Bug 218427

Summary: Kernel Panic during kernel startup: arm marvel dove in 6.6
Product: Linux Reporter: walther-it
Component: KernelAssignee: Ard Biesheuvel (ardb)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: ardb, bagasdotme, regressions
Priority: P3    
Hardware: ARM   
OS: Linux   
Kernel Version: 6.6.14, 6.7.2, 6.8-rc2 Subsystem:
Regression: Yes Bisected commit-id: 8bcba70cb5c2204a011e06278a1fbfb1213e1df1
Attachments: config
patch to handle thumb2 encodings of ldc/stc

Description walther-it 2024-01-27 17:17:03 UTC
after migrating from kernel 6.5 to 6.6.14 with default settings during upgrade, a kernel panic during kernel startup is shown:

[    5.463091] ### dt-test ### EXPECT \ : platform testcase-data:testcase-device2: error -ENXIO: IRQ index 0 not found
[    5.463108] platform testcase-data:testcase-device2: error -ENXIO: IRQ index 0 not found
[    5.481573] ### dt-test ### EXPECT / : platform testcase-data:testcase-device2: error -ENXIO: IRQ index 0 not found
[    5.481580] ### dt-test ### pass of_unittest_platform_populate():1464
[    5.498402] ### dt-test ### pass of_unittest_platform_populate():1469
[    5.504934] ### dt-test ### pass of_unittest_platform_populate():1475
[    5.512498] ### dt-test ### pass of_unittest_platform_populate():1495
[    5.518950] ### dt-test ### pass of_unittest_platform_populate():1495
[    5.525794] ### dt-test ### pass of_unittest_platform_populate():1505
[    5.532221] ### dt-test ### pass of_unittest_platform_populate():1505
[    5.538751] ### dt-test ### pass of_unittest_lifecycle():3176
[    5.544538] ### dt-test ### EXPECT \ : OF: ERROR: of_node_release() detected bad of_node_put() on /testcase-data/refcount-node
[    5.544548] ### dt-test ### pass of_unittest_lifecycle():3201
[    5.561648] OF: ERROR: of_node_release() detected bad of_node_put() on /testcase-data/refcount-node
[    5.570693] ### dt-test ### EXPECT / : OF: ERROR: of_node_release() detected bad of_node_put() on /testcase-data/refcount-node
[    5.570701] ### dt-test ### EXPECT \ : ------------[ cut here ]------------
[    5.582052] ### dt-test ### EXPECT \ : WARNING: <<all>>
[    5.589000] ### dt-test ### EXPECT \ : refcount_t: underflow; use-after-free.
[    5.594215] ### dt-test ### EXPECT \ : ---[ end trace <<int>> ]---
[    5.601314] ### dt-test ### pass of_unittest_lifecycle():3221
[    5.613194] ------------[ cut here ]------------
[    5.617802] WARNING: CPU: 0 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0x13c/0x174
[    5.626062] refcount_t: underflow; use-after-free.
[    5.630829] Modules linked in:
[    5.633892] CPU: 0 PID: 1 Comm: swapper Tainted: G                 N 6.6.14 #1
[    5.641091] Hardware name: Marvell Dove
[    5.644911] Backtrace: 
[    5.647356]  dump_backtrace from show_stack+0x20/0x24
[    5.652418]  r7:00000009 r6:c06068b4 r5:000b0013 r4:c0d5bb38
[    5.658053]  show_stack from dump_stack_lvl+0x2c/0x34
[    5.663107]  dump_stack_lvl from dump_stack+0x18/0x1c
[    5.668155]  r5:0000001c r4:c0d94164
[    5.671712]  dump_stack from __warn+0x88/0x12c
[    5.676161]  __warn from warn_slowpath_fmt+0x120/0x1a4
[    5.681301]  r8:c06068b4 r7:c0d941a0 r6:c0d94164 r5:c1964600 r4:00000000
[    5.687972]  warn_slowpath_fmt from refcount_warn_saturate+0x13c/0x174
[    5.694497]  r10:13ea60fe r9:89464842 r8:abbf414f r7:c1d455ac r6:0503ae7f r5:c7e49613
[    5.702290]  r4:c1d455ec
[    5.704812]  refcount_warn_saturate from kobject_put+0xe0/0xe8
[    5.710636]  kobject_put from of_node_put+0x24/0x28
[    5.715524]  r7:c1d455ac r6:0503ae7f r5:c7e49613 r4:c117c890
[    5.721159]  of_node_put from of_unittest+0x4370/0x470c
[    5.726387]  of_unittest from do_one_initcall+0x58/0x28c
[    5.731701]  r10:64370a8f r9:c0f6e858 r8:c0f6e834 r7:00000000 r6:c1964600 r5:c0f57bd8
[    5.739494]  r4:c0faa898
[    5.742016]  do_one_initcall from kernel_init_freeable+0x294/0x348
[    5.748192]  r8:c0f6e834 r7:00000054 r6:00000007 r5:36eba1c8 r4:c0faa898
[    5.754864]  kernel_init_freeable from kernel_init+0x28/0x140
[    5.760609]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0b40b60
[    5.768404]  r4:c1003240
[    5.770926]  kernel_init from ret_from_fork+0x14/0x28
[    5.775971] Exception stack(0xf0819fb0 to 0xf0819ff8)
[    5.781005] 9fa0:                                     00000000 00000000 00000000 00000000
[    5.789155] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    5.797300] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    5.803890]  r5:c0b40b60 r4:00000000
[    5.807469] ---[ end trace 0000000000000000 ]---
[    5.812070] ### dt-test ### EXPECT / : ---[ end trace <<int>> ]---
[    5.812076] ### dt-test ### EXPECT / : refcount_t: underflow; use-after-free.
[    5.818244] ### dt-test ### EXPECT / : WARNING: <<all>>
[    5.825360] ### dt-test ### EXPECT / : ------------[ cut here ]------------
[    5.830558] ### dt-test ### EXPECT_NOT \ : ------------[ cut here ]------------
[    5.837499] ### dt-test ### EXPECT_NOT \ : WARNING: <<all>>
[    5.844783] ### dt-test ### EXPECT_NOT \ : refcount_t: underflow; use-after-free.
[    5.850327] ### dt-test ### EXPECT_NOT \ : ---[ end trace <<int>> ]---
[    5.857786] ### dt-test ### pass of_unittest_lifecycle():3238
[    5.870019] ### dt-test ### EXPECT_NOT / : ---[ end trace <<int>> ]---
[    5.870025] ### dt-test ### EXPECT_NOT / : refcount_t: underflow; use-after-free.
[    5.876564] ### dt-test ### EXPECT_NOT / : WARNING: <<all>>
[    5.884029] ### dt-test ### EXPECT_NOT / : ------------[ cut here ]------------
[    5.889582] ### dt-test ### pass of_unittest_lifecycle():3264
[    5.902590] ### dt-test ### pass of_unittest_lifecycle():3265
[    5.908449] ### dt-test ### pass of_unittest_check_tree_linkage():271
[    5.914885] ### dt-test ### pass of_unittest_check_tree_linkage():272
[    5.921295] ### dt-test ### end of unittest - 257 passed, 0 failed
[    5.927549] clk: Disabling unused clocks
[    5.938529] Freeing unused kernel image (initmem) memory: 1024K
[    5.965422] Run /init as init process
[    5.970498] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[    5.978135] CPU: 0 PID: 1 Comm: init Tainted: G        W        N 6.6.14 #1
[    5.985077] Hardware name: Marvell Dove
[    5.988895] Backtrace: 
[    5.991341]  dump_backtrace from show_stack+0x20/0x24
[    5.996412]  r7:c0d5116c r6:00000000 r5:600d0093 r4:c0d5bb38
[    6.002045]  show_stack from dump_stack_lvl+0x2c/0x34
[    6.007092]  dump_stack_lvl from dump_stack+0x18/0x1c
[    6.012141]  r5:00000000 r4:c113e338
[    6.015696]  dump_stack from panic+0x118/0x314
[    6.020139]  panic from make_task_dead+0x0/0x174
[    6.024766]  r3:00000001 r2:00000000 r1:00000004 r0:c0d5116c
[    6.030404]  r7:00000048
[    6.032924]  do_exit from do_group_exit+0x48/0xbc
[    6.037627]  r7:00000048
[    6.040148]  do_group_exit from get_signal+0x9f4/0xa40
[    6.045297]  get_signal from do_work_pending+0x11c/0x518
[    6.050612]  r10:004226a4 r9:c1964600 r8:00000000 r7:00000000 r6:f0819fb0 r5:00000000
[    6.058406]  r4:c1964600
[    6.060927]  do_work_pending from slow_work_pending+0xc/0x24
[    6.066569] Exception stack(0xf0819fb0 to 0xf0819ff8)
[    6.071602] 9fa0:                                     bef7b518 00000000 0004ead6 b6f8cdc0
[    6.079754] 9fc0: bef7b704 00000000 004c78cc bef7b6c0 bef7b710 b6f8da60 004226a4 bef7b6a8
[    6.087900] 9fe0: bef7b580 bef7b4f8 b6f70d07 b6f84024 000d0030 ffffffff
[    6.094494]  r10:004226a4 r9:c1964600 r8:00000000 r7:c1964600 r6:ffffffff r5:000d0030
[    6.102288]  r4:b6f84026
[    6.104817] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 ]---

System is a armhf (marvel dove cubox) with debian testing (trixie)
Comment 1 walther-it 2024-01-27 17:21:29 UTC
Created attachment 305784 [details]
config
Comment 2 Bagas Sanjaya 2024-01-28 04:29:37 UTC
(In reply to walther-it from comment #0)
> after migrating from kernel 6.5 to 6.6.14 with default settings during
> upgrade, a kernel panic during kernel startup is shown:
> 

Then please bisect (see [1] for how to perform that).

Also, it's helpful to test current mainline (v6.8-rc1) to confirm or deny
this regression.

[1]: https://lore.kernel.org/linux-doc/c763e15e-e82e-49f8-a540-d211d18768a3@leemhuis.info/
Comment 3 walther-it 2024-01-29 20:46:23 UTC
Hallo Sanjaya,

I performed a bisect between Linux 6.6-rc1 (bad) and Linux 6.5 (good).
It traced the problem down to:

8bcba70cb5c2204a011e06278a1fbfb1213e1df1 is the first bad commit
commit 8bcba70cb5c2204a011e06278a1fbfb1213e1df1
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Sun Mar 19 15:18:25 2023 +0100

    ARM: entry: Disregard Thumb undef exception in coproc dispatch
    
    Now that the only remaining coprocessor instructions being handled via
    the dispatch in entry-armv.S are ones that only exist in a ARM (A32)
    encoding, we can simplify the handling of Thumb undef exceptions, and
    send them straight to the undefined instruction handlers in C code.
    
    This also means we can drop the code that partially decodes the
    instruction to decide whether it is a 16-bit or 32-bit Thumb
    instruction: this is all taken care of by the undef hook.
    
    Acked-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

 arch/arm/kernel/entry-armv.S | 121 +++++++------------------------------------
 1 file changed, 18 insertions(+), 103 deletions(-)

The full log is:
# bad: [0bb80ecc33a8fb5a682236443c1e740d5c917d1d] Linux 6.6-rc1
# good: [2dde18cd1d8fac735875f2e4987f11817cc0bc2c] Linux 6.5
git bisect start '0bb80ecc33a8fb5a682236443c1e740d5c917d1d' '2dde18cd1d8fac735875f2e4987f11817cc0bc2c'
# good: [461f35f014466c4e26dca6be0f431f57297df3f2] Merge tag 'drm-next-2023-08-30' of git://anongit.freedesktop.org/drm/drm
git bisect good 461f35f014466c4e26dca6be0f431f57297df3f2
# bad: [e925992671907314b7db6793a28eb39b36bc21a4] Merge tag 'staging-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect bad e925992671907314b7db6793a28eb39b36bc21a4
# good: [0e72db77672ff4758a31fb5259c754a7bb229751] Merge tag 'soc-dt-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 0e72db77672ff4758a31fb5259c754a7bb229751
# good: [df57721f9a63e8a1fb9b9b2e70de4aa4c7e0cd2e] Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good df57721f9a63e8a1fb9b9b2e70de4aa4c7e0cd2e
# bad: [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
git bisect bad e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9
# bad: [659b3613fc635fb1813fb3006680876b24d86919] Merge tag 'dlm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
git bisect bad 659b3613fc635fb1813fb3006680876b24d86919
# good: [4d15721177d539d743fcf31d7bb376fb3b81aeb6] powerpc/mm: Cleanup memory block size probing
git bisect good 4d15721177d539d743fcf31d7bb376fb3b81aeb6
# good: [b9bbbf4979073d5536b7650decd37fcb901e6556] powerpc/mpc5xxx: Add missing fwnode_handle_put()
git bisect good b9bbbf4979073d5536b7650decd37fcb901e6556
# bad: [f441ff73f1ec568acef03f0ce4d5088c7e65c106] powerpc: Fix pud_mkwrite() definition after pte_mkwrite() API changes
git bisect bad f441ff73f1ec568acef03f0ce4d5088c7e65c106
# bad: [53ae158f6ddc14df5c44d62c06e33fdb66de1196] Merge tag 'arm-vfp-refactor-for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable
git bisect bad 53ae158f6ddc14df5c44d62c06e33fdb66de1196
# good: [6ee1e6772e1e19436f573672de5ff8aab7163be6] ARM: kernel: Get rid of thread_info::used_cp[] array
git bisect good 6ee1e6772e1e19436f573672de5ff8aab7163be6
# bad: [8bcba70cb5c2204a011e06278a1fbfb1213e1df1] ARM: entry: Disregard Thumb undef exception in coproc dispatch
git bisect bad 8bcba70cb5c2204a011e06278a1fbfb1213e1df1
# good: [cdd87465adfd75e4ebd11507575533c6bf7a5525] ARM: vfp: Use undef hook for handling VFP exceptions
git bisect good cdd87465adfd75e4ebd11507575533c6bf7a5525
# first bad commit: [8bcba70cb5c2204a011e06278a1fbfb1213e1df1] ARM: entry: Disregard Thumb undef exception in coproc dispatch

Is there anything else I can help to find out the root cause and fix it?

Best regards
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-01-30 05:12:03 UTC
(In reply to walther-it from comment #3)
>
> I performed a bisect between Linux 6.6-rc1 (bad) and Linux 6.5 (good).
> It traced the problem down to:
> 
> 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 is the first bad commit
> commit 8bcba70cb5c2204a011e06278a1fbfb1213e1df1
> Author: Ard Biesheuvel <ardb@kernel.org>
> Date:   Sun Mar 19 15:18:25 2023 +0100
> 
>     ARM: entry: Disregard Thumb undef exception in coproc dispatch
>     
>     Now that the only remaining coprocessor instructions being handled via
>     the dispatch in entry-armv.S are ones that only exist in a ARM (A32)
>     encoding, we can simplify the handling of Thumb undef exceptions, and
>     send them straight to the undefined instruction handlers in C code.
> [...]     

Two quick follow up questions, the first one being the more important one:

* does the problem still happen with 6.8-rc2?
* if the problem still happens there, could you try to "git revert" the commit -- and if that works compile another kernel to see if this goes away?
Comment 5 walther-it 2024-01-30 12:52:58 UTC
Hi Thorsten,

yes, the issue also happens with:
- 6.7.2
- 6.8-rc2

the last working kernel is:
- 6.5.13

thus I assume it's a regression.

I also tried to revert 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 but this failed, as the file was subsequently changed in the relevant sections,
i.e. by ardb@kernel.org in commits:
- 303d6da167dcbc3dd89adf3ca4e36c369950ed01 and
- 47ba5f39eab3c2a9a1ba878159a6050f2bbfc0e2

Any further help is appreciated.
Comment 6 Ard Biesheuvel 2024-01-30 14:48:05 UTC
Created attachment 305793 [details]
patch to handle thumb2 encodings of ldc/stc

Please try the attached patch.
Comment 7 walther-it 2024-01-31 20:32:57 UTC
Hi Ard,
thanks for the update.
I tried the patch both, on top of 6.6.14 and 6.8-rc2, unfortunately no luck.
The result looks very similar:

[    5.956636] clk: Disabling unused clocks
[    5.967674] Freeing unused kernel image (initmem) memory: 1024K
[    5.994270] Run /init as init process
[    5.999463] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    6.007098] CPU: 0 PID: 1 Comm: init Tainted: G        W        N 6.6.14 #1
[    6.014039] Hardware name: Marvell Dove
[    6.017858] Backtrace: 
[    6.020304]  dump_backtrace from show_stack+0x20/0x24
[    6.025374]  r7:c0d51598 r6:00000000 r5:600d0193 r4:c0d5bf64
[    6.031008]  show_stack from dump_stack_lvl+0x2c/0x34
[    6.036063]  dump_stack_lvl from dump_stack+0x18/0x1c
[    6.041121]  r5:00000000 r4:c113e338
[    6.044676]  dump_stack from panic+0x118/0x314
[    6.049127]  panic from do_exit+0xab8/0xac4
[    6.053323]  r3:00000001 r2:00000000 r1:0000000b r0:c0d51598
[    6.058960]  r7:000000d4
[    6.061481]  do_exit from do_group_exit+0x48/0xbc
[    6.066192]  r7:000000d4
[    6.068712]  do_group_exit from get_signal+0xa30/0xa64
[    6.073854]  get_signal from do_work_pending+0x11c/0x518
[    6.079168]  r10:004346a4 r9:c19c8000 r8:00000000 r7:00000000 r6:f0819fb0 r5:00000000
[    6.086962]  r4:c19c8000
[    6.089484]  do_work_pending from slow_work_pending+0xc/0x24
[    6.095135] Exception stack(0xf0819fb0 to 0xf0819ff8)
[    6.100167] 9fa0:                                     00000000 00000000 00000000 00000060
[    6.108319] 9fc0: be917704 00000000 004d98cc be9176c0 be917710 b6f89a60 004346a4 be9176a8
[    6.116464] 9fe0: be9175b0 be9174f8 b6f6cd07 b6f7aa5e 400d0030 ffffffff
[    6.123060]  r10:004346a4 r9:c19c8000 r8:00000000 r7:c19c8000 r6:ffffffff r5:400d0030
[    6.130854]  r4:b6f7aa5e
[    6.133381] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Comment 8 Ard Biesheuvel 2024-01-31 22:48:23 UTC
Please rebuild your kernel with CONFIG_DEBUG_USER enabled, and boot with user_debug=1 on the command line. That should give more information in the log about the instruction that triggered the exception.
Comment 9 walther-it 2024-02-01 17:10:27 UTC
Result with user_debug=1:
[    5.908611] clk: Disabling unused clocks
[    5.919596] Freeing unused kernel image (initmem) memory: 1024K
[    5.945457] Run /init as init process
[    5.950622] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    5.958263] CPU: 0 PID: 1 Comm: init Tainted: G        W        N 6.6.14 #1
[    5.965206] Hardware name: Marvell Dove
[    5.969025] Backtrace: 
[    5.971469]  dump_backtrace from show_stack+0x20/0x24
[    5.976540]  r7:c0d5129c r6:00000000 r5:600d0193 r4:c0d5bc64
[    5.982174]  show_stack from dump_stack_lvl+0x2c/0x34
[    5.987228]  dump_stack_lvl from dump_stack+0x18/0x1c
[    5.992277]  r5:00000000 r4:c113e33c
[    5.995834]  dump_stack from panic+0x118/0x314
[    6.000276]  panic from make_task_dead+0x0/0x174
[    6.004903]  r3:00000001 r2:00000000 r1:0000000b r0:c0d5129c
[    6.010541]  r7:000000d4
[    6.013063]  do_exit from do_group_exit+0x48/0xbc
[    6.017772]  r7:000000d4
[    6.020294]  do_group_exit from get_signal+0x9f4/0xa40
[    6.025444]  get_signal from do_work_pending+0x11c/0x518
[    6.030757]  r10:004816a4 r9:c1961180 r8:00000000 r7:00000000 r6:f0819fb0 r5:00000000
[    6.038553]  r4:c1961180
[    6.041073]  do_work_pending from slow_work_pending+0xc/0x24
[    6.046715] Exception stack(0xf0819fb0 to 0xf0819ff8)
[    6.051749] 9fa0:                                     00000000 00000000 00000000 00000060
[    6.059900] 9fc0: bed07704 00000000 005268cc bed076c0 bed07710 b6f08a60 004816a4 bed076a8
[    6.068045] 9fe0: bed075b0 bed074f8 b6eebd07 b6ef9a5e 400d0030 ffffffff
[    6.074641]  r10:004816a4 r9:c1961180 r8:00000000 r7:c1961180 r6:ffffffff r5:400d0030
[    6.082435]  r4:b6ef9a5e
[    6.084962] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

result with user_debug=31
[    6.074723] Freeing unused kernel image (initmem) memory: 1024K
[    6.105467] Run /init as init process
[    6.110602] 8<--- cut here ---
[    6.113703] init: unhandled page fault (11) at 0x00000160, code 0x815
[    6.120124] [00000160] *pgd=00000000
[    6.123736] CPU: 0 PID: 1 Comm: init Tainted: G        W        N 6.6.14 #1
[    6.130673] Hardware name: Marvell Dove
[    6.134532] PC is at 0xb6fc9a5e
[    6.137672] LR is at 0xb6fbbd07
[    6.140807] pc : [<b6fc9a5e>]    lr : [<b6fbbd07>]    psr: 400d0030
[    6.147076] sp : beb224f8  ip : beb225b0  fp : beb226a8
[    6.152281] r10: 004216a4  r9 : b6fd8a60  r8 : beb22710
[    6.157496] r7 : beb226c0  r6 : 004c68cc  r5 : 00000000  r4 : beb22704
[    6.164013] r3 : 00000060  r2 : 00000000  r1 : 00000000  r0 : 00000000
[    6.170521] Flags: nZcv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
[    6.177907] Control: 10c5387d  Table: 019cc019  DAC: 00000055
[    6.183639] Backtrace: invalid frame pointer 0xbeb226a8
[    6.188895] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    6.196519] CPU: 0 PID: 1 Comm: init Tainted: G        W        N 6.6.14 #1
[    6.203456] Hardware name: Marvell Dove
[    6.207275] Backtrace: 
[    6.209712]  dump_backtrace from show_stack+0x20/0x24
[    6.214774]  r7:c0d5129c r6:00000000 r5:600d0193 r4:c0d5bc64
[    6.220407]  show_stack from dump_stack_lvl+0x2c/0x34
[    6.225462]  dump_stack_lvl from dump_stack+0x18/0x1c
[    6.230511]  r5:00000000 r4:c113e33c
[    6.234068]  dump_stack from panic+0x118/0x314
[    6.238509]  panic from make_task_dead+0x0/0x174
[    6.243137]  r3:00000001 r2:00000000 r1:0000000b r0:c0d5129c
[    6.248775]  r7:000000d4
[    6.251295]  do_exit from do_group_exit+0x48/0xbc
[    6.256006]  r7:000000d4
[    6.258526]  do_group_exit from get_signal+0x9f4/0xa40
[    6.263669]  get_signal from do_work_pending+0x11c/0x518
[    6.268982]  r10:004216a4 r9:c1964600 r8:00000000 r7:00000000 r6:f0819fb0 r5:00000000
[    6.276777]  r4:c1964600
[    6.279299]  do_work_pending from slow_work_pending+0xc/0x24
[    6.284940] Exception stack(0xf0819fb0 to 0xf0819ff8)
[    6.289974] 9fa0:                                     00000000 00000000 00000000 00000060
[    6.298125] 9fc0: beb22704 00000000 004c68cc beb226c0 beb22710 b6fd8a60 004216a4 beb226a8
[    6.306271] 9fe0: beb225b0 beb224f8 b6fbbd07 b6fc9a5e 400d0030 ffffffff
[    6.312864]  r10:004216a4 r9:c1964600 r8:00000000 r7:c1964600 r6:ffffffff r5:400d0030
[    6.320658]  r4:b6fc9a5e
[    6.323187] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Comment 10 Ard Biesheuvel 2024-02-03 16:00:39 UTC
How odd.

The exception causing the crash went from

  Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

which means 'illegal instruction' to

  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

which means 'segmentation fault', so the patch definitely makes a difference.

Before we go down this rabbit hole any further, could you please double check whether this kernel boots correctly with iWMMXt disabled? I.e., do something like the below to ensure that the iWMMXt capability is never detected, and see if you can boot into a shell.


--- a/arch/arm/kernel/pj4-cp0.c
+++ b/arch/arm/kernel/pj4-cp0.c
@@ -110,7 +110,7 @@ static int __init pj4_cp0_init(void)
        u32 __maybe_unused cp_access;
        int vers;
 
-       if (!cpu_is_pj4())
+       //if (!cpu_is_pj4())
                return 0;
 
        vers = pj4_get_iwmmxt_version();
Comment 11 walther-it 2024-02-10 22:32:37 UTC
Hi Ard,

I can confirm that commenting out the suggested line boots up the kernel.

What can be done next?
Comment 12 Ard Biesheuvel 2024-02-11 13:30:53 UTC
What we are currently contemplating is to disable IWMMXT support entirely on these chips. There is no software that makes use of it, and keeping it enabled actually results in some performance overhead.

https://lkml.kernel.org/r/20240209110901.4032939-2-ardb%2Bgit%40google.com

You can achieve the same result without applying the patch by simply disabling CONFIG_IWMMXT in your kernel configuration.