Bug 218427
Summary: | Kernel Panic during kernel startup: arm marvel dove in 6.6 | ||
---|---|---|---|
Product: | Linux | Reporter: | walther-it |
Component: | Kernel | Assignee: | Ard Biesheuvel (ardb) |
Status: | RESOLVED CODE_FIX | ||
Severity: | blocking | CC: | ardb, bagasdotme, regressions |
Priority: | P3 | ||
Hardware: | ARM | ||
OS: | Linux | ||
Kernel Version: | 6.6.14, 6.7.2, 6.8-rc2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 |
Attachments: |
config
patch to handle thumb2 encodings of ldc/stc |
Description
walther-it
2024-01-27 17:17:03 UTC
Created attachment 305784 [details]
config
(In reply to walther-it from comment #0) > after migrating from kernel 6.5 to 6.6.14 with default settings during > upgrade, a kernel panic during kernel startup is shown: > Then please bisect (see [1] for how to perform that). Also, it's helpful to test current mainline (v6.8-rc1) to confirm or deny this regression. [1]: https://lore.kernel.org/linux-doc/c763e15e-e82e-49f8-a540-d211d18768a3@leemhuis.info/ Hallo Sanjaya, I performed a bisect between Linux 6.6-rc1 (bad) and Linux 6.5 (good). It traced the problem down to: 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 is the first bad commit commit 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 Author: Ard Biesheuvel <ardb@kernel.org> Date: Sun Mar 19 15:18:25 2023 +0100 ARM: entry: Disregard Thumb undef exception in coproc dispatch Now that the only remaining coprocessor instructions being handled via the dispatch in entry-armv.S are ones that only exist in a ARM (A32) encoding, we can simplify the handling of Thumb undef exceptions, and send them straight to the undefined instruction handlers in C code. This also means we can drop the code that partially decodes the instruction to decide whether it is a 16-bit or 32-bit Thumb instruction: this is all taken care of by the undef hook. Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> arch/arm/kernel/entry-armv.S | 121 +++++++------------------------------------ 1 file changed, 18 insertions(+), 103 deletions(-) The full log is: # bad: [0bb80ecc33a8fb5a682236443c1e740d5c917d1d] Linux 6.6-rc1 # good: [2dde18cd1d8fac735875f2e4987f11817cc0bc2c] Linux 6.5 git bisect start '0bb80ecc33a8fb5a682236443c1e740d5c917d1d' '2dde18cd1d8fac735875f2e4987f11817cc0bc2c' # good: [461f35f014466c4e26dca6be0f431f57297df3f2] Merge tag 'drm-next-2023-08-30' of git://anongit.freedesktop.org/drm/drm git bisect good 461f35f014466c4e26dca6be0f431f57297df3f2 # bad: [e925992671907314b7db6793a28eb39b36bc21a4] Merge tag 'staging-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad e925992671907314b7db6793a28eb39b36bc21a4 # good: [0e72db77672ff4758a31fb5259c754a7bb229751] Merge tag 'soc-dt-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc git bisect good 0e72db77672ff4758a31fb5259c754a7bb229751 # good: [df57721f9a63e8a1fb9b9b2e70de4aa4c7e0cd2e] Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good df57721f9a63e8a1fb9b9b2e70de4aa4c7e0cd2e # bad: [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux git bisect bad e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9 # bad: [659b3613fc635fb1813fb3006680876b24d86919] Merge tag 'dlm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm git bisect bad 659b3613fc635fb1813fb3006680876b24d86919 # good: [4d15721177d539d743fcf31d7bb376fb3b81aeb6] powerpc/mm: Cleanup memory block size probing git bisect good 4d15721177d539d743fcf31d7bb376fb3b81aeb6 # good: [b9bbbf4979073d5536b7650decd37fcb901e6556] powerpc/mpc5xxx: Add missing fwnode_handle_put() git bisect good b9bbbf4979073d5536b7650decd37fcb901e6556 # bad: [f441ff73f1ec568acef03f0ce4d5088c7e65c106] powerpc: Fix pud_mkwrite() definition after pte_mkwrite() API changes git bisect bad f441ff73f1ec568acef03f0ce4d5088c7e65c106 # bad: [53ae158f6ddc14df5c44d62c06e33fdb66de1196] Merge tag 'arm-vfp-refactor-for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux into devel-stable git bisect bad 53ae158f6ddc14df5c44d62c06e33fdb66de1196 # good: [6ee1e6772e1e19436f573672de5ff8aab7163be6] ARM: kernel: Get rid of thread_info::used_cp[] array git bisect good 6ee1e6772e1e19436f573672de5ff8aab7163be6 # bad: [8bcba70cb5c2204a011e06278a1fbfb1213e1df1] ARM: entry: Disregard Thumb undef exception in coproc dispatch git bisect bad 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 # good: [cdd87465adfd75e4ebd11507575533c6bf7a5525] ARM: vfp: Use undef hook for handling VFP exceptions git bisect good cdd87465adfd75e4ebd11507575533c6bf7a5525 # first bad commit: [8bcba70cb5c2204a011e06278a1fbfb1213e1df1] ARM: entry: Disregard Thumb undef exception in coproc dispatch Is there anything else I can help to find out the root cause and fix it? Best regards (In reply to walther-it from comment #3) > > I performed a bisect between Linux 6.6-rc1 (bad) and Linux 6.5 (good). > It traced the problem down to: > > 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 is the first bad commit > commit 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 > Author: Ard Biesheuvel <ardb@kernel.org> > Date: Sun Mar 19 15:18:25 2023 +0100 > > ARM: entry: Disregard Thumb undef exception in coproc dispatch > > Now that the only remaining coprocessor instructions being handled via > the dispatch in entry-armv.S are ones that only exist in a ARM (A32) > encoding, we can simplify the handling of Thumb undef exceptions, and > send them straight to the undefined instruction handlers in C code. > [...] Two quick follow up questions, the first one being the more important one: * does the problem still happen with 6.8-rc2? * if the problem still happens there, could you try to "git revert" the commit -- and if that works compile another kernel to see if this goes away? Hi Thorsten, yes, the issue also happens with: - 6.7.2 - 6.8-rc2 the last working kernel is: - 6.5.13 thus I assume it's a regression. I also tried to revert 8bcba70cb5c2204a011e06278a1fbfb1213e1df1 but this failed, as the file was subsequently changed in the relevant sections, i.e. by ardb@kernel.org in commits: - 303d6da167dcbc3dd89adf3ca4e36c369950ed01 and - 47ba5f39eab3c2a9a1ba878159a6050f2bbfc0e2 Any further help is appreciated. Created attachment 305793 [details]
patch to handle thumb2 encodings of ldc/stc
Please try the attached patch.
Hi Ard, thanks for the update. I tried the patch both, on top of 6.6.14 and 6.8-rc2, unfortunately no luck. The result looks very similar: [ 5.956636] clk: Disabling unused clocks [ 5.967674] Freeing unused kernel image (initmem) memory: 1024K [ 5.994270] Run /init as init process [ 5.999463] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 6.007098] CPU: 0 PID: 1 Comm: init Tainted: G W N 6.6.14 #1 [ 6.014039] Hardware name: Marvell Dove [ 6.017858] Backtrace: [ 6.020304] dump_backtrace from show_stack+0x20/0x24 [ 6.025374] r7:c0d51598 r6:00000000 r5:600d0193 r4:c0d5bf64 [ 6.031008] show_stack from dump_stack_lvl+0x2c/0x34 [ 6.036063] dump_stack_lvl from dump_stack+0x18/0x1c [ 6.041121] r5:00000000 r4:c113e338 [ 6.044676] dump_stack from panic+0x118/0x314 [ 6.049127] panic from do_exit+0xab8/0xac4 [ 6.053323] r3:00000001 r2:00000000 r1:0000000b r0:c0d51598 [ 6.058960] r7:000000d4 [ 6.061481] do_exit from do_group_exit+0x48/0xbc [ 6.066192] r7:000000d4 [ 6.068712] do_group_exit from get_signal+0xa30/0xa64 [ 6.073854] get_signal from do_work_pending+0x11c/0x518 [ 6.079168] r10:004346a4 r9:c19c8000 r8:00000000 r7:00000000 r6:f0819fb0 r5:00000000 [ 6.086962] r4:c19c8000 [ 6.089484] do_work_pending from slow_work_pending+0xc/0x24 [ 6.095135] Exception stack(0xf0819fb0 to 0xf0819ff8) [ 6.100167] 9fa0: 00000000 00000000 00000000 00000060 [ 6.108319] 9fc0: be917704 00000000 004d98cc be9176c0 be917710 b6f89a60 004346a4 be9176a8 [ 6.116464] 9fe0: be9175b0 be9174f8 b6f6cd07 b6f7aa5e 400d0030 ffffffff [ 6.123060] r10:004346a4 r9:c19c8000 r8:00000000 r7:c19c8000 r6:ffffffff r5:400d0030 [ 6.130854] r4:b6f7aa5e [ 6.133381] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- Please rebuild your kernel with CONFIG_DEBUG_USER enabled, and boot with user_debug=1 on the command line. That should give more information in the log about the instruction that triggered the exception. Result with user_debug=1: [ 5.908611] clk: Disabling unused clocks [ 5.919596] Freeing unused kernel image (initmem) memory: 1024K [ 5.945457] Run /init as init process [ 5.950622] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 5.958263] CPU: 0 PID: 1 Comm: init Tainted: G W N 6.6.14 #1 [ 5.965206] Hardware name: Marvell Dove [ 5.969025] Backtrace: [ 5.971469] dump_backtrace from show_stack+0x20/0x24 [ 5.976540] r7:c0d5129c r6:00000000 r5:600d0193 r4:c0d5bc64 [ 5.982174] show_stack from dump_stack_lvl+0x2c/0x34 [ 5.987228] dump_stack_lvl from dump_stack+0x18/0x1c [ 5.992277] r5:00000000 r4:c113e33c [ 5.995834] dump_stack from panic+0x118/0x314 [ 6.000276] panic from make_task_dead+0x0/0x174 [ 6.004903] r3:00000001 r2:00000000 r1:0000000b r0:c0d5129c [ 6.010541] r7:000000d4 [ 6.013063] do_exit from do_group_exit+0x48/0xbc [ 6.017772] r7:000000d4 [ 6.020294] do_group_exit from get_signal+0x9f4/0xa40 [ 6.025444] get_signal from do_work_pending+0x11c/0x518 [ 6.030757] r10:004816a4 r9:c1961180 r8:00000000 r7:00000000 r6:f0819fb0 r5:00000000 [ 6.038553] r4:c1961180 [ 6.041073] do_work_pending from slow_work_pending+0xc/0x24 [ 6.046715] Exception stack(0xf0819fb0 to 0xf0819ff8) [ 6.051749] 9fa0: 00000000 00000000 00000000 00000060 [ 6.059900] 9fc0: bed07704 00000000 005268cc bed076c0 bed07710 b6f08a60 004816a4 bed076a8 [ 6.068045] 9fe0: bed075b0 bed074f8 b6eebd07 b6ef9a5e 400d0030 ffffffff [ 6.074641] r10:004816a4 r9:c1961180 r8:00000000 r7:c1961180 r6:ffffffff r5:400d0030 [ 6.082435] r4:b6ef9a5e [ 6.084962] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- result with user_debug=31 [ 6.074723] Freeing unused kernel image (initmem) memory: 1024K [ 6.105467] Run /init as init process [ 6.110602] 8<--- cut here --- [ 6.113703] init: unhandled page fault (11) at 0x00000160, code 0x815 [ 6.120124] [00000160] *pgd=00000000 [ 6.123736] CPU: 0 PID: 1 Comm: init Tainted: G W N 6.6.14 #1 [ 6.130673] Hardware name: Marvell Dove [ 6.134532] PC is at 0xb6fc9a5e [ 6.137672] LR is at 0xb6fbbd07 [ 6.140807] pc : [<b6fc9a5e>] lr : [<b6fbbd07>] psr: 400d0030 [ 6.147076] sp : beb224f8 ip : beb225b0 fp : beb226a8 [ 6.152281] r10: 004216a4 r9 : b6fd8a60 r8 : beb22710 [ 6.157496] r7 : beb226c0 r6 : 004c68cc r5 : 00000000 r4 : beb22704 [ 6.164013] r3 : 00000060 r2 : 00000000 r1 : 00000000 r0 : 00000000 [ 6.170521] Flags: nZcv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user [ 6.177907] Control: 10c5387d Table: 019cc019 DAC: 00000055 [ 6.183639] Backtrace: invalid frame pointer 0xbeb226a8 [ 6.188895] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 6.196519] CPU: 0 PID: 1 Comm: init Tainted: G W N 6.6.14 #1 [ 6.203456] Hardware name: Marvell Dove [ 6.207275] Backtrace: [ 6.209712] dump_backtrace from show_stack+0x20/0x24 [ 6.214774] r7:c0d5129c r6:00000000 r5:600d0193 r4:c0d5bc64 [ 6.220407] show_stack from dump_stack_lvl+0x2c/0x34 [ 6.225462] dump_stack_lvl from dump_stack+0x18/0x1c [ 6.230511] r5:00000000 r4:c113e33c [ 6.234068] dump_stack from panic+0x118/0x314 [ 6.238509] panic from make_task_dead+0x0/0x174 [ 6.243137] r3:00000001 r2:00000000 r1:0000000b r0:c0d5129c [ 6.248775] r7:000000d4 [ 6.251295] do_exit from do_group_exit+0x48/0xbc [ 6.256006] r7:000000d4 [ 6.258526] do_group_exit from get_signal+0x9f4/0xa40 [ 6.263669] get_signal from do_work_pending+0x11c/0x518 [ 6.268982] r10:004216a4 r9:c1964600 r8:00000000 r7:00000000 r6:f0819fb0 r5:00000000 [ 6.276777] r4:c1964600 [ 6.279299] do_work_pending from slow_work_pending+0xc/0x24 [ 6.284940] Exception stack(0xf0819fb0 to 0xf0819ff8) [ 6.289974] 9fa0: 00000000 00000000 00000000 00000060 [ 6.298125] 9fc0: beb22704 00000000 004c68cc beb226c0 beb22710 b6fd8a60 004216a4 beb226a8 [ 6.306271] 9fe0: beb225b0 beb224f8 b6fbbd07 b6fc9a5e 400d0030 ffffffff [ 6.312864] r10:004216a4 r9:c1964600 r8:00000000 r7:c1964600 r6:ffffffff r5:400d0030 [ 6.320658] r4:b6fc9a5e [ 6.323187] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- How odd. The exception causing the crash went from Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 which means 'illegal instruction' to Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b which means 'segmentation fault', so the patch definitely makes a difference. Before we go down this rabbit hole any further, could you please double check whether this kernel boots correctly with iWMMXt disabled? I.e., do something like the below to ensure that the iWMMXt capability is never detected, and see if you can boot into a shell. --- a/arch/arm/kernel/pj4-cp0.c +++ b/arch/arm/kernel/pj4-cp0.c @@ -110,7 +110,7 @@ static int __init pj4_cp0_init(void) u32 __maybe_unused cp_access; int vers; - if (!cpu_is_pj4()) + //if (!cpu_is_pj4()) return 0; vers = pj4_get_iwmmxt_version(); Hi Ard, I can confirm that commenting out the suggested line boots up the kernel. What can be done next? What we are currently contemplating is to disable IWMMXT support entirely on these chips. There is no software that makes use of it, and keeping it enabled actually results in some performance overhead. https://lkml.kernel.org/r/20240209110901.4032939-2-ardb%2Bgit%40google.com You can achieve the same result without applying the patch by simply disabling CONFIG_IWMMXT in your kernel configuration. |