Bug 216041
Summary: | Stack overflow at boot (do_IRQ: stack overflow: 1984) on a PowerMac G4 DP, KASAN debug build | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Erhard F. (erhard_f) |
Component: | PPC-32 | Assignee: | platform_ppc-32 |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | arnd, christophe.leroy, michael |
Priority: | P1 | ||
Hardware: | PPC-32 | ||
OS: | Linux | ||
Kernel Version: | 5.18.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg (5.18.0, PowerMac G4 DP), case 1
dmesg (5.18.0, PowerMac G4 DP), case 2 kernel .config (5.18.0, PowerMac G4 DP) kernel .config (5.19-rc1, Outline KASAN + patches, PowerMac G4 DP) attachment-25616-0.html |
Description
Erhard F.
2022-05-28 11:43:04 UTC
Created attachment 301066 [details]
dmesg (5.18.0, PowerMac G4 DP), case 2
Created attachment 301067 [details]
kernel .config (5.18.0, PowerMac G4 DP)
I can't see any issue, other than your CONFIG_THREAD_SHIFT is set to 13. It should be 14 by default, see https://elixir.bootlin.com/linux/v5.18/source/arch/powerpc/Kconfig#L769 Is there any reason why you set it to 13 ? Setting it higher is probably a good idea, but there really isn't a safe limit with KASAN, at least if KASAN_STACK is active, running with KASAN always has a risk of running into stack overflow issues. One thing that sticks out is that there is an interrupt on the same stack as the task, in [eaa1c800] [c0009258] do_IRQ+0x20/0x34 [eaa1c820] [c00045b4] HardwareInterrupt_virt+0x108/0x10c [eaa1c920] [c0c59b2c] __schedule+0x3f0/0x9dc [eaa1c9b0] [c0c5a18c] schedule+0x74/0x13c It looks like on ppc32, as of 547db12fd8a0 ("powerpc/32: Use vmapped stacks for interrupts"), you have either VMAP_STACK (to detect stack overflows) or IRQ stacks (to make them less likely). I think you really want both instead, and allocate the IRQ stacks from vmalloc space as well. The ext4 read path is a bit wasteful with KASAN enabled, using 1776 bytes from ext4_lookup to ext4_read_bh, but not excessively so. There is an interrupt, that needs too looked at a bit deeper: [eaa1c7a0] [c07d0bd4] dump_stack_lvl+0x60/0x90 [eaa1c7c0] [c0009234] __do_IRQ+0x170/0x174 [eaa1c800] [c0009258] do_IRQ+0x20/0x34 [eaa1c820] [c00045b4] HardwareInterrupt_virt+0x108/0x10c The interesting part is __do_IRQ() : void __do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); void *cursp, *irqsp, *sirqsp; /* Switch to the irq stack to handle this */ cursp = (void *)(current_stack_pointer & ~(THREAD_SIZE - 1)); irqsp = hardirq_ctx[raw_smp_processor_id()]; sirqsp = softirq_ctx[raw_smp_processor_id()]; check_stack_overflow(); /* Already there ? */ if (unlikely(cursp == irqsp || cursp == sirqsp)) { __do_irq(regs); set_irq_regs(old_regs); return; } /* Switch stack and call */ call_do_irq(regs, irqsp); set_irq_regs(old_regs); } The dump_stack() we see in the call trace is from check_stack_overflow(), following the message "do_IRQ: stack overflow: 1984", because the stack dropped below 0xeaa1c800 check_stack_overflow() function emits a warning and a stack dump when CONFIG_DEBUG_STACKOVERFLOW is selected and only 2kbytes remain available on the stack. But here we get an Oops when the stack reaches 0xeaa1c000. Seems like the 2kbytes limit it not enough to properly perform the stack dump. Commit 547db12fd8a0 ("powerpc/32: Use vmapped stacks for interrupts") doesn't remove IRQ stacks. It change the IRQ stacks allocation from kmalloc to vmalloc. Here we are stillon the original stack. The switch to the IRQ stack is performed by call_do_irq(). (In reply to Christophe Leroy from comment #3) > I can't see any issue, other than your CONFIG_THREAD_SHIFT is set to 13. > > It should be 14 by default, see > https://elixir.bootlin.com/linux/v5.18/source/arch/powerpc/Kconfig#L769 > > Is there any reason why you set it to 13 ? I was not aware setting it to a custom value. I thought 13 is the default on ppc32 which gets overriden to 14 if I select KASAN? But I'll make sure I'll double check this on future builds. Only advanced option I did set is CONFIG_LOWMEM_SIZE=0x28000000 (see bug #215389). Created attachment 301129 [details] kernel .config (5.19-rc1, Outline KASAN + patches, PowerMac G4 DP) Tried to reinvestigate this issue with a KASAN build of v5.19-rc1 but it seems it's not quite there. I applied the 2 patches "powerpc-kasan-Force-thread-size-increase-with-KASAN" and "v2-powerpc-irq-Increase-stack_overflow-detection-limit-when-KASAN-is-enabled" on top of v5.19-rc1 but I get a non-booting kernel. The kernel boots first but gets stuck on a white screen reading "done found display: /pci@f0000000/ATY,AlteracParent@10/ATY,Alterac_B@1, opening..." Kernel with same config but with KFENCE instead of KASAN boots fine (see bug #216095). Reinvestigate this issue with a KASAN build of v6.0.0-rc2 and it's looking good so far! No stack overflow at boot, did about 10 reboots. Outline KASAN also seems to work fine. I'll keep an eye on this and close here if I don't see it the next few kernel releases. The two patches mentioned in comment #7 were merged as: 3e8635fb2e07 ("powerpc/kasan: Force thread size increase with KASAN") https://git.kernel.org/torvalds/c/3e8635fb2e072672cbc650989ffedf8300ad67fb 41f20d6db2b6 ("powerpc/irq: Increase stack_overflow detection limit when KASAN is enabled") https://git.kernel.org/torvalds/c/41f20d6db2b64677225bb0b97df956241c353ef8 The boot failure with v5.19-rc1 might have been some other issue? I'll close this for now, please reopen if you see this again. Created attachment 304123 [details]
attachment-25616-0.html
I'm away from the office until April 24th
|