Bug 207129

Summary: PowerMac G4 DP (5.6.2 debug kernel + inline KASAN) freezes shortly after booting with "do_IRQ: stack overflow: 1760"
Product: Platform Specific/Hardware Reporter: Erhard F. (erhard_f)
Component: PPC-32Assignee: platform_ppc-32
Status: RESOLVED CODE_FIX    
Severity: normal CC: christophe.leroy, michael
Priority: P1    
Hardware: PPC-32   
OS: Linux   
Kernel Version: 5.6.2 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel .config (5.6.2, INLINE KASAN, PowerMac G4 DP)
screenshot01.jpg
screenshot02.jpg
kernel .config (6.8-rc6, KASAN_INLINE=y, PowerMac G4 DP)

Description Erhard F. 2020-04-05 21:32:40 UTC
Created attachment 288221 [details]
kernel .config (5.6.2, INLINE KASAN, PowerMac G4 DP)

Was trying to do some testing with the PowerMac G4 DP again, running a 5.6.2 debug kernel w. KASAN INLINE. The G4 boots fine, but crashes shortly afterwards when using it, leaving no stack trace, but only this message on the screen:

do_IRQ: stack overflow: 1760
CPU: 0 PID: 209 Comm: rsync Tained: G        W        5.6.2-PowerMacG4+ #3
Call Trace:


120 seconds panic timer does not kick in. I have to manually switch off/switch on the G4.
Comment 1 Christophe Leroy 2020-04-06 05:29:12 UTC
So it hands in show_stack().

Does it also hang without CONFIG_DEBUG_STACKOVERFLOW ? If not, it means we have a problem with check_stack_overflow()

Regardless of the result above, can you try increasing CONFIG_THREAD_SHIFT ?

Can you maybe also do a test without CONFIG_VMAP_STACK ?
Comment 2 Erhard F. 2020-04-06 12:26:32 UTC
Created attachment 288229 [details]
screenshot01.jpg

Without CONFIG_DEBUG_STACKOVERFLOW things are better. The rsync completes, the G4 was building stuff for 2 hours or so until I got these errors and a hard freeze:

[...]
Oops: kernel stack overflow, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: ...
CPU: 1 PID: 17105 Comm: kworker/u4:5 Tainted: G        W        5.6.2-PowerMacG4+ #5
------------[ cut here  ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exception in kernel mode, sig: 5 [#2]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: ...
CPU: 1 PID: 17185 Comm: kworker/u4:5 Tainted: G        W        5.6.2-PowerMacG4+ #5
usercopy: Kernel memory overwrite attempt detected to kernel text (offset 6336, size 4)!
------------[ cut here  ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exception in kernel mode, sig: 5 [#3]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: ...
CPU: 1 PID: 17185 Comm: kworker/u4:5 Tainted: G        W        5.6.2-PowerMacG4+ #5
usercopy: Kernel memory overwrite attempt detected to kernel text (offset 5336, size 4)!
------------[ cut here  ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exception in kernel mode, sig: 5 [#4]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: ...
CPU: 1 PID: 17185 Comm: kworker/u4:5 Tainted: G        W        5.6.2-PowerMacG4+ #5
usercopy: Kernel memory overwrite attempt detected to kernel text (offset 4336, size 4)!
------------[ cut here  ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exception in kernel mode, sig: 5 [#5]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: ...
Unrecoverable FP Unavailable Exception 801 at 9b8
CPU: 1 PID: 17185 Comm: kworker/u4:5 Tainted: G        W        5.6.2-PowerMacG4+ #5
usercopy: Kernel memory overwrite attempt detected to kernel text (offset 3336, size 4)!
------------[ cut here  ]------------

Now running with CONFIG_THREAD_SHIFT=14 which runs fine so far... Did not try without CONFIG_VMAP_STACK yet.
Comment 3 Erhard F. 2020-04-06 12:27:01 UTC
Created attachment 288231 [details]
screenshot02.jpg
Comment 4 Erhard F. 2020-04-06 22:57:16 UTC
Without CONFIG_VMAP_STACK I had one crash after 2-3 hours of building but the panic timer kicked in and rebooted the machine. Now it has been building packages for hours again without any anomalies.
Comment 5 Christophe Leroy 2020-04-08 14:55:09 UTC
Ok, so as a summary:
- With CONFIG_THREAD_SHIFT = 13 and CONFIG_DEBUG_STACKOVERFLOW, the system gets stuck
- With CONFIG_THREAD_SHIFT = 13 and without CONFIG_DEBUG_STACKOVERFLOW, stack overflow is not really detected until it gets into kernel text !!!
- With CONFIG_THREAD_SHIFT = 14 it runs fine
- With CONFIG_VMAP_STACK, the automatic restart doesn't work
- Without CONFIG_VMAP_STACK, the automatic restart works

So I'll send a patch to set CONFIG_THREAD_SHIFT to 14 when CONFIG_KASAN is selected. x86 and arm64 already do that.

And I'll try to investigate the other points when I have time.
Comment 6 Erhard F. 2020-04-08 15:59:23 UTC
Yes, precisely summarized! Thanks for your efforts!

CONFIG_KASAN though only is x86_64 not x86 AFAIK.
Comment 7 Michael Ellerman 2024-02-26 11:02:08 UTC
I think this was resolved by increasing the stack size for KASAN builds.

ie.
edbadaf06710 ("powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT")
and later
3e8635fb2e07 ("powerpc/kasan: Force thread size increase with KASAN")

If not feel free to reopen.
Comment 8 Erhard F. 2024-02-27 00:40:41 UTC
Currently (as of v6.8-rc6) and since quite a while my G4 does not boot at all with CONFIG_KASAN_INLINE=y.

When I try booting an KASAN_INLINE enabled kernel it fails with an invalid memory access and I get dropped out to the OpenFirmaware console.

A small difference is when I boot the 56M vmlinux-6.8.0-rc6-PMacG4:

Please wait, loading kernel...
   Elf32 kernel loaded...

Invalid memory access at   %SRR0: 00000013   %SRR1: 00001300

Apple PowerMac3,6 4.6.0f1 BootROM built on 02/20/03 at 13:52:27
[...]


vs. booting the 16M arch/powerpc/boot/zImage:


Please wait, loading kernel...
   Elf32 kernel loaded...

zImage starting: loaded at 0x00400000 (sp: 0x012eefb0)
OF version = 'OpenFirmware 3'
Allocating 0x2c337e0 bytes for kernel...
Trying to claim from 0x400000 to 0x12ef5d8 (0xeef5d8) got ffffffff
Decompressing (0x01414000 <- 0x00410000:0x12ea9c9)...
Done! Decompressed 0x2bcc80c bytes

Linux/PowerPC load: ro root=/dev/sda5 slub_debug=FZP page_poison=1 netconsole=6666@192.168.2.8/eth0,6666@192.168.2.3/A8:A1:59:16:4F:EA debug

Finalizing device tree... using OF tree (promptr=ff847240)

Invalid memory access at   %SRR0: 40000000   %SRR1: 00000000

Apple PowerMac3,6 4.6.0f1 BootROM built on 02/20/03 at 13:52:27
[...]


Same kernel with CONFIG_KASAN_OUTLINE=y instead of KASAN_INLINE boots and runs ok.
Comment 9 Erhard F. 2024-02-27 00:42:42 UTC
Created attachment 305910 [details]
kernel .config (6.8-rc6, KASAN_INLINE=y, PowerMac G4 DP)
Comment 10 Christophe Leroy 2024-02-27 15:25:10 UTC
I built a kernel with your .config, the problem is a size problem.

PPC32 kernels are not designed to be that big.

Extract from generated System.map:
  c2394000 D _etext
  c2800000 T _sinittext
  c2bf5000 B _end

You need to keep the size of the kernel below 32Mbytes, or a deep work is required to enable the kernel to perform far jumps before the kernel is relocated.
Comment 11 Erhard F. 2024-02-28 21:44:50 UTC
You were correct! I forgot about that...

I shrunk the size by using -Os and disabling some debugging stuff and changing some statically built-in stuff to 'M' without sacrificing debugging capabilities too much until it fit < 32 MiB:

KASAN_OUTLINE vs.
 # size vmlinux-6.8.0-rc6-PMacG4 
   text	   data	    bss	    dec	    hex	filename
12367737 6652440 426336 19446513 128baf1 vmlinux-6.8.0-rc6-PMacG4

KASAN_INLINE
 # size vmlinux-6.8.0-rc6-PMacG4 
   text	   data	    bss	    dec	    hex	filename
24660169 6652440  426336 31738945 1e44c41 vmlinux-6.8.0-rc6-PMacG4


Apart from that I can confirm inline KASAN runs fine now and I really no longer get this stack overflow when using it.