Bug 208181

Summary: BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8
Product: Platform Specific/Hardware Reporter: Erhard F. (erhard_f)
Component: PPC-32Assignee: platform_ppc-32
Status: RESOLVED OBSOLETE    
Severity: normal CC: christophe.leroy
Priority: P1    
Hardware: PPC-32   
OS: Linux   
Kernel Version: 5.8-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg (5.8-rc1, INLINE KASAN, PowerMac G4 DP)
kernel .config (5.8-rc1, PowerMac G4 DP)
dmesg (5.8-rc3, INLINE KASAN, PowerMac G4 DP
kernel .config (5.8-rc3, PowerMac G4 DP)
block_address_translation
segment_registers
kernel .config (5.8-rc5, PowerMac G4 DP)
dmesg (5.9-rc3, INLINE KASAN, PowerMac G4 DP)
kernel .config (5.9-rc3, PowerMac G4 DP)

Description Erhard F. 2020-06-15 11:17:31 UTC
Created attachment 289659 [details]
dmesg (5.8-rc1, INLINE KASAN, PowerMac G4 DP)

The G4 DP boots up and is useable, though I get this at early boot:

[...]
Jun 15 11:41:11 T600 kernel: setbat: no BAT available for mapping 0x2a000000
Jun 15 11:41:11 T600 kernel: setbat: no BAT available for mapping 0x2c000000
Jun 15 11:41:11 T600 kernel: KASAN init done
Jun 15 11:41:11 T600 kernel: ==================================================================
Jun 15 11:41:11 T600 kernel: BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8
Jun 15 11:41:11 T600 kernel: Read of size 1 at addr c11c1a80 by task swapper/0
Jun 15 11:41:11 T600 kernel: 
Jun 15 11:41:11 T600 kernel: CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-PowerMacG4 #2
Jun 15 11:41:11 T600 kernel: Call Trace:
Jun 15 11:41:11 T600 kernel: [c1ae97d0] [c0a2069c] dump_stack+0xfc/0x158 (unreliable)
Jun 15 11:41:11 T600 kernel: [c1ae9800] [c04ac5cc] print_address_description.isra.0+0x30/0x3fc
Jun 15 11:41:11 T600 kernel: [c1ae9870] [c04acb28] kasan_report+0x110/0x170
Jun 15 11:41:11 T600 kernel: [c1ae98b0] [c0a44c00] strcmp+0x58/0xd8
Jun 15 11:41:11 T600 kernel: [c1ae98d0] [c0170790] register_lock_class+0xfa4/0x10a0
Jun 15 11:41:11 T600 kernel: [c1ae9990] [c0170a34] __lock_acquire+0x1a8/0x382c
Jun 15 11:41:11 T600 kernel: [c1ae9b40] [c016f398] lock_acquire+0x5e0/0x854
Jun 15 11:41:11 T600 kernel: [c1ae9c00] [c1144014] _raw_spin_lock_irqsave+0x48/0x70
Jun 15 11:41:11 T600 kernel: [c1ae9c20] [c0ccbe84] of_find_property+0x2c/0x5c
Jun 15 11:41:11 T600 kernel: [c1ae9c50] [c0ccbec8] of_get_property+0x14/0x6c
Jun 15 11:41:11 T600 kernel: [c1ae9c70] [c0cdbcd8] unflatten_dt_nodes+0xc4c/0xcdc
Jun 15 11:41:11 T600 kernel: [c1ae9ec0] [c0cdbe90] __unflatten_device_tree+0x114/0x1e0
Jun 15 11:41:11 T600 kernel: [c1ae9ef0] [c184a294] unflatten_device_tree+0x38/0x54
Jun 15 11:41:11 T600 kernel: [c1ae9f10] [c1808600] setup_arch+0xc8/0x630
Jun 15 11:41:11 T600 kernel: [c1ae9f50] [c1803268] start_kernel+0xcc/0x4cc
Jun 15 11:41:11 T600 kernel: [c1ae9ff0] [000038a0] 0x38a0
Jun 15 11:41:11 T600 kernel: 
Jun 15 11:41:11 T600 kernel: The buggy address belongs to the variable:
Jun 15 11:41:11 T600 kernel:  __func__.22122+0x80/0x320
Jun 15 11:41:11 T600 kernel: 
Jun 15 11:41:11 T600 kernel: Memory state around the buggy address:
Jun 15 11:41:11 T600 kernel:  c11c1980: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1
Jun 15 11:41:11 T600 kernel:  c11c1a00: 04 f2 04 f2 04 f2 04 f2 04 f2 00 00 00 04 f2 f2
Jun 15 11:41:11 T600 kernel: >c11c1a80: f2 f2 00 00 00 04 f3 f3 f3 f3 00 00 00 00 00 00
Jun 15 11:41:11 T600 kernel:            ^
Jun 15 11:41:11 T600 kernel:  c11c1b00: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
Jun 15 11:41:11 T600 kernel:  c11c1b80: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
Jun 15 11:41:11 T600 kernel: ==================================================================
Jun 15 11:41:11 T600 kernel: Disabling lock debugging due to kernel taint
[...]
Comment 1 Erhard F. 2020-06-15 11:22:45 UTC
Created attachment 289661 [details]
kernel .config (5.8-rc1, PowerMac G4 DP)
Comment 2 Erhard F. 2020-06-29 14:31:53 UTC
Created attachment 289935 [details]
dmesg (5.8-rc3, INLINE KASAN, PowerMac G4 DP

No change with 5.8-rc3.
Comment 3 Erhard F. 2020-06-29 14:32:30 UTC
Created attachment 289937 [details]
kernel .config (5.8-rc3, PowerMac G4 DP)
Comment 4 Erhard F. 2020-06-29 14:42:34 UTC
Erm wait... there is some change.

In 5.8-rc1 stacktrace was:
BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8
Read of size 1 at addr c11c1a80 by task swapper/0

CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-PowerMacG4 #2
Call Trace:
[c1ae97d0] [c0a2069c] dump_stack+0xfc/0x158 (unreliable)
[c1ae9800] [c04ac5cc] print_address_description.isra.0+0x30/0x3fc
[c1ae9870] [c04acb28] kasan_report+0x110/0x170
[c1ae98b0] [c0a44c00] strcmp+0x58/0xd8
[c1ae98d0] [c0170790] register_lock_class+0xfa4/0x10a0
[c1ae9990] [c0170a34] __lock_acquire+0x1a8/0x382c
[c1ae9b40] [c016f398] lock_acquire+0x5e0/0x854
[c1ae9c00] [c1144014] _raw_spin_lock_irqsave+0x48/0x70
[c1ae9c20] [c0ccbe84] of_find_property+0x2c/0x5c
[c1ae9c50] [c0ccbec8] of_get_property+0x14/0x6c
[c1ae9c70] [c0cdbcd8] unflatten_dt_nodes+0xc4c/0xcdc
[c1ae9ec0] [c0cdbe90] __unflatten_device_tree+0x114/0x1e0
[c1ae9ef0] [c184a294] unflatten_device_tree+0x38/0x54
[c1ae9f10] [c1808600] setup_arch+0xc8/0x630
[c1ae9f50] [c1803268] start_kernel+0xcc/0x4cc
[c1ae9ff0] [000038a0] 0x38a0


In 5.8-rc3 stacktrace is:
BUG: KASAN: stack-out-of-bounds in vprintk_func+0x100/0x4b4
Read of size 4 at addr c1919e14 by task swapper/0

CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc3-PowerMacG4 #2
Call Trace:
[c1ae9c00] [c0a304dc] dump_stack+0xfc/0x158 (unreliable)
[c1ae9c30] [c04ac990] print_address_description.isra.0+0x30/0x3fc
[c1ae9ca0] [c04aceec] kasan_report+0x110/0x170
[c1ae9ce0] [c018c204] vprintk_func+0x100/0x4b4
[c1ae9d10] [c018afd4] printk+0xa8/0xd4
[c1ae9db0] [c003c8c4] __ioremap_caller+0x1c4/0x27c
[c1ae9df0] [c003c394] ioremap+0x20/0x30
[c1ae9e00] [c1813fe4] pmac_feature_init+0x288/0xd90
[c1ae9ed0] [c1812cb0] pmac_probe+0x13c/0x190
[c1ae9ef0] [c001d938] probe_machine+0xe8/0x13c
[c1ae9f10] [c1808614] setup_arch+0xdc/0x630
[c1ae9f50] [c1803268] start_kernel+0xcc/0x4cc
[c1ae9ff0] [000038a0] 0x38a0


What stays the same are the two "setbat: no BAT available" messages in both cases.
Comment 5 Christophe Leroy 2020-06-29 14:55:54 UTC
Can we get a dump of /sys/kernel/debug/powerpc/block_address_translation
Comment 6 Erhard F. 2020-06-29 16:57:27 UTC
Created attachment 289945 [details]
block_address_translation

Sure.
Comment 7 Erhard F. 2020-06-29 16:57:52 UTC
Created attachment 289947 [details]
segment_registers
Comment 8 Christophe Leroy 2020-06-30 12:29:59 UTC
block_address_translation contains funny sizes. But the adresses seems ok.
So it shows you have a 24 Mb text+rodata area. 8 BATs are used (16+8+8+32+64+128+256+256)
By increasing CONFIG_DATA_SHIFT to 25, you'll get a 32Mb alignment
So you will have only 6 bats used (32+32+64+128+256+256), so two additional data BATs will be available for KASAN.

But regardless of the BAT stuff, KASAN should work properly.
Comment 9 Erhard F. 2020-06-30 17:28:54 UTC
Ok, thanks for the clarification! So if KASAN works properly something else must cause this hit. I will start a bisect the next few days and see how that turns out...
Comment 10 Christophe Leroy 2020-07-02 08:47:03 UTC
I reproduced the problem and bisected it to commit https://github.com/torvalds/linux/commit/d2a91cef9bbdeb87b7449fdab1a6be6000930210

I'll investigate the issue

Can you confirm this commit is also the curprit from your side ?
Comment 11 Christophe Leroy 2020-07-02 12:00:22 UTC
The issue is that that commit moved more code than described into kasan_init():

Kasan Pages allocation have to be moved into kasan_init() but page tables allocation must remain before the switch to the final hash table.

Problem only occurs on book3s/32 having hash MMU.

See proposed fix at https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=187165 (2 patches).
Comment 12 Erhard F. 2020-07-02 17:18:54 UTC
Successfully applied your 2 patches. The kernel still complains about these 2 BATs, but the KASAN hit at early boot is gone with the patches. Thanks!
Comment 13 Christophe Leroy 2020-07-02 17:40:45 UTC
Thanks for testing.

Regarding the two BATs, I suggest you increase CONFIG_DATA_SHIFT as explained in a previous comment.
Comment 14 Erhard F. 2020-07-03 01:12:20 UTC
Ah, I've overlooked that...

To set CONFIG_DATA_SHIFT=25 I needed to set ADVANCED_OPTIONS=y, DATA_SHIFT_BOOL=y first.

But with CONFIG_DATA_SHIFT=25 this kernel won't boot at all. OpenFirmware shows for a brief moment that the kernel gets loaded and then I get dropped back to OF console.

OpenFirmware tells me:
Invalid memory access at %SRR0: a0000000  %SRR1: 00000000
Comment 15 Christophe Leroy 2020-07-03 04:55:57 UTC
Ah yes, having init_text above the 24 bits limit might be a problem for function calls. I'm surprised that the linker doesn't complain.

Anyway, it is not a problem in itself, and it's unrelated to this bug.
Comment 16 Erhard F. 2020-07-14 22:35:48 UTC
Created attachment 290285 [details]
kernel .config (5.8-rc5, PowerMac G4 DP)

Did some additional test-runs, seems there are still problems with stack usage when running (inline) KASAN:

5.8-rc3 + the 2 patches applied:
Instruction dump:
usercopy: Kernel memory overwrite attemp detected to kernel text (offset 5432, size 4)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exeption in kernel mode, sig:5 [#6]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc b43legacy input_leds joydev mac80211 hid_generic g4_windtunnel sungem sungem_phy btrfs ohci_pci xor lzo_compress lzo_decompress zlib_deflate raid6_pq zlib_inflate firewire_ohci hcd soundcore ssb pcmcia usbcore uninorth_agp pcmcia_core agpart usb_common
CPU: 1 PID: 5250 Comm: mount.nfs Tainted: G       W        5.8.0-rc3-PowerMacG4+ #8
NIP: c04d654c LR: c04d654c CTR: 00000000
REGS: c0001198 TRAP: 0700  Tainted: G       W        (5.8.0-rc3-PowerMacG4+)
MSR:  00021031 <MR,IR,DR,RI> CR: 24028822 XER: 00000000

GPR00: c04d654c c0001498 e719b980 00000058 c01899f4 00000000 00000027 e8dc4e0c
GPR08: 00000023 00000000 00000000 c0001498 44028822 0061bff4 f80002s9 00000003
GPR16: c115a340 f80002d7 c00016b8 c00016c8 c04d654c c115a260 c04d651c f80002d5
GPR24: c00016ac 180002d5 e8dda024 c0000000 c000153c 00000000 00000004 c0001538
NIP [c04d654c] usercopy_abort+0x68/0x78
LR [c04d654c] usercopy_abort+0x68/0x78
Call Trace:
Instruction dump:
usercopy: Kernel memory overwrite attemp detected to kernel text (offset 4848, size 4)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exeption in kernel mode, sig:5 [#7]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc b43legacy input_leds joydev mac80211 hid_generic g4_windtunnel sungem sungem_phy btrfs ohci_pci xor lzo_compress lzo_decompress zlib_deflate raid6_pq zlib_inflate firewire_ohci hcd soundcore ssb pcmcia usbcore uninorth_agp pcmcia_core agpart usb_common
CPU: 1 PID: 5250 Comm: mount.nfs Tainted: G       W        5.8.0-rc3-PowerMacG4+ #8
NIP: c04d654c LR: c04d654c CTR: 00000000
REGS: c0001198 TRAP: 0700  Tainted: G       W        (5.8.0-rc3-PowerMacG4+)
MSR:  00021031 <MR,IR,DR,RI> CR: 24028822 XER: 00000000

GPR00: c04d654c c0001250 e719b980 00000058 c01899f4 00000000 00000027 e8dc4e0c
GPR08: 00000023 00000000 00000000 c0001250 44028822 0061bff4 f8000290 00000003
GPR16: c115a340 f800028e c0001470 c0001480 c04d654c c115a260 c04d651c f800028c
GPR24: c0001464 1800028c e8dda024 c0000000 c00012f4 00000000 00000004 c00012f0
NIP [c04d654c] usercopy_abort+0x68/0x78
Unrecoverable FP Unavailable Exception 801 at 908
LR [c04d654c] usercopy_abort+0x68/0x78
Call Trace:


5.8-rc5 + the 2 patches applied:
do_IRQ: stack overflow: 1984
CPU: 1 PID: 347 Comm: gzip Tainted: G       W        5.8.0-rc5-PowerMacG4+ #1
Call Trace:
Comment 17 Erhard F. 2020-09-04 22:19:43 UTC
Created attachment 292337 [details]
dmesg (5.9-rc3, INLINE KASAN, PowerMac G4 DP)

Re-tried with 5.9-rc3 (inline KASAN). The original problem (stack-out-of-bounds in strcmp+0x58/0xd8) is gone, but still problems with stack usage when doing larger build jobs:

[...]
[ 1929.683510] do_IRQ: stack overflow: 1696
[ 1929.690727] CPU: 1 PID: 735 Comm: mount.nfs Tainted: G        W         5.9.0-rc3-PowerMacG4 #1
[ 1929.697847] Call Trace:
[ 1929.704633] [d0ca4670] [c0a75518] dump_stack+0xfc/0x130 (unreliable)
[ 1929.711507] [d0ca46a0] [c000b094] do_IRQ+0x128/0x180
[ 1929.717998] [d0ca46d0] [c002e560] ret_from_except+0x0/0x14
[ 1929.724652] --- interrupt: 501 at _raw_spin_unlock_irqrestore+0x3c/0xa4
                   LR = _raw_spin_unlock_irqrestore+0x38/0xa4
[ 1929.738722] [d0ca47b8] [c0a6dc90] stack_depot_save+0x20c/0x390
[ 1929.746132] [d0ca4818] [c04d4b70] kasan_save_stack+0x40/0x48
[ 1929.753675] [d0ca4928] [c04d4b9c] kasan_set_track+0x24/0x30
[ 1929.761298] [d0ca4938] [c04d710c] kasan_set_free_info+0x28/0x3c
[ 1929.769073] [d0ca4948] [c04d4f74] __kasan_slab_free+0x104/0x118
[ 1929.776983] [d0ca4968] [c04ce800] slab_free_freelist_hook+0xec/0x17c
[ 1929.785111] [d0ca49a8] [c04d3468] kmem_cache_free+0x58/0x2a0
[ 1929.793391] [d0ca49f8] [c11b251c] packet_rcv+0xb9c/0xbb4
[ 1929.801797] [d0ca4a48] [c0dbfd98] dev_queue_xmit_nit+0x6e4/0x748
[ 1929.810434] [d0ca4ab8] [c0dcaf80] dev_hard_start_xmit+0xec/0x880
[ 1929.819207] [d0ca4b18] [c0ea4814] sch_direct_xmit+0x1f8/0x818
[ 1929.828111] [d0ca4bf8] [c0dcc884] __dev_queue_xmit+0xed4/0x136c
[ 1929.837202] [d0ca4d28] [c0f256dc] ip_finish_output2+0xfcc/0x1028
[ 1929.846472] [d0ca4d88] [c0f2d848] __ip_queue_xmit+0xde0/0x1018
[ 1929.855892] [d0ca4df8] [c0f929d8] __tcp_transmit_skb+0x2550/0x2cb8
[ 1929.865486] [d0ca4ee8] [c0f98470] tcp_write_xmit+0x1d28/0x3498
[ 1929.875216] [d0ca4f78] [c0f99c8c] __tcp_push_pending_frames+0xac/0x1c4
[ 1929.885189] [d0ca4f98] [c0f5a970] tcp_sendmsg_locked+0x1c50/0x2294
[ 1929.895338] [d0ca5098] [c0f5afe4] tcp_sendmsg+0x30/0x48
[ 1929.905564] [d0ca50b8] [c0d598b0] sock_sendmsg_nosec+0xf4/0x10c
[ 1929.916463] [d0ca50d8] [b0a31840] xprt_sock_sendmsg+0x2c0/0x6e8 [sunrpc]
[ 1929.927494] [d0ca51b8] [b0a34ce8] xs_tcp_send_request+0x360/0x580 [sunrpc]
[ 1929.938699] [d0ca52e8] [b0a2eae8] xprt_transmit+0x4f8/0xe30 [sunrpc]
[ 1929.950044] [d0ca5368] [b0a1dcd8] call_transmit+0x238/0x25c [sunrpc]
[ 1929.961450] [d0ca5388] [b0a6641c] __rpc_execute+0x35c/0xbf8 [sunrpc]
[ 1929.972996] [d0ca5448] [b0a21d18] rpc_run_task+0x790/0x79c [sunrpc]
[ 1929.984850] [d0ca5498] [b1282e50] nfs4_call_sync_custom+0x14/0x80 [nfsv4]
[ 1929.996821] [d0ca54b8] [b128302c] nfs4_do_call_sync+0x170/0x1a8 [nfsv4]
[ 1930.008922] [d0ca55a8] [b12b3570] nfs4_proc_lookup_common+0x314/0xc54 [nfsv4]
[ 1930.020820] [d0ca5758] [b12b4244] nfs4_proc_lookup+0x158/0x2f0 [nfsv4]
[ 1930.032753] [d0ca57f8] [b0b49544] nfs_lookup+0x2ac/0x9ac [nfs]
[ 1930.044062] [d0ca5838] [c052c984] __lookup_slow+0x278/0x2a8
[ 1930.055461] [d0ca5958] [c05340a0] walk_component+0x288/0x30c
[ 1930.066816] [d0ca5a08] [c0534e5c] path_lookupat.isra.0+0x1b8/0x438
[ 1930.078282] [d0ca5a48] [c05372a0] filename_lookup+0x144/0x1c4
[ 1930.089834] [d0ca5b98] [c05373fc] vfs_path_lookup+0x94/0xc0
[ 1930.101389] [d0ca5c18] [c05714b8] mount_subtree+0x1c4/0x250
[ 1930.113267] [d0ca5ca8] [b12e1b2c] do_nfs4_mount+0x570/0x7fc [nfsv4]
[ 1930.125298] [d0ca5d68] [b12e202c] nfs4_try_get_tree+0xfc/0x16c [nfsv4]
[ 1930.137200] [d0ca5d88] [c050e434] vfs_get_tree+0xf8/0x398
[ 1930.149133] [d0ca5db8] [c056f968] path_mount+0x1074/0x113c
[ 1930.161107] [d0ca5e78] [c056fad8] do_mount+0xa8/0xe4
[ 1930.173109] [d0ca5f08] [c0570054] sys_mount+0xa8/0xb8
[ 1930.185160] [d0ca5f38] [c002e1cc] ret_from_syscall+0x0/0x34
[ 1930.197313] --- interrupt: c01 at 0x8b5754
                   LR = 0xac0be0
[ 1930.222896] Kernel panic - not syncing: corrupted stack end detected inside scheduler


But feel free to close this bug if appropriate as the original issue is solved.
Comment 18 Erhard F. 2020-09-04 22:21:04 UTC
Created attachment 292339 [details]
kernel .config (5.9-rc3, PowerMac G4 DP)
Comment 19 Erhard F. 2020-09-04 22:57:33 UTC
I noticed that I covered the "do_IRQ: stack overflow: ...." problem already in bug #207129 so closing this one as suggested before.