Bug 208181
Summary: | BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Erhard F. (erhard_f) |
Component: | PPC-32 | Assignee: | platform_ppc-32 |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | christophe.leroy |
Priority: | P1 | ||
Hardware: | PPC-32 | ||
OS: | Linux | ||
Kernel Version: | 5.8-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg (5.8-rc1, INLINE KASAN, PowerMac G4 DP)
kernel .config (5.8-rc1, PowerMac G4 DP) dmesg (5.8-rc3, INLINE KASAN, PowerMac G4 DP kernel .config (5.8-rc3, PowerMac G4 DP) block_address_translation segment_registers kernel .config (5.8-rc5, PowerMac G4 DP) dmesg (5.9-rc3, INLINE KASAN, PowerMac G4 DP) kernel .config (5.9-rc3, PowerMac G4 DP) |
Created attachment 289661 [details]
kernel .config (5.8-rc1, PowerMac G4 DP)
Created attachment 289935 [details]
dmesg (5.8-rc3, INLINE KASAN, PowerMac G4 DP
No change with 5.8-rc3.
Created attachment 289937 [details]
kernel .config (5.8-rc3, PowerMac G4 DP)
Erm wait... there is some change. In 5.8-rc1 stacktrace was: BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8 Read of size 1 at addr c11c1a80 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-PowerMacG4 #2 Call Trace: [c1ae97d0] [c0a2069c] dump_stack+0xfc/0x158 (unreliable) [c1ae9800] [c04ac5cc] print_address_description.isra.0+0x30/0x3fc [c1ae9870] [c04acb28] kasan_report+0x110/0x170 [c1ae98b0] [c0a44c00] strcmp+0x58/0xd8 [c1ae98d0] [c0170790] register_lock_class+0xfa4/0x10a0 [c1ae9990] [c0170a34] __lock_acquire+0x1a8/0x382c [c1ae9b40] [c016f398] lock_acquire+0x5e0/0x854 [c1ae9c00] [c1144014] _raw_spin_lock_irqsave+0x48/0x70 [c1ae9c20] [c0ccbe84] of_find_property+0x2c/0x5c [c1ae9c50] [c0ccbec8] of_get_property+0x14/0x6c [c1ae9c70] [c0cdbcd8] unflatten_dt_nodes+0xc4c/0xcdc [c1ae9ec0] [c0cdbe90] __unflatten_device_tree+0x114/0x1e0 [c1ae9ef0] [c184a294] unflatten_device_tree+0x38/0x54 [c1ae9f10] [c1808600] setup_arch+0xc8/0x630 [c1ae9f50] [c1803268] start_kernel+0xcc/0x4cc [c1ae9ff0] [000038a0] 0x38a0 In 5.8-rc3 stacktrace is: BUG: KASAN: stack-out-of-bounds in vprintk_func+0x100/0x4b4 Read of size 4 at addr c1919e14 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc3-PowerMacG4 #2 Call Trace: [c1ae9c00] [c0a304dc] dump_stack+0xfc/0x158 (unreliable) [c1ae9c30] [c04ac990] print_address_description.isra.0+0x30/0x3fc [c1ae9ca0] [c04aceec] kasan_report+0x110/0x170 [c1ae9ce0] [c018c204] vprintk_func+0x100/0x4b4 [c1ae9d10] [c018afd4] printk+0xa8/0xd4 [c1ae9db0] [c003c8c4] __ioremap_caller+0x1c4/0x27c [c1ae9df0] [c003c394] ioremap+0x20/0x30 [c1ae9e00] [c1813fe4] pmac_feature_init+0x288/0xd90 [c1ae9ed0] [c1812cb0] pmac_probe+0x13c/0x190 [c1ae9ef0] [c001d938] probe_machine+0xe8/0x13c [c1ae9f10] [c1808614] setup_arch+0xdc/0x630 [c1ae9f50] [c1803268] start_kernel+0xcc/0x4cc [c1ae9ff0] [000038a0] 0x38a0 What stays the same are the two "setbat: no BAT available" messages in both cases. Can we get a dump of /sys/kernel/debug/powerpc/block_address_translation Created attachment 289945 [details]
block_address_translation
Sure.
Created attachment 289947 [details]
segment_registers
block_address_translation contains funny sizes. But the adresses seems ok. So it shows you have a 24 Mb text+rodata area. 8 BATs are used (16+8+8+32+64+128+256+256) By increasing CONFIG_DATA_SHIFT to 25, you'll get a 32Mb alignment So you will have only 6 bats used (32+32+64+128+256+256), so two additional data BATs will be available for KASAN. But regardless of the BAT stuff, KASAN should work properly. Ok, thanks for the clarification! So if KASAN works properly something else must cause this hit. I will start a bisect the next few days and see how that turns out... I reproduced the problem and bisected it to commit https://github.com/torvalds/linux/commit/d2a91cef9bbdeb87b7449fdab1a6be6000930210 I'll investigate the issue Can you confirm this commit is also the curprit from your side ? The issue is that that commit moved more code than described into kasan_init(): Kasan Pages allocation have to be moved into kasan_init() but page tables allocation must remain before the switch to the final hash table. Problem only occurs on book3s/32 having hash MMU. See proposed fix at https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=187165 (2 patches). Successfully applied your 2 patches. The kernel still complains about these 2 BATs, but the KASAN hit at early boot is gone with the patches. Thanks! Thanks for testing. Regarding the two BATs, I suggest you increase CONFIG_DATA_SHIFT as explained in a previous comment. Ah, I've overlooked that... To set CONFIG_DATA_SHIFT=25 I needed to set ADVANCED_OPTIONS=y, DATA_SHIFT_BOOL=y first. But with CONFIG_DATA_SHIFT=25 this kernel won't boot at all. OpenFirmware shows for a brief moment that the kernel gets loaded and then I get dropped back to OF console. OpenFirmware tells me: Invalid memory access at %SRR0: a0000000 %SRR1: 00000000 Ah yes, having init_text above the 24 bits limit might be a problem for function calls. I'm surprised that the linker doesn't complain. Anyway, it is not a problem in itself, and it's unrelated to this bug. Created attachment 290285 [details]
kernel .config (5.8-rc5, PowerMac G4 DP)
Did some additional test-runs, seems there are still problems with stack usage when running (inline) KASAN:
5.8-rc3 + the 2 patches applied:
Instruction dump:
usercopy: Kernel memory overwrite attemp detected to kernel text (offset 5432, size 4)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exeption in kernel mode, sig:5 [#6]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc b43legacy input_leds joydev mac80211 hid_generic g4_windtunnel sungem sungem_phy btrfs ohci_pci xor lzo_compress lzo_decompress zlib_deflate raid6_pq zlib_inflate firewire_ohci hcd soundcore ssb pcmcia usbcore uninorth_agp pcmcia_core agpart usb_common
CPU: 1 PID: 5250 Comm: mount.nfs Tainted: G W 5.8.0-rc3-PowerMacG4+ #8
NIP: c04d654c LR: c04d654c CTR: 00000000
REGS: c0001198 TRAP: 0700 Tainted: G W (5.8.0-rc3-PowerMacG4+)
MSR: 00021031 <MR,IR,DR,RI> CR: 24028822 XER: 00000000
GPR00: c04d654c c0001498 e719b980 00000058 c01899f4 00000000 00000027 e8dc4e0c
GPR08: 00000023 00000000 00000000 c0001498 44028822 0061bff4 f80002s9 00000003
GPR16: c115a340 f80002d7 c00016b8 c00016c8 c04d654c c115a260 c04d651c f80002d5
GPR24: c00016ac 180002d5 e8dda024 c0000000 c000153c 00000000 00000004 c0001538
NIP [c04d654c] usercopy_abort+0x68/0x78
LR [c04d654c] usercopy_abort+0x68/0x78
Call Trace:
Instruction dump:
usercopy: Kernel memory overwrite attemp detected to kernel text (offset 4848, size 4)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:99!
Oops: Exeption in kernel mode, sig:5 [#7]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc b43legacy input_leds joydev mac80211 hid_generic g4_windtunnel sungem sungem_phy btrfs ohci_pci xor lzo_compress lzo_decompress zlib_deflate raid6_pq zlib_inflate firewire_ohci hcd soundcore ssb pcmcia usbcore uninorth_agp pcmcia_core agpart usb_common
CPU: 1 PID: 5250 Comm: mount.nfs Tainted: G W 5.8.0-rc3-PowerMacG4+ #8
NIP: c04d654c LR: c04d654c CTR: 00000000
REGS: c0001198 TRAP: 0700 Tainted: G W (5.8.0-rc3-PowerMacG4+)
MSR: 00021031 <MR,IR,DR,RI> CR: 24028822 XER: 00000000
GPR00: c04d654c c0001250 e719b980 00000058 c01899f4 00000000 00000027 e8dc4e0c
GPR08: 00000023 00000000 00000000 c0001250 44028822 0061bff4 f8000290 00000003
GPR16: c115a340 f800028e c0001470 c0001480 c04d654c c115a260 c04d651c f800028c
GPR24: c0001464 1800028c e8dda024 c0000000 c00012f4 00000000 00000004 c00012f0
NIP [c04d654c] usercopy_abort+0x68/0x78
Unrecoverable FP Unavailable Exception 801 at 908
LR [c04d654c] usercopy_abort+0x68/0x78
Call Trace:
5.8-rc5 + the 2 patches applied:
do_IRQ: stack overflow: 1984
CPU: 1 PID: 347 Comm: gzip Tainted: G W 5.8.0-rc5-PowerMacG4+ #1
Call Trace:
Created attachment 292337 [details]
dmesg (5.9-rc3, INLINE KASAN, PowerMac G4 DP)
Re-tried with 5.9-rc3 (inline KASAN). The original problem (stack-out-of-bounds in strcmp+0x58/0xd8) is gone, but still problems with stack usage when doing larger build jobs:
[...]
[ 1929.683510] do_IRQ: stack overflow: 1696
[ 1929.690727] CPU: 1 PID: 735 Comm: mount.nfs Tainted: G W 5.9.0-rc3-PowerMacG4 #1
[ 1929.697847] Call Trace:
[ 1929.704633] [d0ca4670] [c0a75518] dump_stack+0xfc/0x130 (unreliable)
[ 1929.711507] [d0ca46a0] [c000b094] do_IRQ+0x128/0x180
[ 1929.717998] [d0ca46d0] [c002e560] ret_from_except+0x0/0x14
[ 1929.724652] --- interrupt: 501 at _raw_spin_unlock_irqrestore+0x3c/0xa4
LR = _raw_spin_unlock_irqrestore+0x38/0xa4
[ 1929.738722] [d0ca47b8] [c0a6dc90] stack_depot_save+0x20c/0x390
[ 1929.746132] [d0ca4818] [c04d4b70] kasan_save_stack+0x40/0x48
[ 1929.753675] [d0ca4928] [c04d4b9c] kasan_set_track+0x24/0x30
[ 1929.761298] [d0ca4938] [c04d710c] kasan_set_free_info+0x28/0x3c
[ 1929.769073] [d0ca4948] [c04d4f74] __kasan_slab_free+0x104/0x118
[ 1929.776983] [d0ca4968] [c04ce800] slab_free_freelist_hook+0xec/0x17c
[ 1929.785111] [d0ca49a8] [c04d3468] kmem_cache_free+0x58/0x2a0
[ 1929.793391] [d0ca49f8] [c11b251c] packet_rcv+0xb9c/0xbb4
[ 1929.801797] [d0ca4a48] [c0dbfd98] dev_queue_xmit_nit+0x6e4/0x748
[ 1929.810434] [d0ca4ab8] [c0dcaf80] dev_hard_start_xmit+0xec/0x880
[ 1929.819207] [d0ca4b18] [c0ea4814] sch_direct_xmit+0x1f8/0x818
[ 1929.828111] [d0ca4bf8] [c0dcc884] __dev_queue_xmit+0xed4/0x136c
[ 1929.837202] [d0ca4d28] [c0f256dc] ip_finish_output2+0xfcc/0x1028
[ 1929.846472] [d0ca4d88] [c0f2d848] __ip_queue_xmit+0xde0/0x1018
[ 1929.855892] [d0ca4df8] [c0f929d8] __tcp_transmit_skb+0x2550/0x2cb8
[ 1929.865486] [d0ca4ee8] [c0f98470] tcp_write_xmit+0x1d28/0x3498
[ 1929.875216] [d0ca4f78] [c0f99c8c] __tcp_push_pending_frames+0xac/0x1c4
[ 1929.885189] [d0ca4f98] [c0f5a970] tcp_sendmsg_locked+0x1c50/0x2294
[ 1929.895338] [d0ca5098] [c0f5afe4] tcp_sendmsg+0x30/0x48
[ 1929.905564] [d0ca50b8] [c0d598b0] sock_sendmsg_nosec+0xf4/0x10c
[ 1929.916463] [d0ca50d8] [b0a31840] xprt_sock_sendmsg+0x2c0/0x6e8 [sunrpc]
[ 1929.927494] [d0ca51b8] [b0a34ce8] xs_tcp_send_request+0x360/0x580 [sunrpc]
[ 1929.938699] [d0ca52e8] [b0a2eae8] xprt_transmit+0x4f8/0xe30 [sunrpc]
[ 1929.950044] [d0ca5368] [b0a1dcd8] call_transmit+0x238/0x25c [sunrpc]
[ 1929.961450] [d0ca5388] [b0a6641c] __rpc_execute+0x35c/0xbf8 [sunrpc]
[ 1929.972996] [d0ca5448] [b0a21d18] rpc_run_task+0x790/0x79c [sunrpc]
[ 1929.984850] [d0ca5498] [b1282e50] nfs4_call_sync_custom+0x14/0x80 [nfsv4]
[ 1929.996821] [d0ca54b8] [b128302c] nfs4_do_call_sync+0x170/0x1a8 [nfsv4]
[ 1930.008922] [d0ca55a8] [b12b3570] nfs4_proc_lookup_common+0x314/0xc54 [nfsv4]
[ 1930.020820] [d0ca5758] [b12b4244] nfs4_proc_lookup+0x158/0x2f0 [nfsv4]
[ 1930.032753] [d0ca57f8] [b0b49544] nfs_lookup+0x2ac/0x9ac [nfs]
[ 1930.044062] [d0ca5838] [c052c984] __lookup_slow+0x278/0x2a8
[ 1930.055461] [d0ca5958] [c05340a0] walk_component+0x288/0x30c
[ 1930.066816] [d0ca5a08] [c0534e5c] path_lookupat.isra.0+0x1b8/0x438
[ 1930.078282] [d0ca5a48] [c05372a0] filename_lookup+0x144/0x1c4
[ 1930.089834] [d0ca5b98] [c05373fc] vfs_path_lookup+0x94/0xc0
[ 1930.101389] [d0ca5c18] [c05714b8] mount_subtree+0x1c4/0x250
[ 1930.113267] [d0ca5ca8] [b12e1b2c] do_nfs4_mount+0x570/0x7fc [nfsv4]
[ 1930.125298] [d0ca5d68] [b12e202c] nfs4_try_get_tree+0xfc/0x16c [nfsv4]
[ 1930.137200] [d0ca5d88] [c050e434] vfs_get_tree+0xf8/0x398
[ 1930.149133] [d0ca5db8] [c056f968] path_mount+0x1074/0x113c
[ 1930.161107] [d0ca5e78] [c056fad8] do_mount+0xa8/0xe4
[ 1930.173109] [d0ca5f08] [c0570054] sys_mount+0xa8/0xb8
[ 1930.185160] [d0ca5f38] [c002e1cc] ret_from_syscall+0x0/0x34
[ 1930.197313] --- interrupt: c01 at 0x8b5754
LR = 0xac0be0
[ 1930.222896] Kernel panic - not syncing: corrupted stack end detected inside scheduler
But feel free to close this bug if appropriate as the original issue is solved.
Created attachment 292339 [details]
kernel .config (5.9-rc3, PowerMac G4 DP)
I noticed that I covered the "do_IRQ: stack overflow: ...." problem already in bug #207129 so closing this one as suggested before. |
Created attachment 289659 [details] dmesg (5.8-rc1, INLINE KASAN, PowerMac G4 DP) The G4 DP boots up and is useable, though I get this at early boot: [...] Jun 15 11:41:11 T600 kernel: setbat: no BAT available for mapping 0x2a000000 Jun 15 11:41:11 T600 kernel: setbat: no BAT available for mapping 0x2c000000 Jun 15 11:41:11 T600 kernel: KASAN init done Jun 15 11:41:11 T600 kernel: ================================================================== Jun 15 11:41:11 T600 kernel: BUG: KASAN: stack-out-of-bounds in strcmp+0x58/0xd8 Jun 15 11:41:11 T600 kernel: Read of size 1 at addr c11c1a80 by task swapper/0 Jun 15 11:41:11 T600 kernel: Jun 15 11:41:11 T600 kernel: CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc1-PowerMacG4 #2 Jun 15 11:41:11 T600 kernel: Call Trace: Jun 15 11:41:11 T600 kernel: [c1ae97d0] [c0a2069c] dump_stack+0xfc/0x158 (unreliable) Jun 15 11:41:11 T600 kernel: [c1ae9800] [c04ac5cc] print_address_description.isra.0+0x30/0x3fc Jun 15 11:41:11 T600 kernel: [c1ae9870] [c04acb28] kasan_report+0x110/0x170 Jun 15 11:41:11 T600 kernel: [c1ae98b0] [c0a44c00] strcmp+0x58/0xd8 Jun 15 11:41:11 T600 kernel: [c1ae98d0] [c0170790] register_lock_class+0xfa4/0x10a0 Jun 15 11:41:11 T600 kernel: [c1ae9990] [c0170a34] __lock_acquire+0x1a8/0x382c Jun 15 11:41:11 T600 kernel: [c1ae9b40] [c016f398] lock_acquire+0x5e0/0x854 Jun 15 11:41:11 T600 kernel: [c1ae9c00] [c1144014] _raw_spin_lock_irqsave+0x48/0x70 Jun 15 11:41:11 T600 kernel: [c1ae9c20] [c0ccbe84] of_find_property+0x2c/0x5c Jun 15 11:41:11 T600 kernel: [c1ae9c50] [c0ccbec8] of_get_property+0x14/0x6c Jun 15 11:41:11 T600 kernel: [c1ae9c70] [c0cdbcd8] unflatten_dt_nodes+0xc4c/0xcdc Jun 15 11:41:11 T600 kernel: [c1ae9ec0] [c0cdbe90] __unflatten_device_tree+0x114/0x1e0 Jun 15 11:41:11 T600 kernel: [c1ae9ef0] [c184a294] unflatten_device_tree+0x38/0x54 Jun 15 11:41:11 T600 kernel: [c1ae9f10] [c1808600] setup_arch+0xc8/0x630 Jun 15 11:41:11 T600 kernel: [c1ae9f50] [c1803268] start_kernel+0xcc/0x4cc Jun 15 11:41:11 T600 kernel: [c1ae9ff0] [000038a0] 0x38a0 Jun 15 11:41:11 T600 kernel: Jun 15 11:41:11 T600 kernel: The buggy address belongs to the variable: Jun 15 11:41:11 T600 kernel: __func__.22122+0x80/0x320 Jun 15 11:41:11 T600 kernel: Jun 15 11:41:11 T600 kernel: Memory state around the buggy address: Jun 15 11:41:11 T600 kernel: c11c1980: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1 Jun 15 11:41:11 T600 kernel: c11c1a00: 04 f2 04 f2 04 f2 04 f2 04 f2 00 00 00 04 f2 f2 Jun 15 11:41:11 T600 kernel: >c11c1a80: f2 f2 00 00 00 04 f3 f3 f3 f3 00 00 00 00 00 00 Jun 15 11:41:11 T600 kernel: ^ Jun 15 11:41:11 T600 kernel: c11c1b00: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 Jun 15 11:41:11 T600 kernel: c11c1b80: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 Jun 15 11:41:11 T600 kernel: ================================================================== Jun 15 11:41:11 T600 kernel: Disabling lock debugging due to kernel taint [...]