Bug 204479

Summary: KASAN hit at modprobe zram
Product: Drivers Reporter: Erhard F. (erhard_f)
Component: Flash/Memory Technology DevicesAssignee: David Woodhouse (dwmw2)
Status: RESOLVED CODE_FIX    
Severity: normal CC: christophe.leroy, kasan-dev, linuxppc-dev
Priority: P1    
Hardware: PPC-32   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=204819
Kernel Version: 5.3.0-rc2 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg (kernel 5.3-rc2, PowerMac G4 DP)
kernel .config (5.3-rc2, PowerMac G4 DP)
kernel .config (5.3-rc3, PowerMac G4 DP)
dmesg (kernel 5.3-rc3 + patch, PowerMac G4 DP)
dmesg (kernel 5.3-rc3 + patch, without CONFIG_SMP, PowerMac G4 DP)
dmesg (kernel 5.3-rc3 + patch + 2nd patch, without CONFIG_SMP, PowerMac G4 DP)
dmesg (kernel 5.3-rc3 + patch + 2nd patch, without CONFIG_SMP, v2, PowerMac G4 DP)
dmesg (kernel 5.3-rc3 + debug patch + shadow patch + parallel patch, PowerMac G4 DP)
Disassembly of lib/raid6/altivec8.o
dmesg (kernel 5.3-rc4 + shadow patch + parallel patch, PowerMac G4 DP)
kernel .config (5.3-rc4, PowerMac G4 DP)

Description Erhard F. 2019-08-05 00:16:02 UTC
Created attachment 284175 [details]
dmesg (kernel 5.3-rc2, PowerMac G4 DP)

During boot I set up a 256 MiB zram swap on my PowerMac G4 DP. This fails with KASAN enabled.

# systemctl status zram_swap
● zram_swap.service - swap with zram
   Loaded: loaded (/lib/systemd/system/zram_swap.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/zram_swap.service.d
           └─override.conf
   Active: failed (Result: exit-code) since Mon 2019-08-05 01:17:04 CEST; 8min ago
  Process: 86 ExecStart=/sbin/zram-init -d0 -s2 -alzo -Lzram_swap 256 (code=exited, status=1/FAILURE)
 Main PID: 86 (code=exited, status=1/FAILURE)

Aug 05 01:17:00 T600 zram-init[86]: /sbin/zram-init: Zeile 221:    87 Speicherzugriffsfehler  modprobe zram ${num:+num_devices=$num} > /dev/null 2>&1
Aug 05 01:17:03 T600 zram-init[86]: zram-init: cannot create /dev/zram0: /sys/class/zram-control missing
Aug 05 01:17:04 T600 systemd[1]: zram_swap.service: Main process exited, code=exited, status=1/FAILURE
Aug 05 01:17:04 T600 systemd[1]: zram_swap.service: Failed with result 'exit-code'.
Aug 05 01:17:04 T600 systemd[1]: Failed to start swap with zram.

[...]
[   12.409055] systemd[1]: Starting swap with zram...
[   12.438700] systemd[1]: Created slice User and Session Slice.
[   12.535683] BUG: Unable to handle kernel data access at 0xfe2543f8
[   12.545791] Faulting instruction address: 0xc0025ecc
[   12.555615] Oops: Kernel access of bad area, sig: 11 [#1]
[   12.555627] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
[   12.555636] Modules linked in: zsmalloc(+)
[   12.555661] CPU: 0 PID: 87 Comm: modprobe Tainted: G        W         5.3.0-rc2+ #6
[   12.555670] NIP:  c0025ecc LR: c023b970 CTR: 00000001
[   12.555681] REGS: ec01dae0 TRAP: 0300   Tainted: G        W          (5.3.0-rc2+)
[   12.555686] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 44228828  XER: 00000000
[   12.625679] DAR: fe2543f8 DSISR: 0a000000 
               GPR00: c023d05c ec01db98 eceed840 fe2543f8 00000000 00000001 fe2543f7 00000003 
               GPR08: ec01da7c 1e2543f8 00000000 f10790c8 c023d018 0082fff4 ec01deb8 ec01deb0 
               GPR16: f107750c 00000124 f12a1f00 f1055000 c0103664 c144c440 00000017 ec713d18 
               GPR24: f12a1f5c f12a1f60 f10790c8 f12a1e90 00000011 00000001 00000008 f12a1fc0 
[   12.625802] NIP [c0025ecc] __memset+0xb4/0xc0
[   12.625836] LR [c023b970] kasan_unpoison_shadow+0x20/0x40
[   12.699705] Call Trace:
[   12.699716] [ec01db98] [ee802420] 0xee802420 (unreliable)
[   12.699733] [ec01dba8] [c023d05c] __asan_register_globals+0x44/0x68
[   12.699753] [ec01dbc8] [c01077e0] do_init_module+0x134/0x2c4
[   12.699767] [ec01dbf8] [c010a5b0] load_module+0x2b60/0x2d1c
[   12.699782] [ec01ddf8] [c010a9d0] sys_finit_module+0x100/0x140
[   12.699796] [ec01df38] [c001a274] ret_from_syscall+0x0/0x34
[   12.699808] --- interrupt: c01 at 0x67e2c4
                   LR = 0x8067c4
[   12.699814] Instruction dump:
[   12.699824] 4200fffc 70a50003 4d820020 7ca903a6 38c60003 9c860001 4200fffc 4e800020 
[   12.816341] 2c050000 4d820020 7ca903a6 38c3ffff <9c860001> 4200fffc 4e800020 7c032040 
[   12.816377] ---[ end trace bc9c74a40639ae92 ]---
Comment 1 Erhard F. 2019-08-05 00:16:51 UTC
Created attachment 284177 [details]
kernel .config (5.3-rc2, PowerMac G4 DP)
Comment 2 Christophe Leroy 2019-08-05 07:07:24 UTC
Looks like not only this modules but other ones too fail to load as far as I can tell from the attached dmesg.

A powerpc dedicated module_alloc() function is defined in arch/powerpc/mm/kasan/kasan_init_32.c to create the shadow area associated to the module being loaded. This function seems to be OK for several drivers but not for all.
Comment 3 Erhard F. 2019-08-05 14:00:06 UTC
Yes, at least one usb driver is also affected.

Also radeon.ko sometimes loads ok, sometimes it stalls:
# modprobe -v radeon
insmod /lib/modules/5.3.0-rc2+/kernel/drivers/i2c/algos/i2c-algo-bit.ko
Comment 4 Christophe Leroy 2019-08-08 13:57:01 UTC
We need to identify if the allocation of KASAN shadow area at module allocation fails, or if kasan accesses outside of the allocated area.

Could you please run again with the below trace: 

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index 74f4555a62ba..2bca2bf691a9 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -142,6 +142,9 @@ void *module_alloc(unsigned long size)
 	if (!base)
 		return NULL;
 
+	pr_err("###### module_alloc(%lx) = %px [%px-%px]\n", size, base,
+	       kasan_mem_to_shadow(base), kasan_mem_to_shadow(base + size));
+
 	if (!kasan_init_region(base, size))
 		return base;
Comment 5 Erhard F. 2019-08-08 21:55:28 UTC
Created attachment 284271 [details]
kernel .config (5.3-rc3, PowerMac G4 DP)
Comment 6 Erhard F. 2019-08-08 21:57:55 UTC
(In reply to Christophe Leroy from comment #4)
> We need to identify if the allocation of KASAN shadow area at module
> allocation fails, or if kasan accesses outside of the allocated area.
> 
> Could you please run again with the below trace: 
The patch did not apply to the mainstream kernnel with 'patch -p1 < ...' but I inserted the code manually. Please find the new results attached.
Comment 7 Erhard F. 2019-08-08 21:58:41 UTC
Created attachment 284273 [details]
dmesg (kernel 5.3-rc3 + patch, PowerMac G4 DP)
Comment 8 Christophe Leroy 2019-08-09 05:34:29 UTC
List of allocated areas with associated kasan shadow area in [ ], together with the addresses when shadow initialisation fails:

Aug 08 23:39:58 T600 kernel: ###### module_alloc(c78c) = f1470000 [fe28e000-fe28f8f1]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(36f8) = f147e000 [fe28fc00-fe2902df]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(c78c) = f1483000 [fe290600-fe291ef1]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(c78c) = f1491000 [fe292200-fe293af1]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(36f8) = f1502000 [fe2a0400-fe2a0adf]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(1521) = f1013000 [fe202600-fe2028a4]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(13bc5) = f103d000 [fe207a00-fe20a178]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(1357) = f1027000 [fe204e00-fe20506a]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(36f8) = f102a000 [fe205400-fe205adf]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(4301) = f102f000 [fe205e00-fe206660]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(4718) = f1065000 [fe20ca00-fe20d2e3]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(19ac) = f1076000 [fe20ec00-fe20ef35]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(4718) = f129d000 [fe253a00-fe2542e3]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(16ca) = f102a000 [fe205400-fe2056d9]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(1f81) = f1079000 [fe20f200-fe20f5f0]
Aug 08 23:39:58 T600 kernel: ###### module_alloc(1f81) = f1027000 [fe204e00-fe2051f0]
Aug 08 23:39:59 T600 kernel: BUG: Unable to handle kernel data access at 0xfe20d040
Aug 08 23:39:59 T600 kernel: ###### module_alloc(185ef) = f12d0000 [fe25a000-fe25d0bd]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(4035) = f106b000 [fe20d600-fe20de06]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(6196) = f12b3000 [fe256600-fe257232]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(1d27) = f1071000 [fe20e200-fe20e5a4]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(4035) = f102d000 [fe205a00-fe206206]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(a11b) = f13ad000 [fe275a00-fe276e23]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(4035) = f12b3000 [fe256600-fe256e06]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(4035) = f12ea000 [fe25d400-fe25dc06]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(1d27) = f1033000 [fe206600-fe2069a4]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(4035) = f1397000 [fe272e00-fe273606]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(307a) = f12f0000 [fe25e000-fe25e60f]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(1d27) = f1062000 [fe20c400-fe20c7a4]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(1d27) = f12f7000 [fe25ee00-fe25f1a4]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(1d27) = f12fd000 [fe25fa00-fe25fda4]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(d102) = f1429000 [fe285200-fe286c20]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(2a37) = f1033000 [fe206600-fe206b46]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(4718) = f106b000 [fe20d600-fe20dee3]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(9a3f2) = f1db8000 [fe3b7000-fe3ca47e]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(18571) = f13cd000 [fe279a00-fe27caae]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(1f81) = f1071000 [fe20e200-fe20e5f0]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(1fdb9) = f1438000 [fe287000-fe28afb7]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(56a49) = f1e54000 [fe3ca800-fe3d5549]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(56a49) = f1eac000 [fe3d5800-fe3e0549]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(56a49) = f1f04000 [fe3e0800-fe3eb549]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(7c61) = f12ea000 [fe25d400-fe25e38c]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(e011) = f140c000 [fe281800-fe283402]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(56a49) = f1f5c000 [fe3eb800-fe3f6549]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(56a49) = f1fb4000 [fe3f6800-fe401549]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(e011) = f1459000 [fe28b200-fe28ce02]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(e011) = f147e000 [fe28fc00-fe291802]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(2561) = f1033000 [fe206600-fe206aac]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(6ae1) = f12b3000 [fe256600-fe25735c]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(e011) = f148e000 [fe291c00-fe293802]
Aug 08 23:39:59 T600 kernel: ###### module_alloc(e011) = f200c000 [fe401800-fe403402]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(3355) = f1397000 [fe272e00-fe27346a]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(1c8f) = f12f7000 [fe25ee00-fe25f191]
Aug 08 23:40:00 T600 kernel: BUG: Unable to handle kernel data access at 0xfe2731a0
Aug 08 23:40:00 T600 kernel: ###### module_alloc(1c078) = f13cd000 [fe279a00-fe27d20f]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(12f27) = f13eb000 [fe27d600-fe27fbe4]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(12f27) = f13ff000 [fe27fe00-fe2823e4]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(12f27) = f1413000 [fe282600-fe284be4]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(12f27) = f1459000 [fe28b200-fe28d7e4]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(601c) = f12b3000 [fe256600-fe257203]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(3716) = f147e000 [fe28fc00-fe2902e2]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(3716) = f1483000 [fe290600-fe290ce2]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(3716) = f1488000 [fe291000-fe2916e2]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(3716) = f148d000 [fe291a00-fe2920e2]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(12f27) = f1647000 [fe2c8e00-fe2cb3e4]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(1b7a) = f1033000 [fe206600-fe20696f]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(3716) = f165b000 [fe2cb600-fe2cbce2]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(15f7) = f12fd000 [fe25fa00-fe25fcbe]
Aug 08 23:40:00 T600 kernel: ###### module_alloc(fe69) = f13ff000 [fe27fe00-fe281dcd]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(fe69) = f1410000 [fe282000-fe283fcd]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(fe69) = f1459000 [fe28b200-fe28d1cd]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(fe69) = f147e000 [fe28fc00-fe291bcd]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(3087) = f12b3000 [fe256600-fe256c10]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(3087) = f1421000 [fe284200-fe284810]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(3087) = f146a000 [fe28d400-fe28da10]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(2d72) = f1592000 [fe2b2400-fe2b29ae]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(1c09) = f12b8000 [fe257000-fe257381]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(1e4d) = f12fd000 [fe25fa00-fe25fdc9]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(3087) = f1596000 [fe2b2c00-fe2b3210]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(1562) = f1426000 [fe284c00-fe284eac]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(2d72) = f15ab000 [fe2b5600-fe2b5bae]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(24c2) = f15bc000 [fe2b7800-fe2b7c98]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(2d72) = f12b3000 [fe256600-fe256bae]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(2d72) = f1420000 [fe284000-fe2845ae]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(1c09) = f1424000 [fe284800-fe284b81]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(15b9) = f1469000 [fe28d200-fe28d4b7]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(1c09) = f146c000 [fe28d800-fe28db81]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(1c09) = f147e000 [fe28fc00-fe28ff81]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(2d72) = f1491000 [fe292200-fe2927ae]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(1c09) = f1495000 [fe292a00-fe292d81]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(2068) = f1498000 [fe293000-fe29340d]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(13c0) = f1469000 [fe28d200-fe28d478]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(3f159) = f149c000 [fe293800-fe29b62b]
Aug 08 23:40:01 T600 kernel: ###### module_alloc(d8e5) = f14dd000 [fe29ba00-fe29d51c]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(4ed5) = f1500000 [fe2a0000-fe2a09da]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(2843) = f12b3000 [fe256600-fe256b08]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(2843) = f12b7000 [fe256e00-fe257308]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(2843) = f141e000 [fe283c00-fe284108]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(2584) = f1422000 [fe284400-fe2848b0]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(1be5) = f1426000 [fe284c00-fe284f7c]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(1be5) = f1467000 [fe28ce00-fe28d17c]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(2843) = f146a000 [fe28d400-fe28d908]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(1be5) = f147e000 [fe28fc00-fe28ff7c]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(1be5) = f1491000 [fe292200-fe29257c]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(3921) = f1523000 [fe2a4600-fe2a4d24]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(199d) = f1410000 [fe282000-fe282333]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(9d3b) = f1412000 [fe282400-fe2837a7]
Aug 08 23:40:02 T600 kernel: ###### module_alloc(2bfc) = f1422000 [fe284400-fe28497f]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(19a6) = f1422000 [fe284400-fe284734]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(5215) = f1514000 [fe2a2800-fe2a3242]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(1524) = f148b000 [fe291600-fe2918a4]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(1d62) = f148e000 [fe291c00-fe291fac]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(bf4f) = f1596000 [fe2b2c00-fe2b43e9]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(19bf4) = f15f7000 [fe2bee00-fe2c217e]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(2851) = f1481000 [fe290200-fe29070a]
Aug 08 23:40:03 T600 kernel: ###### module_alloc(52fd) = f1485000 [fe290a00-fe29145f]
Aug 08 23:40:04 T600 kernel: BUG: Unable to handle kernel data access at 0xfe2b40dc
Aug 08 23:40:04 T600 kernel: ###### module_alloc(f30b) = f14eb000 [fe29d600-fe29f461]
Aug 08 23:40:04 T600 kernel: ###### module_alloc(2d9d) = f1485000 [fe290a00-fe290fb3]
Aug 08 23:40:04 T600 kernel: BUG: Unable to handle kernel data access at 0xfe29f0b0
Aug 08 23:40:05 T600 kernel: ###### module_alloc(3f9d) = f1489000 [fe291200-fe2919f3]
Aug 08 23:40:05 T600 kernel: ###### module_alloc(1e82) = f148e000 [fe291c00-fe291fd0]
Aug 08 23:40:05 T600 kernel: ###### module_alloc(666f) = f151b000 [fe2a3600-fe2a42cd]
Aug 08 23:40:05 T600 kernel: ###### module_alloc(264d) = f14fc000 [fe29f800-fe29fcc9]
Aug 08 23:40:06 T600 kernel: ###### module_alloc(180d) = f12b3000 [fe256600-fe256901]
Aug 08 23:40:06 T600 kernel: ###### module_alloc(13fa) = f141d000 [fe283a00-fe283c7f]
Aug 08 23:40:06 T600 kernel: ###### module_alloc(74a8) = f1459000 [fe28b200-fe28c095]
Aug 08 23:40:06 T600 kernel: ###### module_alloc(2cc6) = f141d000 [fe283a00-fe283f98]
Aug 08 23:40:07 T600 kernel: ###### module_alloc(eeb18) = f19ee000 [fe33dc00-fe35b963]
Aug 08 23:40:07 T600 kernel: ###### module_alloc(2f546) = f1528000 [fe2a5000-fe2aaea8]
Aug 08 23:40:07 T600 kernel: ###### module_alloc(37fed) = f1559000 [fe2ab200-fe2b21fd]
Aug 08 23:40:07 T600 kernel: ###### module_alloc(37fed) = f1c94000 [fe392800-fe3997fd]
Aug 08 23:40:07 T600 kernel: ###### module_alloc(37fed) = f1ccd000 [fe399a00-fe3a09fd]
Aug 08 23:40:07 T600 kernel: ###### module_alloc(4fa5) = f1462000 [fe28c400-fe28cdf4]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(4fa5) = f1468000 [fe28d000-fe28d9f4]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(4fa5) = f148e000 [fe291c00-fe2925f4]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(37fed) = f1e54000 [fe3ca800-fe3d17fd]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(4fa5) = f14dd000 [fe29ba00-fe29c3f4]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(114e6) = f15a3000 [fe2b4600-fe2b689c]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(114e6) = f15c0000 [fe2b8000-fe2ba29c]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(114e6) = f15d3000 [fe2ba600-fe2bc89c]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(3afc) = f141d000 [fe283a00-fe28415f]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(3afc) = f1462000 [fe28c400-fe28cb5f]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(114e6) = f16de000 [fe2dbc00-fe2dde9c]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(3afc) = f1467000 [fe28ce00-fe28d55f]
Aug 08 23:40:08 T600 kernel: ###### module_alloc(3afc) = f148e000 [fe291c00-fe29235f]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(1bde) = f141d000 [fe283a00-fe283d7b]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(16f8) = f146d000 [fe28da00-fe28dcdf]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(1bde) = f147e000 [fe28fc00-fe28ff7b]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(1bde) = f148e000 [fe291c00-fe291f7b]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(1bde) = f1491000 [fe292200-fe29257b]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(16f8) = f14e8000 [fe29d000-fe29d2df]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(16f8) = f14fc000 [fe29f800-fe29fadf]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(16f8) = f1511000 [fe2a2200-fe2a24df]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(105dd6) = f1612000 [fe2c2400-fe2e2fba]
Aug 08 23:40:09 T600 kernel: ###### module_alloc(2b1fa) = f1719000 [fe2e3200-fe2e883f]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(35b8) = f1466000 [fe28cc00-fe28d2b7]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(2006) = f1554000 [fe2aa800-fe2aac00]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(2c633) = f1528000 [fe2a5000-fe2aa8c6]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(5c9d) = f14dd000 [fe29ba00-fe29c593]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(571d) = f14e4000 [fe29c800-fe29d2e3]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(571d) = f1506000 [fe2a0c00-fe2a16e3]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(571d) = f150d000 [fe2a1a00-fe2a24e3]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(22af) = f1462000 [fe28c400-fe28c855]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(22af) = f146b000 [fe28d600-fe28da55]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(2946) = f148e000 [fe291c00-fe292128]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(22af) = f14fc000 [fe29f800-fe29fc55]
Aug 08 23:40:10 T600 kernel: ###### module_alloc(18bc) = f147e000 [fe28fc00-fe28ff17]
Comment 9 Christophe Leroy 2019-08-09 05:38:09 UTC
The module loads seems to be nested. It might then be an SMP issue, kasan_init_region() is most likely not SMP safe.

Could you test without CONFIG_SMP or with only one CPU ?
Comment 10 Erhard F. 2019-08-09 10:44:03 UTC
Created attachment 284297 [details]
dmesg (kernel 5.3-rc3 + patch, without CONFIG_SMP, PowerMac G4 DP)

Here's the dmesg with the kernel built without CONFIG_SMP.
Comment 11 Christophe Leroy 2019-08-09 14:08:01 UTC
Thanks. Then it is not about SMP allthough there's anyway a theoritical problem with SMP that's I'll address in another patch.

I think I finally spotted the issue. Let's take the first occurence of the first log:

Aug 08 23:39:58 T600 kernel: ###### module_alloc(4718) = f1065000 [fe20ca00-fe20d2e3]
[...]
Aug 08 23:39:59 T600 kernel: BUG: Unable to handle kernel data access at 0xfe20d040

In kasan_init_region(), the loop starts with k_cur = 0xfe20ca00 to set the pte for the first shadow page at 0xfe20c000. Then k_cur is increased by PAGE_SIZE so now k_cur = 0xfe20da00.

As this is over 0xfe20d2e3, it doesn't set the pte for the second page at 0xfe20d000.

It should be fixed by changing the init value of k_cur in the for() loop of kasan_init_region() by:

for (k_cur = k_start & PAGE_MASK; ....)

Can you test it ?
Comment 12 Christophe Leroy 2019-08-09 15:10:21 UTC
Patch at https://patchwork.ozlabs.org/patch/1144756/
Comment 13 Erhard F. 2019-08-09 17:50:08 UTC
Created attachment 284301 [details]
dmesg (kernel 5.3-rc3 + patch + 2nd patch, without CONFIG_SMP, PowerMac G4 DP)

Definitely an improvement with the latest patch. b43legacy and nfs load now reliably without Oops.
Comment 14 Erhard F. 2019-08-09 17:52:05 UTC
Created attachment 284303 [details]
dmesg (kernel 5.3-rc3 + patch + 2nd patch, without CONFIG_SMP, v2, PowerMac G4 DP)

However the radeon module und btrfs (if built as module) still freeze the machine until the 2min reboot timer kicks in. Also some EHCI driver modules oopses, but not always.
Comment 15 Christophe Leroy 2019-08-09 20:31:52 UTC
As far as I can see in the latest dmesg, the Oops occurs in raid6 pq module.
An this time it is not anymore in kasan register global.
Comment 16 Erhard F. 2019-08-10 14:42:17 UTC
Created attachment 284309 [details]
dmesg (kernel 5.3-rc3 + debug patch + shadow patch + parallel patch, PowerMac G4 DP)

Also tested your powerpc-kasan-fix-parallele-loading-of-modules.diff now which seems to work fine! dmesg from the G4 DP with CONFIG_SMP back on is almost identical to non-smp kernel dmesg.

raid6 pq reliably oopses. Probably the 1st issue revealed by ppc32 KASAN. ;)

Loading the radeon module at boot still freezes the G4. modprobing it later on works, without any special dmesg output, switching display over from Offb to radeonfb.
Comment 17 Christophe Leroy 2019-08-12 16:01:05 UTC
Created attachment 284343 [details]
Disassembly of lib/raid6/altivec8.o
Comment 18 Christophe Leroy 2019-08-12 16:06:11 UTC
The Oops occurs at 0x3c8:

 3b0:	81 21 00 88 	lwz     r9,136(r1)
 3b4:	13 67 dc c4 	vxor    v27,v7,v27
 3b8:	7d 11 a8 ce 	lvx     v8,r17,r21
 3bc:	11 5f 5b 06 	vcmpgtsb v10,v31,v11
 3c0:	11 6b 58 00 	vaddubm v11,v11,v11
 3c4:	81 41 00 8c 	lwz     r10,140(r1)
>3c8:   7c 00 48 ce     lvx     v0,0,r9

This is because the value in r9 is most likely wrong.

r9 is loaded from the stack at 0x3b0

r9 was calculated and stored in the stack by the below code.

  70:	3d 20 00 00 	lis     r9,0
			72: R_PPC_ADDR16_HA	.rodata.cst16
  74:	3b b3 00 10 	addi    r29,r19,16
  78:	39 29 00 00 	addi    r9,r9,0
			7a: R_PPC_ADDR16_LO	.rodata.cst16
  7c:	91 21 00 88 	stw     r9,136(r1)

The value comes from .rodata.cst16

Two possibilities, either the value in .rodata.cst16 is wrong or the stack gets corrupted.

Maybe you could try disabling KASAN in lib/raid6/Makefile for altivec8.o ? Or maybe for the entire lib/raid6/ directory, just to see what happens ?
Comment 19 Erhard F. 2019-08-13 11:12:10 UTC
Created attachment 284355 [details]
dmesg (kernel 5.3-rc4 + shadow patch + parallel patch, PowerMac G4 DP)
Comment 20 Erhard F. 2019-08-13 11:21:57 UTC
(In reply to Christophe Leroy from comment #18)
> Two possibilities, either the value in .rodata.cst16 is wrong or the stack
> gets corrupted.
> 
> Maybe you could try disabling KASAN in lib/raid6/Makefile for altivec8.o ?
> Or maybe for the entire lib/raid6/ directory, just to see what happens ?
Disabled KASAN with KASAN_SANITIZE := n in lib/raid6/Makefile. As you can see in my latest dmesg, the G4 continues booting without further issues.

If btrfs gets loaded it still fails with KASAN (will update bug #204397).

Another funny issue. Mounting my nfs share works via:
modprobe nfs
mount /media/distanthome

If I mount it without modprobing nfs beforehand I get:
[...]
[   66.271748] ==================================================================
[   66.272076] BUG: KASAN: global-out-of-bounds in _copy_to_iter+0x3d4/0x5a8
[   66.272331] Write of size 4096 at addr f1c27000 by task modprobe/312

[   66.272598] CPU: 0 PID: 312 Comm: modprobe Tainted: G        W         5.3.0-rc4+ #1
[   66.272883] Call Trace:
[   66.272964] [e100b848] [c075026c] dump_stack+0xb0/0x10c (unreliable)
[   66.273211] [e100b878] [c02334a8] print_address_description+0x80/0x45c
[   66.273456] [e100b908] [c0233128] __kasan_report+0x140/0x188
[   66.273667] [e100b948] [c0233fbc] check_memory_region+0x28/0x184
[   66.273889] [e100b958] [c023206c] memcpy+0x48/0x74
[   66.274061] [e100b978] [c044342c] _copy_to_iter+0x3d4/0x5a8
[   66.274265] [e100baa8] [c04437a8] copy_page_to_iter+0x90/0x550
[   66.274482] [e100bb08] [c01b6898] generic_file_read_iter+0x5c8/0x7bc
[   66.274720] [e100bb78] [c0249034] __vfs_read+0x1b0/0x1f4
[   66.274912] [e100bca8] [c0249134] vfs_read+0xbc/0x124
[   66.275094] [e100bcd8] [c02491f0] kernel_read+0x54/0x70
[   66.275284] [e100bd08] [c02535c8] kernel_read_file+0x240/0x358
[   66.275499] [e100bdb8] [c02537cc] kernel_read_file_from_fd+0x54/0x74
[   66.275737] [e100bdf8] [c01068ac] sys_finit_module+0xd8/0x140
[   66.275949] [e100bf38] [c001a274] ret_from_syscall+0x0/0x34
[   66.276152] --- interrupt: c01 at 0xa602c4
                   LR = 0xbe87c4


[   66.276417] Memory state around the buggy address:
[   66.276588]  f1c27a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   66.276824]  f1c27a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   66.277060] >f1c27b00: 00 00 00 00 00 00 00 00 05 fa fa fa fa fa fa fa
[   66.277293]                                    ^
[   66.277453]  f1c27b80: 07 fa fa fa fa fa fa fa 00 03 fa fa fa fa fa fa
[   66.277688]  f1c27c00: 04 fa fa fa fa fa fa fa 00 06 fa fa fa fa fa fa
[   66.277920] ==================================================================
[   66.428224] RPC: Registered named UNIX socket transport module.
[   66.428484] RPC: Registered udp transport module.
[   66.428647] RPC: Registered tcp transport module.
[   66.428809] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   66.741275] Key type dns_resolver registered
[   67.974192] NFS: Registering the id_resolver key type
[   67.974534] Key type id_resolver registered
[   67.974681] Key type id_legacy registered


But maybe it's better to not open too many ppc32 KASAN related bugs for now. ;) It probably can wait until you patches are in some later 5.3-rc I guess.
Comment 21 Erhard F. 2019-08-13 11:49:24 UTC
Created attachment 284361 [details]
kernel .config (5.3-rc4, PowerMac G4 DP)
Comment 22 Erhard F. 2019-10-06 20:52:26 UTC
On kernel 5.4-rc1 zram loads & runs fine without KASAN complaining.

As the original issue is fixed now I will close this bug. For any other issues KASAN was complaining about here, I will open new bugs if they still happen (like bug #205099).