Created attachment 284931 [details] dmesg (5.3-rc8, PowerMac G4 DP) I tried finding a minimal kernel .config for bug #204479 to rule out other influences. What I found was, that KASAN shows different BUG: Unable to handle kernel data access at 0x.......", depending on which module gets loaded/used first, like: [...] Sep 12 00:12:09 T600 kernel: BUG: Unable to handle kernel data access at 0xfe205150 Sep 12 00:12:09 T600 kernel: Faulting instruction address: 0xc00261d8 Sep 12 00:12:09 T600 kernel: Oops: Kernel access of bad area, sig: 11 [#1] Sep 12 00:12:09 T600 kernel: BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Sep 12 00:12:09 T600 kernel: Modules linked in: sungem_phy(+) usb_common Sep 12 00:12:09 T600 kernel: CPU: 0 PID: 117 Comm: systemd-udevd Tainted: G W 5.3.0-rc8 #3 Sep 12 00:12:09 T600 kernel: NIP: c00261d8 LR: c02355c4 CTR: c0236c6c Sep 12 00:12:09 T600 kernel: REGS: eb7d5ae0 TRAP: 0300 Tainted: G W (5.3.0-rc8) Sep 12 00:12:09 T600 kernel: MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48228828 XER: 20000000 Sep 12 00:12:09 T600 kernel: DAR: fe205150 DSISR: 0a000000 GPR00: c0236cb0 eb7d5b98 ebbd2320 fe205150 00000000 00000007 c01083dc 00000003 GPR08: e991025c 1e205150 00000000 f1013000 c0236c6c 007d6980 eb7d5eb8 eb7d5eb0 GPR16: f10504c0 00000124 f102a880 f103d000 c010426c c144c480 00000011 eb9ef9a0 GPR24: f102a8dc f102a8e0 f1013000 f102a810 0000002d 00000001 0000003c f1028a80 Sep 12 00:12:09 T600 kernel: NIP [c00261d8] __memset+0x10/0xc0 Sep 12 00:12:09 T600 kernel: LR [c02355c4] kasan_unpoison_shadow+0x20/0x40 Sep 12 00:12:09 T600 kernel: Call Trace: Sep 12 00:12:09 T600 kernel: [eb7d5b98] [ee802420] 0xee802420 (unreliable) Sep 12 00:12:09 T600 kernel: [eb7d5ba8] [c0236cb0] __asan_register_globals+0x44/0x68 Sep 12 00:12:09 T600 kernel: [eb7d5bc8] [c01083e8] do_init_module+0x134/0x2c4 Sep 12 00:12:09 T600 kernel: [eb7d5bf8] [c010b1b8] load_module+0x2b60/0x2d1c Sep 12 00:12:09 T600 kernel: [eb7d5df8] [c010b5d8] sys_finit_module+0x100/0x140 Sep 12 00:12:09 T600 kernel: [eb7d5f38] [c001a274] ret_from_syscall+0x0/0x34 Sep 12 00:12:09 T600 kernel: --- interrupt: c00 at 0x24f2c4 LR = 0x43e35c Sep 12 00:12:09 T600 kernel: Instruction dump: Sep 12 00:12:09 T600 kernel: 7c062396 7d4021d6 7d080214 7cca3050 90e30000 91030004 7cc33378 4e800020 Sep 12 00:12:09 T600 kernel: 28050004 418000a0 5084442e 5084801e <90830000> 4d820020 70600003 7ca02a14 Sep 12 00:12:09 T600 kernel: ---[ end trace 00769ab21374844d ]--- [...] This changes depending on which modules I build for the kernel. So I suspect KASAN to still not get the module loading quite right for all modules. At least on my system (PowerMac G4 DP).
Created attachment 284933 [details] kernel .config (5.3-rc8, PowerMac G4 DP)
Could you please provide the content of /sys/kernel/debug/kernel_page_tables together with the dmesg, and also an 'objdump -h' of the failing modules ?
Created attachment 284949 [details] dmesg (5.3-rc8, PowerMac G4 DP)
Created attachment 284951 [details] kernel_page_tables (5.3-rc8, PowerMac G4 DP)
Created attachment 284953 [details] objdump sungem (5.3-rc8, PowerMac G4 DP)
Aaarrgghhh ! kernel_page_tables addresses are wrong. Ok, I'll manage. For next time, please apply https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/mm/ptdump?h=next-20190904&id=7c7a532ba3fc51bf9527d191fb410786c1fdc73c
Created attachment 284961 [details] Manually fixed kernel_page_tables
You get: Sep 13 17:43:49 T600 kernel: BUG: Unable to handle kernel data access at 0xfe205150 However this area is properly mapped RW: 0xfe205000-0xfe205fff 0x297d6000 4K rw present dirty accessed There must be something wrong with hash table update.
I can apply the patch but how do I apply the manually fixed kernel_page_tables? Not quite sure how to proceed.
No no, what I mean by 'manually fixed kernel_page_tables' is that I fixed the file you attached so that it has the same values as what it would have had with the patch applied. Don't worry.
I have carefully reviewed the flushing calls and have not been able to identify any issue. Maybe a SMP issue ? Does this problem also happen without CONFIG_SMP ?
Created attachment 284991 [details] dmesg v2 (5.3-rc8 + ptdump patch)
Created attachment 284993 [details] kernel_page_tables v2 (5.3-rc8 + ptdump patch)
Created attachment 284995 [details] objdump firewire-ohci v2 (5.3-rc8 + ptdump patch)
Created attachment 284997 [details] objdump usbcore v2 (5.3-rc8 + ptdump patch)
Created attachment 284999 [details] dmesg v3 (5.3-rc8 + ptdump patch, NO SMP)
Created attachment 285001 [details] kernel_page_tables v3 (5.3-rc8 + ptdump patch, NO SMP)
Created attachment 285003 [details] objdump usbcore v3 (5.3-rc8 + ptdump patch, NO SMP)
Created attachment 285005 [details] objdump ohci-hcd v3 (5.3-rc8 + ptdump patch, NO SMP)
Created attachment 285007 [details] objdump firewire-ohci v3 (5.3-rc8 + ptdump patch, NO SMP)
Created attachment 285009 [details] objdump ehci-pci v3 (5.3-rc8 + ptdump patch, NO SMP)
(In reply to Christophe Leroy from comment #11) > I have carefully reviewed the flushing calls and have not been able to > identify any issue. > > Maybe a SMP issue ? Does this problem also happen without CONFIG_SMP ? For my 1st (v2) retest I applied the "powerpc/ptdump: Fix addresses display on PPC32" patch you linked on top of -rc8. For my 2nd (v3) retest I applied the patch and disabled SMP in the .config. As you can see the issue still persists without SMP. Also different 'problematic' modules popup even without SMP.
Thanks. The third one is interesting, as it shows : Sep 16 12:57:39 T600 kernel: BUG: Unable to handle kernel data access at 0xfe295404 0xfe295000-0xfe295fff 0x00d05000 4K user r present accessed So unlike the other cases, this one shows that the area is still read-only and pointing to the zero shadow area. Wondering whether for the other ones the area is not allocated to a further module loader after the failing one. Are you able to dump kernel_page_tables just after each module load failure before loading any additional module ?
Do you confirm that the two patches done in the scope of bug #204479 are included in your config, especially the first one: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/mm/kasan/kasan_init_32.c?id=663c0c9496a69f80011205ba3194049bcafd681d https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/arch/powerpc/mm/kasan/kasan_init_32.c?id=45ff3c55958542c3b76075d59741297b8cb31cbb
(In reply to Christophe Leroy from comment #24) > Do you confirm that the two patches done in the scope of bug #204479 are > included in your config, especially the first one: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/ > arch/powerpc/mm/kasan/kasan_init_32. > c?id=663c0c9496a69f80011205ba3194049bcafd681d > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/ > arch/powerpc/mm/kasan/kasan_init_32. > c?id=45ff3c55958542c3b76075d59741297b8cb31cbb Ahh, now we are getting closer... No, I did the testing with a vanilla 5.3-rc8 kernel assuming that powerpc/kasan: Fix shadow area set up for modules powerpc/kasan: Fix parallel loading of modules were already in 5.3-rc8. Now I realize I was wrong. In -rc8 the latest one (2019-07-31) is: powerpc/kasan: fix early boot failure on PPC32 The other two patches are dated 2019-08-20 and not in 5.3-rc8 or the newly released 5.3. Sorry for the noise! I will close the bug for now and retest as soon as the 2 patches are in 5.4-rcx or 5.3.x. If it still seems valid that time I will re-open.
If you want to test things that are not quite in mainline you can always try the "merge" branch of the powerpc tree. That is a merge of master, next and fixes, so should have any fixes that are in the pipeline.