Created attachment 301289 [details] kernel dmesg (kernel 5.19-rc4, Talos II) 5.19-rc4 boots ok when CONFIG_PPC_RADIX_MMU=y CONFIG_PPC_RADIX_MMU_DEFAULT=y is enabled in the .config but fails to boot when MMU is changed to # CONFIG_PPC_RADIX_MMU is not set CONFIG_PPC_HASH_MMU_NATIVE=y in the same .config. [...] Disabling lock debugging due to kernel taint Oops: Machine check, sig: 7 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=32 NUMA PowerNV Modules linked in: cbc aes_generic snd_hda_codec_hdmi libaes snd_hda_intel snd_intel_dspcfg xhci_pci snd_hda_codec snd_hwdep xhci_hcd snd_hda_core cfg80211 drm_ttm_helper ghash_generic rfkill ofpart ttm i2c_algo_bit snd_pcm powernv_flash vmx_crypto(+) ibmpowernv at24(+) usbcore drm_display_helper mtd gf128mul snd_timer hwmon opal_prd regmap_i2c usb_common drm_kms_helper sysimgblt syscopyarea snd sysfillrect fb_sys_fops soundcore zram pkcs8_key_parser zsmalloc powernv_cpufreq drm fuse drm_panel_orientation_quirks backlight configfs CPU: 9 PID: 0 Comm: swapper/9 Tainted: G M 5.19.0-rc4-P9 #4 NIP: 0000000000000000 LR: 0000000000000000 CTR: 00ac408f3f6b677d REGS: c0000007ffe7e900 TRAP: c000000000008354 Tainted: G M (5.19.0-rc4-P9) MSR: 0301010000000000 <> CR: c0000007ffe7ed40 XER: c0003d000007e680 CFAR: 0000000000000003 IRQMASK: 3 GPR00: 0000000000000000 c0000007ffe7eaa0 c0000007ffe7e990 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16: 0000000000000000 c0000007ffe7eaa0 c0000007ffe7ea30 0000000000000000 GPR20: c00000000004a3b4 0000000000000000 0000000000000000 c000000001237e00 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 NIP [0000000000000000] 0x0 LR [0000000000000000] 0x0 Call Trace: [c0000007ffe7eaa0] [c000000001237e00] 0xc000000001237e00 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 0000000000000000 ]--- input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:00.0/0000:01:00.1/sound/card0/input0 Adding 16777212k swap on /dev/nvme0n1p4. Priority:-2 extents:1 across:16777212k SSFS at24 7-0050: 256 byte spd EEPROM, read-only at24 7-0052: 256 byte spd EEPROM, read-only at24 8-0054: 256 byte spd EEPROM, read-only at24 8-0056: 256 byte spd EEPROM, read-only EXT4-fs (nvme0n1p2): mounting ext2 file system using the ext4 subsystem [drm] radeon kernel modesetting enabled. EXT4-fs (nvme0n1p2): mounted filesystem without journal. Quota mode: disabled. EXT4-fs (zram1): mounting ext2 file system using the ext4 subsystem EXT4-fs (zram1): mounted filesystem without journal. Quota mode: disabled. Oops: Machine check, sig: 7 [#2] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=32 NUMA PowerNV Modules linked in: xts ecb ctr evdev cbc aes_generic snd_hda_codec_hdmi libaes snd_hda_intel snd_intel_dspcfg xhci_pci snd_hda_codec snd_hwdep radeon(+) xhci_hcd snd_hda_core cfg80211 drm_ttm_helper ghash_generic rfkill ofpart ttm i2c_algo_bit snd_pcm powernv_flash vmx_crypto ibmpowernv at24 usbcore drm_display_helper mtd gf128mul snd_timer hwmon opal_prd regmap_i2c usb_common drm_kms_helper sysimgblt syscopyarea snd sysfillrect fb_sys_fops soundcore zram pkcs8_key_parser zsmalloc powernv_cpufreq drm fuse drm_panel_orientation_quirks backlight configfs CPU: 1 PID: 0 Comm: swapper/1 Tainted: G M D 5.19.0-rc4-P9 #4 NIP: 0000000000000000 LR: 0000000000000000 CTR: 0063d2a43fc97e45 REGS: c0000007ffede900 TRAP: c000000000008354 Tainted: G M D (5.19.0-rc4-P9) MSR: 0301010000000000 <> CR: c0000007ffeded40 XER: c0003d0000016680 CFAR: 0000000000000003 IRQMASK: 3 GPR00: 0000000000000000 c0000007ffedeaa0 c0000007ffede990 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16: 0000000000000000 c0000007ffedeaa0 c0000007ffedea30 0000000000000000 GPR20: c00000000004a3b4 0000000000000000 0000000000000000 c000000001237e00 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 NIP [0000000000000000] 0x0 LR [0000000000000000] 0x0 Call Trace: [c0000007ffedeaa0] [c000000001237e00] 0xc000000001237e00 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Aiee, killing interrupt handler! Oops: Machine check, sig: 7 [#3] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=32 NUMA PowerNV Modules linked in: xts ecb ctr evdev cbc aes_generic snd_hda_codec_hdmi libaes snd_hda_intel snd_intel_dspcfg xhci_pci snd_hda_codec snd_hwdep radeon(+) xhci_hcd snd_hda_core cfg80211 drm_ttm_helper ghash_generic rfkill ofpart ttm i2c_algo_bit snd_pcm powernv_flash vmx_crypto ibmpowernv at24 usbcore drm_display_helper mtd gf128mul snd_timer hwmon opal_prd regmap_i2c usb_common drm_kms_helper sysimgblt syscopyarea snd sysfillrect fb_sys_fops soundcore zram pkcs8_key_parser zsmalloc powernv_cpufreq drm fuse drm_panel_orientation_quirks backlight configfs CPU: 0 PID: 0 Comm: swapper/0 Tainted: G M D 5.19.0-rc4-P9 #4 NIP: 0000000000000000 LR: 0000000000000000 CTR: 007652c6b5124d60 REGS: c0000007ffeea900 TRAP: c000000000008354 Tainted: G M D (5.19.0-rc4-P9) MSR: 0301010000000000 <> CR: c0000007ffeead40 XER: c0003d0000009680 CFAR: 0000000000000003 IRQMASK: 3 GPR00: 0000000000000000 c0000007ffeeaaa0 c0000007ffeea990 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16: 0000000000000000 c0000007ffeeaaa0 c0000007ffeeaa30 0000000000000000 GPR20: c00000000004a3b4 0000000000000000 0000000000000000 c000000001237e00 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 NIP [0000000000000000] 0x0 LR [0000000000000000] 0x0 Call Trace: [c0000007ffeeaaa0] [c000000001237e00] 0xc000000001237e00 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 0000000000000000 ]--- Machine is a 2 X 4-core POWER9 Talos II: # lspci 0000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0000:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks XT [Radeon HD 6670/7670] 0000:01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series] 0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0001:01:00.0 Non-Volatile memory controller: Phison Electronics Corporation Device 5008 (rev 01) 0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) 0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04) 0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41) 0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
Created attachment 301290 [details] kernel .config (kernel 5.19-rc4, Talos II)
I can't repro this on my Talos 2. I have some different PCI devices, a different GPU and nvme controller. I can't see an obvious reason for this, will require some more digging.
Biggest difference probably is that I run the Talos 2 on Big Endian. ;) I'll check out older LTS kernels and see I can get a bisect if they just work with Hash MMU.
Tried https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-fixes&id=925b6e59138cefa47275c67891c65d48d3266d57 suggested in https://gitlab.freedesktop.org/drm/amd/-/issues/2050#note_1461646 but it did not work out. This bug here seems an entirely different matter.
Danm, posted that to the wrong bug... Sorry! Please ignore comment #4.
Created attachment 301395 [details] kernel .config (kernel 5.10.129, Talos II) Tried some LTS kernels and with 5.10.x I got a .config working to boot the Talos 2 with HASH MMU on my system. Also I found out that selecting CONFIG_PAGE_POISONING=y in the working 5.10.x config renders the kernel unbootable again. Though this seems a different issue, as simply deselecting PAGE_POISONING in my 5.19-rc .config did not help. So I opened bug #216238 for this issue. 5.11.x also boots with HASH MMU, but I got problems on 5.12.x again. 5.15 LTS shows almost the same behaviour as described here for 5.19-rc. At least I got a starting point now for a bisect.
Created attachment 301396 [details] kernel dmesg (kernel 5.10.129, Talos II)
Created attachment 301425 [details] bisect.log Successfully did a bisect which revealed this commit: # git bisect good a008f8f9fd67ffb13d906ef4ea6235a3d62dfdb6 is the first bad commit commit a008f8f9fd67ffb13d906ef4ea6235a3d62dfdb6 Author: Nicholas Piggin <npiggin@gmail.com> Date: Sat Jan 30 23:08:41 2021 +1000 powerpc/64s/hash: improve context tracking of hash faults This moves the 64s/hash context tracking from hash_page_mm() to __do_hash_fault(), so it's no longer called by OCXL / SPU accelerators, which was certainly the wrong thing to be doing, because those callers are not low level interrupt handlers, so should have entered a kernel context tracking already. Then remain in kernel context for the duration of the fault, rather than enter/exit for the hash fault then enter/exit for the page fault, which is pointless. Even still, calling exception_enter/exit in __do_hash_fault seems questionable because that's touching per-cpu variables, tracing, etc., which might have been interrupted by this hash fault or themselves cause hash faults. But maybe I miss something because hash_page_mm very deliberately calls trace_hash_fault too, for example. So for now go with it, it's no worse than before, in this regard. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-32-npiggin@gmail.com arch/powerpc/include/asm/bug.h | 1 + arch/powerpc/mm/book3s64/hash_utils.c | 7 ++++--- arch/powerpc/mm/fault.c | 39 +++++++++++++++++++++++++---------- 3 files changed, 33 insertions(+), 14 deletions(-)
I can't make sense of that bisection result. I'm not saying it's wrong, but I can't see how that commit can cause this bug.
For verifying I tried to revert a008f8f9fd67ffb13d906ef4ea6235a3d62dfdb6 on current -rc and 5.15 LTS but reverting was not possible easily. Seems the kernel meanwhile diverted too much. Anything else I could do to help debuggin this issue?