Bug 216183

Summary: [bisected] Kernel 5.19-rc4 boots ok with CONFIG_PPC_RADIX_MMU=y but fails to boot with CONFIG_PPC_HASH_MMU_NATIVE=y
Product: Platform Specific/Hardware Reporter: Erhard F. (erhard_f)
Component: PPC-64Assignee: platform_ppc-64
Status: ASSIGNED ---    
Severity: normal CC: michael, npiggin
Priority: P1    
Hardware: PPC-64   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=216238
Kernel Version: 5.19-rc4 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel dmesg (kernel 5.19-rc4, Talos II)
kernel .config (kernel 5.19-rc4, Talos II)
kernel .config (kernel 5.10.129, Talos II)
kernel dmesg (kernel 5.10.129, Talos II)
bisect.log

Description Erhard F. 2022-06-27 22:54:15 UTC
Created attachment 301289 [details]
kernel dmesg (kernel 5.19-rc4, Talos II)

5.19-rc4 boots ok when
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_RADIX_MMU_DEFAULT=y

is enabled in the .config but fails to boot when MMU is changed to

# CONFIG_PPC_RADIX_MMU is not set
CONFIG_PPC_HASH_MMU_NATIVE=y

in the same .config.

[...]
Disabling lock debugging due to kernel taint
Oops: Machine check, sig: 7 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=32 NUMA PowerNV
Modules linked in: cbc aes_generic snd_hda_codec_hdmi libaes snd_hda_intel snd_intel_dspcfg xhci_pci snd_hda_codec snd_hwdep xhci_hcd snd_hda_core cfg80211 drm_ttm_helper ghash_generic rfkill ofpart ttm i2c_algo_bit snd_pcm powernv_flash vmx_crypto(+) ibmpowernv at24(+) usbcore drm_display_helper mtd gf128mul snd_timer hwmon opal_prd regmap_i2c usb_common drm_kms_helper sysimgblt syscopyarea snd sysfillrect fb_sys_fops soundcore zram pkcs8_key_parser zsmalloc powernv_cpufreq drm fuse drm_panel_orientation_quirks backlight configfs
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G   M              5.19.0-rc4-P9 #4
NIP:  0000000000000000 LR: 0000000000000000 CTR: 00ac408f3f6b677d
REGS: c0000007ffe7e900 TRAP: c000000000008354   Tainted: G   M               (5.19.0-rc4-P9)
MSR:  0301010000000000 <>  CR: c0000007ffe7ed40  XER: c0003d000007e680
CFAR: 0000000000000003 IRQMASK: 3 
GPR00: 0000000000000000 c0000007ffe7eaa0 c0000007ffe7e990 0000000000000000 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 c0000007ffe7eaa0 c0000007ffe7ea30 0000000000000000 
GPR20: c00000000004a3b4 0000000000000000 0000000000000000 c000000001237e00 
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
NIP [0000000000000000] 0x0
LR [0000000000000000] 0x0
Call Trace:
[c0000007ffe7eaa0] [c000000001237e00] 0xc000000001237e00 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
---[ end trace 0000000000000000 ]---
input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:00.0/0000:01:00.1/sound/card0/input0
Adding 16777212k swap on /dev/nvme0n1p4.  Priority:-2 extents:1 across:16777212k SSFS
at24 7-0050: 256 byte spd EEPROM, read-only
at24 7-0052: 256 byte spd EEPROM, read-only
at24 8-0054: 256 byte spd EEPROM, read-only
at24 8-0056: 256 byte spd EEPROM, read-only
EXT4-fs (nvme0n1p2): mounting ext2 file system using the ext4 subsystem
[drm] radeon kernel modesetting enabled.
EXT4-fs (nvme0n1p2): mounted filesystem without journal. Quota mode: disabled.
EXT4-fs (zram1): mounting ext2 file system using the ext4 subsystem
EXT4-fs (zram1): mounted filesystem without journal. Quota mode: disabled.

Oops: Machine check, sig: 7 [#2]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=32 NUMA PowerNV
Modules linked in: xts ecb ctr evdev cbc aes_generic snd_hda_codec_hdmi libaes snd_hda_intel snd_intel_dspcfg xhci_pci snd_hda_codec snd_hwdep radeon(+) xhci_hcd snd_hda_core cfg80211 drm_ttm_helper ghash_generic rfkill ofpart ttm i2c_algo_bit snd_pcm powernv_flash vmx_crypto ibmpowernv at24 usbcore drm_display_helper mtd gf128mul snd_timer hwmon opal_prd regmap_i2c usb_common drm_kms_helper sysimgblt syscopyarea snd sysfillrect fb_sys_fops soundcore zram pkcs8_key_parser zsmalloc powernv_cpufreq drm fuse drm_panel_orientation_quirks backlight configfs
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G   M  D           5.19.0-rc4-P9 #4
NIP:  0000000000000000 LR: 0000000000000000 CTR: 0063d2a43fc97e45
REGS: c0000007ffede900 TRAP: c000000000008354   Tainted: G   M  D            (5.19.0-rc4-P9)
MSR:  0301010000000000 <>  CR: c0000007ffeded40  XER: c0003d0000016680
CFAR: 0000000000000003 IRQMASK: 3 
GPR00: 0000000000000000 c0000007ffedeaa0 c0000007ffede990 0000000000000000 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 c0000007ffedeaa0 c0000007ffedea30 0000000000000000 
GPR20: c00000000004a3b4 0000000000000000 0000000000000000 c000000001237e00 
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
NIP [0000000000000000] 0x0
LR [0000000000000000] 0x0
Call Trace:
[c0000007ffedeaa0] [c000000001237e00] 0xc000000001237e00 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!

Oops: Machine check, sig: 7 [#3]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=32 NUMA PowerNV
Modules linked in: xts ecb ctr evdev cbc aes_generic snd_hda_codec_hdmi libaes snd_hda_intel snd_intel_dspcfg xhci_pci snd_hda_codec snd_hwdep radeon(+) xhci_hcd snd_hda_core cfg80211 drm_ttm_helper ghash_generic rfkill ofpart ttm i2c_algo_bit snd_pcm powernv_flash vmx_crypto ibmpowernv at24 usbcore drm_display_helper mtd gf128mul snd_timer hwmon opal_prd regmap_i2c usb_common drm_kms_helper sysimgblt syscopyarea snd sysfillrect fb_sys_fops soundcore zram pkcs8_key_parser zsmalloc powernv_cpufreq drm fuse drm_panel_orientation_quirks backlight configfs
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   M  D           5.19.0-rc4-P9 #4
NIP:  0000000000000000 LR: 0000000000000000 CTR: 007652c6b5124d60
REGS: c0000007ffeea900 TRAP: c000000000008354   Tainted: G   M  D            (5.19.0-rc4-P9)
MSR:  0301010000000000 <>  CR: c0000007ffeead40  XER: c0003d0000009680
CFAR: 0000000000000003 IRQMASK: 3 
GPR00: 0000000000000000 c0000007ffeeaaa0 c0000007ffeea990 0000000000000000 
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 c0000007ffeeaaa0 c0000007ffeeaa30 0000000000000000 
GPR20: c00000000004a3b4 0000000000000000 0000000000000000 c000000001237e00 
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
NIP [0000000000000000] 0x0
LR [0000000000000000] 0x0
Call Trace:
[c0000007ffeeaaa0] [c000000001237e00] 0xc000000001237e00 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
---[ end trace 0000000000000000 ]---


Machine is a 2 X 4-core POWER9 Talos II:
 # lspci 
0000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0000:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks XT [Radeon HD 6670/7670]
0000:01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series]
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:01:00.0 Non-Volatile memory controller: Phison Electronics Corporation Device 5008 (rev 01)
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
Comment 1 Erhard F. 2022-06-27 22:56:06 UTC
Created attachment 301290 [details]
kernel .config (kernel 5.19-rc4, Talos II)
Comment 2 Michael Ellerman 2022-06-29 06:35:38 UTC
I can't repro this on my Talos 2.

I have some different PCI devices, a different GPU and nvme controller. I can't see an obvious reason for this, will require some more digging.
Comment 3 Erhard F. 2022-06-29 10:28:08 UTC
Biggest difference probably is that I run the Talos 2 on Big Endian. ;)

I'll check out older LTS kernels and see I can get a bisect if they just work with Hash MMU.
Comment 4 Erhard F. 2022-07-10 10:29:21 UTC Comment hidden (obsolete)
Comment 5 Erhard F. 2022-07-10 11:01:45 UTC Comment hidden (obsolete)
Comment 6 Erhard F. 2022-07-11 18:07:51 UTC
Created attachment 301395 [details]
kernel .config (kernel 5.10.129, Talos II)

Tried some LTS kernels and with 5.10.x I got a .config working to boot the Talos 2 with HASH MMU on my system.

Also I found out that selecting CONFIG_PAGE_POISONING=y in the working 5.10.x config renders the kernel unbootable again. Though this seems a different issue, as simply deselecting PAGE_POISONING in my 5.19-rc .config did not help. So I opened bug #216238 for this issue.

5.11.x also boots with HASH MMU, but I got problems on 5.12.x again. 5.15 LTS shows almost the same behaviour as described here for 5.19-rc.

At least I got a starting point now for a bisect.
Comment 7 Erhard F. 2022-07-11 18:08:16 UTC
Created attachment 301396 [details]
kernel dmesg (kernel 5.10.129, Talos II)
Comment 8 Erhard F. 2022-07-14 12:57:21 UTC
Created attachment 301425 [details]
bisect.log

Successfully did a bisect which revealed this commit:

 # git bisect good
a008f8f9fd67ffb13d906ef4ea6235a3d62dfdb6 is the first bad commit
commit a008f8f9fd67ffb13d906ef4ea6235a3d62dfdb6
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Sat Jan 30 23:08:41 2021 +1000

    powerpc/64s/hash: improve context tracking of hash faults
    
    This moves the 64s/hash context tracking from hash_page_mm() to
    __do_hash_fault(), so it's no longer called by OCXL / SPU
    accelerators, which was certainly the wrong thing to be doing,
    because those callers are not low level interrupt handlers, so
    should have entered a kernel context tracking already.
    
    Then remain in kernel context for the duration of the fault,
    rather than enter/exit for the hash fault then enter/exit for
    the page fault, which is pointless.
    
    Even still, calling exception_enter/exit in __do_hash_fault seems
    questionable because that's touching per-cpu variables, tracing,
    etc., which might have been interrupted by this hash fault or
    themselves cause hash faults. But maybe I miss something because
    hash_page_mm very deliberately calls trace_hash_fault too, for
    example. So for now go with it, it's no worse than before, in this
    regard.
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210130130852.2952424-32-npiggin@gmail.com

 arch/powerpc/include/asm/bug.h        |  1 +
 arch/powerpc/mm/book3s64/hash_utils.c |  7 ++++---
 arch/powerpc/mm/fault.c               | 39 +++++++++++++++++++++++++----------
 3 files changed, 33 insertions(+), 14 deletions(-)
Comment 9 Michael Ellerman 2022-07-29 07:13:18 UTC
I can't make sense of that bisection result. I'm not saying it's wrong, but I can't see how that commit can cause this bug.
Comment 10 Erhard F. 2022-07-30 13:15:23 UTC
For verifying I tried to revert a008f8f9fd67ffb13d906ef4ea6235a3d62dfdb6 on current -rc and 5.15 LTS but reverting was not possible easily. Seems the kernel meanwhile diverted too much.

Anything else I could do to help debuggin this issue?