Created attachment 296759 [details] dmesg (5.13-rc1, PowerMac G5 11,2) With v5.13-rc1 I get IRQ problems and crashes on my G5 sooner or later. IRQ 63 is my NVMe SSD. [...] irq 63: nobody cared (try booting with the "irqpoll" option) CPU: 1 PID: 11783 Comm: emerge Tainted: G W 5.13.0-rc1-PowerMacG5 #3 Call Trace: [c00000000ffefae0] [c000000000549790] .dump_stack+0xe0/0x13c (unreliable) [c00000000ffefb80] [c0000000000def44] .__report_bad_irq+0x34/0xf0 [c00000000ffefc20] [c0000000000dee2c] .note_interrupt+0x258/0x300 [c00000000ffefce0] [c0000000000db0a8] .handle_irq_event_percpu+0x64/0x90 [c00000000ffefd70] [c0000000000db118] .handle_irq_event+0x44/0x70 [c00000000ffefe00] [c0000000000e0530] .handle_fasteoi_irq+0xac/0x158 [c00000000ffefea0] [c0000000000da164] .generic_handle_irq+0x38/0x58 [c00000000ffeff10] [c000000000011674] .__do_irq+0x15c/0x238 [c00000000ffeff90] [c000000000012068] .do_IRQ+0x180/0x188 [c00000014d357d70] [c000000000011f88] .do_IRQ+0xa0/0x188 [c00000014d357e10] [c000000000007f94] hardware_interrupt_common_virt+0x1a4/0x1b0 --- interrupt: 500 at 0x3fffb07a1a9c NIP: 00003fffb07a1a9c LR: 00003fffb07a3d08 CTR: 00003fffb074cb30 REGS: c00000014d357e80 TRAP: 0500 Tainted: G W (5.13.0-rc1-PowerMacG5) MSR: 900000000000f032 <SF,HV,EE,PR,FP,ME,IR,DR,RI> CR: 22482820 XER: 20000000 IRQMASK: 0 GPR00: 00003fffb07a3d08 00003fffe84d07a0 00003fffb0ad1200 00003fffa8131100 GPR04: 00003fffa9ea4bd0 a5a8b016e7fdc57d 00003fffe84d0810 00003fffb0aa7ac0 GPR08: 00003fffb0ab3708 00003fffab4eb870 0000000000000000 0000000000000000 GPR12: 00003fffb07b92a0 00003fffb0b8e850 00003fffe84d0a58 000000014df42388 GPR16: 00003fffe84d0a70 ffffffffffffffff 00003fffafbf54c0 ffffffffffffffff GPR20: 0000000000000000 000000014df42338 000000014c677878 0000000000000000 GPR24: 00003fffafc0b5b0 000000014c677830 00003fffafcc8a50 a5a8b016e7fdc57d GPR28: 00003fffa863bcc0 00003fffa8131100 00003fffa9ea4bd0 00003fffa8131100 NIP [00003fffb07a1a9c] 0x3fffb07a1a9c LR [00003fffb07a3d08] 0x3fffb07a3d08 --- interrupt: 500 handlers: [<00000000370eb0ba>] .nvme_irq [<00000000370eb0ba>] .nvme_irq Disabling IRQ #63 Call Trace: Kernel panic - not syncing: corrupted stack end detected inside scheduler CPU: 0 PID: 814 Comm: kworker/u4:2 Tainted: G W 5.13.0-rc1-PowerMacG5 #3 Workqueue: writeback .wb_workfn (flush-254:1) [c00000007db5ab40] [c000000000549790] .dump_stack+0xe0/0x13c (unreliable) [c00000007db5abe0] [c0000000000680dc] .panic+0x168/0x430 [c00000007db5ac90] [c000000000811e40] .__schedule+0x80/0x840 [c00000007db5ad70] [c00000000081274c] .preempt_schedule_common+0x28/0x48 [c00000007db5adf0] [c00000000081279c] .__cond_resched+0x30/0x4c [c00000007db5ae70] [c0000000001c6a98] .mempool_alloc+0x38/0x1a4 [c00000007db5af50] [c0000000004a1a70] .bio_alloc_bioset+0x94/0x174 [c00000007db5b000] [c000000000354840] .ext4_bio_write_page+0x314/0x480 [c00000007db5b0c0] [c0000000003334d4] .mpage_submit_page+0x70/0xa0 [c00000007db5b140] [c000000000333630] .mpage_process_page_bufs+0x12c/0x18c [c00000007db5b1d0] [c0000000003338b8] .mpage_prepare_extent_to_map+0x1f8/0x228 [c00000007db5b320] [c000000000339088] .ext4_writepages+0x360/0xe5c [c00000007db5b5d0] [c0000000001cee84] .do_writepages+0x54/0xa0 [c00000007db5b650] [c0000000002a49bc] .__writeback_single_inode+0x100/0x560 [c00000007db5b700] [c0000000002a53d8] .writeback_sb_inodes+0x2dc/0x4c8 [c00000007db5b880] [c0000000002a5654] .__writeback_inodes_wb+0x90/0xcc [c00000007db5b930] [c0000000002a58c0] .wb_writeback+0x230/0x3dc [c00000007db5ba50] [c0000000002a6790] .wb_workfn+0x380/0x460 [c00000007db5bbb0] [c0000000000890a0] .process_one_work+0x318/0x4dc [c00000007db5bca0] [c000000000089730] .worker_thread+0x224/0x290 [c00000007db5bd60] [c000000000091200] .kthread+0x134/0x13c [c00000007db5be10] [c00000000000bbf4] .ret_from_kernel_thread+0x58/0x64 Rebooting in 120 seconds.. # lspci -vv -s 0001:08:00.0 0001:08:00.0 Non-Volatile memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03) (prog-if 02 [NVM Express]) Subsystem: Intel Corporation SSD Pro 7600p/760p/E 6100p Series [NVM Express] Device tree node: /sys/firmware/devicetree/base/ht@0,f2000000/pci@5/pci8086,390b@0 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+ Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 63 NUMA node: 0 Region 0: Memory at a0000000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <8us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [b0] MSI-X: Enable- Count=16 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00002100 Kernel driver in use: nvme
Created attachment 296761 [details] kernel .config (5.13-rc1, PowerMac G5 11,2)
Hmm... Just also happened on 5.12.3. But without the Kernel panic (yet). [...] irq 63: nobody cared (try booting with the "irqpoll" option) Call Trace: CPU: 1 PID: 43491 Comm: emerge Tainted: G W 5.12.3-gentoo-PowerMacG5 #2 [c00000000ffefae0] [c00000000053950c] .dump_stack+0xe0/0x13c (unreliable) [c00000000ffefb80] [c0000000000ddb68] .__report_bad_irq+0x34/0xf0 [c00000000ffefc20] [c0000000000dda50] .note_interrupt+0x250/0x2f8 [c00000000ffefce0] [c0000000000d9cf8] .handle_irq_event_percpu+0x64/0x90 [c00000000ffefd70] [c0000000000d9d68] .handle_irq_event+0x44/0x70 [c00000000ffefe00] [c0000000000df164] .handle_fasteoi_irq+0xac/0x158 [c00000000ffefea0] [c0000000000d8db8] .generic_handle_irq+0x38/0x58 [c00000000ffeff10] [c000000000011314] .__do_irq+0x15c/0x238 [c00000000ffeff90] [c00000000001fe04] .call_do_irq+0x14/0x24 [c000000056e2fd70] [c00000000001154c] .do_IRQ+0x15c/0x164 [c000000056e2fe10] [c000000000007d38] hardware_interrupt_common_virt+0x158/0x160 --- interrupt: 500 at 0x3fffb8a21520 handlers: NIP: 00003fffb8a21520 LR: 00003fffb8a214a0 CTR: 00003fffb8ae6d20 REGS: c000000056e2fe80 TRAP: 0500 Tainted: G W (5.12.3-gentoo-PowerMacG5) MSR: 900000000200f032 <SF,HV,VEC,EE,PR,FP,ME,IR,DR,RI> CR: 42482824 XER: 20000000 IRQMASK: 0 GPR00: 00003fffb8a214a0 00003fffdb199650 00003fffb8df7200 000000014e8ddc60 GPR04: 00003fffb210e000 95bfd31b66b69e10 00003fffdb199478 0000000000024d50 GPR08: 000000014cb987c0 0000000000000002 0000000000000000 0000000000000000 GPR12: 00003fffb8ae0e50 00003fffb8eb4850 00003fffdb199a58 000000014e8ddf60 GPR16: 00003fffdb199a70 ffffffffffffffff 0000000000000001 000000014b5d8460 GPR20: 0000000000000000 0000000000000002 000000014e8ddf38 00003fffb6b176e8 GPR24: 000000014c126958 00003fffb2030390 000000014b94c380 000000014b5d8460 GPR28: 000000014c1267f0 000000014c126a60 000000014c1267f0 0000000000000000 NIP [00003fffb8a21520] 0x3fffb8a21520 LR [00003fffb8a214a0] 0x3fffb8a214a0 --- interrupt: 500 [<000000000e5af612>] .nvme_irq [<000000000e5af612>] .nvme_irq Disabling IRQ #63
Some time after the "irq 63: nobody cared" on 5.12.3: [...] --- interrupt: 500 [<000000000e5af612>] .nvme_irq [<000000000e5af612>] .nvme_irq Disabling IRQ #63 Call Trace: Kernel panic - not syncing: corrupted stack end detected inside scheduler CPU: 0 PID: 105549 Comm: kworker/u4:1 Tainted: G W 5.12.3-gentoo-PowerMacG5 #2 Workqueue: 0x0 (flush-259:0) [c000000078dc79f0] [c00000000053950c] .dump_stack+0xe0/0x13c (unreliable) [c000000078dc7a90] [c000000000066074] .panic+0x168/0x430 [c000000078dc7b40] [c0000000007f19f0] .__schedule+0x80/0x848 [c000000078dc7c20] [c0000000007f2270] .schedule+0xb8/0x110 [c000000078dc7ca0] [c000000000086d18] .worker_thread+0x278/0x290 [c000000078dc7d60] [c00000000008e75c] .kthread+0x134/0x13c [c000000078dc7e10] [c00000000000b1f4] .ret_from_kernel_thread+0x58/0x64 Rebooting in 120 seconds..
Created attachment 297191 [details] bisect.log Turns out the problem was introduced between v5.11 and v5.12 by following commit: # git bisect good fbbefb320214db14c3e740fce98e2c95c9d0669b is the first bad commit commit fbbefb320214db14c3e740fce98e2c95c9d0669b Author: Oliver O'Halloran <oohall@gmail.com> Date: Tue Nov 3 15:35:07 2020 +1100 powerpc/pci: Move PHB discovery for PCI_DN using platforms Make powernv, pseries, powermac and maple use ppc_mc.discover_phbs. These platforms need to be done together because they all depend on pci_dn's being created from the DT. The pci_dn contains a pointer to the relevant pci_controller so they need to be created after the pci_controller structures are available, but before PCI devices are scanned. Currently this ordering is provided by initcalls and the sequence is: 1. PHBs are discovered (setup_arch) (early boot, pre-initcalls) 2. pci_dn are created from the unflattended DT (core initcall) 3. PHBs are scanned pcibios_init() (subsys initcall) The new ppc_md.discover_phbs() function is also a core_initcall so we can't guarantee ordering between the creation of pci_controllers and the creation of pci_dn's which require a pci_controller. We could use the postcore, or core_sync initcall levels, but it's cleaner to just move the pci_dn setup into the per-PHB inits which occur inside of .discover_phb() for these platforms. This brings the boot-time path in line with the PHB hotplug path that is used for pseries DLPAR operations too. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Squash powermac & maple in to avoid breakage those platforms, convert memblock allocs to use kmalloc to avoid warnings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201103043523.916109-2-oohall@gmail.com
Hmm, it's pretty weird to see an NVMe drive using LSIs. Not too sure what to make of that. I figure there's something screwy going on with interrupt routing, but I don't have any g5 hardware to replicate this with. Could you add "debug" to the kernel command line and post the dmesg output for a boot with the patch applied and reverted?
This is already a custom built kernel with lots of debugging options turned on (see bugzilla attached kernel .config). But of course I can add "debug" to the other kernel command line parameters. I'll report back when I get access to this G5 next time in about 2-3 weeks.
(In reply to Oliver O'Halloran from comment #5) > Could you add "debug" to the kernel command line and post the dmesg output > for a boot with the patch applied and reverted? Ok, on top of 5.13-rc6 I reverted fbbefb3, which went fine execpt the "pci-ioda.c"-part where I needed to manually apple the old code. Here's the vanilla debug dmesg and the debug dmesg with the patch reverted.
Created attachment 297435 [details] dmesg (5.13-rc6 + debug, PowerMac G5 11,2)
Created attachment 297437 [details] dmesg (5.13-rc6 w. patch fbbefb3 reverted + debug, PowerMac G5 11,2)
Created attachment 297439 [details] kernel .config (5.13-rc6, PowerMac G5 11,2)
Created attachment 297473 [details] dmesg (5.13-rc6 + DEBUG_VM_PGTABLE, PowerMac G5 11,2) The trace got some additional data with DEBUG_VM_PGTABLE=y, slub_debug=P and page_poison=1: [...] irq 63: nobody cared (try booting with the "irqpoll" option) Call Trace: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.13.0-rc6-PowerMacG5+ #2 [c00000000fff7ae0] [c00000000054eafc] .dump_stack+0xe0/0x13c (unreliable) [c00000000fff7b80] [c0000000000e1428] .__report_bad_irq+0x34/0xf0 [c00000000fff7c20] [c0000000000e1310] .note_interrupt+0x258/0x300 [c00000000fff7ce0] [c0000000000dd58c] .handle_irq_event_percpu+0x64/0x90 [c00000000fff7d70] [c0000000000dd5fc] .handle_irq_event+0x44/0x70 [c00000000fff7e00] [c0000000000e2a14] .handle_fasteoi_irq+0xac/0x158 [c00000000fff7ea0] [c0000000000dc648] .generic_handle_irq+0x38/0x58 [c00000000fff7f10] [c000000000011688] .__do_irq+0x15c/0x238 [c00000000fff7f90] [c00000000001207c] .do_IRQ+0x180/0x188 [c0000000012db810] [c000000000011f9c] .do_IRQ+0xa0/0x188 [c0000000012db8b0] [c000000000007f94] hardware_interrupt_common_virt+0x1a4/0x1b0 --- interrupt: 500 at .power4_idle_nap+0x30/0x34 NIP: c00000000002cc04 LR: c000000000016828 CTR: c000000000016768 REGS: c0000000012db920 TRAP: 0500 Tainted: G W (5.13.0-rc6-PowerMacG5+) MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 44082242 XER: 00000000 IRQMASK: 0 GPR00: c0000000000167dc c0000000012dbbc0 c0000000012df700 0000000000000001 GPR04: 0000000000000000 0000000000000000 0000000000000002 9000000000049032 GPR08: 0000000000000001 c0000000011b3b80 0000000000000001 0000000000000016 GPR12: 0000000044082242 c0000000023a6000 000000000014aa88 00000000ffb30100 GPR16: 0000000001e7b8da 0000000001e7bd5f 0000000001e7b9f0 0000000001e88d8d GPR20: 0000000001e7bd3d 0000000001e7b98b 0000000001e7bbb2 0000000001e7b89c GPR24: 000000000270f700 c000000001081008 c000000000a7c02d 0000000000000000 GPR28: c0000000012edb9c c0000000011b3b80 9000000000009032 c0000000012ed985 NIP [c00000000002cc04] .power4_idle_nap+0x30/0x34 LR [c000000000016828] .power4_idle+0xc0/0xe8 --- interrupt: 500 [c0000000012dbbc0] [c0000000000167dc] .power4_idle+0x74/0xe8 (unreliable) handlers: [c0000000012dbc40] [c00000000001665c] .arch_cpu_idle+0x80/0x18c [c0000000012dbcc0] [c00000000081f058] .default_idle_call+0x7c/0xd0 [c0000000012dbd30] [c0000000000a7bcc] .do_idle+0x128/0x140 [c0000000012dbdd0] [c0000000000a7eb4] .cpu_startup_entry+0x28/0x2c [c0000000012dbe40] [c000000000010044] .rest_init+0x1b0/0x1bc [c0000000012dbec0] [c0000000010047f4] .start_kernel+0x934/0x9b8 [c0000000012dbf90] [c00000000000b390] start_here_common+0x1c/0x8c [<000000001553d54b>] .nvme_irq [<000000001553d54b>] .nvme_irq Disabling IRQ #63
Created attachment 297755 [details] hackfix for MSI init
Hi, I got a loaner G5 with an NVMe drive, but I haven't been able to replicate the crash you're seeing. However, I think that's probably because I'm only reading from the NVMe since it's NTFS formatted and I didn't want to trash someone else's files. I'm waiting for a new NVMe drive to arrive so I can do some destructive testing which should hopefully replicate the bug. In the meanwhile, can you try the patch above? That seems to fix bug which is causing MSIs to be unusable. I'm not 100% sure why that woudld matter, but it's possible the crashes are due to some other bug which doesn't appear when MSIs are in use.
Thanks for the patch! I will try it as soon as I get to this G5 again. Don't know whether write access is necessary to trigger the bug. The past weekend I've seen it only by doing an 'emerge -pv distcc' on its' Gentoo partition, which only shows the flags and version distcc is going to be installed, but does not build anything yet. Still the bug was triggered. Filesystem was ext4, but I've seen it on btrfs at other times. Running kernel 5.10.x LTS for the time being which works just fine.
(In reply to Oliver O'Halloran from comment #13) > In the meanwhile, can you try the patch above? That seems to fix bug which > is causing MSIs to be unusable. I'm not 100% sure why that woudld matter, > but it's possible the crashes are due to some other bug which doesn't appear > when MSIs are in use. Now I had time to test your patch on top of kernel 5.13-rc6 and 5.13.4. Can't test it on top of 5.14-rc2 due to bug #213803. Your patch seems to work fine and I don't get this "irq 63: nobody cared" messages and crashes any longer! However now when building stuff the G5 sooner or later crashes with: [...] Kernel panic - not syncing: corrupted stack end detected inside scheduler Call Trace: CPU: 1 PID: 2968 Comm: powerpc64-unkno Tainted: G W 5.13.0-rc6-PowerMacG5+ #2 [c0000000717178c0] [c0000000005412d0] .dump_stack+0xe0/0x13c (unreliable) [c000000071717960] [c0000000000681a0] .panic+0x168/0x430 [c000000071717a10] [c000000000809ca0] .__schedule+0x80/0x840 [c000000071717af0] [c0000000000a0ea8] .do_task_dead+0x54/0x58 [c000000071717b70] [c00000000006e7b4] .do_exit+0xa14/0xa6c [c000000071717c60] [c00000000006e89c] .do_group_exit+0x50/0xb0 [c000000071717cf0] [c00000000006e910] .__wake_up_parent+0x0/0x34 [c000000071717d60] [c000000000021530] .system_call_exception+0x1b4/0x1ec [c000000071717e10] [c00000000000b9c4] system_call_common+0xe4/0x214 --- interrupt: c00 at 0x3fffa8092aa8 NIP: 00003fffa8092aa8 LR: 00003fffa7ff2d04 CTR: 0000000000000000 REGS: c000000071717e80 TRAP: 0c00 Tainted: G W (5.13.0-rc6-PowerMacG5+) MSR: 900000000200f032 <SF,HV,VEC,EE,PR,FP,ME,IR,DR,RI> CR: 22000482 XER: 00000000 IRQMASK: 0 GPR00: 00000000000000ea 00003fffd04ef2a0 00003fffa81b1300 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00003fffa8318c30 000000012e5ff800 00000001136b53b0 GPR16: 00000001200cec38 00003fffddea1c68 00000001200ceb28 000000000000002f GPR20: 0000000000000000 00003fffa81abff8 0000000000000001 00003fffa81aaa58 GPR24: 0000000000000000 0000000000000000 0000000000000003 0000000000000001 GPR28: 0000000000000000 00003fffa8311c50 fffffffffffff000 0000000000000000 NIP [00003fffa8092aa8] 0x3fffa8092aa8 LR [00003fffa7ff2d04] 0x3fffa7ff2d04 --- interrupt: c00 Rebooting in 120 seconds.. Don't know whether this is related. I'll throw more debugging stuff in, file this as a seperate issue and link it here just in case.
Created attachment 298371 [details] dmesg (5.14-rc6, PowerMac G5 11,2) As there is a fix now for bug #213803 I was able to build v5.14-rc6 and gave it a testride. Looks like the issue persists: [...] irq 63: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 10732 Comm: emerge Tainted: G W 5.14.0-rc6-PowerMacG5+ #2 Call Trace: [c00000000fff7af0] [c00000000054de24] .dump_stack_lvl+0x98/0xe0 (unreliable) [c00000000fff7b80] [c0000000000e1724] .__report_bad_irq+0x34/0xf0 [c00000000fff7c20] [c0000000000e160c] .note_interrupt+0x258/0x300 [c00000000fff7ce0] [c0000000000dd840] .handle_irq_event_percpu+0x5c/0x88 [c00000000fff7d70] [c0000000000dd8b0] .handle_irq_event+0x44/0x70 [c00000000fff7e00] [c0000000000e2d34] .handle_fasteoi_irq+0xac/0x158 [c00000000fff7ea0] [c0000000000dc8bc] .handle_irq_desc+0x34/0x54 [c00000000fff7f10] [c000000000012058] .__do_irq+0x15c/0x238 [c00000000fff7f90] [c000000000012978] .__do_IRQ+0xac/0xb4 [c00000001e9cfcf0] [c00000001e9cfd90] 0xc00000001e9cfd90 [c00000001e9cfd90] [c000000000012ac4] .do_IRQ+0x144/0x194 [c00000001e9cfe10] [c000000000008050] hardware_interrupt_common_virt+0x210/0x220 --- interrupt: 500 at 0x3fffb9b25d9c NIP: 00003fffb9b25d9c LR: 00003fffb9b2811c CTR: 00003fffb9b25d9c REGS: c00000001e9cfe80 TRAP: 0500 Tainted: G W (5.14.0-rc6-PowerMacG5+) MSR: 900000000000f032 <SF,HV,EE,PR,FP,ME,IR,DR,RI> CR: 22482822 XER: 20000000 IRQMASK: 0 GPR00: 00003fffb9b28100 00003ffffd4e7550 00003fffb9ef6200 00003fffb7977790 GPR04: 00003fffb7977790 00003fffb55e8b80 0000000000000000 00003fffb9eccac0 GPR08: 00003fffb9b25d9c 0000000000000000 000000000000000f 0000000000000000 GPR12: 00003fffb9b7eeb0 00003fffb9fc8890 00003ffffd4e7658 00003fffb395c548 GPR16: 00003ffffd4e7670 ffffffffffffffff 00003fffb7902480 ffffffffffffffff GPR20: 0000000000000000 00003fffb395c528 000000014b8f7878 0000000000000000 GPR24: 00003fffb7969a80 000000014b8f7830 00003fffb7a750d0 000000000000000a GPR28: 00003fffb7a750dc 000000000000007c 000000014b8f9420 00003fffb395c3c0 NIP [00003fffb9b25d9c] 0x3fffb9b25d9c LR [00003fffb9b2811c] 0x3fffb9b2811c --- interrupt: 500 handlers: [<c0000000015a6568>] .nvme_irq [<c0000000015a6568>] .nvme_irq Disabling IRQ #63
Created attachment 298373 [details] kernel .config (5.14-rc6, PowerMac G5 11,2)
The 'hackfix for MSI init' patch also applies on top of v5.14-rc6. But unchanged the G5 runs later into bug #213837.
(Luckily) I am no longer able to reproduce this. Re-tested on 5.19-rc5. Perhaps the problem was also specific for this specific NVMe SSD. I swapped it for another one and now I have not seen this issue so far. I'll keep an eye on it and will close here if it stays like that for the next few stable kernels.