Bug 218906

Summary: Intel E810 HW runtime init FAIL after suspend/resume
Product: Drivers Reporter: En-Wei Wu (en-wei.wu)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: regressions
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with ice and irdma errors
lspci with E810
.config of 6.9.0-rc5

Description En-Wei Wu 2024-05-28 08:30:28 UTC
Created attachment 306359 [details]
dmesg with ice and irdma errors

ice and irdma reports the following errors on 6.9.0-rc5:
[Log message]
Apr 22 21:27:49  kernel: [ 119.302080] PM: suspend exit
Apr 22 21:27:49  kernel: [ 119.349791] ice 0000:87:00.0: Failure Adding VLAN 0 on VSI 4, status -5
Apr 22 21:27:49  kernel: [ 119.351707] ice 0000:87:00.0 enp135s0f0np0: Failed to open VSI 0x0004 on switch 0x0000
Apr 22 21:27:49  kernel: [ 119.366536] ice 0000:87:00.1: Failure Adding VLAN 0 on VSI 5, status -5
Apr 22 21:27:49  kernel: [ 119.368133] ice 0000:87:00.1 enp135s0f1np1: Failed to open VSI 0x0005 on switch 0x0001
Apr 22 21:27:49  kernel: [ 119.419424] genirq: Flags mismatch irq 368. 00000000 (irdma-0000:87:00.0-CEQ-0) vs. 00000000 (ice-0000:87:00.0:ctrl-TxRx-0)
Apr 22 21:27:49  kernel: [ 119.419474] ice 0000:87:00.0: IRDMA hardware initialization FAILED init_state=5 status=-16
Apr 22 21:27:49  kernel: [ 119.442196] Generic FE-GE Realtek PHY r8169-0-500:00: attached PHY driver (mii_bus:phy_addr=r8169-0-500:00, irq=MAC)
Apr 22 21:27:49  kernel: [ 119.481918] irdma ice.roce.1: probe with driver irdma failed with error -16
Apr 22 21:27:49  kernel: [ 119.548561] r8169 0000:05:00.0 enp5s0f0: Link is Down
Apr 22 21:27:50  kernel: [ 119.687275] genirq: Flags mismatch irq 640. 00000000 (irdma-0000:87:00.1-CEQ-8) vs. 00000080 (nvme0q1)
Apr 22 21:27:50  kernel: [ 119.765382] ice 0000:87:00.1: HW runtime init FAIL status = -16 last cmpl = 1
Apr 22 21:27:50  kernel: [ 119.765390] (null): bad init_state = 1
Apr 22 21:27:50  kernel: [ 119.828314] irdma ice.roce.2: probe with driver irdma failed with error -16

After suspend/resume, irdma breaks. But ice still works normally.

The error messages is similar with bug ID 218799 but with another error message: genirq: Flags mismatch irq 640. 00000000 (irdma-0000:87:00.1-CEQ-8) vs. 00000080 (nvme0q1), and it's probably the main reason why irdma breaks.
Comment 1 En-Wei Wu 2024-05-28 08:35:25 UTC
Created attachment 306360 [details]
lspci with E810
Comment 2 En-Wei Wu 2024-05-28 08:36:50 UTC
Created attachment 306361 [details]
.config of 6.9.0-rc5
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-05-30 07:41:27 UTC
Am I right assuming it worked with 6.8.y? Could you bisect?
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-05-30 07:41:55 UTC
And is the problem still happening with 6.10-rc1?
Comment 5 En-Wei Wu 2024-06-27 09:32:15 UTC
Hi, sorry for the late reply. The issue has been solved, and the patch has also been merged into mainline kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bc69ad74867dba1377abe14356c94a946d9837a3

Thanks you for your time.