Since commit 1db806ec06b7c6e08e8af57088da067963ddf117 the current Linux git master, which will become 6.14, prints the following log message: `igc 0000:06:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached` After bisection led me to that commit, I found an EFI/BIOS configuration key mentioning L1 substates and set that to the `disabled` state, which, IIRC, should be the default value and therefore probably is the reason why no one else has hit this regression, yet. After disabling the L1 substates, now the current git master, which contains the regression, prints the following (expected?) messages instead: `igc 0000:06:00.0 (unnamed net_device) (uninitialized): PHC added` `igc 0000:06:00.0: 4.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x1 link)`
I forgot to mention that the motherboard is Asus ROG Strix Z690-G WiFi and using BIOS version 4001 from late 2024. I hope it's okay, if I try to add the relevant people to the CC list. And sorry if I put this in the wrong category.
Could you please be more specific when it prints that message? Did you do something at that point? Please provide lspci -vvv from working and non-working configuration. We might also need to look at dmesg but that should be taken with PCI dynamic debugging on ("file drivers/pci/*.c +p" written into /sys/kernel/debug/dynamic_debug/control or put into dyndbg on the kernel command line if the former is too late).
Created attachment 307543 [details] dmesg of the broken igc with dyndbg and lockdown=confidentially
Created attachment 307544 [details] lspci -vvv with broken igc; L1.1 and L1.2 are enabled in EFI settings
Created attachment 307545 [details] lspci -vvv with working igc; L1 substates are disabled in EFI settings
Sorry for the confusion. The messages get printed during regular bootup and presumably come from either kernel or systemd-udev device discovery procedures. I have captured dmesg with the required kernel cmdline which seems to have worked but only later did I realize that the lockdown LSM was still set to confidentially mode. If required, I can redo the dmesg with lockdown disabled. I also have lspci -vvv output with only L1.1 enabled but I did not include it, since I felt it's probably not very relevant over L1.1&L1.2 case (though igc is likewise broken).
With the working case, I meant a kernel prior to the commit 1db806ec06b7c6e08e8af57088da067963ddf117 (or that commit reverted if that too works). That is, do not disable L1 substates because I cannot compare how their configuration is different when the entire L1 PM substates capability is removed. Could you also take a dmesg from the working case so I can see if the D3hot->D0 pattern for the 1c.2 bridge is the same.
It might be worth a try to add this as the last thing in pci_enable_bridge(): pci_bridge_wait_for_secondary_bus(dev, "pm wakeup"); (Just let me know if you want my to provide a patch instead.) It seems to me that 1c.2 has just woken up from D3hot due to igc_probe() calling pci_enable_device_mem() but AFAICT, there's nothing that ensures the PCIe link has come up before the pci_enable_bridge() returns and probe continues and finds the device is not yet accessible.
Created attachment 307551 [details] lspci -vvv with working igc and L1 substates; commit reverted
Created attachment 307552 [details] dmesg of the working igc with dyndbg and commit reverted
I have uploaded the updated lspci and dmesg of the current git master with the commit reverted. The `pci_bridge_wait_for_secondary_bus(dev, "pm wakeup");` modification on top of the current git master compiled and booted but igc was still broken with the same PCIe link lost message as before.
Created attachment 307555 [details] L1SS saving fix It seems I found at least one problem from the commit 1db806ec06b7. The fix patch attached, please test.
I can confirm that igc driver is again working with the fix applied on top of the current git master. Thank you for looking into this and providing a fix.
Great, thanks for testing. Can I put you into a Tested-by tag?
Yes, thank you. Sorry for not specifying that in the previous reply. Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> Or yo can use Reported-and-tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> if that's an official thing and not a made up tag.
I have re-opened the bug, because the fix hasn't landed in time for 6.14-rc1 and a bug marked as resolved or verified does not show up in default search results. Sorry for the noise.
The fix has been released as part of Linux 6.14-rc2. Thanks again to everyone involved for the quick fix and seeing it through to a successful merge. I'd also like to note that there's remarkably few known reports of things breaking even after rc1 was released, which highlights just how unlikely it was to get caught during developer testing before merging upstream. Cheers.