Created attachment 301447 [details] dmesg output at time of error I have a router with multiple Intel I225-V ethernet controllers. eth0, 1, 2 are bridged (br-lan). After powering down a computer connected through eth0, errors started showing up in dmesg (see attachment) and packet loss increased.
Repo for the kernel package used: https://github.com/archlinux/svntogit-packages/tree/36002ee46aa77239515447c045b2721ac5b3edd3/repos/core-x86_64 Build script (basically just checks out the source and runs prepare, build, package): https://raw.githubusercontent.com/archlinux/svntogit-packages/36002ee46aa77239515447c045b2721ac5b3edd3/repos/core-x86_64/PKGBUILD Config: https://raw.githubusercontent.com/archlinux/svntogit-packages/36002ee46aa77239515447c045b2721ac5b3edd3/repos/core-x86_64/config
Created attachment 302959 [details] dmesg lts-5.15.70 It also happened with an lts kernel: Linux version 5.15.70-1-lts (linux-lts@archlinux) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0) #1 SMP Fri, 23 Sep 2022 16:05:15 +0000 I will attach the dmesg output at the time again.
Created attachment 303019 [details] dmesg 6.0.1 Another driver crash with Linux 6.0.1
Created attachment 303210 [details] another crash with linux 6.0.7
Created attachment 303211 [details] yet another crash with linux 6.0.7
se same problem on different NUC11TNKi3/NUC11TNBi3 root@09999103:~ uname -a Linux 09999103 6.1.11 #1 SMP PREEMPT_DYNAMIC 2021-08-01T00:00:00+00:00 x86_64 GNU/Linux 57:00.0 Ethernet controller: Intel Corporation Intel(R) Ethernet Controller I225-LM (rev 03) Subsystem: Intel Corporation Device 3002 Flags: bus master, fast devsel, latency 0, IRQ 17, IOMMU group 14 Memory at 6a200000 (32-bit, non-prefetchable) [size=1M] Memory at 6a300000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 48-21-0b-ff-ff-32-3b-5f Capabilities: [1c0] Latency Tolerance Reporting Capabilities: [1f0] Precision Time Measurement Capabilities: [1e0] L1 PM Substates Kernel driver in use: igc I run a test on 5 devices over weekend and all of them are broken. What I have done: run the system with connected ethernet to a switch. removed the uplink so that no dhcp is running. On Monday when I come back all of them doesn't received a ip after I connected the uplink again. When I connected over wifi I saw this error message when I disconnect RJ45 and reconnect it. Feb 27 07:26:42 09999101 systemd-networkd[501]: eth0: Lost carrier Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: NIC Link is Down Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: Register Dump Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: Register Name Value Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: CTRL 181c0641 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: STATUS 40280691 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: CTRL_EXT 10000040 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: MDIC 18017949 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: ICR 00000000 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: RCTL 04408022 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: RDLEN[0-3] 00001000 00001000 00001000 00001000 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: RDH[0-3] 00000018 0000000c 00000082 00000029 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: RDT[0-3] 00000017 0000000b 00000081 00000028 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: RXDCTL[0-3] 02040808 02040808 02040808 02040808 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: RDBAL[0-3] ffffb000 ffffa000 ffff9000 ffff8000 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: RDBAH[0-3] 00000000 00000000 00000000 00000000 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: TCTL a503f0fa Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: TDBAL[0-3] fffff000 ffffe000 ffffd000 ffffc000 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: TDBAH[0-3] 00000000 00000000 00000000 00000000 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: TDLEN[0-3] 00001000 00001000 00001000 00001000 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: TDH[0-3] 00000001 00000002 00000021 0000000c Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: TDT[0-3] 00000001 00000002 00000025 0000000c Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: TXDCTL[0-3] 02100108 02100108 02100108 02100108 Feb 27 07:26:42 09999101 kernel: igc 0000:57:00.0 eth0: Reset adapter I saw a lot of such messages: Feb 27 07:01:02 09999101 kernel: igc 0000:57:00.0 eth0: Detected Tx Unit Hang Tx Queue <2> TDH <0> TDT <0> next_to_use <0> next_to_clean <0> buffer_info[next_to_clean] time_stamp <10c55f45b> next_to_watch <0000000012555140> jiffies <10dc6cbc0> desc.status <280200> I removed the device form pci bus and made a rescan --> same problem. The error messages from kernel are gone but no packages are transmitted. receiving messages is ok.
A temporary workaround that I'm using atm is disabling "WoL link speed reduction" on the connected computer. This feature would reduce link speed to 10 Mbps every time that computer went into standby, which apparently (randomly) triggered this bug.
Is it fixed in the latest stable 6.5.9 kernel?