Created attachment 79641 [details] dmesg Under any significant load the driver starts producing detected Tx Unit hang messages and resets adapter, thus networking is not really usable. By significant load i mean an attempt to download big (>2 Mb) file over ssh. Please see attached files for system details. The bug is 100% reproducible. It looks like the problem is known for a long time, but there is some mess with linux e1000 drivers: it looks like kernel contains 7.x version (alive) and there is 8.x version by intel on sourceforge (not maintained and doesn't compile for newer kernels). A lot of suggestions are about intel's version of the driver. Please sched some light on the situation with e1000 in linux...
Created attachment 79651 [details] ethtool
Created attachment 79661 [details] kernel config
Created attachment 79671 [details] lshw
Created attachment 79681 [details] lspci -vvv
Created attachment 79691 [details] uname
I want to confirm this also happened to me today using 3.3.1-gentoo after 192 days of uptime, while the system was _not_ under any significant load. e1000 0000:01:03.0: eth0: Detected Tx Unit Hang Tx Queue <0> TDH <bd> TDT <bd> next_to_use <bd> next_to_clean <73> buffer_info[next_to_clean] time_stamp <5fd26739> next_to_watch <74> jiffies <5fd267fb> next_to_watch.status <0> e1000 0000:01:03.0: eth0: Detected Tx Unit Hang Tx Queue <0> TDH <bd> TDT <bd> next_to_use <bd> next_to_clean <73> buffer_info[next_to_clean] time_stamp <5fd26739> next_to_watch <74> jiffies <5fd268c3> next_to_watch.status <0> e1000 0000:01:03.0: eth0: Detected Tx Unit Hang Tx Queue <0> TDH <bd> TDT <bd> next_to_use <bd> next_to_clean <73> buffer_info[next_to_clean] time_stamp <5fd26739> next_to_watch <74> jiffies <5fd2698b> next_to_watch.status <0> ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x1b3/0x1bc() Hardware name: PowerEdge 650 NETDEV WATCHDOG: eth0 (e1000): transmit queue 0 timed out Modules linked in: usb_storage usb_libusual xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables ohci_hcd usbcore usb_common Pid: 0, comm: swapper Not tainted 3.3.1-gentoo #3 Call Trace: [<c101fa16>] warn_slowpath_common+0x67/0x8e [<c123d824>] ? dev_watchdog+0x1b3/0x1bc [<c123d824>] ? dev_watchdog+0x1b3/0x1bc [<c101fab9>] warn_slowpath_fmt+0x2e/0x30 [<c123d824>] dev_watchdog+0x1b3/0x1bc [<c102a8d1>] run_timer_softirq+0xfb/0x28a [<c123d671>] ? netif_carrier_off+0x26/0x26 [<c102461b>] __do_softirq+0x72/0x14a [<c10245a9>] ? __tasklet_hi_schedule_first+0x4b/0x4b <IRQ> [<c102485f>] ? irq_exit+0x64/0x85 [<c100395d>] ? do_IRQ+0x3d/0x84 [<c12b6029>] ? common_interrupt+0x29/0x30 [<c10082b3>] ? default_idle+0x4d/0x129 [<c100167f>] ? cpu_idle+0x40/0x63 [<c12aa515>] ? rest_init+0x55/0x60 [<c13ce607>] ? start_kernel+0x24c/0x252 [<c13ce13f>] ? loglevel+0x2b/0x2b [<c13ce044>] ? i386_start_kernel+0x44/0x46 ---[ end trace 7a625de8614c18af ]---
Set the current msglvl by 'ethtool -s ethx msglvl 0x2c01' so driver will print hw ring info when problem occurs. Please submit full dmesg log and lspci -vvv output after issue occurs. -Tushar
Created attachment 84761 [details] dmesg output with verbose settings on
Created attachment 84771 [details] lspci -vvv after the issue occured
dmesg log seems to be overwritten. It does not contain tx ring info. If you have not attached full dmesg please attach. I do see PCI Master Abort error in lspci. 07:03.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- ^^ You may need to freshly boot system and dump lspci -vvv before and after tx hang occurs to confirm that MAbort is the cause.
Created attachment 84821 [details] lspci -vvv after reboot (no issues)
Created attachment 84831 [details] kernel log with full dump of registers after the issue
So looking at lspci -vvv before and after confirms that root cause of the tx hang is PCI MAbort. (For some reason I don't see TX descriptor ring dump logged in dmesg log.) Looks like system has 8GB RAM. Can you test with only 2 GB ram and see if issue occurs?
My problem was on a much older system, with only 1.5GB of RAM. Doubt that is related.
Was this working at all before and start appearing with kernel upgrade?
(In reply to comment #13) > Looks like system has 8GB RAM. Can you test with only 2 GB ram and see if > issue > occurs? When I've just installed the hardware and hit the issue, I spent some time looking for solution, and one of possible causes named was ram > 4 G, so I've tried with 4 G and the result was the same. Can't remember if I've tried with 2G. Anyway, I need it working with 8G.
pci bus trace would be very helpful to find out cause of Mabort error. Do you have facility to capture bus trace? If not then can you send me the dmesg log again with Tx ring info. (Last time you sent dmesg log - Comment #12, it has not dumped the tx ring. Make sure msglvl is set to value 0x2c01 - 'ethtool -s ethx msglvl 0x2c01')
I have no idea how to get PCI bus trace. If it does not require some special hardware, I can try if you explain how. I don't know why kernel didn't dump TX ring, since I've entered command 'ethtool -s ethx msglvl 0x2c01' and the result was that there were a lot of verbose messages from driver, not just errors. I thought that this line was one marking start of what you are looking for: >> Oct 25 21:22:13 myhost kernel: e1000: Tx descriptor cache in 64bit format
I upgraded a number of boxes using 82546GB from 3.1.x kernels to 3.6.x recently, and a number of them have started having these TX hangs regularly. So what changed between 3.1 and 3.6 to cause this? Any output I can provide which would assist?
With my report regarding 3.3.1 we can reduce that to: what happened between 3.1 and 3.3.1.
hey, i have this at kernel 2.6 as well as with kernel 3.3.8 i use openwrt on a router with 5 intel e1000 and e1000e cards. most of this routers have 1 or 2 gb ram this bug happens after aprox. 5 hours of running, does not matter if there is a lot of traffic or not. i can reproduce this bug on various kernel versions, but with kernel 3.3.8 it happens way more often then with kernel 2.6 g i also posted this bug @ 52571 (sorry for crossposting) best ulf
(In reply to comment #18) > I have no idea how to get PCI bus trace. > If it does not require some special hardware, I can try if you explain how. > I don't know why kernel didn't dump TX ring, since I've entered > command 'ethtool -s ethx msglvl 0x2c01' and the result was that > there were a lot of verbose messages from driver, not just errors. Yes please send me the full dmesg log taken. > I thought that this line was one marking start of what you are looking for: > >> Oct 25 21:22:13 myhost kernel: e1000: Tx descriptor cache in 64bit format Would you also please try disabling tso with 'ethtool -K ethx tso off'. See if that makes any difference. Meanwhile I will see if I get hold off similar system as yours and can reproduce issue locally.
So this looks a lot like the problem I once had with these cards. http://sourceforge.net/p/e1000/bugs/266/ is the relevant bug report. Verbatim quote from the bug report at the time, though it seems to be gone from the updated version (sf.net migrated bug trackers I suppose): "We were able to reproduce this bug here, and verify that the system in question has a cache coherency problem. the driver is correctly updating the system memory and then requesting hardware to DMA the data. When the hardware DMA request is completed by the memory controller, the data in question is stale (the value prior to the update) and then the software suffers an apparent "tx hang" We are still investigating if there is a fix possible." I believe they eventually did indeed have a fix in the 8.x series - at least, at some point I downloaded it and used it, and the problem went away. The motherboard being Intel-branded implies it probably isn't a crap board with cache coherency bugs...one hopes, at least. But that is the same era of hardware we had these problems on.
I am on vacation 02/06 -Tushar
we all hope vacation was great =)) but ack on topic: 1) I was able to reproduce the problem with kernel 3.7.9 2) The problem is 100% reproducible with 2 gig of RAM 3) Turning tso off doesn't help 4) I was able to get full logs with "ethtool -s ethx msglvl 0x2c01" I'm attaching tarball with logs (kernel log, lscpi and some other related information) for all this cases.
Created attachment 94201 [details] logs demonstrating the issue with 3.7.9 and 2 gig of ram
Created attachment 166421 [details] dmesg I got this same thing, unexpectedly, inside virtualbox(8GB RAM) with kernel 3.16.5-gentoo (found on install-amd64-minimal-20141204.iso)
I am also affected by this on 4.4.0-1-686-pae #1 SMP Debian 4.4.2-3 (2016-02-21).
Me too, Ubuntu 4.4.0-22.39-generic 4.4.8 + Intel(R) PRO/1000 Network Driver - 3.2.6-k Trying now with options: ethtool -K eno1 gso off gro off tso off ethtool -s eno1 msglvl 0x2c01 ethtool --set-eee eno1 eee off ethtool --set-eee eno1 advertise 0 Not tried yet to disable Active-State Power Management (boot option): pcie_aspm=off Seems to be the link goes off/on at some times, but no more details in dmesg. Thinking into forcing the network adapter to work at 100Mb speed ...
Hi again. It seems to be the problem is fixed by setting the options I wrote before and reducing the speed to 100 Mb. ethtool -s eno1 speed 100 duplex full I've been 1/2h downloading at 600Kb-1Mb. At home I do not need more, but it could be interesting to know why ... There are more ideas around, like ethtool -K eno1 gso off gro off tso off lro off and upgrading to e1000e-3.3.3 https://communities.intel.com/thread/70244 https://downloadcenter.intel.com/download/15817/Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux-?v=t Hope it helps others ...
I can confirm the issue with gentoo sources 4.15.2 on a C220 chipset under heavy load (> 500 Mbit / adapter hangs up every few seconds). I can also confirm, that the following workaround helps. The only popped up recently after using the hardware without any problem up to recent kernels. ethtool -K eno1 gso off gro off tso off 00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-LM [8086:153a] (rev 05) Subsystem: ASUSTeK Computer Inc. Ethernet Connection I217-LM [1043:8535] Flags: bus master, fast devsel, latency 0, IRQ 27 Memory at f7d00000 (32-bit, non-prefetchable) [size=128K] Memory at f7d35000 (32-bit, non-prefetchable) [size=4K] I/O ports at f080 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] PCI Advanced Features Kernel driver in use: e1000e
After a recent kernel upgrade to 5.2.8 on GNU/Gentoo Linux I started to experience the same problem. This is observed on Dell Precision Tower 5810 with x86_64 kernel. This also seems to be a regression, as the issue was not there at least on the below earlier kernels: kernel-genkernel-x86_64-4.19.27-gentoo-r1 kernel-genkernel-x86_64-5.0.5-gentoo kernel-genkernel-x86_64-5.0.7-gentoo kernel-genkernel-x86_64-5.0.10-gentoo kernel-genkernel-x86_64-5.1.0-gentoo kernel-genkernel-x86_64-5.1.3-gentoo kernel-genkernel-x86_64-5.1.7-gentoo kernel-genkernel-x86_64-5.1.11-gentoo Same ethernet driver from kernel 5.2.8 is in use on a Dell Latitude laptop and the problem has not yet shown, thus this seems to be hardware related. On the Precision Tower 5810, disabling the segmentation offloading with: ethtool -K enp0s25 gso off gro off tso off does seem to provide relief. Thanks! -N
I overlooked the kernel driver name as originally reported in year 2012, I use 'e1000e' kernel driver nowadays, not 'e1000'.
This is still an issue. I'm running OpenWRT with Kernel 4.14.131. Any reasonable load on the Intel onboard I217LM NIC causes it to hardware fault repeatedly. [ 917.996439] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: [ 917.996439] TDH <db> [ 917.996439] TDT <f1> [ 917.996439] next_to_use <f1> [ 917.996439] next_to_clean <db> [ 917.996439] buffer_info[next_to_clean]: [ 917.996439] time_stamp <10000efce> [ 917.996439] next_to_watch <db> [ 917.996439] jiffies <10000f168> [ 917.996439] next_to_watch.status <0> [ 917.996439] MAC Status <80083> [ 917.996439] PHY Status <796d> [ 917.996439] PHY 1000BASE-T Status <3800> [ 917.996439] PHY Extended Status <3000> [ 917.996439] PCI Status <10> I can confirm the "ethtool -K enp0s25 gso off gro off tso off" workaround does indeed appear to work.
I have this running and repeatable on a test-rig if there are any diagnostics that would be useful.
Still an issue for me also. I'm running on a single board computer w/ a 3 network PC/104+ card and getting these errors like crazy. [Thu Nov 21 21:08:35 2019] perf: interrupt took too long (5094 > 5083), lowering kernel.perf_event_max_sample_rate to 39000 [Thu Nov 21 21:09:48 2019] e1000 0000:03:06.0 eth-lcs: Detected Tx Unit Hang Tx Queue <0> TDH <6d> TDT <6d> next_to_use <6d> next_to_clean <5b> buffer_info[next_to_clean] time_stamp <1002a3b17> next_to_watch <5c> jiffies <1002a3d00> next_to_watch.status <0> [Thu Nov 21 21:09:50 2019] e1000 0000:03:06.0 eth-lcs: Detected Tx Unit Hang Tx Queue <0> TDH <6d> TDT <6d> next_to_use <6d> next_to_clean <5b> buffer_info[next_to_clean] time_stamp <1002a3b17> next_to_watch <5c> jiffies <1002a3f80> next_to_watch.status <0> [Thu Nov 21 21:09:52 2019] e1000 0000:03:06.0 eth-lcs: Detected Tx Unit Hang Tx Queue <0> TDH <6d> TDT <6d> next_to_use <6d> next_to_clean <5b> buffer_info[next_to_clean] time_stamp <1002a3b17> next_to_watch <5c> jiffies <1002a4200> next_to_watch.status <0> [Thu Nov 21 21:09:54 2019] e1000 0000:03:06.0 eth-lcs: Detected Tx Unit Hang Tx Queue <0> TDH <6d> TDT <6d> next_to_use <6d> next_to_clean <5b> buffer_info[next_to_clean] time_stamp <1002a3b17> next_to_watch <5c> jiffies <1002a4480> next_to_watch.status <0> [Thu Nov 21 21:09:55 2019] e1000 0000:03:06.0 eth-lcs: Reset adapter [Thu Nov 21 21:10:00 2019] e1000: eth-lcs NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Kernel Version impacted ----------------------- 5.4.43 and 5.6.15 Trigger ------- On heavy network use (so the bug may be much older than the kernels I found it in). Consequence ----------- I get random packet drops and packets with wrong data (for example checksums of some downloaded files fail) Log --- ###dmesg [68832.822344] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <23> TDT <42> next_to_use <42> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <10139b38b> next_to_watch <24> jiffies <10139b540> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [68834.742296] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <23> TDT <42> next_to_use <42> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <10139b38b> next_to_watch <24> jiffies <10139b780> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [68836.662275] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <23> TDT <42> next_to_use <42> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <10139b38b> next_to_watch <24> jiffies <10139b9c0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [68838.796845] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <23> TDT <42> next_to_use <42> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <10139b38b> next_to_watch <24> jiffies <10139bc40> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [68839.648816] e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly [68843.384565] e1000e 0000:00:19.0 enp0s25: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx ### Temporary solution ------------------ It is solved with: ethtool -K enp0s25 tso off gso off HW info (lshw) -------------- *-network description: Ethernet interface product: Ethernet Connection (3) I218-LM vendor: Intel Corporation physical id: 19 bus info: pci@0000:00:19.0 logical name: enp0s25 version: 03 serial: <SNIP> size: 1Gbit/s capacity: 1Gbit/s width: 32 bits clock: 33MHz capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=3.2.6-k duplex=full firmware=0.2-4 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s resources: irq:48 memory:c1300000-c131ffff memory:c133d000-c133dfff ioport:5080(size=32) HW info (lspci) --------------- 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03) Subsystem: Hewlett-Packard Company Ethernet Connection (3) I218-LM Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 48 Region 0: Memory at c1300000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at c133d000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at 5080 [disabled] [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00358 Data: 0000 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: e1000e Kernel modules: e1000e
Running kernel 5.8.4, I am seeing the same errors. Sep 06 22:30:10 kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <39> TDT <60> next_to_use <60> next_to_clean <38> buffer_info[next_to_clean]: time_stamp <1030819ce> next_to_watch <39> jiffies <103082240> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Sep 06 22:30:12 kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <39> TDT <60> next_to_use <60> next_to_clean <38> buffer_info[next_to_clean]: time_stamp <1030819ce> next_to_watch <39> jiffies <103082a00> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Sep 06 22:30:14 kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <39> TDT <60> next_to_use <60> next_to_clean <38> buffer_info[next_to_clean]: time_stamp <1030819ce> next_to_watch <39> jiffies <103083200> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Sep 06 22:30:16 kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <39> TDT <60> next_to_use <60> next_to_clean <38> buffer_info[next_to_clean]: time_stamp <1030819ce> next_to_watch <39> jiffies <1030839c0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Sep 06 22:30:17 kernel: NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out Sep 06 22:30:17 kernel: snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device at24 snd_pcm snd_timer i2c_i801 snd intel_pch_thermal lpc_ich mei_me i2c_sm> Sep 06 22:30:17 kernel: e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly Sep 06 22:30:20 kernel: e1000e 0000:00:19.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Same issue in 5.4.78, except that the workaround of: ethtool -K enp0s25 tso off gso off Doesn't seem to apply (they are already off and "fixed")
I had hoped hardware offloading would be fixed in 5.14.0, but no, this driver is still thoroughly fuxored. I got around this on my other servers by installing an ethernet card with a marvel chipset, but unfortunately that card conflicts with an iomemory-vsl flash card I have in this machine and need so that solution doesn't work. For now I've disabled hardware offloading again but would really like this to be fixed. You would think a bug on common hardware that is eight years old would be addressed.
(In reply to Robert Dinse from comment #40) > I had hoped hardware offloading would be fixed in 5.14.0, but no, this > driver is still thoroughly fuxored. I got around this on my other servers by > installing an ethernet card with a marvel chipset, but unfortunately that > card conflicts with an iomemory-vsl flash card I have in this machine and > need so that solution doesn't work. For now I've disabled hardware > offloading again but would really like this to be fixed. You would think a > bug on common hardware that is eight years old would be addressed. [13762.953724] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <1f> TDT <6f> next_to_use <6f> next_to_clean <1e> buffer_info[next_to_clean]: time_stamp <10014897b> next_to_watch <1f> jiffies <100148a88> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [13764.953680] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <1f> TDT <6f> next_to_use <6f> next_to_clean <1e> buffer_info[next_to_clean]: time_stamp <10014897b> next_to_watch <1f> jiffies <100148b50> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [13766.953901] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <1f> TDT <6f> next_to_use <6f> next_to_clean <1e> buffer_info[next_to_clean]: time_stamp <10014897b> next_to_watch <1f> jiffies <100148c18> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [13768.953875] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <1f> TDT <6f> next_to_use <6f> next_to_clean <1e> buffer_info[next_to_clean]: time_stamp <10014897b> next_to_watch <1f> jiffies <100148ce0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10>
This thread is probably scoped over a bunch of different bugs, since the one symptom we have of a transmit issue is a tx-hang and recovery. The most recent comments seem to be about e1000e and not e1000 as the original reports. However, if turning off TSO seems to help, we might be able to figure out what is causing the problem, if we can reproduce the issue. As for e1000e, if you have a good reproducer, I will try to get it reproduced or work internally to get it so.
I apologize, you are correct, this is for the E1000e, but yes turning off TSO results in stable operation, with it enabled it will hang two or three times in a 24 hour period. The machine that it is on: iglulik description: Desktop Computer product: All Series (All) vendor: ASUS version: System Version serial: System Serial Number width: 64 bits capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32 configuration: boot=normal chassis=desktop family=ASUS MB sku=All uuid=40E0EFA1-21C0-D311-88D8-704D7BB335D9 *-core description: Motherboard product: X99-E vendor: ASUSTeK COMPUTER INC. physical id: 0 version: Rev 1.xx serial: 161086337600177 slot: Default string *-firmware description: BIOS vendor: American Megatrends Inc. physical id: 0 version: 2101 date: 07/10/2019 size: 64KiB capacity: 16MiB capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi *-memory description: System Memory physical id: 5b slot: System board or motherboard size: 128GiB *-bank:0 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 0 serial: 00000000 slot: DIMM_A1 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-bank:1 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 1 serial: 00000000 slot: DIMM_A2 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-bank:2 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 2 serial: 00000000 slot: DIMM_B1 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-bank:3 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 3 serial: 00000000 slot: DIMM_B2 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-bank:4 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 4 serial: 00000000 slot: DIMM_C1 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-bank:5 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 5 serial: 00000000 slot: DIMM_C2 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-bank:6 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 6 serial: 00000000 slot: DIMM_D1 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-bank:7 description: DIMM DDR4 Synchronous 2400 MHz (0.4 ns) product: F4-2400C14-16GRK vendor: G-Skill physical id: 7 serial: 00000000 slot: DIMM_D2 size: 16GiB width: 72 bits clock: 2400MHz (0.4ns) *-cache:0 description: L1 cache physical id: 6d slot: CPU Internal L1 size: 384KiB capacity: 384KiB capabilities: internal write-back configuration: level=1 *-cache:1 description: L2 cache physical id: 6e slot: CPU Internal L2 size: 1536KiB capacity: 1536KiB capabilities: internal write-back unified configuration: level=2 *-cache:2 description: L3 cache physical id: 6f slot: CPU Internal L3 size: 15MiB capacity: 15MiB capabilities: internal write-back unified configuration: level=3 *-cpu description: CPU product: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz vendor: Intel Corp. physical id: 70 bus info: cpu@0 version: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz slot: SOCKET 2011 size: 4199MHz capacity: 4200MHz width: 64 bits clock: 100MHz capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp x86-64 constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp l vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 s mep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d cpufreq configuration: cores=6 enabledcores=6 threads=12 *-pci description: Host bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 vendor: Intel Corporation physical id: 100 bus info: pci@0000:00:00.0 version: 01 width: 32 bits clock: 33MHz *-pci:0 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 vendor: Intel Corporation physical id: 1 bus info: pci@0000:00:01.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:25 *-pci:1 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 vendor: Intel Corporation physical id: 1.1 bus info: pci@0000:00:01.1 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:26 *-pci:2 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:28 memory:fb100000-fb2fffff *-storage description: Mass storage controller product: ioDrive2 vendor: SanDisk physical id: 0 bus info: pci@0000:02:00.0 version: 04 width: 32 bits clock: 33MHz capabilities: storage pm msi pciexpress msix bus_master cap_list rom configuration: driver=iodrive latency=0 resources: irq:49 memory:fb220000-fb221fff memory:fb200000-fb21ffff memory:fb100000-fb1fffff *-pci:3 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 vendor: Intel Corporation physical id: 3 bus info: pci@0000:00:03.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:30 ioport:e000(size=4096) memory:fa000000-fb0fffff ioport:c0000000(size=301989888) *-display description: VGA compatible controller product: GT218 [GeForce 210] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a2 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list rom configuration: driver=nouveau latency=0 resources: irq:45 memory:fa000000-faffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:e000(size=128) memory:c0000-dffff *-multimedia description: Audio device product: High Definition Audio Controller vendor: NVIDIA Corporation physical id: 0.1 bus info: pci@0000:01:00.1 version: a1 width: 32 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:48 memory:fb080000-fb083fff *-generic:0 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management vendor: Intel Corporation physical id: 5 bus info: pci@0000:00:05.0 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:1 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug vendor: Intel Corporation physical id: 5.1 bus info: pci@0000:00:05.1 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress msi cap_list configuration: latency=0 *-generic:2 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO RAS/Control Status/Global Errors vendor: Intel Corporation physical id: 5.2 bus info: pci@0000:00:05.2 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:3 UNCLAIMED description: PIC product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D I/O APIC vendor: Intel Corporation physical id: 5.4 bus info: pci@0000:00:05.4 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress pm io_x_-apic bus_master cap_list configuration: latency=0 resources: memory:fb43e000-fb43efff *-generic:4 UNCLAIMED description: Unassigned class product: C610/X99 series chipset SPSR vendor: Intel Corporation physical id: 11 bus info: pci@0000:00:11.0 version: 05 width: 32 bits clock: 33MHz capabilities: pciexpress pm bus_master cap_list configuration: latency=0 *-sata:0 description: SATA controller product: C610/X99 series chipset sSATA Controller [AHCI mode] vendor: Intel Corporation physical id: 11.4 bus info: pci@0000:00:11.4 version: 05 width: 32 bits clock: 66MHz capabilities: sata msi pm ahci_1.0 bus_master cap_list configuration: driver=ahci latency=0 resources: irq:43 ioport:f0f0(size=8) ioport:f0e0(size=4) ioport:f0d0(size=8) ioport:f0c0(size=4) ioport:f020(size=32) memory:fb43a000-fb43a7ff *-usb:0 description: USB controller product: C610/X99 series chipset USB xHCI Host Controller vendor: Intel Corporation physical id: 14 bus info: pci@0000:00:14.0 version: 05 width: 64 bits clock: 33MHz capabilities: pm msi xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:34 memory:fb420000-fb42ffff *-usbhost:0 product: xHCI Host Controller vendor: Linux 5.14.0 xhci-hcd physical id: 0 bus info: usb@3 logical name: usb3 version: 5.14 capabilities: usb-2.00 configuration: driver=hub slots=15 speed=480Mbit/s *-usb description: USB hub product: ASM107x vendor: ASUS TEK. physical id: 4 bus info: usb@3:4 version: 0.01 capabilities: usb-2.10 configuration: driver=hub maxpower=100mA slots=4 speed=480Mbit/s *-usbhost:1 product: xHCI Host Controller vendor: Linux 5.14.0 xhci-hcd physical id: 1 bus info: usb@4 logical name: usb4 version: 5.14 capabilities: usb-3.00 configuration: driver=hub slots=6 speed=5000Mbit/s *-usb description: USB hub product: ASM107x vendor: ASUS TEK. physical id: 4 bus info: usb@4:4 version: 0.01 capabilities: usb-3.00 configuration: driver=hub maxpower=8mA slots=4 speed=5000Mbit/s *-communication description: Communication controller product: C610/X99 series chipset MEI Controller #1 vendor: Intel Corporation physical id: 16 bus info: pci@0000:00:16.0 version: 05 width: 64 bits clock: 33MHz capabilities: pm msi bus_master cap_list configuration: driver=mei_me latency=0 resources: irq:46 memory:fb439000-fb43900f *-network description: Ethernet interface product: Ethernet Connection (2) I218-V vendor: Intel Corporation physical id: 19 bus info: pci@0000:00:19.0 logical name: eno1 version: 05 serial: 70:4d:7b:b3:35:d9 size: 1Gbit/s capacity: 1Gbit/s width: 32 bits clock: 33MHz capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=5.14.0 duplex=full firmware=0.1-4 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s resources: irq:33 memory:fb400000-fb41ffff memory:fb436000-fb436fff ioport:f000(size=32) *-usb:1 description: USB controller product: C610/X99 series chipset USB Enhanced Host Controller #2 vendor: Intel Corporation physical id: 1a bus info: pci@0000:00:1a.0 version: 05 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci-pci latency=0 resources: irq:18 memory:fb435000-fb4353ff *-usbhost product: EHCI Host Controller vendor: Linux 5.14.0 ehci_hcd physical id: 1 bus info: usb@1 logical name: usb1 version: 5.14 capabilities: usb-2.00 configuration: driver=hub slots=2 speed=480Mbit/s *-usb description: USB hub vendor: Intel Corp. physical id: 1 bus info: usb@1:1 version: 0.05 capabilities: usb-2.00 configuration: driver=hub slots=6 speed=480Mbit/s *-multimedia description: Audio device product: C610/X99 series chipset HD Audio Controller vendor: Intel Corporation physical id: 1b bus info: pci@0000:00:1b.0 version: 05 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:47 memory:fb430000-fb433fff *-pci:4 description: PCI bridge product: C610/X99 series chipset PCI Express Root Port #1 vendor: Intel Corporation physical id: 1c bus info: pci@0000:00:1c.0 version: d5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:31 *-pci:5 description: PCI bridge product: C610/X99 series chipset PCI Express Root Port #5 vendor: Intel Corporation physical id: 1c.4 bus info: pci@0000:00:1c.4 version: d5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:32 memory:fb300000-fb3fffff *-usb description: USB controller product: ASM1142 USB 3.1 Host Controller vendor: ASMedia Technology Inc. physical id: 0 bus info: pci@0000:06:00.0 version: 00 width: 64 bits clock: 33MHz capabilities: msi msix pm pciexpress xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:17 memory:fb300000-fb307fff *-usbhost:0 product: xHCI Host Controller vendor: Linux 5.14.0 xhci-hcd physical id: 0 bus info: usb@5 logical name: usb5 version: 5.14 capabilities: usb-2.00 configuration: driver=hub slots=2 speed=480Mbit/s *-usbhost:1 product: xHCI Host Controller vendor: Linux 5.14.0 xhci-hcd physical id: 1 bus info: usb@6 logical name: usb6 version: 5.14 capabilities: usb-3.10 configuration: driver=hub slots=2 speed=10000Mbit/s *-usb:2 description: USB controller product: C610/X99 series chipset USB Enhanced Host Controller #1 vendor: Intel Corporation physical id: 1d bus info: pci@0000:00:1d.0 version: 05 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci-pci latency=0 resources: irq:21 memory:fb434000-fb4343ff *-usbhost product: EHCI Host Controller vendor: Linux 5.14.0 ehci_hcd physical id: 1 bus info: usb@2 logical name: usb2 version: 5.14 capabilities: usb-2.00 configuration: driver=hub slots=2 speed=480Mbit/s *-usb description: USB hub vendor: Intel Corp. physical id: 1 bus info: usb@2:1 version: 0.05 capabilities: usb-2.00 configuration: driver=hub slots=8 speed=480Mbit/s *-isa description: ISA bridge product: C610/X99 series chipset LPC Controller vendor: Intel Corporation physical id: 1f bus info: pci@0000:00:1f.0 version: 05 width: 32 bits clock: 33MHz capabilities: isa bus_master cap_list configuration: driver=lpc_ich latency=0 resources: irq:0 *-sata:1 description: SATA controller product: C610/X99 series chipset 6-Port SATA Controller [AHCI mode] vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:00:1f.2 version: 05 width: 32 bits clock: 66MHz capabilities: sata msi pm ahci_1.0 bus_master cap_list configuration: driver=ahci latency=0 resources: irq:44 ioport:f130(size=8) ioport:f120(size=4) ioport:f110(size=8) ioport:f100(size=4) ioport:f040(size=32) memory:fb43d000-fb43d7ff *-serial description: SMBus product: C610/X99 series chipset SMBus Controller vendor: Intel Corporation physical id: 1f.3 bus info: pci@0000:00:1f.3 version: 05 width: 64 bits clock: 33MHz configuration: driver=i801_smbus latency=0 resources: irq:18 memory:fb43c000-fb43c0ff ioport:580(size=32) *-generic:0 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 vendor: Intel Corporation physical id: b bus info: pci@0000:ff:0b.0 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:1 description: Performance counters product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 vendor: Intel Corporation physical id: b.1 bus info: pci@0000:ff:0b.1 version: 01 width: 32 bits clock: 33MHz configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:2 description: Performance counters product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 vendor: Intel Corporation physical id: b.2 bus info: pci@0000:ff:0b.2 version: 01 width: 32 bits clock: 33MHz configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:3 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link Debug vendor: Intel Corporation physical id: b.3 bus info: pci@0000:ff:0b.3 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:4 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: c bus info: pci@0000:ff:0c.0 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:5 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: c.1 bus info: pci@0000:ff:0c.1 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:6 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: c.2 bus info: pci@0000:ff:0c.2 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:7 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: c.3 bus info: pci@0000:ff:0c.3 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:8 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: c.4 bus info: pci@0000:ff:0c.4 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:9 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: c.5 bus info: pci@0000:ff:0c.5 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:10 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: f bus info: pci@0000:ff:0f.0 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:11 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: f.1 bus info: pci@0000:ff:0f.1 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:12 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: f.4 bus info: pci@0000:ff:0f.4 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:13 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: f.5 bus info: pci@0000:ff:0f.5 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:14 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent vendor: Intel Corporation physical id: f.6 bus info: pci@0000:ff:0f.6 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:15 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent vendor: Intel Corporation physical id: 10 bus info: pci@0000:ff:10.0 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:16 description: Performance counters product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent vendor: Intel Corporation physical id: 10.1 bus info: pci@0000:ff:10.1 version: 01 width: 32 bits clock: 33MHz configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:17 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox vendor: Intel Corporation physical id: 10.5 bus info: pci@0000:ff:10.5 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:18 UNCLAIMED description: Performance counters product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox vendor: Intel Corporation physical id: 10.6 bus info: pci@0000:ff:10.6 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:19 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox vendor: Intel Corporation physical id: 10.7 bus info: pci@0000:ff:10.7 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:20 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 vendor: Intel Corporation physical id: 12 bus info: pci@0000:ff:12.0 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:21 description: Performance counters product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 vendor: Intel Corporation physical id: 12.1 bus info: pci@0000:ff:12.1 version: 01 width: 32 bits clock: 33MHz configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:22 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS vendor: Intel Corporation physical id: 13 bus info: pci@0000:ff:13.0 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:23 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS vendor: Intel Corporation physical id: 13.1 bus info: pci@0000:ff:13.1 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:24 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder vendor: Intel Corporation physical id: 13.2 bus info: pci@0000:ff:13.2 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:25 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder vendor: Intel Corporation physical id: 13.3 bus info: pci@0000:ff:13.3 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:26 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder vendor: Intel Corporation physical id: 13.4 bus info: pci@0000:ff:13.4 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:27 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder vendor: Intel Corporation physical id: 13.5 bus info: pci@0000:ff:13.5 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:28 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Broadcast vendor: Intel Corporation physical id: 13.6 bus info: pci@0000:ff:13.6 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:29 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast vendor: Intel Corporation physical id: 13.7 bus info: pci@0000:ff:13.7 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:30 description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 0 Thermal Control vendor: Intel Corporation physical id: 14 bus info: pci@0000:ff:14.0 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:31 description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 1 Thermal Control vendor: Intel Corporation physical id: 14.1 bus info: pci@0000:ff:14.1 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:32 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 0 Error vendor: Intel Corporation physical id: 14.2 bus info: pci@0000:ff:14.2 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:33 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 1 Error vendor: Intel Corporation physical id: 14.3 bus info: pci@0000:ff:14.3 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:34 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface vendor: Intel Corporation physical id: 14.4 bus info: pci@0000:ff:14.4 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:35 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface vendor: Intel Corporation physical id: 14.5 bus info: pci@0000:ff:14.5 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:36 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface vendor: Intel Corporation physical id: 14.6 bus info: pci@0000:ff:14.6 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:37 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface vendor: Intel Corporation physical id: 14.7 bus info: pci@0000:ff:14.7 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:38 description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 2 Thermal Control vendor: Intel Corporation physical id: 15 bus info: pci@0000:ff:15.0 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:39 description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 3 Thermal Control vendor: Intel Corporation physical id: 15.1 bus info: pci@0000:ff:15.1 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:40 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 2 Error vendor: Intel Corporation physical id: 15.2 bus info: pci@0000:ff:15.2 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:41 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 3 Error vendor: Intel Corporation physical id: 15.3 bus info: pci@0000:ff:15.3 version: 01 width: 32 bits clock: 33MHz capabilities: pciexpress cap_list configuration: latency=0 *-generic:42 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Target Address/Thermal/RAS vendor: Intel Corporation physical id: 16 bus info: pci@0000:ff:16.0 version: 01 width: 32 bits clock: 33MHz capabilities: cap_list configuration: latency=0 *-generic:43 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 2/3 Broadcast vendor: Intel Corporation physical id: 16.6 bus info: pci@0000:ff:16.6 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:44 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast vendor: Intel Corporation physical id: 16.7 bus info: pci@0000:ff:16.7 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:45 description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 1 - Channel 0 Thermal Control vendor: Intel Corporation physical id: 17 bus info: pci@0000:ff:17.0 version: 01 width: 32 bits clock: 33MHz capabilities: cap_list configuration: driver=bdx_uncore latency=0 resources: irq:0 *-generic:46 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 2/3 Interface vendor: Intel Corporation physical id: 17.4 bus info: pci@0000:ff:17.4 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:47 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 2/3 Interface vendor: Intel Corporation physical id: 17.5 bus info: pci@0000:ff:17.5 version: 01 wid clock: 33MHz configuration: latency=0 *-generic:48 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 2/3 Interface vendor: Intel Corporation physical id: 17.6 bus info: pci@0000:ff:17.6 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:49 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 2/3 Interface vendor: Intel Corporation physical id: 17.7 bus info: pci@0000:ff:17.7 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:50 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit vendor: Intel Corporation physical id: 1e bus info: pci@0000:ff:1e.0 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:51 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit vendor: Intel Corporation physical id: 1e.1 bus info: pci@0000:ff:1e.1 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:52 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit vendor: Intel Corporation physical id: 1e.2 bus info: pci@0000:ff:1e.2 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:53 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit vendor: Intel Corporation physical id: 1e.3 bus info: pci@0000:ff:1e.3 version: 01 width: 64 bits clock: 33MHz configuration: latency=0 *-generic:54 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit vendor: Intel Corporation physical id: 1e.4 bus info: pci@0000:ff:1e.4 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:55 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit th: 32 bits vendor: Intel Corporation physical id: 1f bus info: pci@0000:ff:1f.0 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-generic:56 UNCLAIMED description: System peripheral product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:ff:1f.2 version: 01 width: 32 bits clock: 33MHz configuration: latency=0 *-pnp00:00 product: PnP device PNP0b00 physical id: 1 capabilities: pnp configuration: driver=rtc_cmos *-pnp00:01 product: PnP device PNP0c02 physical id: 2 capabilities: pnp configuration: driver=system *-pnp00:02 product: PnP device PNP0c02 physical id: 3 capabilities: pnp configuration: driver=system *-pnp00:03 product: PnP device PNP0501 physical id: 4 capabilities: pnp configuration: driver=serial *-scsi:0 physical id: 5 logical name: scsi2 capabilities: emulated *-disk description: ATA Disk product: WDC WD100EFAX-68 vendor: Western Digital physical id: 0.0.0 bus info: scsi@2:0.0.0 logical name: /dev/sda version: 0A83 serial: JEJXVLXN size: 9314GiB (10TB) configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 *-scsi:1 physical id: 6 logical name: scsi3 capabilities: emulated *-disk description: ATA Disk product: WDC WD100EFAX-68 vendor: Western Digital physical id: 0.0.0 bus info: scsi@3:0.0.0 logical name: /dev/sdb version: 0A83 serial: JEJWTYNN size: 9314GiB (10TB) configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 *-scsi:2 physical id: 7 logical name: scsi4 capabilities: emulated *-disk description: ATA Disk product: WDC WD2002FAEX-0 vendor: Western Digital physical id: 0.0.0 bus info: scsi@4:0.0.0 logical name: /dev/sdc version: 1D05 serial: WD-WMAWP0382725 size: 1863GiB (2TB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: ansiversion=5 guid=514bbf6c-cc02-458b-82f2-2bf484dbb2ca logicalsectorsize=512 sectorsize=512 *-volume:0 description: BIOS Boot partition vendor: EFI physical id: 1 bus info: scsi@4:0.0.0,1 logical name: /dev/sdc1 serial: c72aa955-eb69-4dc5-9b15-9f0f86f0141f capacity: 8015KiB capabilities: nofs configuration: name=bios_grub *-volume:1 description: Windows FAT volume vendor: mkfs.fat physical id: 2 bus info: scsi@4:0.0.0,2 logical name: /dev/sdc2 logical name: /boot/efi version: FAT32 serial: ebaa-74cc size: 188MiB capacity: 196MiB capabilities: boot fat initialized configuration: FATs=2 filesystem=fat label=EFI BOOT mount.fstype=vfat mount.options=rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro name=EFI System Partition state=mounted *-volume:2 description: EXT4 volume vendor: Linux physical id: 3 bus info: scsi@4:0.0.0,3 logical name: /dev/sdc3 logical name: /boot version: 1.0 serial: 806823aa-fd55-4613-bb5e-0d3f8f367b2f size: 501MiB capabilities: journaled extended_attributes large_files huge_files dir_nlink recover 64bit extents ext4 ext2 initialized configuration: created=2017-05-05 07:15:27 filesystem=ext4 lastmountpoint=/boot modified=2021-09-02 00:11:29 mount.fstype=ext4 mount.options=rw,relatime mounted=2021-09-02 00:11:29 name=boot state=mounted *-volume:3 description: Linux swap volume vendor: Linux physical id: 4 bus info: scsi@4:0.0.0,4 logical name: /dev/sdc4 version: 1 serial: 963cb206-8962-4fc0-82a1-fc4f02a9b5c5 size: 249GiB capacity: 249GiB capabilities: nofs swap initialized configuration: filesystem=swap name=swap pagesize=4096 *-volume:4 description: EXT4 volume vendor: Linux physical id: 5 bus info: scsi@4:0.0.0,5 logical name: /dev/sdc5 logical name: / version: 1.0 serial: 28825f5b-a6fd-4e09-982c-0513ae4d2842 size: 1612GiB capacity: 1612GiB capabilities: journaled extended_attributes large_files huge_files dir_nlink recover 64bit extents ext4 ext2 initialized configuration: created=2017-05-05 07:15:27 filesystem=ext4 lastmountpoint=/ modified=2021-09-02 00:11:07 mount.fstype=ext4 mount.options=rw,relatime,errors=remount-ro mounted=2021-09-02 00:11:21 name=root state=mounted *-scsi:3 physical id: 8 logical name: scsi5 capabilities: emulated *-disk description: ATA Disk product: WDC WD4001FAEX-0 vendor: Western Digital physical id: 0.0.0 bus info: scsi@5:0.0.0 logical name: /dev/sdd version: 1L01 serial: WD-WCC131138500 size: 3726GiB (4TB) configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 *-scsi:4 physical id: 9 logical name: scsi6 capabilities: emulated *-disk description: ATA Disk product: WDC WD4003FZEX-0 vendor: Western Digital physical id: 0.0.0 bus info: scsi@6:0.0.0 logical name: /dev/sde version: 1A01 serial: WD-WMC5D0D8WL11 size: 3726GiB (4TB) configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 *-scsi:5 physical id: a logical name: scsi7 capabilities: emulated *-disk description: ATA Disk product: HGST HUS726T4TAL physical id: 0.0.0 bus info: scsi@7:0.0.0 logical name: /dev/sdf version: W40H serial: V6HEM0PS size: 3726GiB (4TB) configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 *-scsi:6 physical id: d logical name: scsi8 capabilities: emulated *-disk description: ATA Disk product: WDC WD4003FZEX-0 vendor: Western Digital physical id: 0.0.0 bus info: scsi@8:0.0.0 logical name: /dev/sdg version: 1A01 serial: WD-WMC5D0D4H5TC size: 3726GiB (4TB) configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 *-scsi:7 physical id: e logical name: scsi9 capabilities: emulated *-cdrom description: DVD-RAM writer product: DRW-24B1ST a vendor: ASUS physical id: 0.0.0 bus info: scsi@9:0.0.0 logical name: /dev/cdrom logical name: /dev/cdrw logical name: /dev/dvd logical name: /dev/dvdrw logical name: /dev/sr0 version: 1.04 capabilities: removable audio cd-r cd-rw dvd dvd-r dvd-ram configuration: ansiversion=5 status=nodisc *-power UNCLAIMED description: To Be Filled By O.E.M. product: To Be Filled By O.E.M. vendor: To Be Filled By O.E.M. physical id: 1 configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096 *-scsi:7 physical id: e logical name: scsi9 capabilities: emulated *-cdrom description: DVD-RAM writer product: DRW-24B1ST a vendor: ASUS physical id: 0.0.0 bus info: scsi@9:0.0.0 logical name: /dev/cdrom logical name: /dev/cdrw logical name: /dev/dvd logical name: /dev/dvdrw logical name: /dev/sr0 version: 1.04 capabilities: removable audio cd-r cd-rw dvd dvd-r dvd-ram configuration: ansiversion=5 status=nodisc *-power UNCLAIMED description: To Be Filled By O.E.M. product: To Be Filled By O.E.M. vendor: To Be Filled By O.E.M. physical id: 1 version: To Be Filled By O.E.M. serial: To Be Filled By O.E.M. capacity: 32768mWh *-network:0 DISABLED description: Ethernet interface physical id: 2 logical name: virbr0-nic serial: 52:54:00:15:2d:d0 size: 10Mbit/s capabilities: ethernet physical configuration: autonegotiation=off broadcast=yes driver=tun driverversion=1.6 duplex=full link=no multicast=yes port=twisted pair speed=10Mbit/s *-network:1 description: Ethernet interface physical id: 3 logical name: vnet0 serial: fe:54:00:23:40:47 size: 10Mbit/s capabilities: ethernet physical configuration: autonegotiation=off broadcast=yes driver=tun driverversion=1.6 duplex=full link=yes multicast=yes port=twisted pair speed=10Mbit/s *-network:2 description: Ethernet interface physical id: 4 logical name: vnet1 serial: fe:54:00:6f:da:d0 size: 10Mbit/s capabilities: ethernet physical configuration: autonegotiation=off broadcast=yes driver=tun driverversion=1.6 duplex=full link=yes multicast=yes port=twisted pair speed=10Mbit/s *-network:3 description: Ethernet interface physical id: 5 logical name: vnet2 serial: fe:54:00:31:7f:f7 size: 10Mbit/s capabilities: ethernet physical configuration: autonegotiation=off broadcast=yes driver=tun driverversion=1.6 duplex=full link=yes multicast=yes port=twisted pair speed=10Mbit/s *-network:4 description: Ethernet interface physical id: 6 logical name: vnet3 serial: fe:54:00:51:fc:f3 size: 10Mbit/s capabilities: ethernet physical configuration: autonegotiation=off broadcast=yes driver=tun driverversion=1.6 duplex=full link=yes multicast=yes port=twisted pair speed=10Mbit/s *-network:5 description: Ethernet interface physical id: 7 logical name: vnet4 serial: fe:54:00:da:81:0c size: 10Mbit/s capabilities: ethernet physical configuration: autonegotiation=off broadcast=yes driver=tun driverversion=1.6 duplex=full link=yes multicast=yes port=twisted pair speed=10Mbit/s
The problem is definitely related to the intel driver. I have had no issues with the servers containing the Broadcom chip and having exactly the same OS configuration, packages installed, kernel version etc. It always appeared on the igb driver. The workaround with ethtool -K eno1 gso off gro off tso off didn't really help. I also tried to change the smp_affinity on the related IRQs (thought it might be related). No luck. But the good news is, that after upgrading to kernel 5.11.0-40-generic the issue is absolutely not reproducable (I have an environment where it was an easy test and always resulted in the Adapter reset). Now no resets visible. Also "Detected Tx Unit Hang" is no more.
Kernels: Problem arises under both 5.18.3 and 5.15.42 (but not necessarily only). So the problem doesn't seem to went away. Happens mostly straight after initial dhcp, so not much net load then. If it doesn't happen, system boots correctly and works. journal snippet: Jun 10 15:30:30 zerz2001 kernel: e1000e 0000:00:1f.6 boot0: Detected Hardware Unit Hang: TDH <0> TDT <1> next_to_use <1> next_to_clean <0> buffer_info[next_to_clean]: time_stamp <fffee272> next_to_watch <0> jiffies <fffee690> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> Jun 10 15:30:32 zerz2001 kernel: ------------[ cut here ]------------ Jun 10 15:30:32 zerz2001 kernel: NETDEV WATCHDOG: boot0 (e1000e): transmit queue 0 timed out Jun 10 15:30:32 zerz2001 kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x267/0x270 Jun 10 15:30:32 zerz2001 kernel: Modules linked in: i915 aesni_intel crypto_simd cec rc_core ahci intel_gtt cryptd e1000e ttm libahci wmi video xloop_file_fmt_raw(OE) xloop_file_fmt_qcow(OE) xloop(OE) dnbd3(OE) autofs4 Jun 10 15:30:32 zerz2001 kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G OE 5.15.42 #1 Jun 10 15:30:32 zerz2001 kernel: Hardware name: FUJITSU ESPRIMO Q958/D3613-A1, BIOS V5.0.0.13 R1.13.0 for D3613-A1x 02/11/2019 Jun 10 15:30:32 zerz2001 kernel: RIP: 0010:dev_watchdog+0x267/0x270 Jun 10 15:30:32 zerz2001 kernel: Code: eb a6 48 8b 5d d0 c6 05 85 a7 fc 00 01 48 89 df e8 4e e0 f9 ff 44 89 e1 48 89 de 48 c7 c7 28 b1 71 bd 48 89 c2 e8 f7 3b 18 00 <0f> 0b eb 83 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 Jun 10 15:30:32 zerz2001 kernel: RSP: 0018:ffffac4d00148e88 EFLAGS: 00010282 Jun 10 15:30:32 zerz2001 kernel: RAX: 0000000000000000 RBX: ffff8db199b7c000 RCX: 000000000000083f Jun 10 15:30:32 zerz2001 kernel: RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f Jun 10 15:30:32 zerz2001 kernel: RBP: ffffac4d00148ec0 R08: ffff8db4dc4a03d0 R09: ffffac4d00148c68 Jun 10 15:30:32 zerz2001 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 Jun 10 15:30:32 zerz2001 kernel: R13: ffff8db181010280 R14: 0000000000000001 R15: ffff8db199b7c4c0 Jun 10 15:30:32 zerz2001 kernel: FS: 0000000000000000(0000) GS:ffff8db4dc480000(0000) knlGS:0000000000000000 Jun 10 15:30:32 zerz2001 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 10 15:30:32 zerz2001 kernel: CR2: 0000556836b811d8 CR3: 00000001c620a004 CR4: 00000000003706e0 Jun 10 15:30:32 zerz2001 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 10 15:30:32 zerz2001 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 10 15:30:32 zerz2001 kernel: Call Trace: Jun 10 15:30:32 zerz2001 kernel: <IRQ> Jun 10 15:30:32 zerz2001 kernel: ? pfifo_fast_enqueue+0x150/0x150 Jun 10 15:30:32 zerz2001 kernel: call_timer_fn+0x29/0x100 Jun 10 15:30:32 zerz2001 kernel: run_timer_softirq+0x413/0x490 Jun 10 15:30:32 zerz2001 kernel: ? lapic_next_deadline+0x2c/0x40 Jun 10 15:30:32 zerz2001 kernel: ? clockevents_program_event+0x8f/0xe0 Jun 10 15:30:32 zerz2001 kernel: __do_softirq+0xcc/0x275 Jun 10 15:30:32 zerz2001 kernel: irq_exit_rcu+0x79/0xa0 Jun 10 15:30:32 zerz2001 kernel: sysvec_apic_timer_interrupt+0x7c/0x90 Jun 10 15:30:32 zerz2001 kernel: </IRQ> Jun 10 15:30:32 zerz2001 kernel: <TASK> Jun 10 15:30:32 zerz2001 kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Jun 10 15:30:32 zerz2001 kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x360 Jun 10 15:30:32 zerz2001 kernel: Code: 3d 61 f5 56 43 e8 34 cf 68 ff 49 89 c6 0f 1f 44 00 00 31 ff e8 85 da 68 ff 80 7d d7 00 0f 85 01 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 0d 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 48 8d Jun 10 15:30:32 zerz2001 kernel: RSP: 0018:ffffac4d000cfe48 EFLAGS: 00000246 Jun 10 15:30:32 zerz2001 kernel: RAX: ffff8db4dc4b07c0 RBX: 0000000000000008 RCX: 000000000000001f Jun 10 15:30:32 zerz2001 kernel: RDX: 0000000000000000 RSI: 000000003c9b3d5b RDI: 0000000000000000 Jun 10 15:30:32 zerz2001 kernel: RBP: ffffac4d000cfe80 R08: 0000000310e8c483 R09: 0000000000022b00 Jun 10 15:30:32 zerz2001 kernel: R10: 00000000dc4a2000 R11: ffff8db4dc4af484 R12: ffffcc4cffc91300 Jun 10 15:30:32 zerz2001 kernel: R13: ffffffffbda793a0 R14: 0000000310e8c483 R15: 0000000000000008 Jun 10 15:30:32 zerz2001 kernel: ? cpuidle_enter_state+0xbb/0x360 Jun 10 15:30:32 zerz2001 kernel: cpuidle_enter+0x2e/0x40 Jun 10 15:30:32 zerz2001 kernel: call_cpuidle+0x23/0x40 Jun 10 15:30:32 zerz2001 kernel: do_idle+0x1e6/0x260 Jun 10 15:30:32 zerz2001 kernel: cpu_startup_entry+0x20/0x30 Jun 10 15:30:32 zerz2001 kernel: start_secondary+0x11a/0x150 Jun 10 15:30:32 zerz2001 kernel: secondary_startup_64_no_verify+0xb0/0xbb Jun 10 15:30:32 zerz2001 kernel: </TASK> Jun 10 15:30:32 zerz2001 kernel: ---[ end trace f63dbc1bc6719ce1 ]--- lspci says: 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10) lspci -n: 00:1f.6 0200: 8086:15bb (rev 10) lspci -v: 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10) DeviceName: Onboard - Ethernet Subsystem: Fujitsu Technology Solutions Ethernet Connection (7) I219-LM Flags: bus master, fast devsel, latency 0, IRQ 132 Memory at 8f900000 (32-bit, non-prefetchable) [size=128K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Kernel driver in use: e1000e Kernel modules: e1000e
So I'll just add to this - as I fell down this rabbithole while trying to diagnose corosync failures where network connectivity on an Intel NUC dies for multiple seconds. The error printed to dmesg is: [293502.911056] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang: TDH <fe> TDT <65> next_to_use <65> next_to_clean <fd> buffer_info[next_to_clean]: time_stamp <1045e7288> next_to_watch <fe> jiffies <1045e7400> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [293504.899112] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang: TDH <fe> TDT <65> next_to_use <65> next_to_clean <fd> buffer_info[next_to_clean]: time_stamp <1045e7288> next_to_watch <fe> jiffies <1045e75f1> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Hardware details: Extract from dmidecode: Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Intel Corp. Version: PNWHL57v.0047.2022.0330.1436 Release Date: 03/30/2022 System Information Manufacturer: Intel(R) Client Systems Product Name: NUC8v7PNH Version: K60013-402 Serial Number: <sn> UUID: a15f153a-eb5f-11ea-a0f5-4002a8e72d00 Wake-up Type: Power Switch SKU Number: BKNUC8v7PNH Family: PN # lspci | grep Ethernet 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-LM (rev 30) # cat /sys/class/mei/mei0/fw_ver 0:12.0.90.2072 0:12.0.90.2072 0:12.0.45.1509 Kernel Version: Linux version 5.15.35-3-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) I am currently testing with disabling tso / gso / gro - and this seems to reduce the problem, but I'm not yet convinced that it works around the issue: In /etc/network/interfaces: iface eno1 inet manual post-up /usr/bin/logger -p debug -t ifup "Disabling segmentation offload for eno1" && /sbin/ethtool -K $IFACE tso off gso off gro off && /usr/bin/logger -p debug -t ifup "Disabled offload for eno1"