Bug 205047

Summary: e1000e driver crashes/resets connection
Product: Drivers Reporter: ginste51
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: normal CC: adam, aminux, andymann375, anon.amish, bugzilla.kernel.org, cagnulein, carloscg, chemobejk, christian.rohmann, dan3805, dion, felix, grizzlyuser, hbayindir, jeffrey.t.kirsher, kimmo, lgpserranegra, lucas.yamanishi, marcus, mg05182-kernel, michael.groh, michael.j.lelli, mikegarcia556, nazar, nenad, ngodfriedt+bugzilla, null, nvaert1986, peter, pierrick, pnedkov, pyther, raxetul, sam.saffron, sasha.neftin, spamtrap, tabaire, thomas.natschlaeger, tmn505, vikb, vitaly.lifshits, wolkenschieber, xxdrshadowxx
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.3.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci of affected system
dmesg of ThinkPad T470s failing to connect (via NetworkManager)
dmesg of comment 7
journalctl for comment 7
dmesg of ThinkPad T470s - with patch from comment #11 applied.
attachment-5901-0.html
Thinkpad T470s - dmesg output with additional printk - 5.4.0-rc3
e1000e interface fails to connect with NetworkManager
Patch to revert commit 59653e6497d16f7ac1d9db088f3959f57ee8c3db based on 5.4.0-rc8
Rejects from Attachment #285979 when applied to kernel 5.5rc1
attachment-24216-0.html
dmesg output of Lenovo ThinkCentre M900 Tiny with the error

Description ginste51 2019-09-30 10:24:32 UTC
NIC:
Intel Corporation 82579V Gigabit Network Connection (rev 05)

Driver:
e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k

What happened:
After update to 5.3.1 kernel my wired network did not work any more.

Netctl showed the following:
systemd[1]: Starting Networking for netctl profile eno1-uninetz...
network[5456]: Starting network profile 'eno1-uninetz'...
network[5456]: No connection found on interface 'eno1' (timeout)
kernel: e1000e: eno1 NIC Link is Down
network[5456]: Failed to bring the network up for profile 'eno1-uninetz'
systemd[1]: netctl@eno1\x2duninetz.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: netctl@eno1\x2duninetz.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Networking for netctl profile eno1-uninetz.

dmesg (very unspectacular):
e1000e: eno1 NIC Link is Down

For additional info, visit:
https://bbs.archlinux.org/viewtopic.php?pid=1866114

Kernel 4.19.75 (Arch LTS kernel) works though.
Comment 1 Tomasz Maciej Nowak 2019-10-01 17:54:46 UTC
This bug also affects me. 
When connected the interface tries to go up but is disconnected immediately and it goes on repeatedly.
My interface is 82577LC [8086:10eb] (more info in attached lspci output).
I bisected this to:

commit def4ec6dce393e2136b62a05712f35a7fa5f5e56
e1000e: PCIm function state support

After reverting this commit everything went back to normal.
The latest 5.4 rc1 does not fix the issue.
Comment 2 Tomasz Maciej Nowak 2019-10-01 17:56:12 UTC
Created attachment 285283 [details]
lspci of affected system
Comment 3 Sam Saffron 2019-10-02 02:04:14 UTC
Also happening here: 

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
	Subsystem: ASUSTeK Computer Inc. Ethernet Connection (2) I219-V
	Flags: bus master, fast devsel, latency 0, IRQ 151
	Memory at df400000 (32-bit, non-prefetchable) [size=128K]
	Capabilities: [c8] Power Management version 3
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] PCI Advanced Features
	Kernel driver in use: e1000e
	Kernel modules: e1000e
Comment 4 Vitaly Lifshits 2019-10-06 10:37:00 UTC
Please attach dmesg log.

If possible try applying the mentioned patch on an older kernel (4.19 for example).
Also please try to reproduce this on e1000e out-of-tree driver:
https://sourceforge.net/projects/e1000/files/e1000e%20stable/3.6.0/
Comment 5 Michael Groh 2019-10-08 07:30:24 UTC
I can confirm this problem with a Lenovo ThinkPad T470s. After upgrading to 5.4-rc1 i was unable to get my Ethernet connection working.

According to "lspci -v", i have the following NIC:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-V (rev 21)
        Subsystem: Lenovo Ethernet Connection (4) I219-V
        Flags: bus master, fast devsel, latency 0, IRQ 132
        Memory at e2200000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] PCI Advanced Features
        Kernel driver in use: e1000e
        Kernel modules: e1000e

I will attach dmesg. Also, i will revert said commit and report back if it works without "e1000e: PCIm function state support"

Thank you for your work,
Michael
Comment 6 Michael Groh 2019-10-08 07:31:52 UTC
Created attachment 285397 [details]
dmesg of ThinkPad T470s failing to connect (via NetworkManager)
Comment 7 ginste51 2019-10-08 12:48:45 UTC
I built the in-tree module for 4.19.77 with the cherry-picked

commit def4ec6dce393e2136b62a05712f35a7fa5f5e56
e1000e: PCIm function state support

Result:
Does not work. Same problem as reportet initially.

See logs attached.
Comment 8 ginste51 2019-10-08 12:49:37 UTC
Created attachment 285403 [details]
dmesg of comment 7
Comment 9 ginste51 2019-10-08 12:50:03 UTC
Created attachment 285405 [details]
journalctl for comment 7
Comment 10 Michael Groh 2019-10-08 14:12:50 UTC
I can now confirm with said ThinkPad T470s that 5.4.0-rc2 will have working ethernet connection again if i revert the following commit:

commit def4ec6dce393e2136b62a05712f35a7fa5f5e56
e1000e: PCIm function state support

If someone needs me to test a new version of said patch, i will gladly help. Also, if i should test the out-of-tree module with/without the patch, i can do that too if requested.

Thanks for your work,
Michael
Comment 11 Vitaly Lifshits 2019-10-10 05:59:04 UTC
Please try applying this patch:

http://patchwork.ozlabs.org/patch/1172931/
Comment 12 Michael Groh 2019-10-11 09:59:06 UTC
Created attachment 285465 [details]
dmesg of ThinkPad T470s - with patch from comment #11 applied.

I applied the patch from Comment 11, and it seems that the bug is not fixed. After login, Networkmanager tries to connect but never succeeds.

Attached is the dmesg output, if you need more info i am happy to help.

Thank you for your work,
Michael
Comment 13 Robin KERDILES 2019-10-11 18:08:11 UTC
Hi,

I can confirm that this issue affect at least 2 motherboards users in the family of Z370/Z390 on kernel 5.3.1->5.3.5

https://bugs.archlinux.org/task/64018

I'm currently using a dkms package for e1000e driver which is a build of the e1000e without the commit def4ec6dce393e2136b62a05712f35a7fa5f5e56
e1000e: PCIm function state support
Comment 14 Robin KERDILES 2019-10-11 18:10:23 UTC
Sorry I thought there was some kind of edit function here... So, what I meant about the dkms package above is that without this commit def4ec6dce393e2136b62a05712f35a7fa5f5e56, the network interface works as expected.
Comment 15 Michael Groh 2019-10-17 07:21:24 UTC
I just wanted to add that, in order to have working ethernet on the Thinkpad T470s with 5.4.0-rc3 i still had to revert commit def4ec6dce393e2136b62a05712f35a7fa5f5e56 . 

Is there a timeline for a potential fix? If there is not, i think the revert should be included in 5.4.0, since it is a regression for quite popular hardware.

Thank you for your work,
Michael
Comment 16 Sasha Neftin 2019-10-17 07:21:35 UTC
Created attachment 285515 [details]
attachment-5901-0.html

Out of office. Expected delayed response
Comment 17 Vitaly Lifshits 2019-10-17 08:02:00 UTC
We weren't able to reproduce on our side this issue, this is why fixing it is difficult.
We are working on reproduction of this issue in order to get more information for this bug.

I think that reverting this patch is not possible since it's a bug fix for LM devices.

Untill we'll have a system to reproduce it, I could use your help with debugging it.
Can you please add this line before and after the patch in watchdog_task function:
printk("e1000e deb: STATUS = %d\n", er32(STATUS));

Also please attach dmesg output.
Comment 18 Michael Groh 2019-10-17 08:22:57 UTC
Created attachment 285517 [details]
Thinkpad T470s - dmesg output with additional printk - 5.4.0-rc3

As requested, the dmesg output with a failing kernel (5.4.0-rc3) with additional printk lines.

It makes no difference if the Laptop is in its dock (with the ethernet connected to the dock) or by itself (ethernet connected to port on laptop).

Also, the Status 1074266240 is only achieved when i disconnect the ethernet cable, which i did at 80 and 149 seconds runtime.

Let me know if i can help to debug this further.

Thank you for your work,
Michael
Comment 19 Vitaly Lifshits 2019-10-17 10:47:38 UTC
Please try:

1. rmmod mei && rmmod mei_me
2. removing the if in the patch and moving the call e1000_phy_hw_reset(&adapter->hw) outside of the while loop:

 if (!(pcim_state & E1000_STATUS_PCIM_STATE))
     e1000_phy_hw_reset(&adapter->hw);
Comment 20 Plamen Nedkov 2019-10-18 17:48:22 UTC
I am running Arch Linux on Dell T5610 with 82579LM rev 06 and I can easily reproduce the problem with all 5.3.x releases so far. After boot the e1000e network interface is constantly switching between "activated" and "deactivated" state every few seconds. The LEDs on the network port switch between going blank and blinking yellow every few seconds respectively. The last known kernel version where e1000e works fine is 5.2.14.

Please let me know how can I help.
Comment 21 Plamen Nedkov 2019-10-18 17:55:40 UTC
Created attachment 285549 [details]
e1000e interface fails to connect with NetworkManager
Comment 22 Michael Groh 2019-10-21 07:47:54 UTC
(In reply to Vitaly Lifshits from comment #19)
> Please try:
> 
> 1. rmmod mei && rmmod mei_me
> 2. removing the if in the patch and moving the call
> e1000_phy_hw_reset(&adapter->hw) outside of the while loop:
> 
>  if (!(pcim_state & E1000_STATUS_PCIM_STATE))
>      e1000_phy_hw_reset(&adapter->hw);

Hello Vitaly,

i tried both, and the problem is still there.

I did "rmmod mei_hdcp && rmmod mei_me && rmmod mei && rmmod e1000e && modprobe e1000e" but still cant get it to connect. dmesg says:

[  959.013605] e1000e 0000:00:1f.6 enp0s31f6: removed PHC
[  959.088952] e1000e: enp0s31f6 NIC Link is Down
[  959.133390] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[  959.133392] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[  959.133597] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[  959.323759] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
[  959.388088] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) c8:5b:76:fb:b5:47
[  959.388122] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[  959.388222] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: 1000FF-0FF
[  959.393484] e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0
[  995.521449] e1000e deb: STATUS = 1074266243
[  997.536794] e1000e deb: STATUS = 1074266240
[  997.536808] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
[  997.537766] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[ 1001.857343] e1000e deb: STATUS = 1074266243
[ 1003.883432] e1000e deb: STATUS = 1074266240
[ 1003.883440] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
[ 1009.217251] e1000e deb: STATUS = 1074266243
[ 1011.240393] e1000e deb: STATUS = 1074266240
[ 1011.240402] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
[ 1015.617225] e1000e deb: STATUS = 1074266243
[ 1017.642151] e1000e deb: STATUS = 1074266240

Is there any more i can help to debug the problem? Thank you,
Michael
Comment 23 Emrah Urhan 2019-10-31 05:55:02 UTC
My dmesg:


 9.642143] audit: type=1130 audit(1572499108.324:29): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    9.857839] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[    9.857874] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   14.009279] audit: type=1131 audit(1572499112.691:30): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   15.976448] audit: type=1130 audit(1572499114.657:31): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   16.065737] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   22.271383] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   26.009253] audit: type=1131 audit(1572499124.691:32): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   28.388391] audit: type=1130 audit(1572499127.071:33): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   28.464720] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   32.437828] random: crng init done
[   32.437833] random: 5 urandom warning(s) missed due to ratelimiting
[   32.710380] aufs 5.3-20190923
[   32.868380] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d8000-0x000dbfff window]
[   32.868507] caller _nv000939rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs
[   32.951333] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   32.953206] Bridge firewalling registered
[   32.973349] audit: type=1325 audit(1572499131.654:34): table=nat family=2 entries=0
[   32.977263] audit: type=1325 audit(1572499131.657:35): table=filter family=2 entries=0
[   32.993180] audit: type=1325 audit(1572499131.674:36): table=nat family=2 entries=5
[   32.994507] audit: type=1325 audit(1572499131.677:37): table=filter family=2 entries=4
[   32.996106] audit: type=1325 audit(1572499131.677:38): table=filter family=2 entries=6
[   32.997528] audit: type=1325 audit(1572499131.681:39): table=filter family=2 entries=8
[   32.999261] audit: type=1325 audit(1572499131.681:40): table=filter family=2 entries=10
[   33.000619] audit: type=1325 audit(1572499131.681:41): table=filter family=2 entries=11
[   33.001245] audit: type=1325 audit(1572499131.684:42): table=filter family=2 entries=12
[   33.008622] Initializing XFRM netlink socket
[   33.019825] audit: type=1325 audit(1572499131.701:43): table=nat family=2 entries=7
[   34.581104] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   38.989875] kauditd_printk_skb: 49 callbacks suppressed
[   38.989876] audit: type=1006 audit(1572499137.671:93): pid=1065 uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=2 res=1
[   39.004396] audit: type=1130 audit(1572499137.687:94): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user-runtime-dir@1000 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   39.005892] audit: type=1131 audit(1572499137.687:95): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   39.009556] audit: type=1006 audit(1572499137.691:96): pid=1069 uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=3 res=1
[   39.061782] audit: type=1130 audit(1572499137.744:97): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user@1000 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   39.133667] fuse: init (API version 7.31)
[   39.643913] logitech-hidpp-device 0003:046D:4008.0005: HID++ 2.0 device connected.
[   40.349622] audit: type=1130 audit(1572499139.031:98): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   40.716153] audit: type=1130 audit(1572499139.397:99): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   40.821104] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   47.024427] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   49.330765] audit: type=1131 audit(1572499148.014:100): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user@969 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   49.338774] audit: type=1131 audit(1572499148.021:101): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user-runtime-dir@969 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   51.005671] audit: type=1131 audit(1572499149.687:102): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   53.140055] audit: type=1130 audit(1572499151.821:103): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   53.211122] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   59.401131] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   63.004496] audit: type=1131 audit(1572499161.687:104): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   65.518240] audit: type=1130 audit(1572499164.201:105): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   65.584434] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   71.757782] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   76.007103] audit: type=1131 audit(1572499174.687:106): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   77.874156] audit: type=1130 audit(1572499176.557:107): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   77.957836] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   84.117783] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   88.008960] audit: type=1131 audit(1572499186.691:108): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   90.235067] audit: type=1130 audit(1572499188.917:109): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   90.317781] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   96.491127] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[  100.008245] audit: type=1131 audit(1572499198.691:110): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  102.608210] audit: type=1130 audit(1572499201.291:111): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  102.690547] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[  108.877784] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
.....
..... Lots of same message
.....
[ 1555.006073] audit: type=1131 audit(1572500653.687:364): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1556.694618] audit: type=1130 audit(1572500655.377:365): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1556.775603] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1562.970906] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1567.007714] audit: type=1131 audit(1572500665.691:366): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1569.088027] audit: type=1130 audit(1572500667.771:367): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1569.144215] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1575.330893] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1579.007277] audit: type=1131 audit(1572500677.691:368): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1581.447017] audit: type=1130 audit(1572500680.127:369): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1581.520891] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
.....
..... continoues trying to reconnect
.....
----------------------------------------------------------------------------------------

I am using KDE and journalctl is below:

Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63372, resource id: 35653485, major code: 19 (DeleteProperty), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63375, resource id: 35653485, major code: 19 (DeleteProperty), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63376, resource id: 35653485, major code: 18 (ChangeProperty), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63377, resource id: 35653485, major code: 19 (DeleteProperty), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63378, resource id: 35653485, major code: 19 (DeleteProperty), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63379, resource id: 35653485, major code: 19 (DeleteProperty), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63380, resource id: 35653485, major code: 7 (ReparentWindow), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63381, resource id: 35653485, major code: 6 (ChangeSaveSet), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63382, resource id: 35653485, major code: 2 (ChangeWindowAttributes), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63383, resource id: 35653485, major code: 10 (UnmapWindow), minor code: 0
Eki 31 08:33:30 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 63468, resource id: 35653491, major code: 15 (QueryTree), minor code: 0
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.7515] device (eno1): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Withdrawing address record for 10.6.1.77 on eno1.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Leaving mDNS multicast group on interface eno1.IPv4 with address 10.6.1.77.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Interface eno1.IPv4 no longer relevant for mDNS.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Withdrawing address record for fe80::9f94:fdd5:ce5d:d1b3 on eno1.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Leaving mDNS multicast group on interface eno1.IPv6 with address fe80::9f94:fdd5:ce5d:d1b3.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Interface eno1.IPv6 no longer relevant for mDNS.
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.7579] manager: NetworkManager state is now CONNECTED_LOCAL
Eki 31 08:33:31 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63702, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:31 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63703, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.8454] device (eno1): carrier: link connected
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.8456] device (eno1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.8464] policy: auto-activating connection 'MY CONNECTION' (bd544ca7-0af1-4df3-b6d5-82486565ad83)
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.8470] device (eno1): Activation: starting connection 'MY CONNECTION' (bd544ca7-0af1-4df3-b6d5-82486565ad83)
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.8471] device (eno1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.8474] manager: NetworkManager state is now CONNECTING
Eki 31 08:33:31 MY-HOSTNAME kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9370] device (eno1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9375] device (eno1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Joining mDNS multicast group on interface eno1.IPv6 with address fe80::9f94:fdd5:ce5d:d1b3.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: New relevant interface eno1.IPv6 for mDNS.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Registering new address record for fe80::9f94:fdd5:ce5d:d1b3 on eno1.*.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Joining mDNS multicast group on interface eno1.IPv4 with address 10.6.1.77.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: New relevant interface eno1.IPv4 for mDNS.
Eki 31 08:33:31 MY-HOSTNAME avahi-daemon[609]: Registering new address record for 10.6.1.77 on eno1.IPv4.
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <warn>  [1572500011.9390] acd[0x55f8eecbdb20,3]: couldn't init ACD for announcing addresses on interface 'eno1': İşleme izin verilmedi
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9391] device (eno1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9405] device (eno1): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9407] device (eno1): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9410] manager: NetworkManager state is now CONNECTED_LOCAL
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9418] manager: NetworkManager state is now CONNECTED_SITE
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9419] policy: set 'MY CONNECTION' (eno1) as default for IPv4 routing and DNS
Eki 31 08:33:31 MY-HOSTNAME NetworkManager[611]: <info>  [1572500011.9458] device (eno1): Activation: successful, device activated.
Eki 31 08:33:31 MY-HOSTNAME kdeinit5[1122]: plasma-nm: Unhandled active connection state change:  1
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63971, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63972, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63973, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63974, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63975, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63976, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63977, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Eki 31 08:33:32 MY-HOSTNAME kwin_x11[1182]: qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 63978, resource id: 0, major code: 14 (GetGeometry), minor code: 0
Comment 24 Vitaly Lifshits 2019-10-31 14:49:55 UTC
Please try applying this:

@@ -5208,6 +5208,14 @@
                        mac->ops.get_link_up_info(&adapter->hw,
                                                  &adapter->link_speed,
                                                  &adapter->link_duplex);
+
+                       /* Check for Duplex mismatch in 1gb */
+                       if (adapter->link_duplex == HALF_DUPLEX &&
+                           adapter->link_speed == SPEED_1000) {
+                               e1000e_down(adapter, true);
+                               e1000e_up(adapter);
+                       }
+
                        e1000_print_link_info(adapter);

                        /* check if SmartSpeed worked */
Comment 25 peter_s 2019-11-04 19:37:58 UTC
Does not work for me. Intel I219V nic (on Asrock Z390 Taichi motherboard).
Arch linux kernel 5.3.8.

#dmesg|grep e1000

Nov 03 23:09:54 kernel: e1000e: loading out-of-tree module taints kernel.
Nov 03 23:09:54 kernel: e1000e: module verification failed: signature and/or required key missing - tainting kernel
Nov 03 23:09:54 kernel: e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-l
Nov 03 23:09:54 kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 70:85:c2:a4:d0:16
Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eno1: renamed from eth0
Nov 03 23:10:01 kernel: e1000e: eno1 NIC Link is Down
Comment 26 Michael Groh 2019-11-05 08:41:37 UTC
(In reply to Vitaly Lifshits from comment #24)
> Please try applying this:
> 
> @@ -5208,6 +5208,14 @@
>                         mac->ops.get_link_up_info(&adapter->hw,
>                                                   &adapter->link_speed,
>                                                   &adapter->link_duplex);
> +
> +                       /* Check for Duplex mismatch in 1gb */
> +                       if (adapter->link_duplex == HALF_DUPLEX &&
> +                           adapter->link_speed == SPEED_1000) {
> +                               e1000e_down(adapter, true);
> +                               e1000e_up(adapter);
> +                       }
> +
>                         e1000_print_link_info(adapter);
> 
>                         /* check if SmartSpeed worked */

Hello Vitaly,

i applied this to 5.4.0-rc5, and it still does not work. Here is dmesg after plugging in the ethernet cable:

[  180.620029] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[  180.620136] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[  186.994114] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[  193.835072] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[  200.617964] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

(i did not apply the "STATUS" patch)

Is there anything more i could test?

Thank you, 
Michael
Comment 27 peter_s 2019-11-06 08:25:19 UTC
(In reply to peter_s from comment #25)
> Does not work for me. Intel I219V nic (on Asrock Z390 Taichi motherboard).
> Arch linux kernel 5.3.8.
> 
> #dmesg|grep e1000
> 
> Nov 03 23:09:54 kernel: e1000e: loading out-of-tree module taints kernel.
> Nov 03 23:09:54 kernel: e1000e: module verification failed: signature and/or
> required key missing - tainting kernel
> Nov 03 23:09:54 kernel: e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-l
> Nov 03 23:09:54 kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6: Interrupt Throttling Rate
> (ints/sec) set to dynamic conservative mode
> Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
> registered PHC clock
> Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
> x1) 70:85:c2:a4:d0:16
> Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
> Connection
> Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No:
> FFFFFF-0FF
> Nov 03 23:09:54 kernel: e1000e 0000:00:1f.6 eno1: renamed from eth0
> Nov 03 23:10:01 kernel: e1000e: eno1 NIC Link is Down

Note: in this report I also applied the latest proposed patch (by Vitality) on 5.3.8 kernel.
Comment 28 Vitaly Lifshits 2019-11-19 09:39:47 UTC
Recently we got a similar complain that is connected to a different patch, and we are working on reverting it. 

Can you please try to revert it and see if it resolves your issue?

The patch is:
The commit introducing the bug is 59653e6497d16f7ac1d9db088f3959f57ee8c3db
(e1000e: Make watchdog use delayed work)
Comment 29 peter_s 2019-11-19 22:14:34 UTC
You recommendation was tested this way (https://bugs.archlinux.org/task/64018). It does not work for me either.

[    4.195139] e1000e: loading out-of-tree module taints kernel.
[    4.215400] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[    4.218393] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-l
[    4.218394] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    4.218581] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    4.475106] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
[    4.543593] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 70:85:c2:a4:d0:16
[    4.543596] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[    4.543677] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
[    4.604465] e1000e 0000:00:1f.6 eno1: renamed from eth0
[   11.407189] e1000e: eno1 NIC Link is Down
[   12.435905] e1000e: eno1 NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None

Maybe the commit was created earlier. Let us know if there is a new patch to test.
Comment 30 Vitaly Lifshits 2019-11-20 08:08:41 UTC
(In reply to peter_s from comment #29)
> You recommendation was tested this way
> (https://bugs.archlinux.org/task/64018). It does not work for me either.
> 
> [    4.195139] e1000e: loading out-of-tree module taints kernel.
> [    4.215400] e1000e: module verification failed: signature and/or required
> key missing - tainting kernel
> [    4.218393] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-l
> [    4.218394] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    4.218581] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set
> to dynamic conservative mode
> [    4.475106] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered
> PHC clock
> [    4.543593] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1)
> 70:85:c2:a4:d0:16
> [    4.543596] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
> [    4.543677] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
> [    4.604465] e1000e 0000:00:1f.6 eno1: renamed from eth0
> [   11.407189] e1000e: eno1 NIC Link is Down
> [   12.435905] e1000e: eno1 NIC Link is Up 1000 Mbps Half Duplex, Flow
> Control: None
> 
> Maybe the commit was created earlier. Let us know if there is a new patch to
> test.

Can you please try to revert the patch I mentioned with the change I offered in line 24?
(the patch to revert is: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/ethernet/intel/e1000e?id=59653e6497d16f7ac1d9db088f3959f57ee8c3db)

I saw in Michael's comment that the link comes up with correct speed and duplex. Probably the usage of the struct in the patch creates problems when it is working with network manager.
Comment 31 Michael Groh 2019-11-20 08:16:32 UTC
(In reply to Vitaly Lifshits from comment #28)
> Recently we got a similar complain that is connected to a different patch,
> and we are working on reverting it. 
> 
> Can you please try to revert it and see if it resolves your issue?
> 
> The patch is:
> The commit introducing the bug is 59653e6497d16f7ac1d9db088f3959f57ee8c3db
> (e1000e: Make watchdog use delayed work)

Hello Vitaly,

i reverted said commit. However, there were merge-issues in the file drivers/net/ethernet/intel/e1000e/netdev.c. I cleaned those up an will post the "git diff" as a patch based on 5.4.0-rc8.

I can confirm that reverting said patch does indeed help. I can use ethernet on my ThinkPad T470s again.

Will this get reverted from mainline then?

Anyway, thank you for your work, have a nice day,
Michael
Comment 32 Sasha Neftin 2019-11-20 08:18:41 UTC
yes. We will work up to revert it.
Comment 33 Michael Groh 2019-11-20 08:20:35 UTC
Created attachment 285979 [details]
Patch to revert commit 59653e6497d16f7ac1d9db088f3959f57ee8c3db based on 5.4.0-rc8

As suggested in #28 i did revert commit 59653e6497d16f7ac1d9db088f3959f57ee8c3db. There have been merge conflicts which i did try to resolve, this is the diff for 5.4.0-rc8.
Comment 34 peter_s 2019-11-20 19:12:32 UTC
It is working now with reverting back to the right commit (on Archlinux with kernel 5.3.11-arch1-1).

[    4.156613] e1000e: loading out-of-tree module taints kernel.
[    4.162353] e1000e: module verification failed: signature and/or required key missing - tainting kernel
[    4.196559] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-l
[    4.196560] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    4.196767] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    4.437055] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
[    4.503870] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 70:85:c2:a4:d0:16
[    4.503875] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[    4.503956] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
[    4.584850] e1000e 0000:00:1f.6 eno1: renamed from eth0
[   11.243194] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

Can we expect this code in final version of 5.4?
Comment 35 Emrah Urhan 2019-11-26 09:10:47 UTC
Disconnecting repeatedly continues in 5.3.12-1 but retry count has got lower. After connection I experience lots of connection drops instead there is no sign of disconnection. 

This implicit connection drops also exist in 4.9.85-1.

Distro is Arch based Manjaro.
Comment 36 Nazar Mokrynskyi 2019-11-30 00:55:15 UTC
Also affects me with I219-V on AsRock Z370 motherboard.
Kernel 5.3.6 works fine, but 5.4.0-rc8 and 5.4.1 (didn't check other) result in connection being established and then dropped shortly afterwards.
No messages in kernel log besides this:
[   23.043952] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Duplicated as many times as reconnection happened.

So whatever happened between 5.3.6 and 5.4.0-rc8 introduced a regression.

I'm on Ubuntu 19.04 daily (not using stock kernel obviously).
Comment 37 Vitaly Lifshits 2019-12-03 06:47:15 UTC
(In reply to Nazar Mokrynskyi from comment #36)
> Also affects me with I219-V on AsRock Z370 motherboard.
> Kernel 5.3.6 works fine, but 5.4.0-rc8 and 5.4.1 (didn't check other) result
> in connection being established and then dropped shortly afterwards.
> No messages in kernel log besides this:
> [   23.043952] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: Rx/Tx
> 
> Duplicated as many times as reconnection happened.
> 
> So whatever happened between 5.3.6 and 5.4.0-rc8 introduced a regression.
> 
> I'm on Ubuntu 19.04 daily (not using stock kernel obviously).

Did you revert the patch we mentioned in comment 28?
Does it happen if you disable network manager? (sudo systemctl stop NetworkManger)
Comment 38 peter_s 2019-12-03 18:23:01 UTC
It seems that Vitaly's proposal (i.e. eliminating the problematic commit)

> The patch is:
> The commit introducing the bug is 59653e6497d16f7ac1d9db088f3959f57ee8c3db
> (e1000e: Make watchdog use delayed work)

did not hit the 5.4 kernel. The e1000e driver is still broken in kernel 5.4.1. What are the plans now to release this fix?
Comment 39 null31 2019-12-06 17:24:19 UTC
(In reply to Michael Groh from comment #33)
> Created attachment 285979 [details]
> Patch to revert commit 59653e6497d16f7ac1d9db088f3959f57ee8c3db based on
> 5.4.0-rc8
> 
> As suggested in #28 i did revert commit
> 59653e6497d16f7ac1d9db088f3959f57ee8c3db. There have been merge conflicts
> which i did try to resolve, this is the diff for 5.4.0-rc8.

Yeah, applying this patch solved my problem with 5.4.1 and 5.4.2.

I had the same problem from Nazar Mokrynskyi (comment #36) before patching the kernel.

I use an Intel 82579V with linux-ck 5.4.2 and NetworkManager 1.20.
Comment 40 Michael Groh 2019-12-11 13:43:21 UTC
(In reply to Sasha Neftin from comment #32)
> yes. We will work up to revert it.

While this seems to have missed the 5.4 tree, the problem is still there with 5.5-rc1

Is there somewhere i can contribute to that the patch will get reverted? Maybe an LKML thread?

Thanks for your work,
Michael
Comment 41 UnicornsOnLSD 2019-12-14 13:06:25 UTC
Created attachment 286283 [details]
Rejects from Attachment #285979 [details] when applied to kernel 5.5rc1

A new patch would be required (unless I'm doing something wrong).

Hunk #9 FAILED at 7414.

1 out of 9 hunks FAILED -- saving rejects to file drivers/net/ethernet/intel/e1000e/netdev.c.rej
Comment 42 Sasha Neftin 2019-12-14 13:06:45 UTC
Created attachment 286285 [details]
attachment-24216-0.html

Out of office. Expected delayed response
Comment 43 Gerald H. 2019-12-29 23:07:31 UTC
Not fixed for me as of kernel 5.4.6 with the integrated I218V Intel Ethernet on an ASUS H97M-PLUS board.

Unfortunately, I cannot downgrade because older kernels do not support my Realtek RTL8125 2.5GbE Adapter. The patch from comment #33 applied fine to my 5.4.6 kernel though and I'll try over the next few days if this indeed fixes it for me.

Please bring a fix to mainline linux. The e1000e driver is really really important and widely used and should not remain broken for so long.
Comment 44 Gerald H. 2019-12-29 23:59:44 UTC
Update to my previous comment: Applying the patch from comment #33 did NOT fix it for me. e1000e still hangs and resets. It could be related to traffic sent through wireguard interfaces at the time, but on the other hand it might be that I just do not notice those e1000e hangs+resets unless I'm using the wireguard tunnels...

However: I'm able to transfer lots of traffic through the e1000e NIC just fine, but once I have small amounts of Remote Desktop traffic going through e1000e + the wireguard interface, I can reproduce the hang+reset within 5 minutes.

Perhaps this is a different bug?
Comment 45 Vitaly Lifshits 2019-12-30 07:30:35 UTC
(In reply to Gerald H. from comment #44)
> Update to my previous comment: Applying the patch from comment #33 did NOT
> fix it for me. e1000e still hangs and resets. It could be related to traffic
> sent through wireguard interfaces at the time, but on the other hand it
> might be that I just do not notice those e1000e hangs+resets unless I'm
> using the wireguard tunnels...
> 
> However: I'm able to transfer lots of traffic through the e1000e NIC just
> fine, but once I have small amounts of Remote Desktop traffic going through
> e1000e + the wireguard interface, I can reproduce the hang+reset within 5
> minutes.
> 
> Perhaps this is a different bug?

It does look like a different bug since the original issue was with the interface not coming up at all.

Please try disabling TCP segmentation offload (tso), it may solve your issue.
Comment 46 Gerald H. 2019-12-30 12:57:41 UTC
Vitaly, thank you very much. "ethtool -K eno1 tso off" has completely solved my hang+reset problems with the e1000e driver that were happening as soon as the e1000e was involved with wireguard traffic.
Comment 47 peter_s 2020-01-04 23:00:03 UTC
Hi all, the e1000e driver is still in a broken stage (from mid September 5.1.x till now 5.4.7). My understanding is that you successfully identified the commit which causes the problem and many of us tested/confirmed that as well.
Could you please elaborate when you plan to remove it from the kernel code? Thanks.
Comment 48 Sasha Neftin 2020-01-05 07:28:15 UTC
Jeff submit revert by 25/12/2019 - please, stay tuned:

Intel-wired-lan <intel-wired-lan-bounces@osuosl.org>; on behalf of; Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[Intel-wired-lan] [net] e1000e: RInIntel-wired-lan <intel-wired-lan-bounces@osuosl.org>tel-wired-lan <intel-wired-lan-bounces@osuosl.org>evert "e1000e: Make watchdog use delayed work"

This reverts commit 59653e6497d16f7ac1d9db088f3959f57ee8c3db.

This is due to this commit causing driver crashes and connections to reset unexpectedly.
Comment 49 Sasha Neftin 2020-01-05 07:43:53 UTC
Look at: https://patchwork.ozlabs.org/patch/1217709/
Comment 50 Stefan Becker 2020-01-06 09:28:19 UTC
I can confirm that the revert from comment #49, backported to stable 5.4.8 (with one minor change to make it apply) fixes the issue on my Dell Latitude E6420:

00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connectio
n (Lewisville) (rev 04)
        Subsystem: Dell Device 0493
        Flags: bus master, fast devsel, latency 0, IRQ 33
        Memory at e6e00000 (32-bit, non-prefetchable) [size=128K]
        Memory at e6e80000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 5080 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] PCI Advanced Features
        Kernel driver in use: e1000e
        Kernel modules: e1000e

I would appreciate a backport to stable once it has been merged to master. Thank you.
Comment 51 Martin Knoblauch 2020-01-15 11:25:16 UTC
So I am facing the same issue on a Dell Precision M4800 with kernel 5.4.12. Question to Stefan: what was neccessary to make it apply on 5.4.8?
Comment 52 Martin Knoblauch 2020-01-15 11:57:11 UTC
Actually I took the patch from comment #33 which applied cleanly to 5.4.12 and it solved my problem. Please have it included in stable series for 5.4
Comment 53 nvaert1986 2020-01-16 11:34:36 UTC
Attachment: https://bugzilla.kernel.org/attachment.cgi?id=285979&action=diff from bug: https://bugzilla.kernel.org/show_bug.cgi?id=205047#c33 resolves the issue for me, please revert this patch in the mainline kernel.
Comment 54 nvaert1986 2020-01-16 11:36:37 UTC
*** Bug 206219 has been marked as a duplicate of this bug. ***
Comment 55 null31 2020-01-27 18:00:29 UTC
I can confirm that kernel 5.5.0 is working flawless with my NIC 82579V since this release include the revert commit mentioned by https://bugzilla.kernel.org/show_bug.cgi?id=205047#c48. You can see the commit here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.5&id=d5ad7a6a7f3c87b278d7e4973b65682be4e588dd
Comment 56 Sasha Neftin 2020-01-28 06:34:19 UTC
Thanks for confirming that.
Comment 57 Stefan Becker 2020-01-28 08:37:57 UTC
As 5.4.x is a LTS kernel can you please submit the revert to that stable branch too? Thank you.
Comment 58 Martin Knoblauch 2020-01-28 09:32:25 UTC
Please also for the 5.3 stable branch. Even if not LTS, there are people using it.
Comment 59 Amin 2020-02-01 15:27:02 UTC
Hello.
I make some experiments with this - this bug appear after upgrade kernel in Fedora-31 from 5.3.16-300.fc31.x86_64  to 5.4.x

Connection continously start/reset with this messages:

янв 26 15:54:08 Host1 kded5[3837]: plasma-nm: Unhandled active connection state change:  1
янв 26 15:54:08 Host1 kded5[3837]: plasma-nm: Unhandled active connection state change:  1

янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5138] device (enp0s31f6): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5282] dhcp4 (enp0s31f6): canceled DHCP transaction
янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5283] dhcp4 (enp0s31f6): state changed unknown -> done
янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5315] manager: NetworkManager state is now DISCONNECTED
янв 26 15:54:15 Host1 kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0662] device (enp0s31f6): carrier: link connected
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0665] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0671] policy: auto-activating connection 'Onboard_LAN' (708b2b50-68f3-4413-9167-b85dc5741983)
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0677] device (enp0s31f6): Activation: starting connection 'Onboard_LAN' (708b2b50-68f3-4413-9167-b85dc5741983)
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0678] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0682] manager: NetworkManager state is now CONNECTING
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.1550] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.1575] device (enp0s31f6): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.1585] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
янв 26 15:54:15 Host1 kded5[3837]: plasma-nm: Unhandled active connection state change:  1


Switching from DHCP to static-config don't affect - connection continue resetting;

Switching "auto-negotiation" option to "off" can help, also force setting link-speed to 1Gbps/HalfDuplex can help to make connection;
But this connection can not survive/reopen after reboot;

Sleeping in RAM-standby mode and waking up made connection drop to 10mbps and i detect "TxError" errors on switch port (DGS-1100-10ME):
	
TX
OutOctets 	193115544851
OutUcastPkts 	501851548
OutNUcastPkts 	676615
OutErrors 	51788
LateCollisions 	51788
ExcessiveCollisions 	0

In normal state Errors / Collisions not detected on switch interface even under load about 700-800 Mbps.



This bug not all motherboards/NIC appear !

- My Asus STRIX H270F GAMING with bios version 1205 from release date: 05/11/2018 and NIC "Intel Corporation Ethernet Connection (2) I219-V" affected - only in 5.4.14-200 i can start network after setting "Half-Duplex" in connection properties;

- Another PC with MB "ASUS P8P67 EVO" bios version: 3602 release date: 11/01/2012 and NIC "Intel Corporation 82579V Gigabit Network Connection (rev 05)" network work correctly without tweaks with same e1000e driver.

Kernel 5.3.16 work ideal with both MB/NIC.
Comment 60 Vitaly Lifshits 2020-02-02 11:18:43 UTC
(In reply to Amin from comment #59)
> Hello.
> I make some experiments with this - this bug appear after upgrade kernel in
> Fedora-31 from 5.3.16-300.fc31.x86_64  to 5.4.x
> 
> Connection continously start/reset with this messages:
> 
> янв 26 15:54:08 Host1 kded5[3837]: plasma-nm: Unhandled active connection
> state change:  1
> янв 26 15:54:08 Host1 kded5[3837]: plasma-nm: Unhandled active connection
> state change:  1
> 
> янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5138] device
> (enp0s31f6): state change: ip-config -> unavailable (reason
> 'carrier-changed', sys-iface-state: 'managed')
> янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5282] dhcp4
> (enp0s31f6): canceled DHCP transaction
> янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5283] dhcp4
> (enp0s31f6): state changed unknown -> done
> янв 26 15:54:14 Host1 NetworkManager[949]: <info>  [1580043254.5315]
> manager: NetworkManager state is now DISCONNECTED
> янв 26 15:54:15 Host1 kernel: e1000e: enp0s31f6 NIC Link is Up 1000 Mbps
> Full Duplex, Flow Control: Rx/Tx
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0662] device
> (enp0s31f6): carrier: link connected
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0665] device
> (enp0s31f6): state change: unavailable -> disconnected (reason
> 'carrier-changed', sys-iface-state: 'managed')
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0671] policy:
> auto-activating connection 'Onboard_LAN'
> (708b2b50-68f3-4413-9167-b85dc5741983)
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0677] device
> (enp0s31f6): Activation: starting connection 'Onboard_LAN'
> (708b2b50-68f3-4413-9167-b85dc5741983)
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0678] device
> (enp0s31f6): state change: disconnected -> prepare (reason 'none',
> sys-iface-state: 'managed')
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.0682]
> manager: NetworkManager state is now CONNECTING
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.1550] device
> (enp0s31f6): state change: prepare -> config (reason 'none',
> sys-iface-state: 'managed')
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.1575] device
> (enp0s31f6): state change: config -> ip-config (reason 'none',
> sys-iface-state: 'managed')
> янв 26 15:54:15 Host1 NetworkManager[949]: <info>  [1580043255.1585] dhcp4
> (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
> янв 26 15:54:15 Host1 kded5[3837]: plasma-nm: Unhandled active connection
> state change:  1
> 
> 
> Switching from DHCP to static-config don't affect - connection continue
> resetting;
> 
> Switching "auto-negotiation" option to "off" can help, also force setting
> link-speed to 1Gbps/HalfDuplex can help to make connection;
> But this connection can not survive/reopen after reboot;
> 
> Sleeping in RAM-standby mode and waking up made connection drop to 10mbps
> and i detect "TxError" errors on switch port (DGS-1100-10ME):
>       
> TX
> OutOctets     193115544851
> OutUcastPkts  501851548
> OutNUcastPkts         676615
> OutErrors     51788
> LateCollisions        51788
> ExcessiveCollisions   0
> 
> In normal state Errors / Collisions not detected on switch interface even
> under load about 700-800 Mbps.
> 
> 
> 
> This bug not all motherboards/NIC appear !
> 
> - My Asus STRIX H270F GAMING with bios version 1205 from release date:
> 05/11/2018 and NIC "Intel Corporation Ethernet Connection (2) I219-V"
> affected - only in 5.4.14-200 i can start network after setting
> "Half-Duplex" in connection properties;
> 
> - Another PC with MB "ASUS P8P67 EVO" bios version: 3602 release date:
> 11/01/2012 and NIC "Intel Corporation 82579V Gigabit Network Connection (rev
> 05)" network work correctly without tweaks with same e1000e driver.
> 
> Kernel 5.3.16 work ideal with both MB/NIC.

It seems that fedora's 5.4.x kernel didn't revert the problematic patch. You can try using the latest stable vanilla kernel 5.5.1, or to wait until Fedora will update the kernel.
Comment 61 Hakan Bayindir 2020-02-02 19:28:30 UTC
I'm having the same problem with 82579V on 5.14.13, but only on gigabit or auto-negotiated-to-gigabit connections. If I force driver to work at 100Mbps via network manager, the link works reliably.

Another strange point is gigabit link works sometimes in first boot, but if I suspend/resume, it's impossible to connect in gigabit speeds. It just resets the phy indefinitely. Forcing card to 100Mbps keeps connection stable even after suspend/resume cycle.
Comment 62 Emrah Urhan 2020-02-03 05:53:13 UTC
Resolved with Linux 5.5.0-1-MANJARO in Manjaro Dist. I also tested sleep/resume and everything seems to be fine.
Comment 63 UndeadKernel 2020-04-16 09:02:21 UTC
With kernel 5.6.4-arch1-1, this bug seems to still be present.

lspci:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
        Subsystem: Lenovo Ethernet Connection (6) I219-V
        Flags: bus master, fast devsel, latency 0, IRQ 148
        Memory at dd300000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: e1000e
        Kernel modules: e1000e


ip link:

5: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether 98:fa:9b:5f:ee:8f brd ff:ff:ff:ff:ff:ff

The interface is correctly shown with `ip link`. The driver keeps connecting and disconnecting constantly still.
Comment 64 Amin 2020-04-23 09:28:53 UTC
In Fedora-31 this bug similar to fully resolve with 5.5.x kernels;
I use 5.5.x kernels now and don't have any problems with e1000e in fc31.
Comment 65 nvaert1986 2020-04-23 14:30:31 UTC
(In reply to UndeadKernel from comment #63)
> With kernel 5.6.4-arch1-1, this bug seems to still be present.
> 
> lspci:
> 
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6)
> I219-V (rev 30)
>         Subsystem: Lenovo Ethernet Connection (6) I219-V
>         Flags: bus master, fast devsel, latency 0, IRQ 148
>         Memory at dd300000 (32-bit, non-prefetchable) [size=128K]
>         Capabilities: <access denied>
>         Kernel driver in use: e1000e
>         Kernel modules: e1000e
> 
> 
> ip link:
> 
> 5: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel
> state DOWN mode DEFAULT group default qlen 1000
>     link/ether 98:fa:9b:5f:ee:8f brd ff:ff:ff:ff:ff:ff
> 
> The interface is correctly shown with `ip link`. The driver keeps connecting
> and disconnecting constantly still.

Are you using the intel e1000e or the in-kernel e1000e driver? As the driver from the intel.com website still contains the bug and overrides the in-kernel driver. The driver also keeps getting re-compiled if you're using dkms. If you tried using the driver from the intel website (like me), then you need to remove the e1000e-dkms package and manually remove the module from the source tree update folder or compile a newer kernel after removing e1000e-dkms.

When using 5.6.5 / 5.6.6 or 5.4.30ish then it should work, as I'm having a similar adapter to yours.

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)
	Subsystem: Lenovo Ethernet Connection (7) I219-V
	Flags: bus master, fast devsel, latency 0, IRQ 159
	Memory at a4300000 (32-bit, non-prefetchable) [size=128K]
	Capabilities: [c8] Power Management version 3
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Kernel driver in use: e1000e
	Kernel modules: e1000e
Comment 66 lgpserranegra 2020-04-26 17:35:10 UTC
This issue still happens with my 82579V. Under any significant load the driver starts to hang and drop packets, tried several kernels from 3.10, 4.18 now to 5.6.7, centos 7, 8 now 8.1, tried the kernel drivers, intel latest 3.8.4-NAPI drivers, tried several settings tso gso nothing helps

I also have some servers using I210 and 82574L which no issues whatsoever, I210 never had issues and 82574L only had to update to intel 3.8.4 driver. The issue seems to be with 82579V and I219 only
Comment 67 Felix Kaechele 2020-05-15 20:45:45 UTC
Created attachment 289157 [details]
dmesg output of Lenovo ThinkCentre M900 Tiny with the error

I am seeing this as well and it is reproducible for me.

Kernel: kernel-5.6.12-300.fc32.x86_64

Setup:
Hardware name: LENOVO 10FLS0A000/30D0, BIOS FWKTACA   03/24/2020 (Lenovo ThinkCentre M900 Tiny)
Host OS: Fedora 32 KVM host
Guest OS: CentOS 8.1 KVM guest

KVM network configuration: Bridged to Intel I219-LM on the Host hardware.

Scenario
When compiling OpenWrt on the CentOS guest through SSH using make V=s (full verbosity) the ethernet adapter will hang and I receive an SSH error as soon as there is a high volume of console output.
The connection is reset but recovers eventually after some time.
Comment 68 Andy Mann 2020-06-03 15:48:21 UTC
Perhaps a cable issue? I note some logs posted here have "carrier-changed" errors.

I had "carrier-changed" errors, frequently disconnecting my ethernet, and after trying every solution in multiple forums, I finally upgraded the CAT5 cable to a CAT7 cable (10 meter run - cheap on ebay) which cured the problem. Apparently, the old CAT5 cable couldn't handle my ISP's upgraded speeds (Virgin Media, UK). The new CAT7 cable has not only stopped the dropouts, but also now allows the speed to auto-configure from 100Mb/s to 1000Mb/s. Hope that helps anyone looking to fix "carrier-changed" problems.
Comment 69 dan3805 2020-06-05 04:08:20 UTC
(In reply to Andy Mann from comment #68)
> Perhaps a cable issue? I note some logs posted here have "carrier-changed"
> errors.
> 
> I had "carrier-changed" errors, frequently disconnecting my ethernet, and
> after trying every solution in multiple forums, I finally upgraded the CAT5
> cable to a CAT7 cable (10 meter run - cheap on ebay) which cured the
> problem. Apparently, the old CAT5 cable couldn't handle my ISP's upgraded
> speeds (Virgin Media, UK). The new CAT7 cable has not only stopped the
> dropouts, but also now allows the speed to auto-configure from 100Mb/s to
> 1000Mb/s. Hope that helps anyone looking to fix "carrier-changed" problems.

that is not the cable , i have the same issue with multiple server at OVH . (datacenter)
Comment 70 Roberto Viola 2020-07-22 06:05:09 UTC
Yesterday i installed the kernel version "5.4.44-2-pve #1 SMP PVE 5.4.44-2 (Wed, 01 Jul 2020 16:37:57 +0200)" that includes the patch above.
The ethernet seems fine but i got this on the logs. The ethernet is still working but i think something it happened (maybe, with kernel 5.3.x i would had the up/down loop instead)

[37932.264016] hrtimer: interrupt took 11516 ns
[39959.724485] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                 TDH                  <42>
                 TDT                  <6f>
                 next_to_use          <6f>
                 next_to_clean        <42>
               buffer_info[next_to_clean]:
                 time_stamp           <100974783>
                 next_to_watch        <42>
                 jiffies              <100974a40>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[39961.708425] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                 TDH                  <42>
                 TDT                  <6f>
                 next_to_use          <6f>
                 next_to_clean        <42>
               buffer_info[next_to_clean]:
                 time_stamp           <100974783>
                 next_to_watch        <42>
                 jiffies              <100974c30>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[39963.724435] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                 TDH                  <42>
                 TDT                  <6f>
                 next_to_use          <6f>
                 next_to_clean        <42>
               buffer_info[next_to_clean]:
                 time_stamp           <100974783>
                 next_to_watch        <42>
                 jiffies              <100974e28>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[39964.844187] ------------[ cut here ]------------
[39964.844190] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out
[39964.844201] WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:448 dev_watchdog+0x264/0x270
[39964.844202] Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi zfs(PO) zunicode(PO) zlua(PO) zavl(PO) snd_hda_codec_realtek icp(PO) snd_hda_codec_generic ledtrig_audio zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci mei_hdcp vfio_virqfd kvmgt intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg btusb aesni_intel btrtl snd_hda_codec crypto_simd btbcm btintel snd_hda_core cryptd bluetooth glue_helper snd_hwdep snd_pcm ecdh_generic snd_timer ecc intel_cstate snd soundcore mei_me intel_rapl_perf 8250_dw intel_pch_thermal mei intel_wmi_thunderbolt pcspkr mac_hid acpi_pad i915 vfio_mdev mdev vfio_iommu_type1 vfio drm_kms_helper drm i2c_algo_bit
[39964.844226]  fb_sys_fops syscopyarea sysfillrect sysimgblt kvm irqbypass sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c usbhid hid intel_lpss_pci xhci_pci intel_lpss e1000e ahci idma64 i2c_i801 virt_dma libahci xhci_hcd wmi video
[39964.844237] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P           O      5.4.44-2-pve #1
[39964.844237] Hardware name: SECO S.p.A. MB09/MB09, BIOS MB09 1.03 2018/08/30
[39964.844239] RIP: 0010:dev_watchdog+0x264/0x270
[39964.844240] Code: 48 85 c0 75 e6 eb a0 4c 89 ef c6 05 81 1a eb 00 01 e8 80 b1 fa ff 89 d9 4c 89 ee 48 c7 c7 70 2f 03 89 48 89 c2 e8 cd 7a 74 ff <0f> 0b eb 82 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
[39964.844241] RSP: 0018:ffffbfa280230e58 EFLAGS: 00010282
[39964.844242] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[39964.844243] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9a299db578c0
[39964.844243] RBP: ffffbfa280230e88 R08: 00000000000003a1 R09: 0000000000000004
[39964.844244] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[39964.844244] R13: ffff9a2990a60000 R14: ffff9a2990a60480 R15: ffff9a2991497c80
[39964.844245] FS:  0000000000000000(0000) GS:ffff9a299db40000(0000) knlGS:0000000000000000
[39964.844246] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39964.844246] CR2: 00007ff83fb170f4 CR3: 00000002fc20a006 CR4: 00000000003626e0
[39964.844247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[39964.844247] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[39964.844248] Call Trace:
[39964.844249]  <IRQ>
[39964.844252]  ? pfifo_fast_enqueue+0x160/0x160
[39964.844254]  call_timer_fn+0x32/0x130
[39964.844255]  run_timer_softirq+0x1a5/0x430
[39964.844257]  ? enqueue_hrtimer+0x3c/0x90
[39964.844258]  ? ktime_get+0x3c/0xa0
[39964.844260]  ? lapic_next_deadline+0x26/0x30
[39964.844261]  ? clockevents_program_event+0x93/0xf0
[39964.844264]  __do_softirq+0xdc/0x2d4
[39964.844265]  irq_exit+0xa9/0xb0
[39964.844267]  smp_apic_timer_interrupt+0x79/0x130
[39964.844268]  apic_timer_interrupt+0xf/0x20
[39964.844269]  </IRQ>
[39964.844271] RIP: 0010:cpuidle_enter_state+0xbd/0x450
[39964.844272] Code: ff e8 a7 b4 84 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 ca 22 8b ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d
[39964.844273] RSP: 0018:ffffbfa280113e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[39964.844274] RAX: ffff9a299db6ad40 RBX: ffffffff89357a00 RCX: 000000000000001f
[39964.844274] RDX: 000024590a574828 RSI: 000000002aaa9f7b RDI: 0000000000000000
[39964.844275] RBP: ffffbfa280113e88 R08: 0000000000000002 R09: 000000000002a5c0
[39964.844275] R10: 00006d19820469de R11: ffff9a299db699e0 R12: ffffdfa27fd52500
[39964.844276] R13: 0000000000000006 R14: ffffffff89357c58 R15: ffffffff89357c40
[39964.844278]  ? cpuidle_enter_state+0x99/0x450
[39964.844280]  cpuidle_enter+0x2e/0x40
[39964.844282]  call_cpuidle+0x23/0x40
[39964.844283]  do_idle+0x22c/0x270
[39964.844285]  cpu_startup_entry+0x1d/0x20
[39964.844286]  start_secondary+0x166/0x1c0
[39964.844288]  secondary_startup_64+0xa4/0xb0
[39964.844289] ---[ end trace 6978f9a6f235f4ac ]---
[39964.844300] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
Comment 71 Roberto Viola 2020-07-29 06:40:40 UTC
ok i confirm that the patch DOESN'T SOLVE the issue. It just postpone it, but it doesn't solve it at all.

After 1 week of uptime, the error had back:

Jul 26 00:00:02 pvei7 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="798" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Jul 27 00:00:01 pvei7 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="798" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Jul 27 02:59:04 pvei7 lvm[492]: WARNING: Thin pool pve-data-tpool data is now 90.01% full.
Jul 27 18:10:03 pvei7 kernel: [511516.174678] perf: interrupt took too long (3917 > 3911), lowering kernel.perf_event_max_sample_rate to 51000
Jul 28 00:00:01 pvei7 rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="798" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Jul 28 19:46:58 pvei7 kernel: [603731.430464] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:47:02 pvei7 kernel: [603735.319318] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:47:02 pvei7 kernel: [603735.319389] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:47:02 pvei7 kernel: [603735.319394] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:47:07 pvei7 kernel: [603740.386574] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:47:11 pvei7 kernel: [603744.357497] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:47:11 pvei7 kernel: [603744.357566] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:47:11 pvei7 kernel: [603744.357569] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:47:21 pvei7 kernel: [603754.466795] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:47:25 pvei7 kernel: [603758.362778] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:47:25 pvei7 kernel: [603758.362848] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:47:25 pvei7 kernel: [603758.362851] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:47:35 pvei7 kernel: [603768.547088] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:47:39 pvei7 kernel: [603772.512977] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:47:39 pvei7 kernel: [603772.513047] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:47:39 pvei7 kernel: [603772.513050] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:47:49 pvei7 kernel: [603782.371358] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:47:53 pvei7 kernel: [603786.306186] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:47:53 pvei7 kernel: [603786.306258] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:47:53 pvei7 kernel: [603786.306262] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:48:03 pvei7 kernel: [603796.455426] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:48:07 pvei7 kernel: [603800.433411] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:48:07 pvei7 kernel: [603800.433483] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:48:07 pvei7 kernel: [603800.433487] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:48:17 pvei7 kernel: [603810.531636] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:48:21 pvei7 kernel: [603814.478659] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:48:21 pvei7 kernel: [603814.478732] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:48:21 pvei7 kernel: [603814.478736] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:48:31 pvei7 kernel: [603824.355883] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:48:35 pvei7 kernel: [603828.308836] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:48:35 pvei7 kernel: [603828.308914] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:48:35 pvei7 kernel: [603828.308918] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:48:40 pvei7 kernel: [603833.572009] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:48:44 pvei7 kernel: [603837.425924] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:48:44 pvei7 kernel: [603837.426002] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:48:44 pvei7 kernel: [603837.426005] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:48:49 pvei7 kernel: [603842.536142] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:48:53 pvei7 kernel: [603846.400033] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:48:53 pvei7 kernel: [603846.400124] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:48:53 pvei7 kernel: [603846.400128] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:48:58 pvei7 kernel: [603851.496230] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:49:02 pvei7 kernel: [603855.363223] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:49:02 pvei7 kernel: [603855.363292] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:49:02 pvei7 kernel: [603855.363296] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:49:12 pvei7 kernel: [603865.572422] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:49:16 pvei7 kernel: [603869.442345] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:49:16 pvei7 kernel: [603869.442418] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:49:16 pvei7 kernel: [603869.442422] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:49:21 pvei7 kernel: [603874.532574] vmbr0: port 1(enp0s31f6) entered disabled state
Jul 28 19:49:25 pvei7 kernel: [603878.468504] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jul 28 19:49:25 pvei7 kernel: [603878.468576] vmbr0: port 1(enp0s31f6) entered blocking state
Jul 28 19:49:25 pvei7 kernel: [603878.468579] vmbr0: port 1(enp0s31f6) entered forwarding state
Jul 28 19:49:35 pvei7 kernel: [603888.356714] vmbr0: port 1(enp0s31f6) entered disabled state
Comment 72 Adam Chasen 2021-03-16 20:07:04 UTC
I am experiencing similarly reported behavior with 5.10.22-200.fc33.x86_64 (Fedora). Usually works for an extended period (days) and then enters a connect/disconnect cycle with short or no connectivity.

Attempted different switch and cable while it was in connect/disconnect cycles.

Excerpt from logs:

Mar 16 15:47:29 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                                                TDH                  <0>
                                                TDT                  <1>
                                                next_to_use          <1>
                                                next_to_clean        <0>
                                              buffer_info[next_to_clean]:
                                                time_stamp           <10b2a22d2>
                                                next_to_watch        <0>
                                                jiffies              <10b2a3a00>
                                                next_to_watch.status <0>
                                              MAC Status             <40080083>
                                              PHY Status             <796d>
                                              PHY 1000BASE-T Status  <3800>
                                              PHY Extended Status    <3000>
                                              PCI Status             <10>
Mar 16 15:47:31 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                                                TDH                  <0>
                                                TDT                  <1>
                                                next_to_use          <1>
                                                next_to_clean        <0>
                                              buffer_info[next_to_clean]:
                                                time_stamp           <10b2a22d2>
                                                next_to_watch        <0>
                                                jiffies              <10b2a41c0>
                                                next_to_watch.status <0>
                                              MAC Status             <40080083>
                                              PHY Status             <796d>
                                              PHY 1000BASE-T Status  <3800>
                                              PHY Extended Status    <3000>
                                              PCI Status             <10>
Mar 16 15:47:33 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                                                TDH                  <0>
                                                TDT                  <1>
                                                next_to_use          <1>
                                                next_to_clean        <0>
                                              buffer_info[next_to_clean]:
                                                time_stamp           <10b2a22d2>
                                                next_to_watch        <0>
                                                jiffies              <10b2a4980>
                                                next_to_watch.status <0>
                                              MAC Status             <40080083>
                                              PHY Status             <796d>
                                              PHY 1000BASE-T Status  <3800>
                                              PHY Extended Status    <3000>
                                              PCI Status             <10>
Mar 16 15:47:33 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Mar 16 15:47:33 localhost.localdomain NetworkManager[1140]: <info>  [1615924053.4093] device (eno1): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Mar 16 15:47:38 localhost.localdomain systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Mar 16 15:47:38 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 16 15:47:39 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Mar 16 15:47:39 localhost.localdomain NetworkManager[1140]: <info>  [1615924059.2580] device (eno1): carrier: link connected
Mar 16 15:47:39 localhost.localdomain NetworkManager[1140]: <info>  [1615924059.2586] device (eno1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Mar 16 15:47:41 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                                                TDH                  <0>
                                                TDT                  <1>
                                                next_to_use          <1>
                                                next_to_clean        <0>
                                              buffer_info[next_to_clean]:
                                                time_stamp           <10b2a60df>
                                                next_to_watch        <0>
                                                jiffies              <10b2a68c1>
                                                next_to_watch.status <0>
                                              MAC Status             <40080083>
                                              PHY Status             <796d>
                                              PHY 1000BASE-T Status  <3800>
                                              PHY Extended Status    <3000>
                                              PCI Status             <10>
Mar 16 15:47:43 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                                                TDH                  <0>
                                                TDT                  <1>
                                                next_to_use          <1>
                                                next_to_clean        <0>
                                              buffer_info[next_to_clean]:
                                                time_stamp           <10b2a60df>
                                                next_to_watch        <0>
                                                jiffies              <10b2a7080>
                                                next_to_watch.status <0>
                                              MAC Status             <40080083>
                                              PHY Status             <796d>
                                              PHY 1000BASE-T Status  <3800>
                                              PHY Extended Status    <3000>
                                              PCI Status             <10>
Mar 16 15:47:44 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Mar 16 15:47:44 localhost.localdomain NetworkManager[1140]: <info>  [1615924064.6731] device (eno1): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Mar 16 15:47:50 localhost.localdomain kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
```