|Summary:||Regression in e1000e with kernel 4.14.3|
|Component:||Other||Assignee:||Stephen Hemminger (stephen)|
|Severity:||normal||CC:||andrey.vihrov, athantor+bugzillakernel, benjamin.poirier, conrad, hydrapolic, jeffrey.t.kirsher, jn, jur, kernel-bugzilla, kernel, kernel, mail, marc.schlaich, michael, public, rauchwolke, rwarsow, till2.schaefer, wgh|
screenshot network manager
patch from https://marc.info/?l=linux-kernel&m=151272209903675&w=2
Description Ronald 2017-11-30 19:32:04 UTC
Created attachment 260963 [details] screenshot network manager I got a regression with my network interface e1000e. With kernel 4.14.3 and Fedora 27 the network interface e1000e doesn't come up if I set MTU to 1492 and boot with that settings. with MTU set to auto sometimes the interface is active and sometimes not. In network manager (Fedora 27) the button to activate/deactivate the network interface is greyed out. Computers backside network LED'S are off. there are no errors in the logs nor selinux issues, ... up to 4.13.2 all kernels from 4.13 series I got no errors. how can I debug this ? Maybe from interest: with the late Fedora 27 Beta (I believe kernel 4.12.x) I remember issues switching interface MTU from auto to 1492. lspci: ===== 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a72 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 123 Region 0: Memory at df100000 (32-bit, non-prefetchable) [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: e1000e dmesg | grep -iE 'eth|enp0s31f6': ================================ [ 1.255886] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 4c:cc:6a:bc:8c:a2 [ 1.255889] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection [ 1.256032] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: FFFFFF-0FF [ 1.495646] e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0 [ 4.091853] e1000e 0000:00:1f.6 enp0s31f6: changing MTU from 1500 to 1492 [ 4.250990] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready [ 4.463180] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready nmcli connection show: ===================== NAME UUID TYPE DEVICE Profile 1 0e0cc197-be48-43a7-83d8-423ee89a448e 802-3-ethernet --
Comment 1 Ronald 2017-11-30 19:52:25 UTC
update: I realized that I *always* set MTU to 1492 via rc.local. commenting that out and rebooted. Anyway even without rc.local MTU settings e1000e activation is flaky. I need a complete poweroff and reboot to activate network if it doesn't get activated.
Comment 2 Ronald 2017-12-01 15:36:10 UTC
Update: e10002 doesn't not get activated at all. back to 4.13.2. Diff: new BIOS yesterday.
Comment 3 Tim Ruffing 2017-12-04 16:05:09 UTC
I have the same issue on Arch Linux (regression between 4.13.12 and 4.14.3). I checked only a few times with the new kernel and the interface worked never with the 4.14 kernel, i.e., the link was never ready. Interestingly, when I tried to reboot (soft), my machine was hanging in the UEFI... After Ctrl-Alt-Del it was hanging again. So I need a full poweroff. The whole thing happened twice and I've never seen this before, so it could actually be related to the driver issue. Maybe the device is in some weird state and UEFI tries to initialize it, or similar.
Comment 4 Tim Ruffing 2017-12-04 16:06:58 UTC
Forgot to include this: 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-LM (rev 21) Subsystem: Lenovo Ethernet Connection (4) I219-LM Flags: bus master, fast devsel, latency 0, IRQ 128 Memory at ed200000 (32-bit, non-prefetchable) [size=128K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] PCI Advanced Features Kernel driver in use: e1000e Kernel modules: e1000e This is on a Lenovo T570.
Comment 5 Ronald 2017-12-04 20:26:47 UTC
Kernel 4.14.4-rc.1 dito: no fun !
Comment 6 Felix Walter 2017-12-05 08:33:24 UTC
Can confirm this regression, also on a I219-LM Ethernet controller. Looks like it works with v4.14.2, but not with v4.14.3. There are only few changes to e1000e between these revisions and they seem to affect the link status: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/diff/drivers/net/ethernet/intel/e1000e?id=v4.14.3&id2=v4.14.2
Comment 7 Felix Walter 2017-12-05 12:04:36 UTC
Update: Seems reverting commit 830466993daf09adbd179e4c74db07279a088f8c ("e1000e: Separate signaling for link check/link up", upstream: 19110cfbb34d4af0cdfe14cd243f3b09dc95b013) on top of v4.14.3 fixes it.
Comment 8 Thomas Mann 2017-12-06 03:25:02 UTC
reverting commit 830466993daf09adbd179e4c74db07279a088f8c fixes the problem for me too
Comment 9 Thomas Mann 2017-12-06 03:33:34 UTC
btw i have a different error: for me the device comes up and takes an ip address (after bootup) but as soon the network cable is detached or the connected switch goes down networkmanager isnt "seeing" the interface anymore until i reboot the machine. reverting 830466993daf09adbd179e4c74db07279a088f8c fixes the problem
Comment 10 J. Niggemann 2017-12-06 09:13:53 UTC
I confirm that reverting 830466993daf09adbd179e4c74db07279a088f8c on top of 4.14.4 fixes the issue. 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection I219-LM (rev 21) [ 0.744127] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 0.744127] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 0.945030] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) c8:5b:76:02:aa:10 [ 0.945033] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection [ 0.945260] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: 1000FF-0FF [ 8.178968] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Comment 11 Ronald 2017-12-08 20:51:32 UTC
applied patch from here: https://marc.info/?l=linux-kernel&m=151272209903675&w=2 on top of 4.14.5-rc1 and e1000e is great again :)
Comment 12 Michael Marley 2017-12-10 13:25:33 UTC
I too have this issue. In my case, on a NUC5i5MYHE, the network, once it goes down, does not seem to come back up until I connect using AMT. At that point, the link comes up and works normally.
Comment 13 Thomas Mann 2017-12-11 18:03:40 UTC
applied the patch from here https://marc.info/?l=linux-kernel&m=151272209903675&w=2 too ontop of 4.14.5 fixes the problem for me too
Comment 14 Thomas Mann 2017-12-14 15:32:29 UTC
please get the patch into stable. still not fixed in 4.14.6
Comment 15 Thomas Mann 2017-12-14 15:33:25 UTC
Created attachment 261183 [details] patch from https://marc.info/?l=linux-kernel&m=151272209903675&w=2
Comment 16 Till Schäfer 2017-12-20 14:21:19 UTC
I can confirm the issue for 4.14.7 on two machines (both gentoo-sources): - Lenovo Thinkpad T440p: 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04) - Desktop: 00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 04) Link does not come up: from dmesg on the thinkpad: [ 0.137961] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 0.137964] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 0.138078] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 0.225862] e1000e 0000:00:19.0 0000:00:19.0 (uninitialized): registered PHC clock [ 0.305328] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 68:f7:28:8d:d0:25 [ 0.305333] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection [ 0.305367] e1000e 0000:00:19.0 eth0: MAC: 11, PHY: 12, PBA No: 1000FF-0FF [ 1.999343] e1000e 0000:00:19.0 enp0s25: renamed from eth0 [ 4.143835] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready [ 4.351132] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready # ethtool enp0s25 Settings for enp0s25: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: Unknown! Duplex: Unknown! (255) Port: Twisted Pair PHYAD: 2 Transceiver: internal Auto-negotiation: on MDI-X: Unknown (auto) Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: no This is with MTU 1500...
Comment 17 Till Schäfer 2017-12-20 14:49:01 UTC
The attached patch 261183 works for both machines (mentioned above).
Comment 18 Andreas Ziegler 2017-12-23 17:00:53 UTC
would be nice to get this into a release! me and some people i know are also affected by this bug
Comment 19 Dillon Dixon 2017-12-24 07:12:45 UTC
^ me too. I have some servers I've been purposely holding back kernel updates on until this is merged.
Comment 20 Andreas Ziegler 2018-01-10 11:51:34 UTC
the patch has landed in Linus' tree 8 days ago. we'll see how long it takes for the patch to get into an 4.14.x or 4.15.x release...
Comment 21 Ronald 2018-01-10 12:20:58 UTC
yeah, ping-ed the developer and LKML yesterday: answer: It (the patch) was part of the last network pull request and should be included in the next mainline release as 4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return value. It's needed in stable branches that include commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up"): linux-4.14.y linux-4.9.y linux-4.4.y linux-4.1.y linux-3.18.y the developer cc'ed it to firstname.lastname@example.org too ! patch is still not in 4.14.13 ! next stable round maybe...
Comment 22 Ronald 2018-01-15 17:48:22 UTC
Houston, the patch has landed ! :-) thanks !!! tested: - 4.14.14 - 4.15-rc8
Comment 23 Jur van der Burg 2018-01-18 07:59:22 UTC
The issue is still present in the 4.14.14 kernel. When i create a virtual machine on vmware ESXi 6.5 and give it an e1000e interface it will not come online after booting. I have to disconnect en reconnect it in the vm config before it will show online. If I use the e1000e driver from 4.14.2 then it will work without an issue.
Comment 24 Jeff Kirsher 2018-04-18 20:25:19 UTC
Thanks Stephen for bringing this to my attention. I have developers looking into the issue, since it stills seems to be present in 4.14.14 kernels.
Comment 25 Benjamin Poirier 2018-04-24 07:06:33 UTC
(In reply to Jur van der Burg from comment #23) > The issue is still present in the 4.14.14 kernel. When i create a virtual > machine on vmware ESXi 6.5 and give it an e1000e interface it will not come > online after booting. I have to disconnect en reconnect it in the vm config > before it will show online. > > If I use the e1000e driver from 4.14.2 then it will work without an issue. vmware is a different issue. It needs a backport of commit 745d0bd3af99 ("e1000e: Remove Other from EIAC", v4.16-rc7). It works on 4.14.2 because the backport of commit 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts", v4.15-rc1) in mainline was added to stable 4.14.3 as commit 10d0fd293103.