Bug 198047 - Regression in e1000e with kernel 4.14.3
Summary: Regression in e1000e with kernel 4.14.3
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-30 19:32 UTC by Ronald
Modified: 2018-04-24 07:06 UTC (History)
19 users (show)

See Also:
Kernel Version: 4.14.3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
screenshot network manager (37.91 KB, image/png)
2017-11-30 19:32 UTC, Ronald
Details
patch from https://marc.info/?l=linux-kernel&m=151272209903675&w=2 (1.39 KB, patch)
2017-12-14 15:33 UTC, Thomas Mann
Details | Diff

Description Ronald 2017-11-30 19:32:04 UTC
Created attachment 260963 [details]
screenshot network manager

I got a regression with my network interface e1000e.

With kernel 4.14.3 and Fedora 27 the network interface e1000e doesn't come up if I set MTU to 1492 and boot with that settings.

with MTU set to auto sometimes the interface is active and sometimes not.

In network manager (Fedora 27) the button to activate/deactivate the network interface is greyed out. Computers backside network LED'S are off.

there are no errors in the logs nor selinux issues, ...

up to 4.13.2 all kernels from 4.13 series I got no errors.


how can I debug this ?


Maybe from interest:
with the late Fedora 27 Beta (I believe kernel 4.12.x) I remember issues switching interface MTU from auto to 1492.


lspci:
=====

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
     Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a72
     Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
     Interrupt: pin A routed to IRQ 123
     Region 0: Memory at df100000 (32-bit, non-prefetchable) [disabled] [size=128K]
     Capabilities: <access denied>
     Kernel driver in use: e1000e 


dmesg | grep -iE 'eth|enp0s31f6':
================================

[    1.255886] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 4c:cc:6a:bc:8c:a2
[    1.255889] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[    1.256032] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: FFFFFF-0FF
[    1.495646] e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0
[    4.091853] e1000e 0000:00:1f.6 enp0s31f6: changing MTU from 1500 to 1492
[    4.250990] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready
[    4.463180] IPv6: ADDRCONF(NETDEV_UP): enp0s31f6: link is not ready



nmcli connection show:
=====================

NAME       UUID                                  TYPE            DEVICE
Profile 1  0e0cc197-be48-43a7-83d8-423ee89a448e  802-3-ethernet  --
Comment 1 Ronald 2017-11-30 19:52:25 UTC
update: I realized that I *always* set MTU to 1492 via rc.local. commenting that out and rebooted.

Anyway even without rc.local MTU settings e1000e activation is flaky.

I need a complete poweroff and reboot to activate network if it doesn't get activated.
Comment 2 Ronald 2017-12-01 15:36:10 UTC
Update:
e10002 doesn't not get activated at all. back to 4.13.2.
Diff: new BIOS yesterday.
Comment 3 Tim Ruffing 2017-12-04 16:05:09 UTC
I have the same issue on Arch Linux (regression between 4.13.12 and 4.14.3).

I checked only a few times with the new kernel and the interface worked never with the 4.14 kernel, i.e., the link was never ready.

Interestingly, when I tried to reboot (soft), my machine was hanging in the UEFI... After Ctrl-Alt-Del it was hanging again. So I need a full poweroff. The whole thing happened twice and I've never seen this before, so it could actually be related to the driver issue. Maybe the device is in some weird state and UEFI tries to initialize it, or similar.
Comment 4 Tim Ruffing 2017-12-04 16:06:58 UTC
Forgot to include this:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-LM (rev 21)
	Subsystem: Lenovo Ethernet Connection (4) I219-LM
	Flags: bus master, fast devsel, latency 0, IRQ 128
	Memory at ed200000 (32-bit, non-prefetchable) [size=128K]
	Capabilities: [c8] Power Management version 3
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] PCI Advanced Features
	Kernel driver in use: e1000e
	Kernel modules: e1000e


This is on a Lenovo T570.
Comment 5 Ronald 2017-12-04 20:26:47 UTC
Kernel 4.14.4-rc.1 

dito: no fun !
Comment 6 Felix Walter 2017-12-05 08:33:24 UTC
Can confirm this regression, also on a I219-LM Ethernet controller. Looks like it works with v4.14.2, but not with v4.14.3. There are only few changes to e1000e between these revisions and they seem to affect the link status:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/diff/drivers/net/ethernet/intel/e1000e?id=v4.14.3&id2=v4.14.2
Comment 7 Felix Walter 2017-12-05 12:04:36 UTC
Update: Seems reverting commit 830466993daf09adbd179e4c74db07279a088f8c ("e1000e: Separate signaling for link check/link up", upstream: 19110cfbb34d4af0cdfe14cd243f3b09dc95b013) on top of v4.14.3 fixes it.
Comment 8 Thomas Mann 2017-12-06 03:25:02 UTC
reverting commit 830466993daf09adbd179e4c74db07279a088f8c fixes the problem for me too
Comment 9 Thomas Mann 2017-12-06 03:33:34 UTC
btw i have a different error: for me the device comes up and takes an ip address (after bootup) but as soon the network cable is detached or the connected switch goes down networkmanager isnt "seeing" the interface anymore until i reboot the machine. reverting 830466993daf09adbd179e4c74db07279a088f8c fixes the problem
Comment 10 J. Niggemann 2017-12-06 09:13:53 UTC
I confirm that reverting 830466993daf09adbd179e4c74db07279a088f8c on top of 4.14.4 fixes the issue.


00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection I219-LM (rev 21)

[    0.744127] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    0.744127] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    0.945030] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) c8:5b:76:02:aa:10
[    0.945033] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[    0.945260] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: 1000FF-0FF
[    8.178968] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Comment 11 Ronald 2017-12-08 20:51:32 UTC
applied patch from here:
https://marc.info/?l=linux-kernel&m=151272209903675&w=2

on top of 4.14.5-rc1

and e1000e is great again :)
Comment 12 Michael Marley 2017-12-10 13:25:33 UTC
I too have this issue.  In my case, on a NUC5i5MYHE, the network, once it goes down, does not seem to come back up until I connect using AMT.  At that point, the link comes up and works normally.
Comment 13 Thomas Mann 2017-12-11 18:03:40 UTC
applied the patch from here 
https://marc.info/?l=linux-kernel&m=151272209903675&w=2

too ontop of 4.14.5 fixes the problem for me too
Comment 14 Thomas Mann 2017-12-14 15:32:29 UTC
please get the patch into stable. still not fixed in 4.14.6
Comment 16 Till Schäfer 2017-12-20 14:21:19 UTC
I can confirm the issue for 4.14.7 on two machines (both gentoo-sources): 


- Lenovo Thinkpad T440p: 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
- Desktop: 00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 04)

Link does not come up: 

from dmesg on the thinkpad:
[    0.137961] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    0.137964] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    0.138078] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    0.225862] e1000e 0000:00:19.0 0000:00:19.0 (uninitialized): registered PHC clock
[    0.305328] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 68:f7:28:8d:d0:25
[    0.305333] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    0.305367] e1000e 0000:00:19.0 eth0: MAC: 11, PHY: 12, PBA No: 1000FF-0FF
[    1.999343] e1000e 0000:00:19.0 enp0s25: renamed from eth0 
[    4.143835] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready
[    4.351132] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready

# ethtool enp0s25
Settings for enp0s25:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 2
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown (auto)
        Supports Wake-on: pumbg
        Wake-on: g
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: no


This is with MTU 1500...
Comment 17 Till Schäfer 2017-12-20 14:49:01 UTC
The attached patch 261183 works for both machines (mentioned above).
Comment 18 Andreas Ziegler 2017-12-23 17:00:53 UTC
would be nice to get this into a release! me and some people i know are also affected by this bug
Comment 19 Dillon Dixon 2017-12-24 07:12:45 UTC
^ me too. I have some servers I've been purposely holding back kernel updates on until this is merged.
Comment 20 Andreas Ziegler 2018-01-10 11:51:34 UTC
the patch has landed in Linus' tree 8 days ago.
we'll see how long it takes for the patch to get into an 4.14.x or 4.15.x release...
Comment 21 Ronald 2018-01-10 12:20:58 UTC
yeah, ping-ed the developer and LKML yesterday:

answer:

It (the patch) was part of the last network pull request and should be included in the next mainline release as 4110e02eb45e e1000e: 
Fix e1000_check_for_copper_link_ich8lan return value.

It's needed in stable branches that include commit 19110cfbb34d
("e1000e: Separate signaling for link check/link up"):
	linux-4.14.y
	linux-4.9.y
	linux-4.4.y
	linux-4.1.y
	linux-3.18.y


the developer cc'ed it to stable@vger.kernel.org too !

patch is still not in 4.14.13 !

next stable round maybe...
Comment 22 Ronald 2018-01-15 17:48:22 UTC
Houston, the patch has landed !
:-)

thanks !!!


tested:
- 4.14.14
- 4.15-rc8
Comment 23 Jur van der Burg 2018-01-18 07:59:22 UTC
The issue is still present in the 4.14.14 kernel. When i create a virtual machine on vmware ESXi 6.5 and give it an e1000e interface it will not come online after booting. I have to disconnect en reconnect it in the vm config before it will show online.

If I use the e1000e driver from 4.14.2 then it will work without an issue.
Comment 24 Jeff Kirsher 2018-04-18 20:25:19 UTC
Thanks Stephen for bringing this to my attention.  I have developers looking into the issue, since it stills seems to be present in 4.14.14 kernels.
Comment 25 Benjamin Poirier 2018-04-24 07:06:33 UTC
(In reply to Jur van der Burg from comment #23)
> The issue is still present in the 4.14.14 kernel. When i create a virtual
> machine on vmware ESXi 6.5 and give it an e1000e interface it will not come
> online after booting. I have to disconnect en reconnect it in the vm config
> before it will show online.
> 
> If I use the e1000e driver from 4.14.2 then it will work without an issue.

vmware is a different issue. It needs a backport of commit 745d0bd3af99
("e1000e: Remove Other from EIAC", v4.16-rc7).

It works on 4.14.2 because the backport of commit 4aea7a5c5e94 ("e1000e: Avoid
receiver overrun interrupt bursts", v4.15-rc1) in mainline was added to stable
4.14.3 as commit 10d0fd293103.

Note You need to log in before you can comment on or make changes to this bug.