Bug 213377 - Poor receive speeds using e1000e driver on Intel I219-V
Summary: Poor receive speeds using e1000e driver on Intel I219-V
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
Depends on:
Reported: 2021-06-09 06:47 UTC by Roland Sommer
Modified: 2022-01-12 18:39 UTC (History)
11 users (show)

See Also:
Kernel Version: 5.8, 5.10, 5.11
Tree: Mainline
Regression: No

dmesg from one of the test installs using 5.10er kernel on ubuntu 20.04 (96.97 KB, text/plain)
2021-06-09 06:47 UTC, Roland Sommer
dmesg from test boot without additional kernel parameters, 5.11.10 from ubuntu mainline ppa (95.07 KB, text/plain)
2021-06-10 09:07 UTC, Roland Sommer
dmesg from test boot without additional kernel parameters, 5.10.42 built from git linux-stable (103.23 KB, text/plain)
2021-06-10 13:34 UTC, Roland Sommer
speed test with/without usb keys (49.76 KB, text/plain)
2021-06-30 07:56 UTC, Robin Jarry
disable ASPM (PCIe link power management) for CNL (CannonLake/TigerLake) chips (521 bytes, text/plain)
2021-08-19 08:42 UTC, akinzler

Description Roland Sommer 2021-06-09 06:47:38 UTC
Created attachment 297253 [details]
dmesg from one of the test installs using 5.10er kernel on ubuntu 20.04


I encountered very poor receive speeds using the e1000e driver for wired ethernet connection via I219-V internal network card.

I did file an upstream distribution bug at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1927925 but was requested to file a kernel bug against this.

So far I tested kernel versions 5.8, 5.10 and 5.11, mainly on ubuntu, 5.11 on fedora, so this does not seem to be distribution specific.

When booting without any special kernel parameters, receive speeds are very poor up to a complete hang. ethtool reports increasing errors at rx_errors and rx_crc_errors.

When booting the same kernel(s) with "intel_idle.max_cstate=1", everything works as expected.

I was not able to spot any conspicuous log messages (dmesg, syslog etc.) until now.

I'll attach a sample dmesg output from one of the test runs.

Quotation from the upstream bug report:
OK, so we know two things now:
1) The issue happens when the CPU reaches deeper power saving state, likely PC10.
2) Temporarily increase CPU usage via cpu_latency_qos_add_request() to disable C-State doesn't help.
Comment 1 Vitaly Lifshits 2021-06-09 16:04:07 UTC
Hello Roland,

I suspect that the issue is related to a patch in the pmc_core driver.

Can you try to reproduce this issue with vanilla kernel 5.11.10 or below (for example the long-term kernel 5.10.42)?
Comment 2 Roland Sommer 2021-06-10 09:07:24 UTC
Created attachment 297285 [details]
dmesg from test boot without additional kernel parameters, 5.11.10 from ubuntu mainline ppa

Booted and tested 5.11.10 from the ubuntu mainline ppa without additional kernel parameters - same behaviour as before.

I'm currently building 5.10.42 from git/linux-stable for further testing.
Comment 3 Roland Sommer 2021-06-10 13:34:25 UTC
Created attachment 297293 [details]
dmesg from test boot without additional kernel parameters, 5.10.42 built from git linux-stable

I did build the kernel from git/linux-stable, v5.10.42. The behaviour is the same. Without the additional kernel parameter, receive speeds are slow and ethtool reports errors on rx_errors and rx_crc_errors.
Comment 4 Roland Sommer 2021-06-11 07:59:48 UTC
Ok, things are getting really weird.

I did re-run the speed-tests with various kernels I already tested and for some reason even when booting with intel_idle.max_cstate=1, my networking was broken with ethtool reporting errors. All I did was to install everything I needed to build the vanilla kernel and some sensor stuff for fan-control.

I have removed all the newly installed packages, removed every module-setting added from/for lm-sensors or thinkfan and re-tested the kernels - the parameter had no effect anymore.

What I did next:
- I booted a live stick (Ubuntu, using ubuntu kernel 5.8.0-43 and speeds seem to be fine without any kernel parameter
- After I did this clean boot, I booted the updated ubuntu kernel 5.8.0-55 with the max_cstate parameter and speeds where fine again
- I rebooted the same kernel without the parameter and the errors returned
- I installed the vanilla 5.10.42 kernel, rebooted without the kernel parameter and the errors persisted
- I rebooted with the added max_cstate-parameter, and speeds are fine, no errors in ethtool
- I did reinstall lm-sensors and thinkfan, ran the detection with all defaults, adding coretemp to /etc/modules etc.
- Everything still seems back to the way it was, with the kernel parameter fixing the receive-problem.

I'm really curious what could have caused this weird behaviour. If there is any way to produce more useable debug information, I would like to help.
Comment 5 Roland Sommer 2021-06-11 13:05:20 UTC
To rule out any mistakes while building on my side, I started over with a fresh build of v5.10.42 (commit 65859eca4dff1af0db5e36d1cfbac15b834c6a65). I booted with and without the additional kernel parameter, this time without any network errors.

I retried using the ubuntu kernel 5.10.0-1029-oem which is based on 5.10.35. This kernel leads to rx-errors when not booted with the additional parameter.

I will now build 5.10.35 from git to see if there is any difference.
Comment 6 Roland Sommer 2021-06-11 15:07:58 UTC
vanilla kernel 5.10.35 with "intel_idle.max_cstate=1" is working.
vanilla kernel 5.10.35 without yields rx errors on the interface.

I'm suspecting that the working test with 5.10.42 was by chance, maybe the errors pop up later sometimes, because I did a reboot-cycle with 5.10.42 again and now the behaviour was as with 5.10.35: without parameters I encounter rx errors, with max_cstate=1 everything seems fine.
Comment 7 Roland Sommer 2021-06-11 20:34:24 UTC
I did one more vanilla build using v5.12.9, the behaviour stays the same, no errors with "intel_idle.max_cstate=1", rx errors without.
Comment 8 David Kraeutmann 2021-06-12 14:58:36 UTC
This significantly reduced the impact of the bug on an I219-LM Tiger Lake machine running 5.11.0-18 (Ubuntu), but iperf3 still shows weird results:

In send direction, everything is fine (both TCP and UDP).

In receive direction, it starts losing packets left and right after a few seconds -- 0% packet loss for the first 7-10 seconds of an iperf3 invoke, then spikes to 50% or so.

ethtool -S shows minor CRC errors -- 132 RX errors, 66 CRC errors with millions of packets sent.

Without intel_idle.max_cstate=1, the receive results are always lossy (send is identical) -- about 1-10% loss throughout. CRC errors remain on the same level.

Predictably, this absolutely dumpsters TCP speed.

I'm not sure if this is related, but since intel_idle.max_cstate=1 changed how this manifests, I think this might be.
Comment 9 Roland Sommer 2021-06-18 14:05:48 UTC
I just did firmware updates (1.36) and re-tested vanilla kernel 5.12.9 - now I'm back to "does not work even with intel_idle.max_cstate=1". I did boot several times, always the same. I did not change anything else since the last test runs.
Comment 10 Roland Sommer 2021-06-22 10:27:22 UTC
After updating to UEFI BIOS 1.36 (N34ET36W) and Embedded Controller Version 1.33 (N34HT33W), none of the already tested kernels work now, with or without intel_idle.max_cstate=1.
Comment 11 Roland Sommer 2021-06-22 13:25:37 UTC
This is getting ridiculous: I removed the custom kernel builds to free up space for a new distribution kernel. After booting the distribution kernel (which is 5.10.0-1032-oem), I was able to get a good connection using the kernel parameter and then retested with previously failing kernels (just a few hours ago) and now I get good connection speeds when using intel_idle.max_cstate=1 again.

If the hardware had not previously been tested on windows (without any errors), I would bet this must be some hardware problem because this seems so random. A specific behaviour is reproduceable for some time, and suddenly the behaviour changes after minor or unrelated changes (like removing some custom kernel builds like this time).
Comment 12 Kiran K Telukunta 2021-06-24 23:15:54 UTC
setting the parameter in /etc/default/grub as

and ran grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg 

does not improve the condition of Ethernet for me.

Or am I doing something wrong?
Comment 13 Roland Sommer 2021-06-28 07:31:24 UTC
I discovered the next oddity: For the time being, I can reproduce the following behaviour in the currently running system:

1. Transfer of large file via wget from internal network is horribly slow or getting stack completely
2. I put in an USB-Stick and reissue the same wget, transfer is fast
3. I remove the USB stick, wget is slow again

I did this several times in a row, the USB stick is not involved in any way, just inserted. The issued wget looks like

wget -O /dev/null http://storage/largefile

Transfer speed was around 93MB/s while the USB stick was inserted, 5GB transfered in around 50sec.
Comment 14 Roland Sommer 2021-06-30 05:37:22 UTC
Just for reference, a similar bug has been posted to the netdev mailing list by another user. https://www.spinics.net/lists/netdev/msg743850.html
Comment 15 Robin Jarry 2021-06-30 07:56:31 UTC
Created attachment 297675 [details]
speed test with/without usb keys

Hi Roland,

I have the exact same problem with a I219-LM network card. I tested with a 5.10 (debian testing) and a 5.12 (tag v5.12.13) that I compiled myself.

Out of curiosity, I tried with different USB devices (mouse, yubikey) and the TCP transfer rate seems only to improve with mass storage usb devices.
Comment 16 Roland Sommer 2021-07-07 14:24:54 UTC
The workaround from https://bugzilla.kernel.org/show_bug.cgi?id=213651 also seems to fix the problem. Just tested and got good rx speeds afterwards.
Comment 17 akinzler 2021-08-19 08:42:59 UTC
Created attachment 298359 [details]
disable ASPM (PCIe link power management) for CNL (CannonLake/TigerLake) chips

@Roland: could you try my patch. On my Tiger Lake system it fixes the problem, but the problem was broken ASPM
Comment 18 Roland Sommer 2021-08-19 11:41:50 UTC
@Andreas against which kernel version should I try this patch?
Comment 19 Roland Sommer 2021-08-19 15:12:47 UTC
I tried against linux-stabe 5.13.12 without any success, receive speeds are still slow (if I did no mistake while building).
Comment 20 akinzler 2021-08-20 13:28:21 UTC
I developed the patch for 5.12.19, but it should work for 5.13.12 too.
Comment 21 Kiran K Telukunta 2021-08-20 13:56:28 UTC
I have 5.13.9-200.fc34.x86_64 and still get the RX errors

enp0s31f6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

inet  netmask  broadcast

        inet6 fd00::6ede:5e4d:3a7d:342b  prefixlen 64  scopeid 0x0<global>

        inet6 fe80::99b2:d9d1:233b:e70f  prefixlen 64  scopeid 0x20<link>

        ether xxxxxxxxxxxxx  txqueuelen 1000  (Ethernet)

        RX packets 4064  bytes 4188271 (3.9 MiB)

        RX errors 190  dropped 97  overruns 0  frame 95

        TX packets 16261  bytes 1315382 (1.2 MiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

        device interrupt 16  memory 0xbed80000-beda0000
Comment 22 Roland Sommer 2021-09-09 10:26:53 UTC
@Andreas after being back from holidays, I re-ran the test with kernel 5.12.19 and your patch, but rx-speed ist still broken.
Comment 23 Kiran K Telukunta 2021-10-01 13:11:14 UTC
Tried to see if it was working. Looks like it is not resolved in 5.13.19-200.fc34.x86_64 kernel. 

enp0s31f6: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500

        ether xxxxxxxxxxxxx  txqueuelen 1000  (Ethernet)

        RX packets 31038  bytes 19012398 (18.1 MiB)

        RX errors 926  dropped 138  overruns 0  frame 464

        TX packets 51132  bytes 59055699 (56.3 MiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

        device interrupt 16  memory 0xbed80000-beda0000
Comment 24 Roland Sommer 2021-10-01 13:56:57 UTC
The patches from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1927925/comments/93 did fix the problem using 5.15-rc3 as base kernel.
Comment 25 Steve Smith 2021-11-21 08:40:10 UTC
I'm still seeing this issue with a build of kernel 5.15.3. Adding the MEI power control override fixes it still. This is on a NUC7i3BNH with e1000e driver, I219-V (rev 21).
Comment 26 Kiran K Telukunta 2021-12-02 22:40:17 UTC
Is this happening in Kernel 5.13? Because initially I had issue in the Fedora when  I was using the kernel 5.13 and then now I see it is happening in Ubuntu and it has 5.13 Kernel now.
Comment 27 Dustin Bensing 2021-12-03 12:27:55 UTC
Found this issue here after recently getting constant disconnections on a Dell Latitude 5421 (11 Gen i7-11850h) with I219-LM on Kernel 5.15.5 but was probably introduced earlier (using wifi most of the time) and was wondering if this is connected to the issues discussed here (and maybe introduced by fixes applied).
Comment 28 ChrisO 2021-12-16 13:20:15 UTC
I can confirm e1000e connects at 1gb but speeds are limited to 10mbps MAX throughput on Fedora 5.15.7-200.fc35.x86_64 that was just pushed out last night and all prior Fedora 35 kernels are affected ( 5.14 ) 

00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
        DeviceName:  Onboard LAN
        Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard

I have tried all the fixes above with no resolution.
Comment 29 Mike 2022-01-05 10:33:13 UTC
I also have a problem with the network card in Ubuntu 21.10. I found out that no RX CRC errors occur when the C-State(Package) option is set to maximum C6 in UEFI. With the option "auto" or above C6 the network adapter gets RX CRC errors(ip -s -s link show dev eno1).

00:1f.6 Ethernet controller : Intel Corporation Ethernet Connection (10) I219-V (rev 11)
	DeviceName: Onboard - Ethernet
	Subsystem: Micro-Star International Co., Ltd. [MSI] Ethernet Connection (10) I219-V
	Kernel driver in use: e1000e
	Kernel modules: e1000e

cat /sys/devices/system/cpu/cpuidle/current_driver
cat /sys/module/intel_idle/parameters/max_cstate
Comment 30 Mike 2022-01-12 18:39:56 UTC
I think I have found the error in my case.
I have enabled the C-State(Packages) max. C10 again in the bios. With Powertop 2.14-1(Debian Sid) I found out that the communication controller is causing the errors.

In Ubuntu 21.10 - 00:16.0 Communication controller: Intel Corporation Tiger Lake-H Management Engine Interface (rev 11)

Debian Sid Kernel 5.15.0-2-amd64 #1 SMP Debian 5.15.5-2 (2021-12-18) x86_64 GNU/Linux
lspci -vv
    00:16.0 Communication controller: Intel Corporation Device 43e0 (rev 11)
	DeviceName: Onboard - Other
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7d22
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 132
	Region 0: Memory at a123d000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: mei_me
	Kernel modules: mei_me
Now I have deactivated the energy saving function for the communication controller with powertop and I have created a service with which it is automatically loaded after system start.

echo 'on' > '/sys/bus/pci/devices/0000:00:16.0/power/control';

No crc error has appeared since then. In the kernel the option was preset to auto(echo 'auto' > '/sys/bus/pci/devices/0000:00:16.0/power/control';).

Note You need to log in before you can comment on or make changes to this bug.