Bug 213651

Summary: Intel MEI driver affect the network speed alot
Product: Drivers Reporter: AceLan Kao (acelan)
Component: OtherAssignee: drivers_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexander.usyskin, martin.hamant, michael.lin, msalle, tarkasteve
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.10 & 5.13 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg.log
lspci_vvv.log
dmesg log after switched back the AMT setting

Description AceLan Kao 2021-07-06 08:22:36 UTC
Created attachment 297765 [details]
dmesg.log

Found this issue on Dell Latitude 5420
BIOS: 1.6.0
Kernel: 5.10 & 5.13

The rx speed is lower than 1Mb/s if enable runpm on intel mei device.

u@BM4-DVT2-C99:~$ sudo lspci -vvnns 00:16.0
00:16.0 Communication controller [0780]: Intel Corporation Device [8086:a0e0] (rev 20)
        Subsystem: Dell Device [1028:0a20]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 166
        Region 0: Memory at 6053199000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee006d8  Data: 0000
        Capabilities: [a4] Vendor Specific Information: Len=14 <?>
        Kernel driver in use: mei_me
        Kernel modules: mei_me


u@BM4-DVT2-C99:~$ echo auto | sudo tee /sys/bus/pci/devices/0000\:00\:16.0/power/control 
auto
u@BM4-DVT2-C99:~$ speedtest-cli --source 10.101.46.59
Retrieving speedtest.net configuration...
Testing from Chunghwa Telecom (61.220.137.37)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by WiFly (New Taipei) [8.57 km]: 3.882 ms
Testing download speed................................................................................
Download: 0.32 Mbit/s
Testing upload speed......................................................................................................
Upload: 131.80 Mbit/s
u@BM4-DVT2-C99:~$ echo on | sudo tee /sys/bus/pci/devices/0000\:00\:16.0/power/control 
on
u@BM4-DVT2-C99:~$ speedtest-cli --source 10.101.46.59
Retrieving speedtest.net configuration...
Testing from Chunghwa Telecom (61.220.137.37)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Taiwan Mobile (Da'an District) [9.56 km]: 2.107 ms
Testing download speed................................................................................
Download: 263.88 Mbit/s
Testing upload speed......................................................................................................
Upload: 302.24 Mbit/s
Comment 1 Alexander Usyskin 2021-07-11 08:52:40 UTC
What network speed you have with mei_me and mei modules unloaded?
Dell published BIOS 1.8 now, does it helps?
Comment 2 Alexander Usyskin 2021-07-11 08:58:56 UTC
Can you, please, post here output of 'cat /sys/class/mei/mei0/fw_ver'?
Comment 3 Alexander Usyskin 2021-07-11 10:19:11 UTC
And full 'lspci -vvv' will be good to understand the network HW types
Comment 4 AceLan Kao 2021-07-12 07:26:08 UTC
unload mei*, the network speed is around 300Mb/s for both rx/tx.

u@BM4-DVT2-C99:~$ cat /sys/class/mei/mei0/fw_ver
0:15.0.10.1602
0:15.0.10.1602
0:15.0.10.1432

BIOS setttings -> System Management -> Intel AMT Capability
switch it from "Restrict MEBx Access" to "Disabled" fixes this issue.

And then I found I can't reproduce this issue even if I switched back the AMT setting.
Comment 5 AceLan Kao 2021-07-12 07:26:38 UTC
Created attachment 297793 [details]
lspci_vvv.log
Comment 6 AceLan Kao 2021-07-12 07:33:40 UTC
Sorry, switched back the AMT setting doesn't fix the issue, the situation got worse, the network is hard to get connected.

[ 252.814437] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[ 255.299362] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 262.906680] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 266.439342] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 288.906644] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[ 291.059876] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 297.893882] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 301.426005] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 307.526655] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[ 309.577103] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 312.497550] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 320.314637] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[ 322.423937] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 325.920565] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 327.850659] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[ 329.922971] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 337.690129] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 340.734454] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 395.406614] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[ 397.552266] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Link Status Change
[ 401.048543] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Comment 7 AceLan Kao 2021-07-12 07:34:12 UTC
Created attachment 297795 [details]
dmesg log after switched back the AMT setting
Comment 8 AceLan Kao 2021-07-12 07:54:34 UTC
The issue mentioned on comment #6 result from my loose cable, please ignore it.
So, toggle the AMT setting in the BIOS fixes this issue.
Comment 9 AceLan Kao 2021-07-13 06:32:30 UTC
Toggle AMT setting in the BIOS doesn't fix the same issue on other platform. But disable mei_me runpm works for both systems.
Comment 10 AceLan Kao 2021-07-13 11:59:22 UTC
https://lkml.org/lkml/2021/7/12/2926
This patchset fixes the issue, could you help to review it, thanks.
Comment 11 Michael Lin 2021-08-10 06:29:03 UTC
Share stable reproduction steps as follows.

==========

Per test on my side, I can reliably reproduce with two Linux machines on LAN with iperf3.

On server:

1. set packet delay to 5ms

   "tc qdisc add dev eno1 root netem delay 5ms" or
   "tc qdisc change dev eno1 root netem delay 5ms"

2. run iperf3 server by command "iperf3 -s"

On Client to simulate download:

   "iperf3 -c SERVER_IP -R"

Then the network speed will be very low like 3Mbits/sec
if they are connected by 1G-full-dup switch.

-------

Compare to the same network delay, with an usb-eth dongle, the same
test setup (5ms delay), the speed is in value like 346 Mbits/sec.
(should be USB limitation)
Comment 12 Martin 2022-01-04 16:17:08 UTC
Brand new 5420 here on ubuntu 20.04.3 and kernel 5.11.0-43-generic.

First I hit https://bugzilla.kernel.org/show_bug.cgi?id=213667 - the NIC wasn't working, installed Intel's e1000e with DKMS (3.8.7) to "solve" the issue, then I hit this one. What's the best workaround please ?
Comment 13 AceLan Kao 2022-02-25 02:30:13 UTC
This issue should be fixed in v5.15 by below commit

commit 639e298f432fb058a9496ea16863f53b1ce935fe
Author: Sasha Neftin <sasha.neftin@intel.com>
Date:   Wed Sep 22 09:55:42 2021 +0300

    e1000e: Fix packet loss on Tiger Lake and later
    
    Update the HW MAC initialization flow. Do not gate DMA clock from
    the modPHY block. Keeping this clock will prevent dropped packets
    sent in burst mode on the Kumeran interface.
    
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213651
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213377
    Fixes: fb776f5d57ee ("e1000e: Add support for Tiger Lake")
    Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
    Tested-by: Mark Pearson <markpearson@lenovo.com>
    Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>