Bug 213943 - Poor network speed with 10G NIC with kernel 5.13 (Intel X710-T2L)
Summary: Poor network speed with 10G NIC with kernel 5.13 (Intel X710-T2L)
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-02 14:05 UTC by Aurélien Mora
Modified: 2021-08-27 16:47 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.13.x
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Aurélien Mora 2021-08-02 14:05:30 UTC
Hello,

On a server, I use to receive files (parallel cp with NFS4) with a 10g NIC (Intel X710-T2L). I can usually achieve a total speed close to 1 GB/s (generally 4 cp at the same time, targeted to 4 different HDD).

Switching to kernel 5.13 (Archlinux) with the exact same configuration, my speed is now limited at 250MB/s. Rolling back to a previous kernel fix the speed. 

The problem is maybe in the last Intel driver i40e, but I don't really know how to investigate more.


The problem still appear in 5.13.7 (last one at the moment).
Comment 1 Jesse Brandeburg 2021-08-16 18:50:17 UTC
Thanks for your bug report. I don't know of anything immediately pending that might cause this, so this is the first time I've heard of this issue.

There are a few troubleshooting items that you can provide that will help us be able to reproduce and hopefully fix the bug.

full dmesg from boot
output from ethtool -i ethX
ethtool -S ethX, before and after the test
output from netstat -S before and after the test

Any reproduction instructions? mount options for NFSv4 (output of mount command would do)
Comment 2 Aurélien Mora 2021-08-17 13:08:49 UTC
Thanks for your answer. 

> full dmesg from boot
> output from ethtool -i ethX
> ethtool -S ethX, before and after the test
> output from netstat -S before and after the test

This server is currently processing data for the next few days. I'll provide these info as soon as I can.


> Any reproduction instructions? mount options for NFSv4 (output of mount
> command would do)

Not really, no matter if I used `cp` with normal files or `dd`for the test, the throughput was low. 
About the NFS options (in fstab):
nfsvers=4.2,noatime,nodiratime,_netdev,noauto,x-systemd.automount,x-systemd.mount-timeout=10
With `nfsstat -m`: 
rw,noatime,nodiratime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.2,local_lock=none,addr=10.0.2.1

I also increased a bit the rx ring buffer  (-> 1024) to reduce packet loss, but it didn't helped with the throughput. 


> I don't know of anything immediately pending
> that might cause this, so this is the first time I've heard of this issue.

I don't know how specific this bug is, but just for the log, another user on the Phoronix forum have a similar problem:
https://www.phoronix.com/forums/forum/software/general-linux-open-source/1270526-vmware-hits-a-nasty-performance-regression-with-linux-5-13?p=1270550#post1270550
Comment 3 Aurélien Mora 2021-08-27 16:47:06 UTC
I switched back again on 4.13 branch (currently 4.13.12), but I cannot reproduce the problem anymore. 

I'm not sure what happened, but everything's good now. 


Thank you for your time.

Note You need to log in before you can comment on or make changes to this bug.