Bug 213943

Summary: Poor network speed with 10G NIC with kernel 5.13 (Intel X710-T2L)
Product: Networking Reporter: Aurélien Mora (ealrann)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED UNREPRODUCIBLE    
Severity: high CC: jesse.brandeburg
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.13.x Subsystem:
Regression: Yes Bisected commit-id:

Description Aurélien Mora 2021-08-02 14:05:30 UTC
Hello,

On a server, I use to receive files (parallel cp with NFS4) with a 10g NIC (Intel X710-T2L). I can usually achieve a total speed close to 1 GB/s (generally 4 cp at the same time, targeted to 4 different HDD).

Switching to kernel 5.13 (Archlinux) with the exact same configuration, my speed is now limited at 250MB/s. Rolling back to a previous kernel fix the speed. 

The problem is maybe in the last Intel driver i40e, but I don't really know how to investigate more.


The problem still appear in 5.13.7 (last one at the moment).
Comment 1 Jesse Brandeburg 2021-08-16 18:50:17 UTC
Thanks for your bug report. I don't know of anything immediately pending that might cause this, so this is the first time I've heard of this issue.

There are a few troubleshooting items that you can provide that will help us be able to reproduce and hopefully fix the bug.

full dmesg from boot
output from ethtool -i ethX
ethtool -S ethX, before and after the test
output from netstat -S before and after the test

Any reproduction instructions? mount options for NFSv4 (output of mount command would do)
Comment 2 Aurélien Mora 2021-08-17 13:08:49 UTC
Thanks for your answer. 

> full dmesg from boot
> output from ethtool -i ethX
> ethtool -S ethX, before and after the test
> output from netstat -S before and after the test

This server is currently processing data for the next few days. I'll provide these info as soon as I can.


> Any reproduction instructions? mount options for NFSv4 (output of mount
> command would do)

Not really, no matter if I used `cp` with normal files or `dd`for the test, the throughput was low. 
About the NFS options (in fstab):
nfsvers=4.2,noatime,nodiratime,_netdev,noauto,x-systemd.automount,x-systemd.mount-timeout=10
With `nfsstat -m`: 
rw,noatime,nodiratime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.2,local_lock=none,addr=10.0.2.1

I also increased a bit the rx ring buffer  (-> 1024) to reduce packet loss, but it didn't helped with the throughput. 


> I don't know of anything immediately pending
> that might cause this, so this is the first time I've heard of this issue.

I don't know how specific this bug is, but just for the log, another user on the Phoronix forum have a similar problem:
https://www.phoronix.com/forums/forum/software/general-linux-open-source/1270526-vmware-hits-a-nasty-performance-regression-with-linux-5-13?p=1270550#post1270550
Comment 3 Aurélien Mora 2021-08-27 16:47:06 UTC
I switched back again on 4.13 branch (currently 4.13.12), but I cannot reproduce the problem anymore. 

I'm not sure what happened, but everything's good now. 


Thank you for your time.