Bug 81661 - Network Performance Regression for large TCP transfers starting with v3.10
Summary: Network Performance Regression for large TCP transfers starting with v3.10
Status: RESOLVED INVALID
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-04 13:12 UTC by Alexander Steffen
Modified: 2014-08-21 18:49 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.10 and later
Subsystem:
Regression: No
Bisected commit-id:


Attachments
tshark captures of good/bad performance (815.09 KB, application/zip)
2014-08-04 13:12 UTC, Alexander Steffen
Details

Description Alexander Steffen 2014-08-04 13:12:57 UTC
Created attachment 145061 [details]
tshark captures of good/bad performance

Our network consists of two separate geographical locations, that are transparently connected with some kind of VPN. Using newer kernel versions (v3.10 or later) we noticed a strange performance regression when transferring larger amounts of data via TCP (e.g. HTTP downloads of files). It only affects transfers from one location to the other, but not the other way around. The kernel version of the receiving machine does not seem to have any influence (tested: v3.2, v3.5, v3.11), whereas on the sending machine everything starting with v3.10 results in bad performance.

The problem could be reproduced using iperf and bisecting showed 3e59cb0ddfd2c59991f38e89352ad8a3c71b2374 to be the first bad commit. Reverting this commit on top of v3.15.4 restores the performance of previous kernels.

Reproducing this problem in a different environment does not seem to be so easy. Therefore, I've attached packet captures created with tshark on both the sending and the receiving side for the last good commit and the first bad commit when using iperf to demonstrate the problem (see output below). Our network experts did not find anything obviously wrong with the network configuration. Can you see any problem there from the packet captures? Or was the algorithm removed in 3e59cb0ddfd2c59991f38e89352ad8a3c71b2374 not so bad after all?


This is the iperf output of 3cc7587b30032b7c4dd9610a55a77519e84da7db (the last good commit):
user@site1:~$ iperf -c site2
------------------------------------------------------------
Client connecting to site2, TCP port 5001 TCP window size: 20.1 KByte (default)
------------------------------------------------------------
[  3] local 172.31.22.15 port 32821 connected with 172.31.25.248 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.1 sec  15.5 MBytes  12.9 Mbits/sec

This is the iperf output of 3e59cb0ddfd2c59991f38e89352ad8a3c71b2374 (the first bad commit):
user@site1:~$ iperf -c site2
------------------------------------------------------------
Client connecting to site2, TCP port 5001 TCP window size: 20.1 KByte (default)
------------------------------------------------------------
[  3] local 172.31.22.15 port 39947 connected with 172.31.25.248 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-11.3 sec  1.88 MBytes  1.39 Mbits/sec

This is the corresponding iperf output on the server side:
user@site2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 172.31.25.248 port 5001 connected with 172.31.22.15 port 32821
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.7 sec  15.5 MBytes  12.1 Mbits/sec [  5] local 172.31.25.248 port 5001 connected with 172.31.22.15 port 39947
[  5]  0.0-19.0 sec  1.88 MBytes   826 Kbits/sec
Comment 1 Yuchung Cheng 2014-08-04 17:52:06 UTC
There seems to be some middle-box that re-writes the ackno but not the SACK sequences. So the sack blocks reported by the receiver were ignored by the sender to trigger fast recovery regularly. All losses were forced to be recovered by timeouts.

I would suggest fixing the sequence corruption first.

Here is your capture-bad-site*

snd$ tcpdump -Snr capture-bad-site1  |head -3
reading from file capture-bad-site1, link-type EN10MB (Ethernet)
01:19:50.805681 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [S], seq 3137619979, win 12600, options [mss 1260,sackOK,TS val 4294933997 ecr 0,nop,wscale 7], length 0
01:19:50.820197 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [S.], seq 3018990051, ack 3137619980, win 12480, options [mss 1260,sackOK,TS val 715849432 ecr 4294933997,nop,wscale 7], length 0
01:19:50.820341 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [.], ack 3018990052, win 99, options [nop,nop,TS val 4294934001 ecr 715849432], length 0

rcv$ tcpdump -Snr capture-bad-site2|head -3
reading from file capture-bad-site2, link-type EN10MB (Ethernet)
01:19:29.707701 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [S], seq 3751174188, win 12600, options [mss 1260,sackOK,TS val 4294933997 ecr 0,nop,wscale 7], length 0
01:19:29.707768 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [S.], seq 3018990051, ack 3751174189, win 12480, options [mss 1260,sackOK,TS val 715849432 ecr 4294933997,nop,wscale 7], length 0
01:19:29.721487 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [.], ack 3018990052, win 99, options [nop,nop,TS val 4294934001 ecr 715849432], length 0

snd$ tcpdump -Snr capture-bad-site1  |grep 'sack ' | head -1
reading from file capture-bad-site1, link-type EN10MB (Ethernet)
01:19:51.057309 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [.], ack 3138076772, win 1482, options [nop,nop,TS val 715849491 ecr 4294934042,nop,nop,sack 1 {3751632229:3751633477}], length 0


rcv$ tcpdump -Snr capture-bad-site2|grep 'sack ' | head -1
reading from file capture-bad-site2, link-type EN10MB (Ethernet)
01:19:29.942530 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [.], ack 3751630981, win 1482, options [nop,nop,TS val 715849491 ecr 4294934042,nop,nop,sack 1 {3751632229:3751633477}], length 0

Note You need to log in before you can comment on or make changes to this bug.