Bug 81661
Summary: | Network Performance Regression for large TCP transfers starting with v3.10 | ||
---|---|---|---|
Product: | Networking | Reporter: | Alexander Steffen (Alexander.Steffen) |
Component: | Other | Assignee: | Stephen Hemminger (stephen) |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | alan, ycheng |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.10 and later | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | tshark captures of good/bad performance |
There seems to be some middle-box that re-writes the ackno but not the SACK sequences. So the sack blocks reported by the receiver were ignored by the sender to trigger fast recovery regularly. All losses were forced to be recovered by timeouts. I would suggest fixing the sequence corruption first. Here is your capture-bad-site* snd$ tcpdump -Snr capture-bad-site1 |head -3 reading from file capture-bad-site1, link-type EN10MB (Ethernet) 01:19:50.805681 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [S], seq 3137619979, win 12600, options [mss 1260,sackOK,TS val 4294933997 ecr 0,nop,wscale 7], length 0 01:19:50.820197 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [S.], seq 3018990051, ack 3137619980, win 12480, options [mss 1260,sackOK,TS val 715849432 ecr 4294933997,nop,wscale 7], length 0 01:19:50.820341 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [.], ack 3018990052, win 99, options [nop,nop,TS val 4294934001 ecr 715849432], length 0 rcv$ tcpdump -Snr capture-bad-site2|head -3 reading from file capture-bad-site2, link-type EN10MB (Ethernet) 01:19:29.707701 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [S], seq 3751174188, win 12600, options [mss 1260,sackOK,TS val 4294933997 ecr 0,nop,wscale 7], length 0 01:19:29.707768 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [S.], seq 3018990051, ack 3751174189, win 12480, options [mss 1260,sackOK,TS val 715849432 ecr 4294933997,nop,wscale 7], length 0 01:19:29.721487 IP 172.31.22.15.39947 > 172.31.25.248.5001: Flags [.], ack 3018990052, win 99, options [nop,nop,TS val 4294934001 ecr 715849432], length 0 snd$ tcpdump -Snr capture-bad-site1 |grep 'sack ' | head -1 reading from file capture-bad-site1, link-type EN10MB (Ethernet) 01:19:51.057309 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [.], ack 3138076772, win 1482, options [nop,nop,TS val 715849491 ecr 4294934042,nop,nop,sack 1 {3751632229:3751633477}], length 0 rcv$ tcpdump -Snr capture-bad-site2|grep 'sack ' | head -1 reading from file capture-bad-site2, link-type EN10MB (Ethernet) 01:19:29.942530 IP 172.31.25.248.5001 > 172.31.22.15.39947: Flags [.], ack 3751630981, win 1482, options [nop,nop,TS val 715849491 ecr 4294934042,nop,nop,sack 1 {3751632229:3751633477}], length 0 |
Created attachment 145061 [details] tshark captures of good/bad performance Our network consists of two separate geographical locations, that are transparently connected with some kind of VPN. Using newer kernel versions (v3.10 or later) we noticed a strange performance regression when transferring larger amounts of data via TCP (e.g. HTTP downloads of files). It only affects transfers from one location to the other, but not the other way around. The kernel version of the receiving machine does not seem to have any influence (tested: v3.2, v3.5, v3.11), whereas on the sending machine everything starting with v3.10 results in bad performance. The problem could be reproduced using iperf and bisecting showed 3e59cb0ddfd2c59991f38e89352ad8a3c71b2374 to be the first bad commit. Reverting this commit on top of v3.15.4 restores the performance of previous kernels. Reproducing this problem in a different environment does not seem to be so easy. Therefore, I've attached packet captures created with tshark on both the sending and the receiving side for the last good commit and the first bad commit when using iperf to demonstrate the problem (see output below). Our network experts did not find anything obviously wrong with the network configuration. Can you see any problem there from the packet captures? Or was the algorithm removed in 3e59cb0ddfd2c59991f38e89352ad8a3c71b2374 not so bad after all? This is the iperf output of 3cc7587b30032b7c4dd9610a55a77519e84da7db (the last good commit): user@site1:~$ iperf -c site2 ------------------------------------------------------------ Client connecting to site2, TCP port 5001 TCP window size: 20.1 KByte (default) ------------------------------------------------------------ [ 3] local 172.31.22.15 port 32821 connected with 172.31.25.248 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 15.5 MBytes 12.9 Mbits/sec This is the iperf output of 3e59cb0ddfd2c59991f38e89352ad8a3c71b2374 (the first bad commit): user@site1:~$ iperf -c site2 ------------------------------------------------------------ Client connecting to site2, TCP port 5001 TCP window size: 20.1 KByte (default) ------------------------------------------------------------ [ 3] local 172.31.22.15 port 39947 connected with 172.31.25.248 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-11.3 sec 1.88 MBytes 1.39 Mbits/sec This is the corresponding iperf output on the server side: user@site2:~$ iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 172.31.25.248 port 5001 connected with 172.31.22.15 port 32821 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.7 sec 15.5 MBytes 12.1 Mbits/sec [ 5] local 172.31.25.248 port 5001 connected with 172.31.22.15 port 39947 [ 5] 0.0-19.0 sec 1.88 MBytes 826 Kbits/sec