Bug 28542 - 3c59x.c: Regression since 2.6.36 (incl.), for e. g., TCP stalls on receive
Summary: 3c59x.c: Regression since 2.6.36 (incl.), for e. g., TCP stalls on receive
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Neil Horman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-07 18:55 UTC by Igor M Podlesny
Modified: 2012-08-15 22:08 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.36—2.6.37
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Igor M Podlesny 2011-02-07 18:55:27 UTC
Dump of kernel source bz2 download being stalled: http://poige.livejournal.com/475372.html#cutid1 (see the bunch of ACKs #5555523).

As a workaround I've just replaced drivers/net/3c59x.c with its former (2.6.35.11) version, the connectivity problems are gone now.
Comment 1 Andrew Morton 2011-02-07 23:09:52 UTC
On Mon, 7 Feb 2011 18:55:29 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=28542
> 
>            Summary: 3c59x.c: Regression since 2.6.36 (incl.), for e. g.,
>                     TCP stalls on receive
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.36—2.6.37
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: for.poige+bugzilla.kernel.org@gmail.com
>         Regression: Yes
> 
> 
> Dump of kernel source bz2 download being stalled:
> http://poige.livejournal.com/475372.html#cutid1 (see the bunch of ACKs
> #5555523).
> 
> As a workaround I've just replaced drivers/net/3c59x.c with its former
> (2.6.35.11) version, the connectivity problems are gone now.
>
Comment 2 Ben Hutchings 2011-02-08 01:20:47 UTC
On Mon, 2011-02-07 at 15:09 -0800, Andrew Morton wrote:
> On Mon, 7 Feb 2011 18:55:29 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
[...]
> > Dump of kernel source bz2 download being stalled:
> > http://poige.livejournal.com/475372.html#cutid1 (see the bunch of ACKs
> > #5555523).

That's not a stall, that's selective-ACKing (see the ranges listed on
the right of each ACK line).

> > As a workaround I've just replaced drivers/net/3c59x.c with its former
> > (2.6.35.11) version, the connectivity problems are gone now.

This could just be a coincidence.  But it could be that packet loss is
now more likely for some reason.  Does the rx_fifo_errors statistic
increase quickly with the later version of the driver?  (This seems to
be the only receive error type reported.)

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
Comment 3 Igor M Podlesny 2011-02-08 02:45:31 UTC
So you consider bunch of ACKs (see the timestamp for all of the {5556071:x}) isnt' a stale?

Actually I hadn't figured out what was the reason of the problems just in a moment. I had spent lot of time playing with MTU ( http://poige.livejournal.com/474589.html#cutid1 ), hosing out firewall, sysctl's configs, then I gave a try to 2.6.36, then to 2.6.35.11 and only this version worked flawlessly. I decided to take a look whether there were changes in network card driver and yep, they were. The old version of the driver being used in 2.6.37 solves the problem as well. So, what kind of coincidence are you talking about? There's none, just proved to be consistent pattern.
Comment 4 Igor M Podlesny 2011-02-08 02:59:08 UTC
Regarding rx_fifo_errors, the only stat ethtool reports (-S) is: {
NIC statistics:
     tx_deferred: 0
     tx_max_collisions: 0
     tx_multiple_collisions: 0
     tx_single_collisions: 0
     rx_bad_ssd: 0
}

Network card as reported with lspci: {
  05:02.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
}
Comment 5 Neil Horman 2011-02-08 18:44:07 UTC
was that tcpdump taken from the system with the c359x driver?  If so, it may not show some problems, since the frames are cloned to the AF_PACKET socket just before its sent to the driver on transmit, and not until the driver actually gets a packet on receive.  If the trace is on the 3c59x system, can you re-do the trace on the system you're trying to communicate with so that we can get a better idea of the on-the-wire traffic?  Also, if you could capture it in binary format and attach it here (-w option in tcpdump), that would be helpful.  Thanks!
Comment 6 Igor M Podlesny 2011-02-09 13:02:08 UTC
Hi, Neil! Yeah, dump taken on the comp. with 3c59x onboard, so your suggestion is pretty advisably as to me. I'll try to test it once again recording from the other si[dt]e.
Comment 7 Igor M Podlesny 2011-03-20 11:21:22 UTC
Hi!

I have tried 2.6.38 just right now, and have to say it has the same problem, so I'm compiling it with driver borrowed from 2.6.36 (as usual, yeah).

Also, I've tried capturing packets from remote side of the connection, it looks the very same way as mine side it sees. I have an interesting observation as it seems: when I tried using zero-filled data files, it worked ok, but downloading VirtualBox-4.0.4-70112-Linux_amd64.run always stops at the same stream position:

7,730,466   --.-K/s  eta 2m 55s
7,730,466   --.-K/s  eta 5m 50s
7,730,466   --.-K/s  eta 4m 41s
7,730,466   --.-K/s  eta 77m 8s

When I tried downloading the dump file itself, it stopped @ 
7,866,245   --.-K/s  eta 3s

so I tried gzip it and in despite of becoming only half of meg. smaller, the system could download it normally:

8,415,331    175K/s   in 47s.

The 2.6.36-driver seems to be succesfully compiled into 2.6.38, so rebooting now.
Comment 8 Neil Horman 2011-03-20 18:56:03 UTC
Could you attach the new traces here please?  I'd like to have a look at the failing scenario.  Thanks!
Comment 9 Igor M Podlesny 2011-03-20 19:12:06 UTC
Here it is: http://pastebin.com/Kq4LiP5B
Comment 10 Neil Horman 2011-03-21 13:32:21 UTC
Ok, Thank you.  Just FYI, when I said binary TCP dump, I had intended for you to use the -w flag and write out the dump in pcap format so that you could upload it to this bz and I could look at it with wireshark.  The text format just takes too long to parse through.

That said, now that we have a trace from the server side, I think we can reliably comment on the data here.  

Observations:
The problem seems to begin actually prior to the start of the trace.  You can see from the first frame that the serer is sending sequence values 7970141:7971601, while the client only acks sequence 7729241.  So the client is already significantly behind in the processing of received data.  In frame 11 we begin sending selective acks for sequences of received data, which is fine.  Unfortunately, since there is a gap between the last ack and the left side of the sack, the servers congestion window continues to shrink as it sends data.  I'm guessing that the servers cwnd closes at around frame 734, which is when the data transmission stall starts.  during this period. the client continues to send backloged frames with an increasing sack range, selectively ack-ing the data the server has sent.  This continues up until frame 826 when we sack the sequence number 8149721, which is the highest sequence the server had sent to us in frame 728.  At this point, the server realizes we've acked everything except that missing frame, and re-sends sequences 7730701:7732161 (which incidentally we also tried to retransmit in frame 732).  

Thoughts:
It seems as though we're specifically having a problem receiving that one frame containing that sequence of data.  Why for the life of me would would have a problem with it, I don't know.  But we never seem to be able to ACK it.  That suggests to me a problem higher up than the network driver.

It also appears to me that we might be having some problems with buffer latency here.  I'm hesitant to use the term bufferbloat, since no one can seem to agree on what exactly the term means, but given that we can observe what appears to be the servers cwnd closing on this connection while the client catches up on ack's suggests to me that we have a driver transmit queue backlog thats causing the tcp implementation to think the connection is far less congested than it really is.

Questions:
When this 'stall' happens, is it still possible for the system to send data over this NIC via other connections?  I.e. can you ping another host on your network reliably while the stall is occuring?


Next Steps:
I think we have three options here:

1) Make some adjustments to the tcp stack to see if the problem clears up, and use that data to better understand the problem itself - Since it seems as though, from this trace the server is sending us data faster than we can ACK it, lets see if we can do some things to slow it down, in the hopes of a more reliable consistent connection.  I would recommend disabling selective ACK's in your system (echo 0 > /proc/sys/net/ipv4/tcp_sack) and shortening the 3c59x cards txqueuelen value to 100 via ifconfig.  This should cause the server to see the lack of progress being made by the client in the ack value faster, causing the cwnd to shink, retransmits to happen faster, and throughput to be more consistent).  If that solves the problem, we can move forward by trying to understand what changed between 2.6.35 and .36 that started causing the client system to fall behind in its acking for received data.

2) We can do a manual bisect of 3c59x to see where the problem started.  There are about 11 commits between .35 and .36, so you should be able to narrow down which commit introduced the degraded performance.  These are the commits of interest:
[nhorman@hmsreliant net-next-2.6]$ git log --format=format:%H refs/tags/v2.6.35..refs/tags/v2.6.36 -- drivers/net/3c59x.c 
3fd6c88ef875a14740801ebfc6b6e4e064a1cdd4
84176b7b56704580e008e6cb820dd4ccf622a1fd
89b12faba4f3441c9457c5278851e8a93ffd008d
24cd804d1dc60a74c53da983094df3516500c276
aa25ab7d943a5e1e6bcc2a65ff6669144f5b5d60
690a1f2002a3091bd18a501f46c9530f10481463
344e0f623cec5eba273db06fe57db080988d6b26
f89f5d0e94e001e0a09bd433c0c0f089eaf0dea9
8cd47ea19bf8c6f9d3a41b3c312237d007138ae0
de847272149365363a6043a963a6f42fb91566e2
a095cfc40ec7ebe63e9532383c5b5c2a27b14075

3) Try a git bisect of the entire tree, on the chance that something higher up the stack caused this change (which reverting the driver simply papered over).  theres approximately 10000 commits between 2.6.35 and .36, but that should only be about 14 bisect steps, so its not too bad.

I would recommend doing actions 1 and 3.  option one is a nice quick test you can do now, and will confirm that we are headed in the right direction.  Option 3 will let us pinpoint the change that introduced this behavior, which in conjunction with option (1) should give us a much clearer picture of the problem.  Plus if the problem actually is in the driver, a whole tree bisect should land us on one of the commits listed in (2) anyway.
Comment 11 Igor M Podlesny 2011-03-21 15:00:56 UTC
> Just FYI, when I said binary TCP dump

Sorry, overlooked it.

Thanks for other suggestions, gotta think it over.

Note You need to log in before you can comment on or make changes to this bug.