Bug 213827 - Stall In TCP/IP
Summary: Stall In TCP/IP
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-22 22:03 UTC by Tom Martens
Modified: 2025-03-04 04:45 UTC (History)
1 user (show)

See Also:
Kernel Version: Observed 5.9.2. Confirmed on 5.12 and 5.13
Subsystem:
Regression: No
Bisected commit-id:


Attachments
A basic repro of the observed issue. (2.59 KB, application/x-gzip)
2021-07-22 22:03 UTC, Tom Martens
Details

Description Tom Martens 2021-07-22 22:03:50 UTC
Created attachment 298007 [details]
A basic repro of the observed issue.

We see an unexpected pause of 3-60 seconds where the test no longer receives data, recv() is blocking, netstat shows data is sitting in the 7777 server's send queue and 0 bytes are in the test's receive queue.

Setting net.core.rmem_max seems to be a key factor. For our testing we used net.core.rmem_max=16777216

See attached README and repro.
Comment 1 Luiz Carvalho 2025-03-04 04:45:20 UTC
Hi, after *a lot* of research and debugging, I've come to find that this might be, in fact, not a bug. The server (mock_server) continuously responds to messages from the client, which are not processed at the time of arrival and are, instead, buffered until the client finishes sending the 100k messages.


When the client's RX buffer fills up to rmem_max, it advertises Zero Window, the sender halts any sending of new data and begins Zero Window Probing (handled by the RTO timer). Now, every RTO timeout sends the first SKB on rtx_queue and if it is ACKd, great, RTO is set again to the same timeout, otherwise, RTO gets back-offed exponentially (for me, this happens about 7 times, starting at aprox 200ms, in the ends accounts to over 25s!).


You might see where this is going. By the time the reader processes everything, it announces a Window Update, but the RTO is still halfway to the next timeout, so no data is retransmitted. Because the sender only takes RTT measurements when *new data* is ACKd by the reader, the RTO is not reset when Window Update arrives because ack_seq < SND.UNA.


My interpretation is that the implementation is correct, although I do find it strange that Window Updates do not reset the RTO per the RFCs I could find, specially RFC 6298 section 5. Should it be the case that Window Updates after a Zero Window take a RTT measurement? In the remote case where it should, someone please let me know, I'd love to do it, I've already spent about a week investigating this and have actually done it.

Note You need to log in before you can comment on or make changes to this bug.