Bug 206319 - ECN header flag processing overly restrictive in TCP
Summary: ECN header flag processing overly restrictive in TCP
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-27 08:15 UTC by rscheff
Modified: 2020-01-27 08:30 UTC (History)
3 users (show)

See Also:
Kernel Version: HEAD
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description rscheff 2020-01-27 08:15:36 UTC
RFC3168 states, that a CWR flag "SHOULD" be *sent* together with a new data segment.

However, linux is processing the CWR flag as data receiver ONLY when it arrives together with some data (but apparently does accept it on retransmissions).

This has been found to be an interoperability issue with *BSD, where the CWR is sent as quickly as possible, including on pure ACKs (or retransmissions) so far. That deviation from RFC3168 there is reported at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243231

Nevertheless, CWR processing as receiver should be less restrictive, to meet the sprit of Postels Law: "Be liberal in what you accept, and conservative in what you send."


This has been demonstrated to be a dramatic performance impediment, as the data receiver (linux) keeps the ECE latched, while *BSD interprets the additional ECE flags as another round of congestion. To which the data sender reacts by continous reductions of the congestion window until extremely low packet transmission rates (1 packet per delayed ACK timeout, or even persist timeout (5s) are hit, and kept at that level for extensive periods of time.

Discussed this issue with Neal and Yuchung already, this bug report is to track the issue in the field (impacted environments).
Comment 1 rscheff 2020-01-27 08:30:41 UTC
Also, while not having validated this on linux:

As CWR is effectivly bound to the first transmission of [snd_max+1] when entering ECN-congestion, the recovery point should be set to that sequence number too; it may be set to snd_max (as with loss recovery). This can lead to premature termination of ECN-based congestion reaction, on a full ACK of snd_max. Which is likely during request-respond workloads, where the data direction changes frequently.

If that happens, the still latched ECE can lead to two consecutive ECN reactions (reducing cwnd twice). 

Again, I have not verified if this problem may exist on linux, but found this on *BSD derived stacks.

Note You need to log in before you can comment on or make changes to this bug.