Bug 9031 - TCP window is to cautious on send
Summary: TCP window is to cautious on send
Status: REJECTED DOCUMENTED
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-16 17:02 UTC by Andrew J. Kroll
Modified: 2007-09-20 20:11 UTC (History)
0 users

See Also:
Kernel Version: Any
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Andrew J. Kroll 2007-09-16 17:02:45 UTC
This has been a longstanding "bug" of sorts when talking to a system that has extremely small windows (under 1.5k).

The only way to give the stack on the other side a nudge is to ACK twice.

Here is a sample transcript, with a max window size of 1025 bytes.

18:25:43.968358 IP dr.ea.ms.http > 192.168.80.2.40246: . 37377:37633(256) ack 120 win 5840
18:25:43.992402 IP 192.168.80.2.40246 > dr.ea.ms.http: . ack 37121 win 769 <mss 256>
18:25:44.390305 IP 192.168.80.2.40246 > dr.ea.ms.http: . ack 37121 win 1025 <mss 256>
18:25:44.823084 IP dr.ea.ms.http > 192.168.80.2.40246: . 37633:37889(256) ack 120 win 5840

If I take the "nudge" code out of my IP stack, it sits for an aweful long time, waiting on the next packet, when there clearly is room for a few more.

Should I:
1: Have my IP stack lie about the window till it is important?
2: Something else?

I can't see any good reason for the large delay, since it is on a serial link, via SLIP.

I can point you to source code that will allow you to verify the problem for yourself, if you would like.
Comment 1 Stephen Hemminger 2007-09-16 19:35:42 UTC
The Linux stack follows the RFC standard for silly window avoidance.
Any window less than a full MTU is deemed a silly window and will
not be used. The application can turn off the Nagle algorithm on
a per socket basis with TCP_NODELAY.

What is the OS or device on the other end that is so non-standard
compliant?

Since Linux follows the standard, you really need to fix the receiver.
Comment 2 Anonymous Emailer 2007-09-16 23:44:13 UTC
Reply-To: akpm@linux-foundation.org

On Sun, 16 Sep 2007 17:02:46 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9031
> 
>            Summary: TPC window is to cautious on send
>            Product: Networking
>            Version: 2.5
>      KernelVersion: Any
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@osdl.org
>         ReportedBy: a@oo.ms
> 
> 
> This has been a longstanding "bug" of sorts when talking to a system that has
> extremely small windows (under 1.5k).
> 
> The only way to give the stack on the other side a nudge is to ACK twice.
> 
> Here is a sample transcript, with a max window size of 1025 bytes.
> 
> 18:25:43.968358 IP dr.ea.ms.http > 192.168.80.2.40246: . 37377:37633(256) ack
> 120 win 5840
> 18:25:43.992402 IP 192.168.80.2.40246 > dr.ea.ms.http: . ack 37121 win 769
> <mss
> 256>
> 18:25:44.390305 IP 192.168.80.2.40246 > dr.ea.ms.http: . ack 37121 win 1025
> <mss 256>
> 18:25:44.823084 IP dr.ea.ms.http > 192.168.80.2.40246: . 37633:37889(256) ack
> 120 win 5840
> 
> If I take the "nudge" code out of my IP stack, it sits for an aweful long
> time,
> waiting on the next packet, when there clearly is room for a few more.
> 
> Should I:
> 1: Have my IP stack lie about the window till it is important?
> 2: Something else?
> 
> I can't see any good reason for the large delay, since it is on a serial
> link,
> via SLIP.
> 
> I can point you to source code that will allow you to verify the problem for
> yourself, if you would like.
> 
Comment 3 Anonymous Emailer 2007-09-17 10:04:22 UTC
Reply-To: shemminger@linux-foundation.org

On Sun, 16 Sep 2007 23:43:40 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sun, 16 Sep 2007 17:02:46 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9031
> > 
> >            Summary: TPC window is to cautious on send
> >            Product: Networking
> >            Version: 2.5
> >      KernelVersion: Any
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: IPV4
> >         AssignedTo: shemminger@osdl.org
> >         ReportedBy: a@oo.ms
> > 
> > 
> > This has been a longstanding "bug" of sorts when talking to a system that
> has
> > extremely small windows (under 1.5k).
> > 
> > The only way to give the stack on the other side a nudge is to ACK twice.
> > 
> > Here is a sample transcript, with a max window size of 1025 bytes.
> > 
> > 18:25:43.968358 IP dr.ea.ms.http > 192.168.80.2.40246: . 37377:37633(256)
> ack
> > 120 win 5840
> > 18:25:43.992402 IP 192.168.80.2.40246 > dr.ea.ms.http: . ack 37121 win 769
> <mss
> > 256>
> > 18:25:44.390305 IP 192.168.80.2.40246 > dr.ea.ms.http: . ack 37121 win 1025
> > <mss 256>
> > 18:25:44.823084 IP dr.ea.ms.http > 192.168.80.2.40246: . 37633:37889(256)
> ack
> > 120 win 5840
> > 
> > If I take the "nudge" code out of my IP stack, it sits for an aweful long
> time,
> > waiting on the next packet, when there clearly is room for a few more.
> > 
> > Should I:
> > 1: Have my IP stack lie about the window till it is important?
> > 2: Something else?
> > 
> > I can't see any good reason for the large delay, since it is on a serial
> link,
> > via SLIP.
> > 
> > I can point you to source code that will allow you to verify the problem
> for
> > yourself, if you would like.
> > 

See my comment, on bug report, Linux is doing Silly Window Syndrome avoidance (RFC 813)
as required in host requirements RFC1122

         4.2.3.4  When to Send Data

            A TCP MUST include a SWS avoidance algorithm in the sender.

            A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
            coalesce short segments.  However, there MUST be a way for
            an application to disable the Nagle algorithm on an
            individual connection.  In all cases, sending data is also
            subject to the limitation imposed by the Slow Start
            algorithm (Section 4.2.2.15).

The Linux mechanism to disable Nagle is setsockopt(TCP_NODELAY).
Comment 4 Andrew J. Kroll 2007-09-20 19:59:09 UTC
So then the option is to lie on my end. There is no way to increase the actual window on some of the devices, which is very unfortunate. It shouldn't matter much any way, since it is a serial device, and the buffer space is available anyway. The windowing scheme used reports to the other host the amount of buffer space available per connection. It doesn't count the free shared buffers, which are usually used to collect fragments and hold on to them for assembly. I suppose that would be the idea fix. Thanks for pointing me to the RFC. it will be seriously helpful in the development of my IP stack.
Comment 5 Stephen Hemminger 2007-09-20 20:11:31 UTC
If the device doesn't have a enough buffer space for a whole Ethernet
packet. Then a sensible thing to do would be to use a smaller Maximum
Segment Size (MSS) during the TCP negotiation phase.  The SWS avoidance
is done based on MSS, so if you set it to 512 bytes everything would work.

Note You need to log in before you can comment on or make changes to this bug.