Bug 12014 - Incorrect Urgent Pointer in outgoing packets
Incorrect Urgent Pointer in outgoing packets
Status: CLOSED CODE_FIX
Product: Networking
Classification: Unclassified
Component: IPV4
All Linux
: P1 normal
Assigned To: Stephen Hemminger
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-12 02:16 UTC by Petr Tesarik
Modified: 2012-05-16 14:55 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.27
Tree: Mainline
Regression: No


Attachments
strace of a test client (11.14 KB, text/plain)
2008-11-12 02:18 UTC, Petr Tesarik
Details
tcpdump from the test case run (139.83 KB, application/octet-stream)
2008-11-12 02:27 UTC, Petr Tesarik
Details

Description Petr Tesarik 2008-11-12 02:16:32 UTC
Latest working kernel version: ??? (maybe never)
Earliest failing kernel version: 2.6.27
Distribution: kernel.org
Hardware Environment: x86, x86_64
Software Environment: Oracle, testcase
Problem Description:

When urgent mode is initiated using send(..., MSG_OOB), and the current outgoing queue is longer than MTU, several packets are sent out with the URG flag set. However, the Urgent Pointer field in the TCP header is not updated correctly. Instead, it stays the same.

The receiving end interprets this as a new urgent message inserted further in the stream, sends multiple SIGURGs to the server application and reads across the urgent data. This behaviour is correct.

Steps to reproduce:

1. Initiate a TCP connection
2. Fill the outgoing queue with some (normal) data
3. Send an urgent message (with MSG_OOB)
4. Send some more non-urgent data.
Comment 1 Petr Tesarik 2008-11-12 02:18:28 UTC
Created attachment 18817 [details]
strace of a test client

I don't have permission to publish the test case here, but I'm attaching an strace, at least. It follows closely the steps to reproduce from the initial comment.
Comment 2 Petr Tesarik 2008-11-12 02:27:53 UTC
Created attachment 18818 [details]
tcpdump from the test case run

It is not easy to capture a tcpdump from the same run as the strace, as the strace adds overhead to the client and so the output queue usually drains too quickly.

When I created some load on the network to slow it down and used a fast machine against an old and slow machine, it worked out. So, this is the network dump corresponding exactly to the strace posted above.

Look at frames 138. It has Urgent Pointer set to 4097. This segment itself moves 1448 bytes forward in the stream, but the following frame (139) still has Urgent Pointer set to 4097.

The bug can also be demonstrated on the TCP sequence number. If there is only one urgent message, obviously Sequence Number + Urgent Pointer should stay constant in all segments up to the urgent message. However in frame 138 it is 126977+4097 = 131074, in frame 139 it is 128425+4097 = 132522.
Interestingly in the following frame 140, it is again 129873+1201 = 131074, so the Urgent Pointer does get updated sometimes...
Comment 3 Petr Tesarik 2008-11-12 05:29:11 UTC
The bug is triggered when a packet is broken into more pieces to fit into the MTU. If I understand the problem correctly, the Linux TCP/IP stack happily constructs oversize TCP packets, which are then transformed into smaller (but not fragmented) TCP packets. At least a netdump on the client side shows me:

14:16:21.532735 IP (tos 0x0, ttl 64, id 26991, offset 0, flags [DF], proto TCP (6), length 4396) 10.10.4.188.50387 > 10.10.4.140.58699: . 257745:262089(4344) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599>

Which is seen on the server side as 3 packets:

14:16:21.537512 IP (tos 0x0, ttl 64, id 26991, offset 0, flags [DF], proto TCP (6), length 1500) 10.10.4.188.50387 > 10.10.4.140.58699: ., cksum 0xecba (correct), 257745:259193(1448) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599>
14:16:21.537644 IP (tos 0x0, ttl 64, id 26992, offset 0, flags [DF], proto TCP (6), length 1500) 10.10.4.188.50387 > 10.10.4.140.58699: ., cksum 0xe712 (correct), 259193:260641(1448) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599>
14:16:21.537767 IP (tos 0x0, ttl 64, id 26993, offset 0, flags [DF], proto TCP (6), length 1500) 10.10.4.188.50387 > 10.10.4.140.58699: ., cksum 0xe16a (correct), 260641:262089(1448) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599>

Note that urg is 4401 in all three packets.
Comment 4 Petr Tesarik 2008-11-12 06:24:35 UTC
I'm now quite certain that the bug is in tcp_fragment() in net/ipv4/tcp_output.c. The code corrects the sequence numbers, but it doesn't touch TCP_SKB_CB(buff)->urg_ptr, which seems wrong to me.

I don't understand the code well enough to produce a patch (what happens if urg_ptr points before the new skb's start?), but it shouldn't be too hard with this analysis, should it?
Comment 5 Ilpo Järvinen 2008-11-12 13:12:55 UTC
On Wed, 12 Nov 2008, Stephen Hemminger wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12014
> 
>            Summary: Incorrect Urgent Pointer in outgoing packets
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.27
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: kernel@tesarici.cz
> 
> 
> Latest working kernel version: ??? (maybe never)
> Earliest failing kernel version: 2.6.27
> Distribution: kernel.org
> Hardware Environment: x86, x86_64
> Software Environment: Oracle, testcase
> Problem Description:
> 
> When urgent mode is initiated using send(..., MSG_OOB), and the current
> outgoing queue is longer than MTU, several packets are sent out with the URG
> flag set.

This urg flag in many segments is expected to happen (the urg pointer
might not point to the current segment but something that will follow)...

> However, the Urgent Pointer field in the TCP header is not updated
> correctly. Instead, it stays the same.

...but I've hard time believing that this true since the the only place
where urg_ptr is assigned is in tcp_output.c and that's done relative
to segments seqno:

	th->urg_ptr = htons(tp->snd_up - tcb->seq);

So if what you say is true, it means that snd_up should advance equally
to the segments seqno. Which means that writes would update it.

> The receiving end interprets this as a new urgent message inserted further in
> the stream, sends multiple SIGURGs to the server application and reads across
> the urgent data. This behaviour is correct.
> 
> Steps to reproduce:
> 
> 1. Initiate a TCP connection
> 2. Fill the outgoing queue with some (normal) data
> 3. Send an urgent message (with MSG_OOB)
> 4. Send some more non-urgent data.

Can you prove that your application is doing what you describe, I suspect 
that you e.g., forget to clear MSG_OOB... Can you strace the program 
please...  Getting a tcpdump at the same time wouldn't hurt either.

I've tried with this exact sequence recently (though after the rewrite of 
urg handling for 2.6.28), and did get a nicely working urg after fixing 
the bug that got introduced by the rewrite and just a single sigurg. It 
seems that I've lost that tcpdump somewhere though. I was using sendfile 
though for the normal data while you might not be doing the same.


Comment 6 Petr Tesarik 2008-11-12 23:32:33 UTC
(In reply to comment #5)
> > When urgent mode is initiated using send(..., MSG_OOB), and the current
> > outgoing queue is longer than MTU, several packets are sent out with the URG
> > flag set.
> 
> This urg flag in many segments is expected to happen (the urg pointer
> might not point to the current segment but something that will follow)...

Correct. No question about that.

> > However, the Urgent Pointer field in the TCP header is not updated
> > correctly. Instead, it stays the same.
> 
> ...but I've hard time believing that this true since the the only place
> where urg_ptr is assigned is in tcp_output.c and that's done relative
> to segments seqno:
> 
>         th->urg_ptr = htons(tp->snd_up - tcb->seq);
> 
> So if what you say is true, it means that snd_up should advance equally
> to the segments seqno. Which means that writes would update it.

Yes, of course. But the resulting skb can later be broken up into several packets in tcp_fragment(). Please, read the code.

> Can you prove that your application is doing what you describe, I suspect 
> that you e.g., forget to clear MSG_OOB... Can you strace the program 
> please...

See my comment #1. Attachment #18817 [details].

>  Getting a tcpdump at the same time wouldn't hurt either.

Comment #2. Attachment #18818 [details].

> I've tried with this exact sequence recently (though after the rewrite of 
> urg handling for 2.6.28), and did get a nicely working urg after fixing 
> the bug that got introduced by the rewrite and just a single sigurg. It 
> seems that I've lost that tcpdump somewhere though. I was using sendfile 
> though for the normal data while you might not be doing the same.

I'm using write() for the normal data. I'll ask the author of the test program for permission to publish it here, if that helps.
Comment 7 Petr Tesarik 2008-11-13 01:36:58 UTC
(In reply to comment #5)
> I've tried with this exact sequence recently (though after the rewrite of 
> urg handling for 2.6.28), and did get a nicely working urg [...]

I don't think this is a recent regression. I specifically observed exactly the same behaviour on 2.6.16, but that was a distro kernel, so I assumed you would have considered that piece of information worthless.

Anyway, I tested with latest git (commit 4ffaf869c7780bbdfc11291e5fd4b61dde662b1c), and it fails exactly the same way.
Comment 8 Tina Yang 2009-01-07 12:27:12 UTC
We have a testcase that produces the same symptoms.  From the net trace, the urgent pointers only get updated periodically.  On closer examination, it seems that the ones that are not updated are produced by NIC when chopping the TSO into smaller segments.  So, if I turn off TSO on the NIC, the problem disappears.  This seems to point to the individual NIC fw/hw to be the source of the problem.
Comment 9 Alan 2012-05-14 16:39:59 UTC
Closing as obsolete please re-open if incorrect
Comment 10 Petr Tesarik 2012-05-16 13:42:41 UTC
Yes, this bug was originally fixed by
commit 33cf71cee14743185305c61625c4544885055733

and then for real with
commit f8269a495a1924f8b023532dd3e77423432db810

Note You need to log in before you can comment on or make changes to this bug.