Latest working kernel version: ??? (maybe never) Earliest failing kernel version: 2.6.27 Distribution: kernel.org Hardware Environment: x86, x86_64 Software Environment: Oracle, testcase Problem Description: When urgent mode is initiated using send(..., MSG_OOB), and the current outgoing queue is longer than MTU, several packets are sent out with the URG flag set. However, the Urgent Pointer field in the TCP header is not updated correctly. Instead, it stays the same. The receiving end interprets this as a new urgent message inserted further in the stream, sends multiple SIGURGs to the server application and reads across the urgent data. This behaviour is correct. Steps to reproduce: 1. Initiate a TCP connection 2. Fill the outgoing queue with some (normal) data 3. Send an urgent message (with MSG_OOB) 4. Send some more non-urgent data.
Created attachment 18817 [details] strace of a test client I don't have permission to publish the test case here, but I'm attaching an strace, at least. It follows closely the steps to reproduce from the initial comment.
Created attachment 18818 [details] tcpdump from the test case run It is not easy to capture a tcpdump from the same run as the strace, as the strace adds overhead to the client and so the output queue usually drains too quickly. When I created some load on the network to slow it down and used a fast machine against an old and slow machine, it worked out. So, this is the network dump corresponding exactly to the strace posted above. Look at frames 138. It has Urgent Pointer set to 4097. This segment itself moves 1448 bytes forward in the stream, but the following frame (139) still has Urgent Pointer set to 4097. The bug can also be demonstrated on the TCP sequence number. If there is only one urgent message, obviously Sequence Number + Urgent Pointer should stay constant in all segments up to the urgent message. However in frame 138 it is 126977+4097 = 131074, in frame 139 it is 128425+4097 = 132522. Interestingly in the following frame 140, it is again 129873+1201 = 131074, so the Urgent Pointer does get updated sometimes...
The bug is triggered when a packet is broken into more pieces to fit into the MTU. If I understand the problem correctly, the Linux TCP/IP stack happily constructs oversize TCP packets, which are then transformed into smaller (but not fragmented) TCP packets. At least a netdump on the client side shows me: 14:16:21.532735 IP (tos 0x0, ttl 64, id 26991, offset 0, flags [DF], proto TCP (6), length 4396) 10.10.4.188.50387 > 10.10.4.140.58699: . 257745:262089(4344) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599> Which is seen on the server side as 3 packets: 14:16:21.537512 IP (tos 0x0, ttl 64, id 26991, offset 0, flags [DF], proto TCP (6), length 1500) 10.10.4.188.50387 > 10.10.4.140.58699: ., cksum 0xecba (correct), 257745:259193(1448) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599> 14:16:21.537644 IP (tos 0x0, ttl 64, id 26992, offset 0, flags [DF], proto TCP (6), length 1500) 10.10.4.188.50387 > 10.10.4.140.58699: ., cksum 0xe712 (correct), 259193:260641(1448) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599> 14:16:21.537767 IP (tos 0x0, ttl 64, id 26993, offset 0, flags [DF], proto TCP (6), length 1500) 10.10.4.188.50387 > 10.10.4.140.58699: ., cksum 0xe16a (correct), 260641:262089(1448) ack 1 win 1460 urg 4401 <nop,nop,timestamp 307239942 327183599> Note that urg is 4401 in all three packets.
I'm now quite certain that the bug is in tcp_fragment() in net/ipv4/tcp_output.c. The code corrects the sequence numbers, but it doesn't touch TCP_SKB_CB(buff)->urg_ptr, which seems wrong to me. I don't understand the code well enough to produce a patch (what happens if urg_ptr points before the new skb's start?), but it shouldn't be too hard with this analysis, should it?
On Wed, 12 Nov 2008, Stephen Hemminger wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12014 > > Summary: Incorrect Urgent Pointer in outgoing packets > Product: Networking > Version: 2.5 > KernelVersion: 2.6.27 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: kernel@tesarici.cz > > > Latest working kernel version: ??? (maybe never) > Earliest failing kernel version: 2.6.27 > Distribution: kernel.org > Hardware Environment: x86, x86_64 > Software Environment: Oracle, testcase > Problem Description: > > When urgent mode is initiated using send(..., MSG_OOB), and the current > outgoing queue is longer than MTU, several packets are sent out with the URG > flag set. This urg flag in many segments is expected to happen (the urg pointer might not point to the current segment but something that will follow)... > However, the Urgent Pointer field in the TCP header is not updated > correctly. Instead, it stays the same. ...but I've hard time believing that this true since the the only place where urg_ptr is assigned is in tcp_output.c and that's done relative to segments seqno: th->urg_ptr = htons(tp->snd_up - tcb->seq); So if what you say is true, it means that snd_up should advance equally to the segments seqno. Which means that writes would update it. > The receiving end interprets this as a new urgent message inserted further in > the stream, sends multiple SIGURGs to the server application and reads across > the urgent data. This behaviour is correct. > > Steps to reproduce: > > 1. Initiate a TCP connection > 2. Fill the outgoing queue with some (normal) data > 3. Send an urgent message (with MSG_OOB) > 4. Send some more non-urgent data. Can you prove that your application is doing what you describe, I suspect that you e.g., forget to clear MSG_OOB... Can you strace the program please... Getting a tcpdump at the same time wouldn't hurt either. I've tried with this exact sequence recently (though after the rewrite of urg handling for 2.6.28), and did get a nicely working urg after fixing the bug that got introduced by the rewrite and just a single sigurg. It seems that I've lost that tcpdump somewhere though. I was using sendfile though for the normal data while you might not be doing the same.
(In reply to comment #5) > > When urgent mode is initiated using send(..., MSG_OOB), and the current > > outgoing queue is longer than MTU, several packets are sent out with the > URG > > flag set. > > This urg flag in many segments is expected to happen (the urg pointer > might not point to the current segment but something that will follow)... Correct. No question about that. > > However, the Urgent Pointer field in the TCP header is not updated > > correctly. Instead, it stays the same. > > ...but I've hard time believing that this true since the the only place > where urg_ptr is assigned is in tcp_output.c and that's done relative > to segments seqno: > > th->urg_ptr = htons(tp->snd_up - tcb->seq); > > So if what you say is true, it means that snd_up should advance equally > to the segments seqno. Which means that writes would update it. Yes, of course. But the resulting skb can later be broken up into several packets in tcp_fragment(). Please, read the code. > Can you prove that your application is doing what you describe, I suspect > that you e.g., forget to clear MSG_OOB... Can you strace the program > please... See my comment #1. Attachment #18817 [details]. > Getting a tcpdump at the same time wouldn't hurt either. Comment #2. Attachment #18818 [details]. > I've tried with this exact sequence recently (though after the rewrite of > urg handling for 2.6.28), and did get a nicely working urg after fixing > the bug that got introduced by the rewrite and just a single sigurg. It > seems that I've lost that tcpdump somewhere though. I was using sendfile > though for the normal data while you might not be doing the same. I'm using write() for the normal data. I'll ask the author of the test program for permission to publish it here, if that helps.
(In reply to comment #5) > I've tried with this exact sequence recently (though after the rewrite of > urg handling for 2.6.28), and did get a nicely working urg [...] I don't think this is a recent regression. I specifically observed exactly the same behaviour on 2.6.16, but that was a distro kernel, so I assumed you would have considered that piece of information worthless. Anyway, I tested with latest git (commit 4ffaf869c7780bbdfc11291e5fd4b61dde662b1c), and it fails exactly the same way.
We have a testcase that produces the same symptoms. From the net trace, the urgent pointers only get updated periodically. On closer examination, it seems that the ones that are not updated are produced by NIC when chopping the TSO into smaller segments. So, if I turn off TSO on the NIC, the problem disappears. This seems to point to the individual NIC fw/hw to be the source of the problem.
Closing as obsolete please re-open if incorrect
Yes, this bug was originally fixed by commit 33cf71cee14743185305c61625c4544885055733 and then for real with commit f8269a495a1924f8b023532dd3e77423432db810