Bug 61681

Summary:	Incoming TCP4 connections fail to start, don't get past SYN_RECV and then quickly disappear
Product:	Networking	Reporter:	Dave (dcrooke)
Component:	IPV4	Assignee:	Stephen Hemminger (stephen)
Status:	NEW ---
Severity:	normal	CC:	alan, dcrooke, eric.dumazet, nealcardwell, szg00000
Priority:	P1
Hardware:	IA-64
OS:	Linux
Kernel Version:	3.4.57	Subsystem:
Regression:	No	Bisected commit-id:

Description Dave 2013-09-19 16:41:56 UTC

This bug appears to be very rare, but entirely real, and it dates back a long time. I tried to debug it thoroughly looking at both kernel and webserver settings, and then got down to looking at netstat.

The Linux kernel can sometimes get into a state where it fails to complete approx 98% of incoming TCP connection attempts, and only correctly processes about 2%. These numbers may be relevant as others have posted finding the same "1 in 50" ratio on much older kernels over the years.

I did not get a chance to capture traffic with iptables / pcap / Wireshark (production box so we gave up quickly and tried a reboot) but other folks with the same issue indicate that Linux is sending the wrong remote sequence number back in the SYN-ACK packet, and the client simply drops it. My experience is that the half formed connection is torn down almost immediately - I was running netstat in a continuous loop to see this, others have observed that their clients send RST in response to the malformed SYN-ACK.

http://serverfault.com/questions/297134/server-not-sending-a-syn-ack-packet-in-response-to-a-syn-packet

http://ask.wireshark.org/questions/23885/rst-after-syn-ack

For us, the problem went away on a reboot and so far has stayed away, so I am wondering if it is a factor of cumulative traffic but TCP sequence number wraparound on the Linux end shouldn't cause this afaict, it should be simply replying to the client with the sequence number that came in the SYN packet.

A number of people have had very similar looking issues due to broken multi-path network config or a broken NAT device. Obviously this is not the case here, Amazon knows how to do IT, this box only has one interface, and in any case the Linux kernel is still responsible for the sequence number it replies with.

Comment 1 Eric Dumazet 2013-09-20 13:39:28 UTC

Really linux just copy the sequence number received in the SYN message back to the SYNACK message. No wraparound issue involved here.

Make sure the server really sends a SYNACK message. It might drop your SYN packet for valid reasons. nstat should help to understand why.

netfilter / tcp conntrack might be the problem. Are you using it ?

Comment 2 Dave 2013-09-20 16:30:40 UTC

Hi Eric, thanks for the quick reply.

I have no way to reproduce the problem, but it's definitely not firewall related and we are not using any of the filters you mention. iptables is blank. Port 80 is not firewalled by Amazon.

Some people have reported a malformed SYN_ACK due to a NAT device using its own (single) IP on the inside to talk to the Linux server, but this is apparently due to the NAT having to quickly recycle source port numbers due to using a single IP. Linux will apparently return the ACK sequence number for the previous connection, which is understandable.

Amazon EC2 only NAT's the server VM IP, the external Internet IP and port from the upstream client is passed in to us as the source. Thus it seems unlikely in this case that the problem is due to port number re-use.

The traffic level on the server was low when the problem occurred, perhaps 10 requests per second at the most, and had plenty of file descriptors, Apache children, RAM, CPU, etc. The server had been up for a few months, and no config changes were made to the OS or Apache.

I can't figure it out, but hopefully it won't recur :)

Comment 3 Neal Cardwell 2013-09-20 23:57:24 UTC

From logs, can you quantify exactly how long the machine was up when this problem happened?

Interesting bugs can happen at 24 days and 49 days, due to 32-bit millisecond-based jiffies values flipping sign, wrapping around, overflowing, etc.

Comment 4 Dave 2013-09-23 15:52:40 UTC

Unfortunately, the logs had rolled, but I found this (not my systems so I am not super familiar):

[xx@xxxx log]$ stat dmesg.old
  File: `dmesg.old'
  Size: 10320     	Blocks: 24         IO Block: 4096   regular file
Device: ca01h/51713d	Inode: 1380        Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-06-17 14:22:46.495339112 -0500
Modify: 2013-06-17 14:22:46.499339112 -0500
Change: 2013-09-18 11:17:58.625811219 -0500
[bf@cake-app1 log]$ stat dmesg
  File: `dmesg'
  Size: 10320     	Blocks: 24         IO Block: 4096   regular file
Device: ca01h/51713d	Inode: 17          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-09-18 11:17:58.625811219 -0500
Modify: 2013-09-18 11:17:58.649811219 -0500
Change: 2013-09-18 11:17:58.649811219 -0500
[xx@xxxx log]$ 

I make this just under 92 days.