Bug 42595 - TCP checksum gets incorrectly calculated on SYN/ACK packets to destinations with RTAX_FEATURE_ALLFRAG set
Summary: TCP checksum gets incorrectly calculated on SYN/ACK packets to destinations w...
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV6 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Hideaki YOSHIFUJI
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-17 19:57 UTC by Tore Anderson
Modified: 2015-05-03 14:11 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.2.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
tcpdump taken at the Linux server's IPv6-only interface (410 bytes, application/vnd.tcpdump.pcap)
2012-01-17 19:58 UTC, Tore Anderson
Details
tcpdump taken at the stateless translator's IPv6 and IPv4 interfaces (hardware is Cisco ASR1K) (708 bytes, application/vnd.tcpdump.pcap)
2012-01-17 19:58 UTC, Tore Anderson
Details
Incorrect checksums for FIN/ACK packets (446.59 KB, application/vnd.tcpdump.pcap)
2012-01-17 22:15 UTC, Tore Anderson
Details
Two downloads in a row from server with patched kernel (626.56 KB, application/vnd.tcpdump.pcap)
2012-01-18 14:18 UTC, Tore Anderson
Details

Description Tore Anderson 2012-01-17 19:57:12 UTC
If the allfrag feature has been set on a host route (due to an ICMPv6 Packet Too Big received indicating a MTU of less than 1280), TCP SYN/ACK packets to that destination appears to get an incorrect TCP checksum. This in turn means they are thrown away as invalid.

In the case of an IPv4 client behind a link with a MTU of less than 1260, accessing an IPv6 server through a stateless translator, this means that the client can only download a single large file from the server, because once it is in the server's routing cache with the allfrag feature set, new TCP connections can no longer be established.

See the attached PCAP files. At this point, the server's routing cache contained (because of an earlier HTTP download from the client which resulted in an ICMPv6 PTB with a lesser-than-1280 MTU):

2a02:c0::46:0:57ee:2a2a via 2a02:c0:200:102:ffff::1 dev eth0  metric 0 
    cache  expires 87sec mtu 1280 features 8

"features 8" indicates the allfrag feature, and 2a02:c0::46:0:57ee:2a2a is actually the IPv6 representation of the IPv4 client 87.238.42.42, which is behind an IPv4 link with a lower MTU than 1280.

The two pcaps show the problem both from the server's point of view (at-server.pcap) and at the stateless translator's point of view (at-translator.pcap). In the latter, you can see both the IPv4 and the IPv6 packets before and after IP version translation. Also note that RX checksum offloading was disabled on the server (something which is also evidenced by the fact that the checksum of the outgoing TCP SYN/ACK packet is the same when it arrives at the translator).

Tore
Comment 1 Tore Anderson 2012-01-17 19:58:04 UTC
Created attachment 72096 [details]
tcpdump taken at the Linux server's IPv6-only interface
Comment 2 Tore Anderson 2012-01-17 19:58:57 UTC
Created attachment 72097 [details]
tcpdump taken at the stateless translator's IPv6 and IPv4 interfaces (hardware is Cisco ASR1K)
Comment 3 Eric Dumazet 2012-01-17 20:30:18 UTC
You wrote "Also note that RX checksum offloading was disabled on the server "

Did you mean  "TX checksum offloading was disabled" ?

Please provide "ethtool -k eth0" on your machine :)

Thanks
Comment 4 Eric Dumazet 2012-01-17 21:04:22 UTC
Seems to me checksums are OK :

tcpdump -n -v -r at-server.pcap
reading from file at-server.pcap, link-type EN10MB (Ethernet)

20:40:17.180463 IP6 (hlim 58, next-header TCP (6) payload length: 40) 2a02:c0::46:0:57ee:2a2a.46805 > 2a02:c0::46:0:57ee:3d82.80: Flags [S], cksum 0xc011 (correct), seq 3347429097, win 14600, options [mss 1460,sackOK,TS val 895918 ecr 0,nop,wscale 4], length 0

20:40:17.180516 IP6 (hlim 64, next-header Fragment (44) payload length: 48) 2a02:c0::46:0:57ee:3d82 > 2a02:c0::46:0:57ee:2a2a: frag (0x99f4b014:0|40) 80 > 46805: Flags [S.], seq 1790467254, ack 3347429098, win 12080, options [mss 1220,sackOK,TS val 113331996 ecr 895918,nop,wscale 7], length 0

20:40:17.181070 IP6 (class 0xc0, hlim 59, next-header ICMPv6 (58) payload length: 88) 2a02:c0::46:0:57ee:2a2a > 2a02:c0::46:0:57ee:3d82: [icmp6 sum ok] ICMP6, destination unreachable, length 88,  unreachable prohibited 2a02:c0::46:0:57ee:2a2a

So maybe its a NIC bug ?
Comment 5 Tore Anderson 2012-01-17 22:07:42 UTC
(In reply to comment #3)
> You wrote "Also note that RX checksum offloading was disabled on the server "
> 
> Did you mean  "TX checksum offloading was disabled" ?

Yes, correct. My apologies.

> Please provide "ethtool -k eth0" on your machine :)

Offload parameters for eth0:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off
root@v6only:~# 

(In reply to comment #4)
> Seems to me checksums are OK :

> 20:40:17.180516 IP6 (hlim 64, next-header Fragment (44) payload length: 48)
2a02:c0::46:0:57ee:3d82 > 2a02:c0::46:0:57ee:2a2a: frag (0x99f4b014:0|40) 80 >
46805: Flags [S.], seq 1790467254, ack 3347429098, win 12080, options [mss
1220,sackOK,TS val 113331996 ecr 895918,nop,wscale 7], length 0

I can't see the checksum being verified OK or even verified at all here..?

If I load up the capture in Wireshark I get an incorrect checksum reported for this packet. (The other two packets weren't originated from the Linux server and does not have any checksum problems as far as I can tell.)

Tore
Comment 6 Tore Anderson 2012-01-17 22:15:43 UTC
Created attachment 72098 [details]
Incorrect checksums for FIN/ACK packets

This PCAP also shows the incorrect TCP checksum problem happen for FIN/ACK packets (frames 4012,4014,4016,4018,4020). So it is not limited to SYN/ACK packets, it seems.

The capture shows a normal download starting from when the server does not have a routing cache (thus no allfrag feature set) for the client. The TCP handshake works well, then you can then see PMTUD taking place, followed by lots of those mini-fragments described in bug #42572, before the incorrect TCP checksum problem appears in the FIN/ACK packets that are transmitted and retransmitted after the download has finished and the connection is being closed.

(Any subsequent connection attempt while the routing cache entry remains present on the server at this point will result in the incorrect TCP checksums during the SYN/ACK part of the TCP handshake, preventing it from succeeding.)

Tore
Comment 7 Tore Anderson 2012-01-18 14:18:37 UTC
Created attachment 72107 [details]
Two downloads in a row from server with patched kernel

This tcpdump is taken after patching the kernel with these two patches applied:

http://thread.gmane.org/gmane.linux.network/217998/focus=218021
http://thread.gmane.org/gmane.linux.network/217998/focus=218071

The first flow triggers Path MTU Discovery, as expected. Unlike earlier, the FIN/ACK packet sent by the server appears to have a correct TCP checksum (frame 2334).

Furthermore, the second flow (which is started with the allfrag-enabled route present in the routing cache), also have a correct TCP checksum (frame 2337).

So now I can download the file over and over and over again with no problems, without having to flush the routing cache on the server between each attempt.

In other words - this looks very good! :-)
Comment 8 xerofoify 2014-06-26 04:44:15 UTC
Please close this bug as it seems fixed after reading the log of this bug.
Cheers Nick

Note You need to log in before you can comment on or make changes to this bug.