Created attachment 72059 [details] An IPv4 client behind a link with a MTU of 1259 downloading a file from an IPv6 server When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment size does not take into account the size of the IPv6 Fragmentation header that needs to be included in outbound packets, causing every transmitted TCP segment to be fragmented across two IPv6 packets, the latter of which will only contain 8 bytes of actual payload. RTAX_FEATURE_ALLFRAG is typically set on a route in response to receving a ICMPv6 Packet Too Big message indicating a Path MTU of less than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6 PTBs with MTU < 1280 are still valid, in particular when an IPv6 packet is sent to an IPv4 destination through a stateless translator. Any ICMPv4 Need To Fragment packets originated from the IPv4 part of the path will be translated to ICMPv6 PTB which may then indicate an MTU of less than 1280. RFC 2460 section 5 specifies what an IPv6 stack should do when this happens: > In response to an IPv6 packet that is sent to an IPv4 destination > (i.e., a packet that undergoes translation from IPv6 to IPv4), the > originating IPv6 node may receive an ICMP Packet Too Big message > reporting a Next-Hop MTU less than 1280. In that case, the IPv6 node > is not required to reduce the size of subsequent packets to less than > 1280, but must include a Fragment header in those packets so that the > IPv6-to-IPv4 translating router can obtain a suitable Identification > value to use in resulting IPv4 fragments. Note that this means the > payload may have to be reduced to 1232 octets (1280 minus 40 for the > IPv6 header and 8 for the Fragment header), and smaller still if > additional extension headers are used. The Linux kernel refuses to reduce the effective MTU to anything below 1280 bytes, instead it sets it to exactly 1280 bytes, and RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header), instead of 1232 (additionally taking into account the 8 bytes required by the IPv6 Fragmentation extension header). This in turn results in rather inefficient transmission, as every transmitted TCP segment now is split in two fragments containing 1232+8 bytes of payload. I am attaching a tcpdump that shows this happening. In this case, 2a02:c0::46:0:57ee:3d82 is an IPv6-only server running Linux 3.2.0, while 2a02:c0::46:0:57ee:2a2a really is 87.238.42.42, a NAT device with an IPv4 node behind it. The link between the NAT device and the IPv4 node has a MTU of 1259. Somewhere between the NAT device and the server there's a stateless IPv4/IPv6 translator. When the server sends its first full-sized (1500 bytes) packets, the NAT device responds with a ICMPv4 Need To Fragment (MTU=1259) which are then received by the server in its translated for (ICMPv6 PTB, MTU 1279). After that a large number of these mini-fragments containing only 8 bytes of payload are transmitted. They should have been avoided. Tore
Created attachment 72106 [details] tcpdump taken at the Linux server's IPv6-only interface (with proposed patch) This is a tcpdump of the same download taken with the patch from http://thread.gmane.org/gmane.linux.network/217998/focus=218021 applied, it seems to work fine to me - no "mini-fragments" any more, and all the packets that have a Fragmentation header included are in fact *not* fragmented. Tore
A patch referencing this bug report has been merged in Linux v3.5-rc1: commit 67469601406c12ced3db9956aeb0ef0854e2952f Author: Eric Dumazet <edumazet@google.com> Date: Tue Apr 24 07:37:38 2012 +0000 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing