Bug 42572

Summary: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Product: Networking Reporter: Tore Anderson (tore)
Component: IPV6Assignee: Hideaki YOSHIFUJI (yoshfuji)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, florian
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: An IPv4 client behind a link with a MTU of 1259 downloading a file from an IPv6 server
tcpdump taken at the Linux server's IPv6-only interface (with proposed patch)

Description Tore Anderson 2012-01-12 17:32:51 UTC
Created attachment 72059 [details]
An IPv4 client behind a link with a MTU of 1259 downloading a file from an IPv6 server

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

RFC 2460 section 5 specifies what an IPv6 stack should do when this
happens:

> In response to an IPv6 packet that is sent to an IPv4 destination
> (i.e., a packet that undergoes translation from IPv6 to IPv4), the
> originating IPv6 node may receive an ICMP Packet Too Big message
> reporting a Next-Hop MTU less than 1280.  In that case, the IPv6 node
> is not required to reduce the size of subsequent packets to less than
> 1280, but must include a Fragment header in those packets so that the
> IPv6-to-IPv4 translating router can obtain a suitable Identification
> value to use in resulting IPv4 fragments.  Note that this means the
> payload may have to be reduced to 1232 octets (1280 minus 40 for the
> IPv6 header and 8 for the Fragment header), and smaller still if
> additional extension headers are used.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every 
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

I am attaching a tcpdump that shows this happening. In this case,
2a02:c0::46:0:57ee:3d82 is an IPv6-only server running Linux 3.2.0,
while 2a02:c0::46:0:57ee:2a2a really is 87.238.42.42, a NAT device with
an IPv4 node behind it. The link between the NAT device and the IPv4
node has a MTU of 1259. Somewhere between the NAT device and the server
there's a stateless IPv4/IPv6 translator. When the server sends its
first full-sized (1500 bytes) packets, the NAT device responds with
a ICMPv4 Need To Fragment (MTU=1259) which are then received by the
server in its translated for (ICMPv6 PTB, MTU 1279). After that a
large number of these mini-fragments containing only 8 bytes of 
payload are transmitted. They should have been avoided.

Tore
Comment 1 Tore Anderson 2012-01-18 12:40:59 UTC
Created attachment 72106 [details]
tcpdump taken at the Linux server's IPv6-only interface (with proposed patch)

This is a tcpdump of the same download taken with the patch from http://thread.gmane.org/gmane.linux.network/217998/focus=218021 applied, it seems to work fine to me - no "mini-fragments" any more, and all the packets that have a Fragmentation header included are in fact *not* fragmented.

Tore
Comment 2 Florian Mickler 2012-07-01 09:43:36 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc1:

commit 67469601406c12ced3db9956aeb0ef0854e2952f
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Apr 24 07:37:38 2012 +0000

    ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing