Bug 1148

Summary: when using ipsec with DSL connections communication will hang if mtu is 1500(the default value)
Product: Networking Reporter: Uwe Schmeling (uwe.schmeling)
Component: IPV4Assignee: Herbert Xu (herbert)
Status: REJECTED INVALID    
Severity: normal CC: YnVnemlsbGEua2VybmVsLm9y
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.0-test4 Subsystem:
Regression: --- Bisected commit-id:

Description Uwe Schmeling 2003-08-26 00:07:56 UTC
Distribution:
Hardware Environment:
Software Environment:
Problem Description:

Steps to reproduce: 
Startup a tunneling connection, if one of the gw interfaces has mtu=1500,
communication will hang after transfering some data through tunnel. As far as I
know the problem will occure with DSL connections only.
Comment 1 Andrew Morton 2003-08-26 02:35:49 UTC
davem says:


 We don't do PMTU properly in tunneled mode, known problem
 but fix won't happen any time soon as it's non-trivial
 to cure this properly.
Comment 2 Valentijn Sessink 2004-02-03 05:42:32 UTC
Not doing PMTU is one thing, but leaving connections hanging around is another.
When using a masq'ed connection, the  ICMP "frag needed" is sent from the end
point. Setup:

10.15.67.21 ---IPsec--- 10.15.67.1 MASQ 24.132.71.96 -----> some web server

Now look what happens:
13:57:30.163411 24.132.71.96.1098 > 145.58.30.41.80: SWE
1300925025:1300925025(0) win 5840 <mss 1460,sackOK,timestamp 18267217
0,nop,wscale 0> (DF)
13:57:30.201869 145.58.30.41.80 > 24.132.71.96.1098: S 401835726:401835726(0)
ack 1300925026 win 5792 <mss 1460,sackOK,timestamp 457753496 18267217,nop,wscale
0> (DF)
13:57:30.214933 24.132.71.96.1098 > 145.58.30.41.80: . ack 1 win 5840
<nop,nop,timestamp 18267222 457753496> (DF)
13:57:30.226815 24.132.71.96.1098 > 145.58.30.41.80: P 1:255(254) ack 1 win 5840
<nop,nop,timestamp 18267222 457753496> (DF)
13:57:30.231337 145.58.30.41.80 > 24.132.71.96.1098: . ack 255 win 6432
<nop,nop,timestamp 457753501 18267222> (DF)
13:57:30.235774 145.58.30.41.80 > 24.132.71.96.1098: . 1:1449(1448) ack 255 win
6432 <nop,nop,timestamp 457753501 18267222> (DF)
13:57:30.237906 24.132.71.96 > 145.58.30.41: icmp: 24.132.71.96 unreachable -
need to frag (mtu 1444) [tos 0xc0]
13:57:30.237420 145.58.30.41.80 > 24.132.71.96.1098: . 1449:2897(1448) ack 255
win 6432 <nop,nop,timestamp 457753501 18267222> (DF)
13:57:30.240329 24.132.71.96 > 145.58.30.41: icmp: 24.132.71.96 unreachable -
need to frag (mtu 1444) [tos 0xc0]
13:57:30.482018 145.58.30.41.80 > 24.132.71.96.1098: . 1:1449(1448) ack 255 win
6432 <nop,nop,timestamp 457753526 18267222> (DF)
13:57:30.483851 24.132.71.96 > 145.58.30.41: icmp: 24.132.71.96 unreachable -
need to frag (mtu 1444) [tos 0xc0]

Ad nauseam. A hang, as Uwe Schmeling reported.

The inside MTU (10.15.67.21 to 67.1) is 1460, which is plain wrong. Now if PMTU
detection doesn't work, shouldn't the kernel use fragmentation as a workaround?
In the above situation, no connection is possible.

Comment 3 Herbert Xu 2005-06-05 14:37:01 UTC
Your packet dump shows that it's actually the webserver that's not responding to
the ICMP need-to-frag messages.  That is, you've got a ICMP blackhole between
your DSL provider and the webserver.

You can work around it using MSS clamping.  However, this is certainly not an
IPsec problem.