Bug 213669 - PMTU dicovery not working for IPsec
Summary: PMTU dicovery not working for IPsec
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-07 09:08 UTC by marek.gresko
Modified: 2021-08-04 20:42 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.12.13
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description marek.gresko 2021-07-07 09:08:07 UTC
Hello,

I have two sites interconnected using ipsec (libreswan)

the situation is as follows:

X <=> (a) <=> (Internet) <=> (b) <=> Y

So you have two gateways a and b connected to the internet and their corresponding internal subnets X and Y. The gateway a is connected to the provider p using pppoe. The ipsec tunnel is created between a and b to interconnect subnets X and Y. When gateway b with internal address y itself is communication to the gateway a using its internal address x. Addresses x and y are defined by leftsourceif and rightsourceip in the libreswan configuration, you get this behavior:

b# ping -M do x -s 1392 -c 1
PING x (x.x.x.x) 1392(1420) bytes of data.

--- ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

b# ping -M do a -s 1460 -c 3
PING a (a.a.a.a) 1460(1488) bytes of data.
From p (p.p.p.p) icmp_seq=1 Frag needed and DF set (mtu = 1480)
ping: local error: message too long, mtu=1480
ping: local error: message too long, mtu=1480

--- ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2014ms

b# ping -M do x -s 1392 -c 3
PING x (x.x.x.x) 1392(1420) bytes of data.
ping: local error: message too long, mtu=1418
ping: local error: message too long, mtu=1418
ping: local error: message too long, mtu=1418

--- ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2046ms


Legend:
x.x.x.x is an inner ip address if the gateway (a) (or x from the inside).
a.a.a.a is an outer address of the gateway (a).
p.p.p.p is some address in the provider's network of the (a) side.

So definitely the ipsec tunnel is aware of the mtu only when some outer communication is in progress. The inner communication itself is not aware of icmp packets using for PMTU discovery. I had also a situation when also the outer pings did not help the ipsec to be aware of the MTU and after reboot it started to behave like discribed again.

Did I describe it understandably or should I clarify things?

Thanks

Marek
Comment 1 Vadim Fedorenko 2021-07-07 18:43:35 UTC
Hi Marek!
Could you please provide routing information for both cases using
ip route get x.x.x.x
ip route get a.a.a.a
and
ip route list

The best would be to get this information before and after PMTU is active
Thanks
Comment 2 marek.gresko 2021-07-08 05:18:10 UTC
Hi Vadim,

you mean to run the ip route get x.x.x.x and ip route get a.a.a.a to be run on gateway (a) or (b)? I suspect (b).

The ip route list should contain all interfaces or should I filter out the relevant ones? The list could be pretty large on gateway (a). The side (b) is pretty simple.

I hope I will be able to get the data "before" on side (b) because the ssh connection usually freezes until the outer ping is run.

There is a change on side (b). The new provider does not use MTU 1500. There is some ipv6 tunnel used to tunnel ipv4 traffic and the final MTU is even less then on side (a) using pppoe. The behavior is the same as it was before with MTU 1500 on side (b). I can see the ICMP unreachable in tcpdump, but it is ignored until communication outside of ipsec is present. The ipsec is nat-t udp on port 4500 to be more specific.

Thanks

Marek

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Wednesday, July 7th, 2021 at 20:43, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=213669
>
> Vadim Fedorenko (vfedorenko@novek.ru) changed:
>
> What |Removed |Added
>
> ------------------------------------------------------------------------------------------------------------------------
>
>                  CC|                            |vfedorenko@novek.ru
>
>
> --- Comment #1 from Vadim Fedorenko (vfedorenko@novek.ru) ---
>
> Hi Marek!
>
> Could you please provide routing information for both cases using
>
> ip route get x.x.x.x
>
> ip route get a.a.a.a
>
> and
>
> ip route list
>
> The best would be to get this information before and after PMTU is active
>
> Thanks
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
>
> You reported the bug.
Comment 3 Vadim Fedorenko 2021-07-08 10:55:03 UTC
(In reply to marek.gresko from comment #2)
> Hi Vadim,
> 
> you mean to run the ip route get x.x.x.x and ip route get a.a.a.a to be run
> on gateway (a) or (b)? I suspect (b).

Yes, it's all about router (b) where you observe the problem
 
> The ip route list should contain all interfaces or should I filter out the
> relevant ones? The list could be pretty large on gateway (a). The side (b)
> is pretty simple.

Again, router (b), you may filter out relevant routes only

> There is a change on side (b). The new provider does not use MTU 1500. There
> is some ipv6 tunnel used to tunnel ipv4 traffic and the final MTU is even
> less then on side (a) using pppoe. The behavior is the same as it was before
> with MTU 1500 on side (b). I can see the ICMP unreachable in tcpdump, but it
> is ignored until communication outside of ipsec is present. The ipsec is
> nat-t udp on port 4500 to be more specific.

It's ok to have changes. I would like to see network configuration with all layers of encapsulation. pcap file could also help a lot
Comment 4 marek.gresko 2021-07-08 17:58:40 UTC
Hello,

before MTU learning:

ip route get x.x.x.x
x.x.x.x via b.b.b.d dev enp2s0 src y.y.y.y uid 0 cache

ip route get a.a.a.a
a.a.a.a via b.b.b.d dev enp2s0 src b.b.b.b uid 0 cache

ip route list
default via b.b.b.d dev enp2s0 proto dhcp metric 100
b.b.b.0/24 dev enp2s0 proto kernel scope link src b.b.b.b metric 100
x.x.x.0/24 via b.b.b.d dev enp2s0 src y.y.y.y

after MTU learning:

ip route get x.x.x.x
x.x.x.x via b.b.b.d dev enp2s0 src y.y.y.y uid 0 cache

ip route get a.a.a.a
a.a.a.a via b.b.b.d dev enp2s0 src b.b.b.b uid 0 cache expires 590sec mtu 1444

ip route list
default via b.b.b.d dev enp2s0 proto dhcp metric 100
b.b.b.0/24 dev enp2s0 proto kernel scope link src b.b.b.b metric 100
x.x.x.0/24 via b.b.b.d dev enp2s0 src y.y.y.y

Marek
Comment 5 marek.gresko 2021-07-08 18:28:42 UTC
tcpdump -nnvv -i enp2s0 icmp
dropped privs to tcpdump
tcpdump: listening on enp2s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:21:34.748792 IP (tos 0xc0, ttl 64, id 59803, offset 0, flags [none], proto ICMP (1), length 576)
    b.b.b.d > b.b.b.b: ICMP a.a.a.a unreachable - need to frag (mtu 1444), length 556
        IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 1448)
    b.b.b.b.4500 > a.a.a.a.4500: [no cksum] UDP-encap: ESP(spi=0xaaaaaaaa,seq=0xaaa), length 1420
^C
1 packet captured
1 packet received by filter
0 packets dropped by kernel



Strange it is in reverse order....

Marek
Comment 6 Vadim Fedorenko 2021-07-10 02:36:25 UTC
I can confirm regression in current stable kernel. I would suggest you to downgrade to latest LTS (5.10.x) as it is not affected
Comment 7 marek.gresko 2021-07-10 09:05:06 UTC
Hello,

I can live with the bug, since workaround using constant ping is available. Is there some ETA in which version the problem will be fixed?

Thanks

Marek

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, July 10th, 2021 at 4:36, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=213669
>
> --- Comment #6 from Vadim Fedorenko (vfedorenko@novek.ru) ---
>
> I can confirm regression in current stable kernel. I would suggest you to
>
> downgrade to latest LTS (5.10.x) as it is not affected
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
>
> You reported the bug.
Comment 8 Vadim Fedorenko 2021-07-21 22:24:27 UTC
The fix is commited to -net branch, will go stable versions later
Comment 9 marek.gresko 2021-08-03 09:46:36 UTC
Could you, please, notify here, when merge occurs and specify version?

Thanks

Marek
Comment 10 Vadim Fedorenko 2021-08-03 13:35:51 UTC
The fix was merged to v5.13.6 and will not be merged to 5.12.x because of EOL
Comment 11 marek.gresko 2021-08-04 20:42:56 UTC
Hello,

I confirm kernel 5.13.6-200.fc34 fixes the problem.

Firstly I did not notice, because I pinged with -c 1 and the first response did not come.

Thanks

Marek

Note You need to log in before you can comment on or make changes to this bug.