Bug 213581 - Change in ip_dst_mtu_maybe_forward() breaks WebRTC connections
Summary: Change in ip_dst_mtu_maybe_forward() breaks WebRTC connections
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-25 11:54 UTC by Juan Manuel Santos
Modified: 2021-06-27 20:29 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.13.0-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Juan Manuel Santos 2021-06-25 11:54:19 UTC
Recent Linux kernel versions (>=5.9 if my calculations are correct), when used as a gateway on a LAN (or a similar setup) will break WebRTC protocols such as Google Meet, Discord, etc. (have not done extensive testing but would gather that most similar protocols are affected). In the case of Meet, no video for any participant is ever shown (other than my own), and nobody can see my video, although audio does work. In the case of Discord, no audio/video for other participants is ever shown. Note that every meeting is initiated or joined from inside the LAN, not on the gateway itself.

Using plain iptables, firewalld+iptables or firewalld+nftables makes no difference (it was the first thing I tried). I discovered this a few months ago when updating the kernel, and found that reverting to the previous kernel made this work again. I didn't look further into it until now, when I can no longer stay on that old of a kernel :).

Using git-bisect I was able to identify the offending commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=02a1b175b0e92d9e0fa5df3957ade8d733ceb6a0

This patch was backported to linux-stable shortly after 5.4.72 released. It appears to still be there in vanilla upstream. I can confirm that reverting this patch in 5.4.109 fixes the issue and webrtc works again.

I have also reverted this patch in 5.13.0-rc7 and WebRTC works with the patch reverted. Without reverting the patch, it's broken.

No other protocol/connection seems to be affected.

Reproducible: Always

Steps to Reproduce:
1. Install any kernel >5.4.9 on a gateway device.
2. Try to use a conferencing application that uses WebRTC (Meet, Discord, etc). Either start or join a meeting from a device that sits in the LAN.
Actual Results:  
Audio and/or video does not work when a meeting is initiated/joined from within the LAN

Expected Results:  
Both audio and video should work when inside the meeting.

My C is quite limited, but it appears that this function, from wherever it gets called, returns a different value after the mentioned commit. It used to return:

return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);

Now it returns:

mtu = dst_metric_raw(dst, RTAX_MTU);
if (mtu)
    return mtu;
Comment 1 Vadim Fedorenko 2021-06-25 16:31:05 UTC
Hi!
Could you please provide network configuration on gateway device?
It really depends on usage of any kind of tunnel interfaces to connect networks
Thanks
Comment 2 Juan Manuel Santos 2021-06-25 20:46:39 UTC
Hi Vadim,

Not exactly sure what configuration you're looking for but here goes:

oot@manya ~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:a1:b0:8b:35:3b brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.1/24 brd 172.16.0.255 scope global enp3s0
       valid_lft forever preferred_lft forever
    inet6 fe80::2a1:b0ff:fe8b:353b/64 scope link 
       valid_lft forever preferred_lft forever
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:22:4d:7a:51:20 brd ff:ff:ff:ff:ff:ff
    inet W.X.Y.Z/24 brd 255.255.255.255 scope global dynamic noprefixroute eno1
       valid_lft 18535sec preferred_lft 15835sec
    inet6 fe80::ffb7:39b3:70b4:b67b/64 scope link 
       valid_lft forever preferred_lft forever
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:94:3d:f3:8b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:94ff:fe3d:f38b/64 scope link 
       valid_lft forever preferred_lft forever
6: veth2a862ce@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether c6:67:6a:9b:ea:13 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::c467:6aff:fe9b:ea13/64 scope link 
       valid_lft forever preferred_lft forever
7: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 500
    link/none 
    inet 172.20.0.38 peer 172.20.0.37/32 scope global tun0
       valid_lft forever preferred_lft forever
    inet6 fe80::5658:73ac:2b48:c43b/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever



root@manya ~ # ip route
default via W.X.Y.1 dev eno1 proto dhcp src W.X.Y.Z metric 3 mtu 576 
172.16.0.0/24 dev enp3s0 proto kernel scope link src 172.16.0.1 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
172.20.0.0/24 via 172.20.0.37 dev tun0 
172.20.0.37 dev tun0 proto kernel scope link src 172.20.0.38 
W.X.Y.0/24 dev eno1 proto dhcp scope link src W.X.Y.Z metric 3 mtu 576 
root@manya ~ # 

Where W.X.Y.Z is my public IP address.

- eno1: card connected to the cable modem, receiving DHCP and a public IP address from my ISP.
- enp3s0: card connected to the LAN, providing DHCP for my home systems.
- docker0 / veth*: docker stuff
- tun0: an openvpn connection with my server

Other than disabling tso/gro/gso due to some issues crashing and rebooting the NIC, nothing more (i.e. no MTU custom changes or anything).

Do let me know if this is enough or if you need anything else.
Comment 3 Vadim Fedorenko 2021-06-27 18:20:22 UTC
Hi!

Thanks for configuration. The problem itself is definitely because of "mtu 576" part of the routes via eno1 interface. Looks like you received this attribute via DHCP configuration, but I'm not sure if it's correct. Path MTU Discovery should help in this case, at least for TCP. And actually kernel is doing things that are asked by configuration - it will not route packets more than 576 bytes long and will send ICMP Fragmentation needed back to the source of packet.
Comment 4 Juan Manuel Santos 2021-06-27 19:05:05 UTC
(In reply to Vadim Fedorenko from comment #3)
> Hi!
> 
> Thanks for configuration. The problem itself is definitely because of "mtu
> 576" part of the routes via eno1 interface. Looks like you received this
> attribute via DHCP configuration, but I'm not sure if it's correct. Path MTU
> Discovery should help in this case, at least for TCP. And actually kernel is
> doing things that are asked by configuration - it will not route packets
> more than 576 bytes long and will send ICMP Fragmentation needed back to the
> source of packet.

Interesting! So it's my ISP's fault? That would make sense since I've not seen this reported elsewhere.

Forcibly ignoring the MTU setting in my DHCP client should get rid of this then? If that is correct then i can certainly try that out.
Comment 5 Vadim Fedorenko 2021-06-27 19:06:53 UTC
(In reply to Juan Manuel Santos from comment #4)
> (In reply to Vadim Fedorenko from comment #3)
> > Hi!
> > 
> > Thanks for configuration. The problem itself is definitely because of "mtu
> > 576" part of the routes via eno1 interface. Looks like you received this
> > attribute via DHCP configuration, but I'm not sure if it's correct. Path
> MTU
> > Discovery should help in this case, at least for TCP. And actually kernel
> is
> > doing things that are asked by configuration - it will not route packets
> > more than 576 bytes long and will send ICMP Fragmentation needed back to
> the
> > source of packet.
> 
> Interesting! So it's my ISP's fault? That would make sense since I've not
> seen this reported elsewhere.
> 
> Forcibly ignoring the MTU setting in my DHCP client should get rid of this
> then? If that is correct then i can certainly try that out.

I suppose it should help if it works without the patch you mentioned
Comment 6 Juan Manuel Santos 2021-06-27 20:29:06 UTC
I don't want to claim victory just yet, but that looks to be it. /etc/dhcpcd.conf has the option:

option interface_mtu

I commented it out, restarted eno1 and had to restart both browsers, now the problem seems gone!

Note You need to log in before you can comment on or make changes to this bug.