Bug 99081 - Bridge multicast snooping breaks ICMPv6
Summary: Bridge multicast snooping breaks ICMPv6
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-28 10:34 UTC by Steinar H. Gunderson
Modified: 2024-09-10 23:37 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.0.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
cap on both pc and router. (28.82 KB, application/vnd.rar)
2023-10-26 05:33 UTC, jade
Details

Description Steinar H. Gunderson 2015-05-28 10:34:55 UTC
Hi,

I've seen this reported many times around the net, with no definite solution; since I got hit by it again and now used the latest kernel, I thought the best place would be the kernel Bugzilla.

I have a bridge br0 with eth0 on it, and then some tap devices from KVM guests. In this situation, by default, I don't have IPv6. I can't ping anything because ND fails. I've seen this over multiple kernel versions for years, so I'm fairly certain this is not something about my specific setup.

The common workaround is:

 echo 0 > /sys/devices/virtual/net/br0/bridge/multicast_snooping 

which immediately makes IPv6 work well again.

IPv6 unicast is unaffected; only multicast (ie., ND) seems to have the problem.
Comment 1 Eric Dumazet 2015-05-28 11:16:43 UTC
I noticed the following bug, not sure if it is relevant...

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index a3abe6ed111e..22fd0419b314 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1822,7 +1822,7 @@ static void br_multicast_query_expired(struct net_bridge *br,
        if (query->startup_sent < br->multicast_startup_query_count)
                query->startup_sent++;
 
-       RCU_INIT_POINTER(querier, NULL);
+       RCU_INIT_POINTER(querier->port, NULL);
        br_multicast_send_query(br, NULL, query);
        spin_unlock(&br->multicast_lock);
 }
Comment 2 Linus Lüssing 2015-05-28 17:32:11 UTC
Hm, no, that one can't be relevant as the querier port information is only used by the export which is supposed to be used by batman-adv later (patches for that are still pending review and inclusion on the batman-adv mailing list). The selected querier port rcu-pointer isn't used by the forwarding decisions of the bridge. (Nevertheless thanks for finding it! Would have taken me quite some time to spot from a more complex setup with batman-adv+bridge)

The recent patch from Thadeu Lima de Souza Cascardo seems promising though ("bridge: fix parsing of MLDv2 reports"). Steinar, could it be that you are one of the (few?) people using MLDv2 queriers? Also, behind which port is your querier, eth0, br0 or one of the tap interfaces? Does multicast traffic on the port which has the querier work fine?
Comment 3 Steinar H. Gunderson 2015-05-28 20:01:05 UTC
Uhm, I'm fairly IPv6 versed, but I don't think I know what an MLDv2 querier is. Care to explain?

The regular network is behind eth0, and I can't reach stuff on that from my machine, IIRC even before a VM is started (so eth0 is essentially alone on br0).
Comment 4 Linus Lüssing 2015-05-28 20:49:27 UTC
Sure. The bridge learns about which host is interested in which multicast traffic via IGMP (IPv4) or MLD (IPv6). Multicast listeners issue so called IGMP/MLD reports on the link to inform switches and multicast routers about what they want to receive. However they don't do this unconditionally: They only do this if they are asked to report. For this some host needs to regularly issue so called IGMP/MLD queries (usually such a querying host is a multicast router or sometimes a bridge/snooping switch itself).

The Linux bridge also only kicks into gear once it sees that a querier is present. Otherwise there wouldn't be any reports and the bridge wouldn't be able to learn about any multicast listeners.

Could you maybe figure out with tcpdump behind which port the selected querier resides? E.g. do a "tcpdump -i eth0 icmp6", check the IPv6 source address and figure out from which bridge port this source is coming from. And please post the output line from tcpdump about the query message here so we can figure out whether it's an MLDv1 or MLDv2 querier. (for the latter and the MLDv2 reports that creates there is a known issue which just got fixed and queued for stable)
Comment 5 Steinar H. Gunderson 2015-05-28 21:07:34 UTC
We have a 3560-X as router, including IPv6 multicast routing. It's speaking MLDv2, and we have MLD snooping in place in most of the network. Unfortunately, there's way too much IPv6 going on on this subnet for a simple icmp6 tcpdump to be very selective, and I can't see any MLD packets from a simple eyeballing. It's news to me that ND would be treated differently depending on whether we've seen a multicast router or not, but I'll believe you there. :-)

As for which bridge port it would be behind, it has to be eth0 if so; none of the KVM hosts (the only other things on the bridge) speak much IPv6 at all, let alone route it.
Comment 6 Salah Coronya 2015-07-07 05:55:50 UTC
I haved the same problem (with the same setup, bridged kvm eth0) with the same symptom - turning off multicast_snooping fixes the problem. Kernel is 4.0.5. My setup is a lot simpler though: Just 1 computer and a Linksys router.

However, I manually applied "bridge: fix parsing of MLDv2 reports" (	47cc84ce0c2fe75c99ea5963c4b5704dd78ead54) and that seems to have fixed it.
Comment 7 Steinar H. Gunderson 2015-08-03 11:29:05 UTC
I'm still seeing this in 4.2-rc4. Isn't that supposed to have the patch?
Comment 8 Linus Lüssing 2015-08-23 13:12:03 UTC
Yes, that patch is there since 4.2-rc1 and 4.1-rc5.

4.1-rc1 introduced a regression which is supposed to be fixed in master or with the upcoming 4.2-rc8: "net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code" (a516993f0ac1694673412eb2d16a091eafa77d2a). Not quite sure whether missing this patch would create erratic behaviour other than crashing.

Stenair, if the issue is still present for you for current master or any (non-rc) 4.1 kernel, I'd be very interested in having a look at some tcpdumps of ICMPv6 traffic and the according "/sbin/bridge mdb show" output.
Comment 9 Steinar H. Gunderson 2015-08-23 15:16:00 UTC
I'm down on 4.1.4 now (4.2-rc* had other blocking bugs); I've enabled multicast snooping again, and will let you know if I see problems.
Comment 10 Steinar H. Gunderson 2016-01-20 10:50:49 UTC
I still see this in 4.3.0 and 4.4.0. However, it's intermittent; I just experienced it and could get a bridge dump, but once I started the tcpdump, it started working (possibly due to promisc mode?). Anyway, the bridge dump is:

pannekake:~# /sbin/bridge mdb show
dev br0 port eth2 grp ff02::2 temp
dev br0 port eth2 grp ff02::1:ff00:60 temp
dev br0 port eth2 grp ff02::1:ff00:53 temp
dev br0 port eth2 grp ff02::1:ff00:61 temp
dev br0 port eth2 grp ff02::1:ff00:56 temp
dev br0 port eth2 grp ff02::202 temp
dev br0 port eth2 grp ff02::1:ff00:68 temp
dev br0 port eth2 grp ff02::1:ff00:63 temp
dev br0 port eth2 grp ff02::1:ff00:69 temp
dev br0 port eth2 grp ff02::1:ffac:97d6 temp

This was from before starting the tcpdump.
Comment 11 Steinar H. Gunderson 2016-01-21 01:02:43 UTC
Yes indeed, starting tcpdump fixes the problem. I had a hanging SSH to an IPv6 host on the same subnet, and the second I started tcpdump, the connection went through:

01:56:12.880344 IP6 2001:67c:29f4::29 > ff02::1:ff00:50: ICMP6, neighbor solicitation, who has 2001:67c:29f4::50, length 32
01:56:12.880379 IP6 2001:67c:29f4::50 > 2001:67c:29f4::29: ICMP6, neighbor advertisement, tgt is 2001:67c:29f4::50, length 32

::50 here is my machine, ::29 was the one I was SSH-ing to.

I did some testing on ::29, doing “ip -6 neigh flush dev eth0” and trying to ping ::50. No response. “ifconfig br0 promisc” on ::50, and it works. “ifconfig br0 -promisc” and flush the table again, it doesn't. Turn off multicast snooping, and it immediately works. So now I can reproduce this 100% by simple means.

I note that ff02::1:ff00:50 is not in the list of “bridge mdb show”. Might it be that it doesn't understand it should listen on that group, and thus does not get the neighbor soliciation?
Comment 12 Steinar H. Gunderson 2016-06-03 21:01:06 UTC
Still there in 4.7-rc1:

pannekake:~# /sbin/bridge mdb show
dev br0 port eth2 grp ff02::1:ff00:63 temp
dev br0 port eth2 grp ff02::1:ff00:0 temp
dev br0 port eth2 grp ff02::1:ff2f:1158 temp
dev br0 port eth2 grp ff02::fb temp
dev br0 port eth2 grp ff02::1:ff00:56 temp
dev br0 port eth2 grp ff02::1:ff00:1007 temp
dev br0 port eth2 grp ff02::2 temp
dev br0 port eth2 grp ff02::202 temp
dev br0 port eth2 grp ff02::1:ff4f:316 temp
dev br0 port eth2 grp ff02::1:ff00:68 temp
Comment 13 Linus Lüssing 2016-06-16 21:15:03 UTC
Thanks for still checking for this issue! It's a pitty the issue is still there for you even with a very recent kernel...

-----

Regarding the bridge mdb output, hm, I'm getting a similar output, entries for the bridge device itself seem to be missing in "bridge mdb show" for me too...

I added some printk's in br_multicast_ipv6_rcv() and I am receiving reports from br0 (that is "port == NULL" in there) and no errors are thrown by any functions called with br_multicast_ipv6_rcv(). Seems it is being parsed and added fine in there at least in my VMs.

I am ping6'ing the solicited-node multicast address of br0 from a second VM. With no querier, those multicast echo requests are flooded on all other ports and br0 itself. Then I enabled the built-in multicast_querier of br0 and echo replies continued to arrive from br0, while the echo requests stoped being flooded on other bridge ports. So just as expected.

It seems, that maybe the bridge tool or netlink interface is somehow broken or at least misleading? I'm having the same issue with a 4.0.2 kernel.

-----

Regarding your packet loss, it'd still be very helpful to get a tcpdump of all ICMPv6 traffic for 10-15 minutes from affected interfaces, while using the "--no-promiscuous-mode" option. Or at least a capture of all MLD messages and according, broken neighbor solicitation/advertisement exchanges. We need to check first whether MLD works fine.

Very specific captures should be doable relatively easily with wireshark or tshark and their powerful filter rules (for instance "icmpv6.type == 130" or 131/132/143 for MLD). A full ICMPv6 capture with tcpdump multiple interfaces, even if large, would be fine for me too. I can find my way around with wireshark.
Comment 14 jade 2023-10-26 05:28:17 UTC
I have same issue. I have a OpenWRT router wich Kernel is 5.4.225, there are a bridage br0 with wan(eth1) and lan(lan2/lan3/lan4). 

The PC connect to router's lan3, and the router connect to PON Modem by wan(eth1).

when set multicast_snooping to 1, the PC send RS and DHCP6 packet, but can't get according resonses. However, when set multicast_snooping 0, PC can get both RA and DHCP6 resonse, and get IP6 address.

Refer to the attachement, I get the packet by wireshark on PC, and the packet captured on router's lan and wan interface.
Comment 15 jade 2023-10-26 05:33:21 UTC
Created attachment 305296 [details]
cap on both pc and router.
Comment 16 Jakob 2024-09-10 08:30:09 UTC
I have the same issue with the default bridge on Proxmox.
The only device that seems to be regularly affected is my Android mobile phone (maybe because it relies solely on neighbour solicitation once connected) which can not be reached on IPv6 by my OpenWrt VM after a few hours.
Disabling multicast snooping on the Bridge in Proxmox fixes the issue.

I'm running kernel 6.2.16-12-pve.

It took me way too long to figure out why my phone looses IPv6 connectivity after a while.
It is especially annoying as the phone still thinks it has a valid IPv6 connection and most apps do not implement happy eyeballs so you often have to wait for the IPv6 connection to time out before IPv4 is tried.

If I can be of any help with debugging please tell me as I can pretty reliably reproduce this after a few hours.
Comment 17 Jakob 2024-09-10 08:50:39 UTC
Oh sorry I might've been a bit too quick.
This one bridge for whatever reason hat multicast snooping enabled but the multicast querier disabled (while the others had the querier enabled).

I'm still not sure if it is expected that neighbour solicitation breaks without a multicast querier but I now enabled both and will report back if just enabling the querier fixes this as well.
Comment 18 Linus Lüssing 2024-09-10 09:31:29 UTC
(In reply to Jakob from comment #17)
> Oh sorry I might've been a bit too quick.
> This one bridge for whatever reason hat multicast snooping enabled but the
> multicast querier disabled (while the others had the querier enabled).
> 
> I'm still not sure if it is expected that neighbour solicitation breaks
> without a multicast querier but I now enabled both and will report back if
> just enabling the querier fixes this as well.

If there is one MLD querier on a link then it will enable IPv6 multicast snooping for any other bridge/switch capable of MLD snooping, too. No need to enable the MLD querier on all bridges/switches. You can use  "bridge -s -d mdb show" to check if it seems active.

As you mention issues with Android only: I believe this sounds like this Android bug: https://issuetracker.google.com/issues/149630944

Took them 3 years but finally they seem to have fixed it in Android last year. And will likely need many more years to trickle down (or will never reach your phone, because the vendor has abandoned it, like for my HTC U11 here...).

In Gluon (based on top of OpenWrt) we have added this workaround / hack to our WLAN routers: https://github.com/freifunk-gluon/gluon/blob/v2023.2.3/patches/openwrt/0005-kernel-bridge-Implement-MLD-Querier-wake-up-calls-Android-bug-workaround.patch

However this patch got rejected by upstream Linux: https://lists.infradead.org/pipermail/openwrt-devel/2020-August/030919.html

The suggestion was to reimplement this workaround as a BPF hook or netfilter hook instead of directly in the Linux bridge core.
Comment 19 Jakob 2024-09-10 09:50:44 UTC
Hi and thanks for the quick response!

It does not look like this specific issue. I did not see any MLD queries using tcpdump before enabling it on the bridge.
I'm also running LineageOS 21 (Android 14 based) on the latest Android Patch level on a Pixel 5 so the chances that I already have the fix are pretty high (although I'm not entirely certain about that as the Android build system is *slightly* complicated).

IPv6 also did not start working again if the device is awake - even if I keep it awake for half an hour. The only thing that fixed it temporarily was to reconnect to the WiFi.

Someone suggested to me that it might be nice to detect the absence of an MLD querier on the network and at least warn in the kernel log, if multicast snooping is enabled but no querrier is present (or even disable snooping automatically).
Comment 20 Linus Lüssing 2024-09-10 12:45:04 UTC
(In reply to Jakob from comment #19)
> It does not look like this specific issue. I did not see any MLD queries
> using > tcpdump before enabling it on the bridge.

ah, okay

> Someone suggested to me that it might be nice to detect the absence of an
> MLD querier on the network and at least warn in the kernel log, if multicast
> snooping is enabled but no querrier is present (or even disable snooping
> automatically).

The last part, the automatic detection and disabling on querier absence should already happen for a decade now, since this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b00589af3b04736376f24625ab0b394642e89e29

Are you sure it's a Linux bridge then which drops the multicast packets? Also make sure you use 'tcpdump ... "ip6 proto 0"' as libpcap won't jump over the IPv6 hop-by-hop header for the proto option. But you do see MLD reports in tcpdump, at least when some interface gets up and/or a new host joins the network etc.?

> Someone suggested to me that it might be nice to detect the absence of an MLD
> > querier on the network and at least warn in the kernel log

Hm, indeed, might be helpful? At least I added such a warning in the kernel log for batman-adv (which uses the Linux bridge for multicast snooping now). But there is no warning/information in the kernel logs from the Linux bridge code itself for that. (But don't know if the bridge people would like such a warning/info from the bridge code.)
Comment 21 Jakob 2024-09-10 23:37:02 UTC
> The last part, the automatic detection and disabling on querier absence
> should already happen for a decade now, since this commit

That's strange. Can I somehow inspect the state of this mechanism on a running system (without compiling my own kernel with some "printf-debugging")?
Then I could get it into this state again and try to figure out what it currently thinks is happening.
I would prefer not to regularly reboot my NAS for custom kernels though (I'm not the only one using the internet connection that is routed by a VM on there so I'd have to do it late at night ^^').

> Are you sure it's a Linux bridge then which drops the multicast packets?

I'm fairly sure about that.
In the "nonfunctional" state I tried pinging the affected multicast address (one used for neighbour solicitation of my Android phone) from the host bridge port (vmbr0) while listening on the physical interface that is enslaved to this bridge using tcpdump (enp2s0).
This results in absolutely nothing.
Pinging the multicast address used for neighbour solicitation of my PC worked fine and I can see the relevant echo requests in the same tcpdump.

So whatever the bridge does to the affected multicast packets, they do not make it out of the correct bridge port.

> Also make sure you use 'tcpdump ... "ip6 proto 0"' as libpcap won't jump over
> the IPv6 hop-by-hop header for the proto option. But you do see MLD reports
> in tcpdump, at least when some interface gets up and/or a new host joins the
> network etc.?

Yes but I used a different filter - `tcpdump ... "not tcp and not udp"`. while it might is a bit stupid, it helped to filter out most traffic (thanks for telling me the proper filter ^^').
I just verified again that this bridge now is the only multicast querier on the network and that the default config is querier off but snooping on (which is the configuration that causes my issues - with the querier enabled it seems to be working, it's a bit too early for me to tell you for sure though).

> But don't know if the bridge people would like such a warning/info from the
> bridge code.

IMO it would be helpful - just make sure to not flood the kernel log (for example by logging it once on state transition).

Note You need to log in before you can comment on or make changes to this bug.