Bug 209487
Summary: | DSA: Regression with Marvell 88E6176: Excess traffic on slave ports | ||
---|---|---|---|
Product: | Drivers | Reporter: | Klaus Kudielka (klaus.kudielka) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | klaus.kudielka |
Priority: | P1 | ||
Hardware: | ARM | ||
OS: | Linux | ||
Kernel Version: | 5.1 and above | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
boot logs
node's port info port info of affected device |
After bisecting this, I realize that the default DSA behaviour has changed with kernel 5.1: ***** From c138806344855cd7c094abbe7507832107e8171e Mon Sep 17 00:00:00 2001 From: Russell King <rmk+kernel@armlinux.org.uk> Date: Wed, 20 Feb 2019 15:35:06 -0800 Subject: net: dsa: enable flooding for bridge ports [...] This commit enables by default flooding of both unknown unicast and unknown multicast frames whenever a port is added to a bridge, and disables the flooding when a port leaves the bridge. This means that mv88e6xxx DSA switches now behave as per the bridge(8) man page, and IPv6 works flawlessly through such a switch. ***** If I disable unicast flooding on the slave ports, I can confirm that packets addressed to the bridge master's MAC are no more flooding the slave ports. But this may break IPv6 connectivity, as the commit above explains. So, what IMO is still not working correctly: The bridge master's MAC address should be *known* to the DSA switch, and therefore unicasts to the master should NOT be flooding the slave ports, even when unicast flooding is enabled!? I got kernel 5.9.1 deployed on a TO and cannot reproduce the issue, that is three-fold: --- 1) ping -i 0.1 does not work in the first place (with iputils-ping 20200821-1 from OpenWrt), producing > ping: invalid number '0.1' --- 2) "turris-omnia" it is not clear which (master) interface you are pinging, assuming this being ipv4 to br-lan this > *all* active LAN port LEDs will start flashing on the Turris Omnia does not reproduce but I compiled the kernel with: CONFIG_LEDS_TURRIS_OMNIA=y (only available for kernel 5.x currently) and LEDs stipulated accordingly in the device tree. That aside, is there any other indicator than the observed LED flashing that >the switch is re-emitting Ethernet packets (a) received on any slave port and >(b) addressed to its master port, to *all* other active slave ports ? --- 3) the multiple (16 times) switch detection during boot does not reproduce but then I do not compile the kernel with modules and also # CONFIG_MODULES is not set which could make a difference and which is also negates the kmod loading mechanism of OpenWrt. Ok, I think I have to improve the procedure for reproducing the bug. Let's forget about Turris Omnia LEDs for the moment, there are better indicators. ---- 1. You need at least three hosts, one of them being the Turris Omnia. My example: gateway: Turris Omnia (TO), upstream kernel 5.9.1, debian testing br0 192.168.1.1 bridge master eth1 - DSA master lan0-lan4 - DSA slave host1: with tcpdump eth0 192.168.1.4 connected to gateway/lan3 host2: with ping eno1 192.168.1.151 connected to gateway/lan4 ---- 2. If not already done by your OS, configure yor DSA bridge on the TO. Note the MAC address of br0! Just to be sure, enable unicast flooding (which is the kernel default). root@gateway:~# bridge link show 5: lan0@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 19 6: lan1@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100 7: lan2@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100 8: lan3@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 9: lan4@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 root@gateway:~# ip address show br0 10: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 1e:1a:6c:ef:b9:7d brd ff:ff:ff:ff:ff:ff inet 192.168.1.1/24 brd 192.168.1.255 scope global br0 valid_lft forever preferred_lft forever root@gateway:~# bridge link set dev lan0 flood on root@gateway:~# bridge link set dev lan1 flood on root@gateway:~# bridge link set dev lan2 flood on root@gateway:~# bridge link set dev lan3 flood on root@gateway:~# bridge link set dev lan4 flood on ---- 3. Run the following commands in parallel (note the different hosts): root@host2:~# ping gateway PING gateway (192.168.1.1) 56(84) bytes of data. 64 bytes from gateway (192.168.1.1): icmp_seq=1 ttl=64 time=0.404 ms 64 bytes from gateway (192.168.1.1): icmp_seq=2 ttl=64 time=0.380 ms 64 bytes from gateway (192.168.1.1): icmp_seq=3 ttl=64 time=0.378 ms root@host1:~# tcpdump -n -i eth0 -e icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 17:49:40.263993 bc:ee:7b:e1:f0:91 > 1e:1a:6c:ef:b9:7d, ethertype IPv4 (0x0800), length 98: 192.168.1.151 > 192.168.1.1: ICMP echo request, id 2, seq 27, length 64 17:49:41.287978 bc:ee:7b:e1:f0:91 > 1e:1a:6c:ef:b9:7d, ethertype IPv4 (0x0800), length 98: 192.168.1.151 > 192.168.1.1: ICMP echo request, id 2, seq 28, length 64 17:49:42.311964 bc:ee:7b:e1:f0:91 > 1e:1a:6c:ef:b9:7d, ethertype IPv4 (0x0800), length 98: 192.168.1.151 > 192.168.1.1: ICMP echo request, id 2, seq 29, length 64 *** This should not happen. host1@lan3 is getting traffic from host2@lan4 *** to gateway. It seems the switch treats the MAC address of br0 as *** "unknown", and therefore floods the other slaves. *** As far as I can see, *all* packets from a host on the bridge to the *** gateway (or furtheron to the Internet) are being flooded to *all* other *** hosts on the bridge. Not quite what I would expect from a gateway. 4. Nasty workaround: Disable unicast flooding root@gateway:~# bridge link set dev lan0 flood off root@gateway:~# bridge link set dev lan1 flood off root@gateway:~# bridge link set dev lan2 flood off root@gateway:~# bridge link set dev lan3 flood off root@gateway:~# bridge link set dev lan4 flood off 5. Repeat step 3 root@host2:~# ping gateway PING gateway (192.168.1.1) 56(84) bytes of data. 64 bytes from gateway (192.168.1.1): icmp_seq=1 ttl=64 time=0.408 ms 64 bytes from gateway (192.168.1.1): icmp_seq=2 ttl=64 time=0.421 ms 64 bytes from gateway (192.168.1.1): icmp_seq=3 ttl=64 time=0.389 ms root@host1:~# tcpdump -n -i eth0 -e icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes [ No further output ] *** With unicast flooding disabled, we have the expected behaviour. *** But, as Russell King explains in commit c1388063, this may break *** IPv6 connectivity in subtle ways, so is *not* recommended. Created attachment 293161 [details]
node's port info
If I understand correctly the kernel is compiled from linux source your end and not sourced from the debian distro. If so that is the same my end, difference the userland being sourced from OpenWrt instead of debian.
Attached are details about the network/ports on this node, notable "flood on" (Controls whether a given port will flood unicast traffic for which there is no FDB entry. By default this flag is on) on all DSA downstream ports (lan 0 - 4).
br-lan enslaves lan 0|1|2
* downstream client on lan1 pinging br-lan ipv4 (192.168.84.23)
* remote ssh packet dump (wireshark) for lan0 with downstream client on lan0
The packet dump did not (re)produce your observation of frames being (re)flooded to lan0.
Even tried with explicitly with:
bridge link set dev lan0 flood on
bridge link set dev lan1 flood on
but same result.
Created attachment 293223 [details]
port info of affected device
> If I understand correctly the kernel is compiled from linux source your end > and not sourced from the debian distro. If so that is the same my end, > difference the userland being sourced from OpenWrt instead of debian. Correct. Kernel source from kernel.org, kernel config from debian (armmp). > Attached are details about the network/ports on this node, notable "flood on" For comparison, I have attached similar details about the ports involved on my (affected) device. At that time, only lan0 & lan4 ports were active. At a first glance, I cannot find a reason for the different behaviour. Theoretically I suppose there could be four (separate or combined) causes that produce different results: (1) different kernel conf (2) different sysctl conf (which I suspect might be the case) (3) different netfilter userland (on my node it is nftables) (4) other userland differences I you are interested I could share (1) and (2) from my end with you via email. > I you are interested I could share (1) and (2) from my end with you via > email. I don't think that makes sense at the moment. Just to be absolutely sure, I'd like to calibrate our test methods first. I found a relatively simple method to have same kernel, user land, and configuration on your & my device: Use plain OpenWrt master, which is affected as well. Step 1: ======= I just installed the following image, and booted it. https://downloads.openwrt.org/snapshots/targets/mvebu/cortexa9/openwrt-mvebu-cortexa9-cznic_turris-omnia-sysupgrade.img.gz A less intrusive but otherwise identical way would be, to directly boot cznic_turris-omnia-initramfs-kernel.bin instead. You would need to extract the DTB from the sysupgrade image or the medkit.) No additional install, no configuration. Just boot the image. Step 2: ======= Plug in two computers into the LAN ports, generate traffic on one port (e.g. ssh 192.168.1.1), and watch it on the other port. If you still don't see it, please double check whether your WireShark is in promiscuous mode. A very simple but effective capture filter rule could be: "ether host not <my_mac_address>". Would you be able to reproduce that? Okay, it's not the latest kernel, but let's make one step at a time... Pardon me, but I would not accommodate your suggestion about testing this with the OpenWrt kernel since the distro is patching the kernel heavily whilst this being the bugzilla for the Linux kernel. Checked and made sure that the lan port traffic is being dumped from is indeed in promiscuous mode, e.g. STP traffic is visible but nothing from the node(s) connected to the other lan port(s). It would seem that I cannot contribute anything helpful/useful here, it might even be that somehow I am not replicating your setup exactly and thus fail to reproduce your observation. For reference, it seems the issue is being worked on: https://lore.kernel.org/netdev/20210116012515.3152-1-tobias@waldekranz.com/ The issue seems to be finally solved by acceptance of the "RX filtering in DSA" patch series (thanks to Tobias Waldekranz & Vladimir Oltean). https://lore.kernel.org/netdev/20210629140658.2510288-1-olteanv@gmail.com/ I tested this with 5.14-rc3 on a Turris Omnia. Traffic from a DSA switch port, addressed to the bridge, is not flooded anymore to other DSA switch ports - even if unicast flooding is turned on. Setting the status to "resolved". |
Created attachment 292789 [details] boot logs I am facing a regression on Turris Omnia (Armada 385), which is equipped with a Marvell 88E6176 switch, revision 1. DSA Configuration is according to the "bridge" showcase. With 4.19.148, the switch is operating as expected. With 5.4.x kernels, up to and including the current one (5.9-rc7), the switch is re-emitting Ethernet packets (a) received on any slave port and (b) addressed to its master port, to *all* other active slave ports. (It may re-emit in other cases as well, I haven't checked this). This generates a HUGE amount of excess traffic on the slave ports. Effectively, the switch acts as a hub. This can easily be confirmed e.g. by "ping -i 0.1 turris-omnia" on one of the slave ports, and *all* active LAN port LEDs will start flashing on the Turris Omnia. With 4.19 kernels, this does NOT happen. I am attaching boot logs for three different kernels, one for the "good" case, and two for the "bad" case. Maybe related or not, the "bad" kernels detect the same switch 16 times (!), the "good" kernel (4.19.148) only once.