Bug 209487 - DSA: Regression with Marvell 88E6176: Excess traffic on slave ports
Summary: DSA: Regression with Marvell 88E6176: Excess traffic on slave ports
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-03 13:47 UTC by Klaus Kudielka
Modified: 2021-08-01 08:13 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.1 and above
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
boot logs (92.77 KB, text/plain)
2020-10-03 13:47 UTC, Klaus Kudielka
Details
node's port info (4.20 KB, text/plain)
2020-10-23 20:24 UTC, vtolkm
Details
port info of affected device (7.73 KB, text/plain)
2020-10-26 18:23 UTC, Klaus Kudielka
Details

Description Klaus Kudielka 2020-10-03 13:47:44 UTC
Created attachment 292789 [details]
boot logs

I am facing a regression on Turris Omnia (Armada 385), which is equipped with a Marvell 88E6176 switch, revision 1. DSA Configuration is according to the "bridge" showcase.

With 4.19.148, the switch is operating as expected.

With 5.4.x kernels, up to and including the current one (5.9-rc7), the switch is re-emitting Ethernet packets (a) received on any slave port and (b) addressed to its master port, to *all* other active slave ports. (It may re-emit in other cases as well, I haven't checked this).

This generates a HUGE amount of excess traffic on the slave ports. Effectively, the switch acts as a hub. This can easily be confirmed e.g. by "ping -i 0.1 turris-omnia" on one of the slave ports, and *all* active LAN port LEDs will start flashing on the Turris Omnia. With 4.19 kernels, this does NOT happen.

I am attaching boot logs for three different kernels, one for the "good" case, and two for the "bad" case.

Maybe related or not, the "bad" kernels detect the same switch 16 times (!), the "good" kernel (4.19.148) only once.
Comment 1 Klaus Kudielka 2020-10-05 18:21:15 UTC
After bisecting this, I realize that the default DSA behaviour has changed with kernel 5.1:

*****

From c138806344855cd7c094abbe7507832107e8171e Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Wed, 20 Feb 2019 15:35:06 -0800
Subject: net: dsa: enable flooding for bridge ports

[...]

This commit enables by default flooding of both unknown unicast and
unknown multicast frames whenever a port is added to a bridge, and
disables the flooding when a port leaves the bridge.  This means that
mv88e6xxx DSA switches now behave as per the bridge(8) man page, and
IPv6 works flawlessly through such a switch.

*****

If I disable unicast flooding on the slave ports, I can confirm that packets addressed to the bridge master's MAC are no more flooding the slave ports. But this may break IPv6 connectivity, as the commit above explains.

So, what IMO is still not working correctly:

The bridge master's MAC address should be *known* to the DSA switch, and therefore unicasts to the master should NOT be flooding the slave ports, even when unicast flooding is enabled!?
Comment 2 vtolkm 2020-10-22 18:39:28 UTC
I got kernel 5.9.1 deployed on a TO and cannot reproduce the issue, that is three-fold:

---

1) ping -i 0.1 

does not work in the first place (with iputils-ping 20200821-1 from OpenWrt), producing > ping: invalid number '0.1'

---

2) "turris-omnia" it is not clear which (master) interface you are pinging, assuming this being ipv4 to br-lan this

>  *all* active LAN port LEDs will start flashing on the Turris Omnia

does not reproduce but I compiled the kernel with:

CONFIG_LEDS_TURRIS_OMNIA=y

(only available for kernel 5.x currently) and LEDs stipulated accordingly in the device tree.

That aside, is there any other indicator than the observed LED flashing that 

>the switch is re-emitting Ethernet packets (a) received on any slave port and
>(b) addressed to its master port, to *all* other active slave ports

?

---

3) the multiple (16 times) switch detection during boot does not reproduce but then I do not compile the kernel with modules and also

# CONFIG_MODULES is not set

which could make a difference and which is also negates the kmod loading mechanism of OpenWrt.
Comment 3 Klaus Kudielka 2020-10-23 17:43:09 UTC
Ok, I think I have to improve the procedure for reproducing the bug.
Let's forget about Turris Omnia LEDs for the moment, there are better
indicators.


----

1. You need at least three hosts, one of them being the Turris Omnia.

My example:

gateway: Turris Omnia (TO), upstream kernel 5.9.1, debian testing
	br0		192.168.1.1	bridge master
	eth1		-		DSA master
	lan0-lan4	-		DSA slave

host1: with tcpdump
	eth0		192.168.1.4	connected to gateway/lan3

host2: with ping
	eno1		192.168.1.151	connected to gateway/lan4

----

2. If not already done by your OS, configure yor DSA bridge on the TO.
   Note the MAC address of br0!
   Just to be sure, enable unicast flooding (which is the kernel default).

root@gateway:~# bridge link show
5: lan0@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 19 
6: lan1@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100 
7: lan2@eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100 
8: lan3@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 
9: lan4@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 
root@gateway:~# ip address show br0
10: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 1e:1a:6c:ef:b9:7d brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 brd 192.168.1.255 scope global br0
       valid_lft forever preferred_lft forever
root@gateway:~# bridge link set dev lan0 flood on
root@gateway:~# bridge link set dev lan1 flood on
root@gateway:~# bridge link set dev lan2 flood on
root@gateway:~# bridge link set dev lan3 flood on
root@gateway:~# bridge link set dev lan4 flood on


----

3. Run the following commands in parallel (note the different hosts):

root@host2:~# ping gateway
PING gateway (192.168.1.1) 56(84) bytes of data.
64 bytes from gateway (192.168.1.1): icmp_seq=1 ttl=64 time=0.404 ms
64 bytes from gateway (192.168.1.1): icmp_seq=2 ttl=64 time=0.380 ms
64 bytes from gateway (192.168.1.1): icmp_seq=3 ttl=64 time=0.378 ms

root@host1:~# tcpdump -n -i eth0 -e icmp 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:49:40.263993 bc:ee:7b:e1:f0:91 > 1e:1a:6c:ef:b9:7d, ethertype IPv4 (0x0800), length 98: 192.168.1.151 > 192.168.1.1: ICMP echo request, id 2, seq 27, length 64
17:49:41.287978 bc:ee:7b:e1:f0:91 > 1e:1a:6c:ef:b9:7d, ethertype IPv4 (0x0800), length 98: 192.168.1.151 > 192.168.1.1: ICMP echo request, id 2, seq 28, length 64
17:49:42.311964 bc:ee:7b:e1:f0:91 > 1e:1a:6c:ef:b9:7d, ethertype IPv4 (0x0800), length 98: 192.168.1.151 > 192.168.1.1: ICMP echo request, id 2, seq 29, length 64

*** This should not happen. host1@lan3 is getting traffic from host2@lan4
*** to gateway. It seems the switch treats the MAC address of br0 as
*** "unknown", and therefore floods the other slaves.

*** As far as I can see, *all* packets from a host on the bridge to the
*** gateway (or furtheron to the Internet) are being flooded to *all* other
*** hosts on the bridge. Not quite what I would expect from a gateway.


4. Nasty workaround: Disable unicast flooding

root@gateway:~# bridge link set dev lan0 flood off
root@gateway:~# bridge link set dev lan1 flood off
root@gateway:~# bridge link set dev lan2 flood off
root@gateway:~# bridge link set dev lan3 flood off
root@gateway:~# bridge link set dev lan4 flood off


5. Repeat step 3

root@host2:~# ping gateway
PING gateway (192.168.1.1) 56(84) bytes of data.
64 bytes from gateway (192.168.1.1): icmp_seq=1 ttl=64 time=0.408 ms
64 bytes from gateway (192.168.1.1): icmp_seq=2 ttl=64 time=0.421 ms
64 bytes from gateway (192.168.1.1): icmp_seq=3 ttl=64 time=0.389 ms

root@host1:~# tcpdump -n -i eth0 -e icmp 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
[ No further output ]

*** With unicast flooding disabled, we have the expected behaviour.
*** But, as Russell King explains in commit c1388063, this may break
*** IPv6 connectivity in subtle ways, so is *not* recommended.
Comment 4 vtolkm 2020-10-23 20:24:55 UTC
Created attachment 293161 [details]
node's port info

If I understand correctly the kernel is compiled from linux source your end and not sourced from the debian distro. If so that is the same my end, difference the userland being sourced from OpenWrt instead of debian.

Attached are details about the network/ports on this node, notable "flood on" (Controls whether a given port will flood unicast traffic for which there is no FDB entry. By default this flag is on)  on all DSA downstream ports (lan 0 - 4).

br-lan enslaves lan 0|1|2

* downstream client on lan1 pinging br-lan ipv4 (192.168.84.23)
* remote ssh packet dump (wireshark) for lan0  with downstream client on lan0

The packet dump did not (re)produce your observation of frames being (re)flooded to lan0.

Even tried with explicitly with:

bridge link set dev lan0 flood on
bridge link set dev lan1 flood on

but same result.
Comment 5 Klaus Kudielka 2020-10-26 18:23:22 UTC
Created attachment 293223 [details]
port info of affected device
Comment 6 Klaus Kudielka 2020-10-26 18:33:49 UTC
> If I understand correctly the kernel is compiled from linux source your end
> and not sourced from the debian distro. If so that is the same my end, 
> difference the userland being sourced from OpenWrt instead of debian.

Correct. Kernel source from kernel.org, kernel config from debian (armmp).

> Attached are details about the network/ports on this node, notable "flood on"

For comparison, I have attached similar details about the ports involved on my
(affected) device. At that time, only lan0 & lan4 ports were active.

At a first glance, I cannot find a reason for the different behaviour.
Comment 7 vtolkm 2020-10-26 18:54:53 UTC
Theoretically I suppose there could be four (separate or combined) causes that produce different results:

(1) different kernel conf
(2) different sysctl conf (which I suspect might be the case)
(3) different netfilter userland (on my node it is nftables)
(4) other userland differences

I you are interested I could share (1) and (2) from my end with you via email.
Comment 8 Klaus Kudielka 2020-10-29 07:44:08 UTC
> I you are interested I could share (1) and (2) from my end with you via
> email.

I don't think that makes sense at the moment.

Just to be absolutely sure, I'd like to calibrate our test methods first.

I found a relatively simple method to have same kernel, user land, and configuration on your & my device: Use plain OpenWrt master, which is affected
as well.

Step 1:
=======

I just installed the following image, and booted it.

https://downloads.openwrt.org/snapshots/targets/mvebu/cortexa9/openwrt-mvebu-cortexa9-cznic_turris-omnia-sysupgrade.img.gz

A less intrusive but otherwise identical way would be, to directly boot 
cznic_turris-omnia-initramfs-kernel.bin instead. You would need to extract the DTB from the sysupgrade image or the medkit.)

No additional install, no configuration. Just boot the image.

Step 2:
=======

Plug in two computers into the LAN ports, generate traffic on one port (e.g. ssh 192.168.1.1), and watch it on the other port. If you still don't see it, please double check whether your WireShark is in promiscuous mode. A very simple but effective capture filter rule could be: "ether host not <my_mac_address>".


Would you be able to reproduce that? Okay, it's not the latest kernel, but let's make one step at a time...
Comment 9 vtolkm 2020-10-29 21:11:30 UTC
Pardon me, but I would not accommodate your suggestion about testing this with the OpenWrt kernel since the distro is patching the kernel heavily whilst this being the bugzilla for the Linux kernel.

Checked and made sure that the lan port traffic is being dumped from is indeed in promiscuous mode, e.g. STP traffic is visible but nothing from the node(s) connected to the other lan port(s).

It would seem that I cannot contribute anything helpful/useful here, it might even be that somehow I am not replicating your setup exactly and thus fail to reproduce your observation.
Comment 10 Klaus Kudielka 2021-02-03 19:35:22 UTC
For reference, it seems the issue is being worked on:

https://lore.kernel.org/netdev/20210116012515.3152-1-tobias@waldekranz.com/
Comment 11 Klaus Kudielka 2021-08-01 08:12:10 UTC
The issue seems to be finally solved by acceptance of the "RX filtering in DSA" patch series (thanks to Tobias Waldekranz & Vladimir Oltean).

https://lore.kernel.org/netdev/20210629140658.2510288-1-olteanv@gmail.com/

I tested this with 5.14-rc3 on a Turris Omnia. Traffic from a DSA switch port, addressed to the bridge, is not flooded anymore to other DSA switch ports - even if unicast flooding is turned on.
Comment 12 Klaus Kudielka 2021-08-01 08:13:20 UTC
Setting the status to "resolved".

Note You need to log in before you can comment on or make changes to this bug.