Bug 56221 - bonding: ARP monitoring doesn't work with bridges
Summary: bonding: ARP monitoring doesn't work with bridges
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-04 13:57 UTC by Christian Ruppert
Modified: 2015-06-23 14:27 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.8.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Christian Ruppert 2013-04-04 13:57:49 UTC
Hi,

we tried to setup VLANs and bridges on top of the bonding device.

Creating a bond over eth0 and eth1:
eth0 \
      --> bond0
eth1 /

Adding VLANs 1, 2 and 3 on top of the bond:
       --> bond0.1
      /
bond0 ---> bond0.2
      \
       --> bond0.3

And finally adding bridges on top of the VLAN bonds:
bond0.1 -> br1
bond0.2 -> br2
bond0.3 -> br3

Those three VLANs are all tagged. br1 will also get the default route.

So the ARP monitoring doesn't work as soon as a bridge gets the default route (or at least the route to the ARP IP target).
<snip>
[76655.096076] bonding: bond0: no path to arp_ip_target 172.16.0.1 via rt.dev br1
</snip>

I then tried it with just the VLANs (no bridges) and it works fine.
Then I tried it with just bridges and no VLAN - it doesn't.

Here are some steps to reproduce it:
ifconfig eth0 0.0.0.0 up
ifconfig eth1 0.0.0.0 up

modprobe bonding mode="active-backup" primary="eth0" arp_interval=3000
arp_ip_target="172.16.0.1"

ifconfig bond0 0.0.0.0 up
ifenslave bond0 eth0
ifenslave bond0 eth1

brctl addbr br1 
brctl addif br1 bond0
ifconfig br1 172.16.0.2/28 up

With tcpdump you'll see that there are no ARP requests/pings at all and in either dmesg or the kernel log you'll notice the "no path to ..." warnings.

So I took a look into the bonding driver souces (mainly bond_main.c):
The bonding driver asks for the route to the arp_ip_target which is br1 in this case and it then compares it against the bond device(s) so bond0 and/or bond0.1, bond0.2 and so forth. So neither bond->dev nor vlan_dev will ever be the same as rt->dst.dev as long as you add the route to the bridge.
There is no check for bridged devices at all.
The driver should get an event/notify when the device (bond) became a member of a bridge or has been removed and so forth, basically the same that has been added for the VLAN stuff.

Just let me know if you need anything else. Thanks in advance.
Comment 1 Christian Ruppert 2014-02-21 14:57:22 UTC
The related thread on the ML: http://marc.info/?t=136509410200004&r=1&w=2
Comment 2 Christian Ruppert 2015-06-23 14:27:16 UTC
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/net/bonding?id=27bc11e63888c7cb0bd6d443e98775254cf7dbdd

It looks like it has been fixed. I cannot reproduce it at least. Thanks!

Note You need to log in before you can comment on or make changes to this bug.