Hi, we tried to setup VLANs and bridges on top of the bonding device. Creating a bond over eth0 and eth1: eth0 \ --> bond0 eth1 / Adding VLANs 1, 2 and 3 on top of the bond: --> bond0.1 / bond0 ---> bond0.2 \ --> bond0.3 And finally adding bridges on top of the VLAN bonds: bond0.1 -> br1 bond0.2 -> br2 bond0.3 -> br3 Those three VLANs are all tagged. br1 will also get the default route. So the ARP monitoring doesn't work as soon as a bridge gets the default route (or at least the route to the ARP IP target). <snip> [76655.096076] bonding: bond0: no path to arp_ip_target 172.16.0.1 via rt.dev br1 </snip> I then tried it with just the VLANs (no bridges) and it works fine. Then I tried it with just bridges and no VLAN - it doesn't. Here are some steps to reproduce it: ifconfig eth0 0.0.0.0 up ifconfig eth1 0.0.0.0 up modprobe bonding mode="active-backup" primary="eth0" arp_interval=3000 arp_ip_target="172.16.0.1" ifconfig bond0 0.0.0.0 up ifenslave bond0 eth0 ifenslave bond0 eth1 brctl addbr br1 brctl addif br1 bond0 ifconfig br1 172.16.0.2/28 up With tcpdump you'll see that there are no ARP requests/pings at all and in either dmesg or the kernel log you'll notice the "no path to ..." warnings. So I took a look into the bonding driver souces (mainly bond_main.c): The bonding driver asks for the route to the arp_ip_target which is br1 in this case and it then compares it against the bond device(s) so bond0 and/or bond0.1, bond0.2 and so forth. So neither bond->dev nor vlan_dev will ever be the same as rt->dst.dev as long as you add the route to the bridge. There is no check for bridged devices at all. The driver should get an event/notify when the device (bond) became a member of a bridge or has been removed and so forth, basically the same that has been added for the VLAN stuff. Just let me know if you need anything else. Thanks in advance.
The related thread on the ML: http://marc.info/?t=136509410200004&r=1&w=2
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/net/bonding?id=27bc11e63888c7cb0bd6d443e98775254cf7dbdd It looks like it has been fixed. I cannot reproduce it at least. Thanks!