On Fedora 21, with 3.17.6-300.fc21.x86_64, with iproute-3.16.0-3 (installed from rpm), ip -V: ip utility, iproute2-ss140804 Running in one terminal: ip monitor all And then running in a second terminal this sequence: ip link add br0 type bridge bridge vlan add vid 10 dev br0 self causes the "ip monitor all" to terminate, with "EOF on netlink". This happens also on older distros of Fedora (Fedora 20 and downward) with older kernels. It seems that the reason is that an skb->len is 0 for the netlink notification which is sent from with rtnl_notify() which is invoked from rtnl_bridge_notify(), which in turn is invoked from rtnl_bridge_setlink(). See: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2773 Rami Rosen
The reason for the zero length message in this case is that the user is sending the setlink request to the bridge with self flag set. And since the getlink on the bridge device only returns bytes when its a bridge port, there are no bytes in the skb. There are two fixes needed: - one is the skb->len check - and fix bridge driver ndo_bridge_getlink to return correct vlan changes for the bridge device
pasting here more detailed comments from rami rosen on netdev list: For the sake of those who are interested in more implementation details and in the code walkthrough under such scenario, what happens when "bridge vlan add vid 1 dev br0 self" , you should follow this path: Look at rtnl_bridge_setlink() method, it is invoked in this case. http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2782 If the SELF flag is set it calls dev->netdev_ops->ndo_bridge_setlink() See: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2840 and then it calls rtnl_bridge_notify() See: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2850 Now, rtnl_bridge_notify() calls dev->netdev_ops->ndo_bridge_getlink() when the self flag is set. See: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2767 Now, when running the "bridge vlan add" on a bridge device like we do (and **not on a bridge port**) then the dev variable is an instance of a software bridge. So this calls the ndo_bridge_getlink() callback of the software bridge, which is br_getlink(): See: http://lxr.free-electrons.com/source/net/bridge/br_netlink.c#L205 Now, br_getlink() first checks if the device is a bridge port: struct net_bridge_port *port = br_port_get_rtnl(dev); And it returns 0 if not. So as a result, the skb->len is 0 and an empty notification is sent. And when the rtneltnlink socket, which is opened by "ip monitor all" and listens to netlink messages, receives an empty notification it terminates with the "EOF" message (as mentioned in the bugzilla link).
It seems to me that this BUG should be closed. The following patch from Roopa Parbhu fixed it: http://www.spinics.net/lists/netdev/msg314256.html This BUG id is mentioned in the commit message as the reason for submitting it. This patch was integrated already in 3.19. I tested the same scenario on 4.4 and the problem mentioned in the BUG description did not occur. So unless I will get an objection within 24 hours, I intend to close it. Regards, Rami Rosen Intel Corporation
As said yesterday, I am changing the status to "RESOLVED", as the following patch from Roopa Parbhu fixed it: http://www.spinics.net/lists/netdev/msg314256.html Rami Rosen Intel Corporation
Closing the BUG Rami Rosen Intel Corporation