On Fedora 21, with 3.17.6-300.fc21.x86_64, with iproute-3.16.0-3 (installed from rpm),
ip utility, iproute2-ss140804
Running in one terminal:
ip monitor all
And then running in a second terminal this sequence:
ip link add br0 type bridge
bridge vlan add vid 10 dev br0 self
causes the "ip monitor all" to terminate, with "EOF on netlink".
This happens also on older distros of Fedora (Fedora 20 and downward) with older kernels.
It seems that the reason is that an skb->len is 0 for the netlink notification which is sent from
with rtnl_notify() which is invoked from rtnl_bridge_notify(), which in turn is invoked from rtnl_bridge_setlink().
The reason for the zero length message in this case is that the user is sending
the setlink request to the bridge with self flag set.
And since the getlink on the bridge device only returns bytes when its a bridge port, there are no bytes in the skb.
There are two fixes needed:
- one is the skb->len check
- and fix bridge driver ndo_bridge_getlink to return correct vlan changes for the bridge device
pasting here more detailed comments from rami rosen on netdev list:
For the sake of those who are interested in more implementation details and in the code walkthrough under such scenario, what happens when "bridge vlan add vid 1 dev br0 self" , you should follow this path:
Look at rtnl_bridge_setlink() method, it is invoked in this case.
If the SELF flag is set it calls dev->netdev_ops->ndo_bridge_setlink()
and then it calls rtnl_bridge_notify()
Now, rtnl_bridge_notify() calls dev->netdev_ops->ndo_bridge_getlink()
when the self flag is set.
Now, when running the "bridge vlan add" on a bridge device like we do (and **not on a bridge port**)
then the dev variable is an instance of a software bridge. So this calls the ndo_bridge_getlink() callback of the software bridge, which is br_getlink():
Now, br_getlink() first checks if the device is a bridge port:
struct net_bridge_port *port = br_port_get_rtnl(dev);
And it returns 0 if not.
So as a result, the skb->len is 0 and an empty notification is sent.
And when the rtneltnlink socket, which is opened by "ip monitor all" and listens to netlink messages, receives an
empty notification it terminates with the "EOF" message (as mentioned in the bugzilla link).
It seems to me that this BUG should be closed.
The following patch from Roopa Parbhu fixed it:
This BUG id is mentioned in the commit message as the reason for submitting it.
This patch was integrated already in 3.19.
I tested the same scenario on 4.4 and the problem mentioned in the BUG description did not occur.
So unless I will get an objection within 24 hours, I intend to close it.
As said yesterday, I am changing the status to "RESOLVED", as
the following patch from Roopa Parbhu fixed it:
Closing the BUG