Bug 92081 - skb->len=0 and getting "EOF on netlink" with "ip monitor all" (of iproute) when adding a vlan with "bridge vlan add"
Summary: skb->len=0 and getting "EOF on netlink" with "ip monitor all" (of iproute) wh...
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-26 18:15 UTC by Rami Rosen
Modified: 2016-02-18 17:12 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.17.6-300
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Rami Rosen 2015-01-26 18:15:01 UTC
On Fedora 21, with 3.17.6-300.fc21.x86_64, with iproute-3.16.0-3 (installed from rpm), 
ip -V:
ip utility, iproute2-ss140804

Running in one terminal:
ip monitor all

And then running in a second terminal this sequence:
ip link add br0 type bridge
bridge vlan add vid 10 dev br0 self

causes the "ip monitor all" to terminate, with "EOF on netlink".

This happens also on older distros of Fedora (Fedora 20 and downward) with older kernels.

It seems that the reason is that an skb->len is 0 for the netlink notification which is sent from 
with rtnl_notify() which is invoked from  rtnl_bridge_notify(), which in turn is invoked from  rtnl_bridge_setlink().

See:
http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2773

Rami Rosen
Comment 1 Roopa Prabhu 2015-01-28 05:10:32 UTC
The reason for the zero length message in this case is that the user is sending
 the setlink request to the bridge with self flag set.
And since the getlink on the bridge device only returns bytes when its a  bridge port, there are no bytes in the skb.

There are two fixes needed:
- one is the skb->len check 
- and fix bridge driver ndo_bridge_getlink to return correct vlan changes for the bridge device
Comment 2 Roopa Prabhu 2015-01-28 05:12:01 UTC
pasting here more detailed comments from rami rosen on netdev list:

For the sake of those who are interested in more implementation details and in the code walkthrough under such scenario, what happens when "bridge vlan add vid 1 dev br0 self" , you should follow this path:

Look at rtnl_bridge_setlink() method, it is invoked in this case.
http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2782

If the SELF flag is set it calls dev->netdev_ops->ndo_bridge_setlink()
See:
http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2840

and then it calls rtnl_bridge_notify()
See:
http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2850

Now, rtnl_bridge_notify() calls  dev->netdev_ops->ndo_bridge_getlink()
when the self flag is set.
See:
http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L2767

Now, when running the "bridge vlan add" on a bridge device like we do (and **not on a bridge port**)
then the dev variable is an instance of a software bridge. So this calls the ndo_bridge_getlink() callback of the software bridge, which is br_getlink():
See:
http://lxr.free-electrons.com/source/net/bridge/br_netlink.c#L205

Now, br_getlink() first checks if the device is a bridge port:
struct net_bridge_port *port = br_port_get_rtnl(dev);

And it returns 0 if not.
So as a result, the skb->len is 0 and an empty notification is sent.

And when the rtneltnlink socket, which is opened by "ip monitor all" and listens to netlink messages, receives an
empty notification it terminates with the "EOF" message (as mentioned in the bugzilla link).
Comment 3 Rami Rosen 2016-02-17 10:06:32 UTC
It seems to me that this BUG should be closed.
The following patch from Roopa Parbhu fixed it:
http://www.spinics.net/lists/netdev/msg314256.html

This BUG id is mentioned in the commit message as the reason for submitting it.

This patch was integrated already in 3.19.

I tested the same scenario on 4.4 and the problem mentioned in the BUG description did not occur.

So unless I will get an objection within 24 hours, I intend to close it.

Regards,
Rami Rosen
Intel Corporation
Comment 4 Rami Rosen 2016-02-18 17:11:37 UTC
As said yesterday, I am changing the status to "RESOLVED", as
the following patch from Roopa Parbhu fixed it:
http://www.spinics.net/lists/netdev/msg314256.html


Rami Rosen
Intel Corporation
Comment 5 Rami Rosen 2016-02-18 17:12:41 UTC
Closing the BUG

Rami Rosen
Intel Corporation

Note You need to log in before you can comment on or make changes to this bug.