Bug 204389

Summary: bridge: AF_BRIDGE NEWNEIGH netlink message with ifidx is zero
Product: Networking Reporter: michael-dev
Component: OtherAssignee: Stephen Hemminger (stephen)
Status: NEW ---    
Severity: normal CC: michael-dev, nikolay
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.19 Subsystem:
Regression: No Bisected commit-id:

Description michael-dev 2019-07-31 09:32:20 UTC
My application is receiving the following netlink messages from kernel:

-------------------------- BEGIN NETLINK
MESSAGE ---------------------------
 [NETLINK HEADER] 16 octets
 .nlmsg_len = 76
 .type = 28 <route/neigh::new>
 .flags = 0 <>
 .seq = 0
 .port = 0
 [PAYLOAD] 12 octets
 07 00 00 00 00 00 00 00 00 80 00 00 ............
 [ATTR 02] 6 octets
 4e db c4 30 92 f4 N..0..
 [PADDING] 2 octets
 00 00 ..
 .flags = 0 <>
 .seq = 0
 .port = 0
 [PAYLOAD] 12 octets
 07 00 00 00 00 00 00 00 00 80 00 00 ............
 [ATTR 02] 6 octets
 4e db c4 30 92 f4 N..0..
 [PADDING] 2 octets
 00 00 ..
 [ATTR 09] 4 octets
 00 00 00 00 ....
 [ATTR 03] 16 octets
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
 [ATTR 05] 2 octets
 00 01 ..
 [PADDING] 2 octets
 00 00 ..
--------------------------- END NETLINK MESSAGE ---------------------------

Clearly, ifidex is zero here, so the application cannot correlate this NEWNEIGH message to  any interface.

Adding a WARN_ON (ifidx=0) to fdb_fill_info gives the following backtrace:

[ 43.071801] [efb47c10] [c0510580] fdb_fill_info+0x180/0x22c (unreliable)
[ 43.071829] [efb47c50] [c0510684] fdb_notify.isra.22+0x58/0xd8
[ 43.071849] [efb47c70] [c0511b44] fdb_insert+0xa4/0x108
[ 43.071870] [efb47c90] [c051254c] br_fdb_insert+0x40/0x64
[ 43.071891] [efb47cb0] [c052b640] __vlan_add+0xe50/0xf24
[ 43.071911] [efb47d20] [c052a77c] br_vlan_add+0x528/0x59c
[ 43.071931] [efb47d70] [c052c38c] br_vlan_init+0x90/0xd4
[ 43.071951] [efb47da0] [c050f96c] br_dev_init+0xc8/0x170
[ 43.071983] [efb47de0] [c03cff8c] register_netdevice+0x148/0x6e8
[ 43.072003] [efb47e20] [c03d0554] register_netdev+0x28/0x50
[ 43.072026] [efb47e30] [c05154f4] br_add_bridge+0x44/0x78
[ 43.072047] [efb47e40] [c0517ce8] br_ioctl_deviceless_stub+0x2c4/0x2f0
[ 43.072069] [efb47e80] [c03a372c] sock_ioctl+0x17c/0x3d8
[ 43.072091] [efb47ed0] [c017655c] do_vfs_ioctl+0x7ac/0x894
[ 43.072111] [efb47f20] [c0176684] ksys_ioctl+0x40/0x74
[ 43.072140] [efb47f40] [c0011288] ret_from_syscall+0x0/0x3c

So clearly br_vlan_init triggers a NEWNEIGH message before bridge creation has been completed and an ifindex assigned.
Comment 1 michael-dev 2019-07-31 09:33:30 UTC
libnl parses this message as "dev 0
lladdr 4e:db:c4:30:92:f4 vlan 1 <permanent>"
Comment 2 Nikolay Aleksandrov 2019-07-31 14:48:33 UTC
Indeed, that is a known bug. br_vlan_init() is called from ndo_init() which is called prior to assigning ifindex to the device, it has always been like that.
I'll propose a patch on netdev (will keep you CC'ed) which resolves it after
running a few tests.