Each IP interface in the UP state "have" appropriate multicast addresses, which can be shown by "ip maddr show" output. For example, for an ordinary one-card computer, "ip maddr show" shows something like: 1: lo inet 224.0.0.1 2: eth0 link 01:00:5e:00:00:01 inet 224.0.0.1 or something more if IPv6 is enabled. I've discovered an issue with the link layer multicasts (aka 01:00:5e:00:00:01 in the example above) and VLAN devices. When I do some non-trivial vlan manipulations (more than one vlan device, "set up/set down" several times, etc.), "ip maddr show" shows either the wrong "users %d" count for "link" addresses, or the "link" addresses disappears at all. Unfortunately, it seems that there is no any "stable" scheme to catch the issue, but it happens "often enough" in various situations. I've wrote a test script (see below), which perform in a loop some trivial things over vlans and then shows "ip maddr show". (To run the script further just type enter. Ctrl-C to break). I've disabled ipv6 module on the test machine to be more clean. For me, in the first 5 steps I already catched the issue. Instead of the normal situation: 1: lo inet 224.0.0.1 2: eth0 link 01:00:5e:00:00:01 users 3 inet 224.0.0.1 3: eth0.2 link 01:00:5e:00:00:01 users 2 inet 224.0.0.1 4: eth0.3 link 01:00:5e:00:00:01 users 2 inet 224.0.0.1 I often see: 1: lo inet 224.0.0.1 2: eth0 link 01:00:5e:00:00:01 inet 224.0.0.1 3: eth0.2 link 01:00:5e:00:00:01 users 2 inet 224.0.0.1 4: eth0.3 inet 224.0.0.1 or even: 1: lo inet 224.0.0.1 2: eth0 inet 224.0.0.1 3: eth0.2 inet 224.0.0.1 4: eth0.3 inet 224.0.0.1 at all. Sorry that I've tested it on too old kernel (2.6.23), but I hope the test script is convenient enough to check whether the issue still present at the newest kernels or not.
Created attachment 15735 [details] The test shell script.
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Fri, 11 Apr 2008 09:18:07 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10443 > > Summary: VLAN: link level multicasts addresses disappears... > Product: Networking > Version: 2.5 > KernelVersion: 2.6.23.15 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: dmitry@butskoy.name > > > Each IP interface in the UP state "have" appropriate multicast addresses, > which > can be shown by "ip maddr show" output. > > For example, for an ordinary one-card computer, "ip maddr show" shows > something > like: > > 1: lo > inet 224.0.0.1 > 2: eth0 > link 01:00:5e:00:00:01 > inet 224.0.0.1 > > or something more if IPv6 is enabled. > > I've discovered an issue with the link layer multicasts (aka > 01:00:5e:00:00:01 > in the example above) and VLAN devices. > > > When I do some non-trivial vlan manipulations (more than one vlan device, > "set > up/set down" several times, etc.), "ip maddr show" shows either the wrong > "users %d" count for "link" addresses, or the "link" addresses disappears at > all. > > Unfortunately, it seems that there is no any "stable" scheme to catch the > issue, but it happens "often enough" in various situations. > > I've wrote a test script (see below), which perform in a loop some trivial > things over vlans and then shows "ip maddr show". (To run the script further > just type enter. Ctrl-C to break). I've disabled ipv6 module on the test > machine to be more clean. > > For me, in the first 5 steps I already catched the issue. > > > Instead of the normal situation: > > 1: lo > inet 224.0.0.1 > 2: eth0 > link 01:00:5e:00:00:01 users 3 > inet 224.0.0.1 > 3: eth0.2 > link 01:00:5e:00:00:01 users 2 > inet 224.0.0.1 > 4: eth0.3 > link 01:00:5e:00:00:01 users 2 > inet 224.0.0.1 > > > I often see: > > 1: lo > inet 224.0.0.1 > 2: eth0 > link 01:00:5e:00:00:01 > inet 224.0.0.1 > 3: eth0.2 > link 01:00:5e:00:00:01 users 2 > inet 224.0.0.1 > 4: eth0.3 > inet 224.0.0.1 > > > or even: > > 1: lo > inet 224.0.0.1 > 2: eth0 > inet 224.0.0.1 > 3: eth0.2 > inet 224.0.0.1 > 4: eth0.3 > inet 224.0.0.1 > > at all. > > > Sorry that I've tested it on too old kernel (2.6.23), but I hope the test > script is convenient enough to check whether the issue still present at the > newest kernels or not. There's a test case in the bugzilla report.
From: Andrew Morton <akpm@linux-foundation.org> Date: Fri, 11 Apr 2008 21:41:34 -0700 > There's a test case in the bugzilla report. Could be fixed by: commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6 Author: Patrick McHardy <kaber@trash.net> Date: Wed Mar 26 00:15:17 2008 -0700 [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity or dev_change_flags. Setting it directly causes two unwanted effects: - the next dev_change_flags call will notice a difference between dev->gflags and the actual flags, enable promisc/allmulti mode and incorrectly update dev->gflags - this keeps the underlying device in promisc/allmulti mode until the VLAN device is deleted Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 8fbcefe..480ea90 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -660,7 +660,7 @@ static int vlan_dev_init(struct net_device *dev) int subclass = 0; /* IFF_BROADCAST|IFF_MULTICAST; ??? */ - dev->flags = real_dev->flags & ~IFF_UP; + dev->flags = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI); dev->iflink = real_dev->ifindex; dev->state = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))) |
David Miller wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Fri, 11 Apr 2008 21:41:34 -0700 > >> There's a test case in the bugzilla report. > > Could be fixed by: > > commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6 > Author: Patrick McHardy <kaber@trash.net> > Date: Wed Mar 26 00:15:17 2008 -0700 > > [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device Either that patch or this one I guess: [NET]: Messed multicast lists after dev_mc_sync/unsync I'll go through our recent multicast fixes and check which ones make sense for -stable. commit 12aa343add3eced38a44bdb612b35fdf634d918c Author: Jorge Boncompte [DTI2] <jorge@dti2.net> Date: Tue Feb 19 14:17:04 2008 -0800 [NET]: Messed multicast lists after dev_mc_sync/unsync Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 ("[NET]: dev_mcast: add multicast list synchronization helpers") from you introduced a new field "da_synced" to struct dev_addr_list that is not properly initialized to 0. So when any of the current users (8021q, macvlan, mac80211) calls dev_mc_sync/unsync they mess the address list for both devices. The attached patch fixed it for me and avoid future problems. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/core/dev.c b/net/core/dev.c index 6cfc123..9516105 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int *count, } } - da = kmalloc(sizeof(*da), GFP_ATOMIC); + da = kzalloc(sizeof(*da), GFP_ATOMIC); if (da == NULL) return -ENOMEM; memcpy(da->da_addr, addr, alen);
Patrick McHardy wrote: > David Miller wrote: >> From: Andrew Morton <akpm@linux-foundation.org> >> Date: Fri, 11 Apr 2008 21:41:34 -0700 >> >>> There's a test case in the bugzilla report. >> >> Could be fixed by: >> >> commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6 >> Author: Patrick McHardy <kaber@trash.net> >> Date: Wed Mar 26 00:15:17 2008 -0700 >> >> [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device > > Either that patch or this one I guess: > > [NET]: Messed multicast lists after dev_mc_sync/unsync > > I'll go through our recent multicast fixes and check which > ones make sense for -stable. The "messed multicast lists" patch is already in the latest -stable release. I've attached backports of two more multicast fixes. Dave, could you forward them to -stable please in case you're fine with them? Thanks. [NET]: Fix multicast device ioctl checks Upstream commit 61ee6bd487b9cc160e533034eb338f2085dc7922: SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list method to determine whether it supports multicast. Drivers implementing secondary unicast support use set_rx_mode however. Check for both dev->set_multicast_mode and dev->set_rx_mode to determine multicast capabilities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> --- commit 4eed9fb2564cf763918b9711558169e06dbfb8d5 tree 1cead411884581aa0b8932edcc399cab9551e556 parent 16c64cac7d9c6a503f49887219c4fe675e7d43d9 author Patrick McHardy <kaber@trash.net> Wed, 26 Mar 2008 02:12:11 -0700 committer Patrick McHardy <kaber@trash.net> Sun, 13 Apr 2008 08:04:43 +0200 net/core/dev.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 4d44372..82f77ef 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3240,7 +3240,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd) return -EOPNOTSUPP; case SIOCADDMULTI: - if (!dev->set_multicast_list || + if ((!dev->set_multicast_list && !dev->set_rx_mode) || ifr->ifr_hwaddr.sa_family != AF_UNSPEC) return -EINVAL; if (!netif_device_present(dev)) @@ -3249,7 +3249,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd) dev->addr_len, 1); case SIOCDELMULTI: - if (!dev->set_multicast_list || + if ((!dev->set_multicast_list && !dev->set_rx_mode) || ifr->ifr_hwaddr.sa_family != AF_UNSPEC) return -EINVAL; if (!netif_device_present(dev)) [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device Upstream commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6: Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity or dev_change_flags. Setting it directly causes two unwanted effects: - the next dev_change_flags call will notice a difference between dev->gflags and the actual flags, enable promisc/allmulti mode and incorrectly update dev->gflags - this keeps the underlying device in promisc/allmulti mode until the VLAN device is deleted Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> --- commit da2c2dd3bbd2e57ebf6ce9a404ede39a3103d4bd tree 86c0ea5efb73fada291d7ed70897afd5b6234c7b parent 4eed9fb2564cf763918b9711558169e06dbfb8d5 author Patrick McHardy <kaber@trash.net> Sun, 13 Apr 2008 08:06:00 +0200 committer Patrick McHardy <kaber@trash.net> Sun, 13 Apr 2008 08:06:00 +0200 net/8021q/vlan.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 032bf44..156ad38 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -326,7 +326,7 @@ static int vlan_dev_init(struct net_device *dev) int subclass = 0; /* IFF_BROADCAST|IFF_MULTICAST; ??? */ - dev->flags = real_dev->flags & ~IFF_UP; + dev->flags = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI); dev->iflink = real_dev->ifindex; dev->state = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))) |
From: Patrick McHardy <kaber@trash.net> Date: Sun, 13 Apr 2008 08:08:19 +0200 > Dave, could you forward them to -stable please in case you're > fine with them? Thanks. I've already forwarded the second one, but not the first. I'll forward that one too, thanks!
For comment #4 : Yes. This patch fixes the issue for me. --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int *count, } } - da = kmalloc(sizeof(*da), GFP_ATOMIC); + da = kzalloc(sizeof(*da), GFP_ATOMIC); if (da == NULL) return -ENOMEM; memcpy(da->da_addr, addr, alen);