Bug 10443 - VLAN: link level multicasts addresses disappears...
Summary: VLAN: link level multicasts addresses disappears...
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Arnaldo Carvalho de Melo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-11 09:18 UTC by Dmitry Butskoy
Modified: 2008-04-15 05:32 UTC (History)
0 users

See Also:
Kernel Version: 2.6.23.15
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
The test shell script. (334 bytes, application/x-shellscript)
2008-04-11 09:19 UTC, Dmitry Butskoy
Details

Description Dmitry Butskoy 2008-04-11 09:18:04 UTC
Each IP interface in the UP state "have" appropriate multicast addresses, which can be shown by "ip maddr show" output.

For example, for an ordinary one-card computer, "ip maddr show" shows something like:

1:      lo
        inet  224.0.0.1
2:      eth0
        link  01:00:5e:00:00:01
        inet  224.0.0.1

or something more if IPv6 is enabled.

I've discovered an issue with the link layer multicasts (aka 01:00:5e:00:00:01 in the example above) and VLAN devices.


When I do some non-trivial vlan manipulations (more than one vlan device, "set up/set down" several times, etc.), "ip maddr show" shows either the wrong "users %d" count for "link" addresses, or the "link" addresses disappears at all.

Unfortunately, it seems that there is no any "stable" scheme to catch the issue, but it happens "often enough" in various situations.

I've wrote a test script (see below), which perform in a loop some trivial things over vlans and then shows "ip maddr show". (To run the script further just type enter. Ctrl-C to break). I've disabled ipv6 module on the test machine to be more clean.

For me, in the first 5 steps I already catched the issue.


Instead of the normal situation:

1:      lo
        inet  224.0.0.1
2:      eth0
        link  01:00:5e:00:00:01 users 3
        inet  224.0.0.1
3:      eth0.2
        link  01:00:5e:00:00:01 users 2
        inet  224.0.0.1
4:      eth0.3
        link  01:00:5e:00:00:01 users 2
        inet  224.0.0.1


I often see:

1:      lo
        inet  224.0.0.1
2:      eth0
        link  01:00:5e:00:00:01
        inet  224.0.0.1
3:      eth0.2
        link  01:00:5e:00:00:01 users 2
        inet  224.0.0.1
4:      eth0.3
        inet  224.0.0.1


or even:

1:      lo
        inet  224.0.0.1
2:      eth0
        inet  224.0.0.1
3:      eth0.2
        inet  224.0.0.1
4:      eth0.3
        inet  224.0.0.1

at all.


Sorry that I've tested it on too old kernel (2.6.23), but I hope the test script is convenient enough to check whether the issue still present at the newest kernels or not.
Comment 1 Dmitry Butskoy 2008-04-11 09:19:21 UTC
Created attachment 15735 [details]
The test shell script.
Comment 2 Anonymous Emailer 2008-04-11 21:41:59 UTC
Reply-To: akpm@linux-foundation.org

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Fri, 11 Apr 2008 09:18:07 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10443
> 
>            Summary: VLAN: link level multicasts addresses disappears...
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.23.15
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: dmitry@butskoy.name
> 
> 
> Each IP interface in the UP state "have" appropriate multicast addresses,
> which
> can be shown by "ip maddr show" output.
> 
> For example, for an ordinary one-card computer, "ip maddr show" shows
> something
> like:
> 
> 1:      lo
>         inet  224.0.0.1
> 2:      eth0
>         link  01:00:5e:00:00:01
>         inet  224.0.0.1
> 
> or something more if IPv6 is enabled.
> 
> I've discovered an issue with the link layer multicasts (aka
> 01:00:5e:00:00:01
> in the example above) and VLAN devices.
> 
> 
> When I do some non-trivial vlan manipulations (more than one vlan device,
> "set
> up/set down" several times, etc.), "ip maddr show" shows either the wrong
> "users %d" count for "link" addresses, or the "link" addresses disappears at
> all.
> 
> Unfortunately, it seems that there is no any "stable" scheme to catch the
> issue, but it happens "often enough" in various situations.
> 
> I've wrote a test script (see below), which perform in a loop some trivial
> things over vlans and then shows "ip maddr show". (To run the script further
> just type enter. Ctrl-C to break). I've disabled ipv6 module on the test
> machine to be more clean.
> 
> For me, in the first 5 steps I already catched the issue.
> 
> 
> Instead of the normal situation:
> 
> 1:      lo
>         inet  224.0.0.1
> 2:      eth0
>         link  01:00:5e:00:00:01 users 3
>         inet  224.0.0.1
> 3:      eth0.2
>         link  01:00:5e:00:00:01 users 2
>         inet  224.0.0.1
> 4:      eth0.3
>         link  01:00:5e:00:00:01 users 2
>         inet  224.0.0.1
> 
> 
> I often see:
> 
> 1:      lo
>         inet  224.0.0.1
> 2:      eth0
>         link  01:00:5e:00:00:01
>         inet  224.0.0.1
> 3:      eth0.2
>         link  01:00:5e:00:00:01 users 2
>         inet  224.0.0.1
> 4:      eth0.3
>         inet  224.0.0.1
> 
> 
> or even:
> 
> 1:      lo
>         inet  224.0.0.1
> 2:      eth0
>         inet  224.0.0.1
> 3:      eth0.2
>         inet  224.0.0.1
> 4:      eth0.3
>         inet  224.0.0.1
> 
> at all.
> 
> 
> Sorry that I've tested it on too old kernel (2.6.23), but I hope the test
> script is convenient enough to check whether the issue still present at the
> newest kernels or not.

There's a test case in the bugzilla report.
Comment 3 David S. Miller 2008-04-12 00:10:47 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Fri, 11 Apr 2008 21:41:34 -0700

> There's a test case in the bugzilla report.

Could be fixed by:

commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6
Author: Patrick McHardy <kaber@trash.net>
Date:   Wed Mar 26 00:15:17 2008 -0700

    [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device
    
    Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity
    or dev_change_flags. Setting it directly causes two unwanted effects:
    
    - the next dev_change_flags call will notice a difference between
      dev->gflags and the actual flags, enable promisc/allmulti
      mode and incorrectly update dev->gflags
    
    - this keeps the underlying device in promisc/allmulti mode until
      the VLAN device is deleted
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 8fbcefe..480ea90 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -660,7 +660,7 @@ static int vlan_dev_init(struct net_device *dev)
 	int subclass = 0;
 
 	/* IFF_BROADCAST|IFF_MULTICAST; ??? */
-	dev->flags  = real_dev->flags & ~IFF_UP;
+	dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI);
 	dev->iflink = real_dev->ifindex;
 	dev->state  = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) |
 					  (1<<__LINK_STATE_DORMANT))) |
Comment 4 Patrick McHardy 2008-04-12 22:44:28 UTC
David Miller wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Fri, 11 Apr 2008 21:41:34 -0700
> 
>> There's a test case in the bugzilla report.
> 
> Could be fixed by:
> 
> commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6
> Author: Patrick McHardy <kaber@trash.net>
> Date:   Wed Mar 26 00:15:17 2008 -0700
> 
>     [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device

Either that patch or this one I guess:

[NET]: Messed multicast lists after dev_mc_sync/unsync

I'll go through our recent multicast fixes and check which
ones make sense for -stable.

commit 12aa343add3eced38a44bdb612b35fdf634d918c
Author: Jorge Boncompte [DTI2] <jorge@dti2.net>
Date:   Tue Feb 19 14:17:04 2008 -0800

    [NET]: Messed multicast lists after dev_mc_sync/unsync
    
    Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 ("[NET]: dev_mcast:
    add multicast list synchronization helpers") from you introduced a new
    field "da_synced" to struct dev_addr_list that is not properly
    initialized to 0. So when any of the current users (8021q, macvlan,
    mac80211) calls dev_mc_sync/unsync they mess the address list for both
    devices.
    
    The attached patch fixed it for me and avoid future problems.
    
    Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
    Signed-off-by: Patrick McHardy <kaber@trash.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 6cfc123..9516105 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int *count,
 		}
 	}
 
-	da = kmalloc(sizeof(*da), GFP_ATOMIC);
+	da = kzalloc(sizeof(*da), GFP_ATOMIC);
 	if (da == NULL)
 		return -ENOMEM;
 	memcpy(da->da_addr, addr, alen);
Comment 5 Patrick McHardy 2008-04-12 23:08:55 UTC
Patrick McHardy wrote:
> David Miller wrote:
>> From: Andrew Morton <akpm@linux-foundation.org>
>> Date: Fri, 11 Apr 2008 21:41:34 -0700
>>
>>> There's a test case in the bugzilla report.
>>
>> Could be fixed by:
>>
>> commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6
>> Author: Patrick McHardy <kaber@trash.net>
>> Date:   Wed Mar 26 00:15:17 2008 -0700
>>
>>     [VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device
> 
> Either that patch or this one I guess:
> 
> [NET]: Messed multicast lists after dev_mc_sync/unsync
> 
> I'll go through our recent multicast fixes and check which
> ones make sense for -stable.

The "messed multicast lists" patch is already in the latest -stable
release. I've attached backports of two more multicast fixes.

Dave, could you forward them to -stable please in case you're
fine with them? Thanks.




[NET]: Fix multicast device ioctl checks

Upstream commit 61ee6bd487b9cc160e533034eb338f2085dc7922:

SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list
method to determine whether it supports multicast. Drivers implementing
secondary unicast support use set_rx_mode however.

Check for both dev->set_multicast_mode and dev->set_rx_mode to determine
multicast capabilities.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

---
commit 4eed9fb2564cf763918b9711558169e06dbfb8d5
tree 1cead411884581aa0b8932edcc399cab9551e556
parent 16c64cac7d9c6a503f49887219c4fe675e7d43d9
author Patrick McHardy <kaber@trash.net> Wed, 26 Mar 2008 02:12:11 -0700
committer Patrick McHardy <kaber@trash.net> Sun, 13 Apr 2008 08:04:43 +0200

 net/core/dev.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4d44372..82f77ef 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3240,7 +3240,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
 			return -EOPNOTSUPP;
 
 		case SIOCADDMULTI:
-			if (!dev->set_multicast_list ||
+			if ((!dev->set_multicast_list && !dev->set_rx_mode) ||
 			    ifr->ifr_hwaddr.sa_family != AF_UNSPEC)
 				return -EINVAL;
 			if (!netif_device_present(dev))
@@ -3249,7 +3249,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
 					  dev->addr_len, 1);
 
 		case SIOCDELMULTI:
-			if (!dev->set_multicast_list ||
+			if ((!dev->set_multicast_list && !dev->set_rx_mode) ||
 			    ifr->ifr_hwaddr.sa_family != AF_UNSPEC)
 				return -EINVAL;
 			if (!netif_device_present(dev))
[VLAN]: Don't copy ALLMULTI/PROMISC flags from underlying device

Upstream commit 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6:

Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity
or dev_change_flags. Setting it directly causes two unwanted effects:

- the next dev_change_flags call will notice a difference between
  dev->gflags and the actual flags, enable promisc/allmulti
  mode and incorrectly update dev->gflags

- this keeps the underlying device in promisc/allmulti mode until
  the VLAN device is deleted

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

---
commit da2c2dd3bbd2e57ebf6ce9a404ede39a3103d4bd
tree 86c0ea5efb73fada291d7ed70897afd5b6234c7b
parent 4eed9fb2564cf763918b9711558169e06dbfb8d5
author Patrick McHardy <kaber@trash.net> Sun, 13 Apr 2008 08:06:00 +0200
committer Patrick McHardy <kaber@trash.net> Sun, 13 Apr 2008 08:06:00 +0200

 net/8021q/vlan.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 032bf44..156ad38 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -326,7 +326,7 @@ static int vlan_dev_init(struct net_device *dev)
 	int subclass = 0;
 
 	/* IFF_BROADCAST|IFF_MULTICAST; ??? */
-	dev->flags  = real_dev->flags & ~IFF_UP;
+	dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI);
 	dev->iflink = real_dev->ifindex;
 	dev->state  = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) |
 					  (1<<__LINK_STATE_DORMANT))) |
Comment 6 David S. Miller 2008-04-12 23:11:42 UTC
From: Patrick McHardy <kaber@trash.net>
Date: Sun, 13 Apr 2008 08:08:19 +0200

> Dave, could you forward them to -stable please in case you're
> fine with them? Thanks.

I've already forwarded the second one, but not the first.
I'll forward that one too, thanks!
Comment 7 Dmitry Butskoy 2008-04-14 06:40:00 UTC
For comment #4 :

Yes. This patch fixes the issue for me.


--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int
*count,
                }
        }

-       da = kmalloc(sizeof(*da), GFP_ATOMIC);
+       da = kzalloc(sizeof(*da), GFP_ATOMIC);
        if (da == NULL)
                return -ENOMEM;
        memcpy(da->da_addr, addr, alen);

Note You need to log in before you can comment on or make changes to this bug.