Bug 9778

Summary: unregister_netdevice: waiting for [device] to become free
Product: Networking Reporter: Nigel Cunningham (nigel)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: CLOSED CODE_FIX    
Severity: blocking CC: bunk, holger, jbrandeb, wrar, xemul
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc8 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    

Description Nigel Cunningham 2008-01-19 12:20:27 UTC
Latest working kernel version: 2.6.24-rc7
Earliest failing kernel version: 2.6.24-rc8
Distribution: Ubuntu Gutsy
Hardware Environment: AMD64, Atheros 5212 chipset
Software Environment: Fairly recent madwifi svn.
Problem Description: Patch "[PATCH][NEIGH] Fix race between neigh_parms_release and neightbl_fill_parms" (9cd40029423701c376391da59d2c6469672b4bed) causes the error described in the subject line to occur when seeking to unload a device. Also seen by some of my TuxOnIce users with e1000 and ppp42.

Steps to reproduce: Run a kernel with the above patch, bring up a (nonloopback) interface, seek to take it down.
Comment 1 Anonymous Emailer 2008-01-19 16:57:47 UTC
Reply-To: akpm@linux-foundation.org

On Sat, 19 Jan 2008 12:20:28 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9778
> 
>            Summary: unregister_netdevice: waiting for [device] to become
>                     free
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.24-rc8
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: nigel@suspend2.net
> 
> 
> Latest working kernel version: 2.6.24-rc7
> Earliest failing kernel version: 2.6.24-rc8
> Distribution: Ubuntu Gutsy
> Hardware Environment: AMD64, Atheros 5212 chipset
> Software Environment: Fairly recent madwifi svn.
> Problem Description: Patch "[PATCH][NEIGH] Fix race between
> neigh_parms_release
> and neightbl_fill_parms" (9cd40029423701c376391da59d2c6469672b4bed) causes
> the
> error described in the subject line to occur when seeking to unload a device.
> Also seen by some of my TuxOnIce users with e1000 and ppp42.
> 
> Steps to reproduce: Run a kernel with the above patch, bring up a
> (nonloopback)
> interface, seek to take it down.
> 

ouch.
Comment 2 David S. Miller 2008-01-20 02:30:26 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Sat, 19 Jan 2008 16:58:02 -0800

> ouch.

Yep, several people are hitting this it seems.

If Pavel doesn't provide a fix or direction soon I'll just revert.
Comment 3 Evgeniy Polyakov 2008-01-21 04:15:04 UTC
On Sun, Jan 20, 2008 at 02:30:27AM -0800, David Miller (davem@davemloft.net) wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Sat, 19 Jan 2008 16:58:02 -0800
> 
> > ouch.
> 
> Yep, several people are hitting this it seems.
> 
> If Pavel doesn't provide a fix or direction soon I'll just revert.

It looks like patch is still valid.
Here is a problem description as I undestood.

When new device (let's talk about ethernet, since that is what I tested)
is being turned on, it gets neigh_parms entry allocated for it via
inetdev_init(), which is called for NETDEV_REGISTER inetdev event.
This entry is stored in arp_tbl table and is in_dev->arp_parms.

When later new arp entry is created, device is provided into
arp_constructor(), which clones (increase reference counter) device's
in_dev->arp_parms and puts it into provided neighbour entry.

When later we remove device, its in_dev->arp_parms's reference counter
is high enough (it is equal to number of arp entries found on given
device plu one), so neigh_parms_destroy() is not called. Later all
neighbour entries are flushed by garbage collector and reference counter
for that parm hits zero and device can be removed.

I will think about how to fix the problem nicely or if this patch still
can be simplified/dropped, but so far it looks valid. Maybe this
analysis will help someone to fix problem first.

Here is debug dmesg:
[   21.835595] inetdev_init: allocating parms.
[   21.839829] neigh_parms_alloc: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 2.
...
[   30.251576] r8169: eth0: link up
[   31.067079] NET: Registered protocol family 10
[   31.072055] neigh_parms_alloc: parms: ffff81003efc72a8, dev: lo, refcnt: 1, dev_refcnt: 9.
[   31.080891] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[   31.087816] neigh_parms_alloc: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 9.
[   31.097335] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.
[   31.104172] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2.
[   31.500348] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[   32.499628] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[  102.827796] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 3, dev_refcnt: 13.
[  106.828843] neigh_destroy: parms: ffff81003f8c3be8, dev: lo, refcnt: 2, dev_refcnt: 78.
[  109.810987] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.

First arp entry for eth0 device, bump the counter:
[  109.817827] arp_constructor: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2.

[  109.831811] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.
[  109.838661] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2.
[  110.837894] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 2, dev_refcnt: 15.

Can not release that neigh parm:
[  113.638228] neigh_parms_release: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2, dev_refcnt: 5.

Can release some other (for ipv6):
[  113.649380] neigh_parms_release: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 5.
[  113.671806] neigh_parms_destroy: parms: ffff81003efc7210, dev: eth0, dev_refcnt: 3.

[  123.916250] unregister_netdevice: waiting for eth0 to become free. Usage count = 1

GC hits us:
[  124.839572] neigh_destroy: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 11.
[  124.847813] neigh_parms_destroy: parms: ffff81003d8e8df0, dev: eth0, dev_refcnt: 1.
[  124.952026] ACPI: PCI interrupt for device 0000:02:0d.0 disabled
Comment 4 David S. Miller 2008-01-21 04:36:12 UTC
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Mon, 21 Jan 2008 15:14:45 +0300

> I will think about how to fix the problem nicely or if this patch still
> can be simplified/dropped, but so far it looks valid. Maybe this
> analysis will help someone to fix problem first.

I have currently reverted Pavel's patch.

I have no doubt that his patch was in some aspects
correct, the regressions were worse than the disease
he initially intended to cure.

We can add the race fix back once we get to the
bottom of the reference count issue.
Comment 5 Pavel Emelyanov 2008-01-21 05:37:19 UTC
Agree. I overlooked that the neighbor entries (i.e. arp in our case) are not flushed when the device goes down :(

The same problem exists for net namespaces - we need to flush (among other things) the neighbors on namespace stop. Either me, or Denis Lunev will implement this  so this will be done, but I'm afraid, that this is not 2.6.24 issue.
Comment 6 Evgeniy Polyakov 2008-01-21 06:36:50 UTC
On Mon, Jan 21, 2008 at 03:14:45PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> It looks like patch is still valid.
> Here is a problem description as I undestood.
> 
> When new device (let's talk about ethernet, since that is what I tested)
> is being turned on, it gets neigh_parms entry allocated for it via
> inetdev_init(), which is called for NETDEV_REGISTER inetdev event.
> This entry is stored in arp_tbl table and is in_dev->arp_parms.
> 
> When later new arp entry is created, device is provided into
> arp_constructor(), which clones (increase reference counter) device's
> in_dev->arp_parms and puts it into provided neighbour entry.
> 
> When later we remove device, its in_dev->arp_parms's reference counter
> is high enough (it is equal to number of arp entries found on given
> device plu one), so neigh_parms_destroy() is not called. Later all
> neighbour entries are flushed by garbage collector and reference counter
> for that parm hits zero and device can be removed.
> 
> I will think about how to fix the problem nicely or if this patch still
> can be simplified/dropped, but so far it looks valid. Maybe this
> analysis will help someone to fix problem first.

Yes, patch is valid, and there is a (very noticeble) race between
neighbour processing and parm release - parm still can be accessed after
device was fully freed (as with old behaviour when dev_pu() was called
from neigh_parms_release()), although no one access it, so the simplest
solution is to move dev_put() under the table lock and allow to access
parms->dev only under table lock and always check if it is non-null.
So I propose a following patch as a simplest solution for the current
time.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a4f2618..410b7e7 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -34,6 +34,11 @@ struct neighbour;
 
 struct neigh_parms
 {
+	/*
+	 * This device is only allowed to be accessed under table lock (bh turned off)
+	 * and while device is alive. After parm was released, it will be set to NULL
+	 * and has to be always checked before accessed.
+	 */
 	struct net_device *dev;
 	struct neigh_parms *next;
 	int	(*neigh_setup)(struct neighbour *);
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index cc8a2f1..5076acd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1315,7 +1315,12 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 		if (*p == parms) {
 			*p = parms->next;
 			parms->dead = 1;
+			if (parms->dev) {
+				dev_put(parms->dev);
+				parms->dev = NULL;
+			}
 			write_unlock_bh(&tbl->lock);
+
 			call_rcu(&parms->rcu_head, neigh_rcu_free_parms);
 			return;
 		}
@@ -1326,8 +1331,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 
 void neigh_parms_destroy(struct neigh_parms *parms)
 {
-	if (parms->dev)
-		dev_put(parms->dev);
 	kfree(parms);
 }
 
Comment 7 Adrian Bunk 2008-01-22 11:05:13 UTC
Reverted in commit cecbb63967b4f36701b9412a12377e8fe006a93b.