Latest working kernel version: 2.6.24-rc7 Earliest failing kernel version: 2.6.24-rc8 Distribution: Ubuntu Gutsy Hardware Environment: AMD64, Atheros 5212 chipset Software Environment: Fairly recent madwifi svn. Problem Description: Patch "[PATCH][NEIGH] Fix race between neigh_parms_release and neightbl_fill_parms" (9cd40029423701c376391da59d2c6469672b4bed) causes the error described in the subject line to occur when seeking to unload a device. Also seen by some of my TuxOnIce users with e1000 and ppp42. Steps to reproduce: Run a kernel with the above patch, bring up a (nonloopback) interface, seek to take it down.
Reply-To: akpm@linux-foundation.org On Sat, 19 Jan 2008 12:20:28 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9778 > > Summary: unregister_netdevice: waiting for [device] to become > free > Product: Networking > Version: 2.5 > KernelVersion: 2.6.24-rc8 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: nigel@suspend2.net > > > Latest working kernel version: 2.6.24-rc7 > Earliest failing kernel version: 2.6.24-rc8 > Distribution: Ubuntu Gutsy > Hardware Environment: AMD64, Atheros 5212 chipset > Software Environment: Fairly recent madwifi svn. > Problem Description: Patch "[PATCH][NEIGH] Fix race between > neigh_parms_release > and neightbl_fill_parms" (9cd40029423701c376391da59d2c6469672b4bed) causes > the > error described in the subject line to occur when seeking to unload a device. > Also seen by some of my TuxOnIce users with e1000 and ppp42. > > Steps to reproduce: Run a kernel with the above patch, bring up a > (nonloopback) > interface, seek to take it down. > ouch.
From: Andrew Morton <akpm@linux-foundation.org> Date: Sat, 19 Jan 2008 16:58:02 -0800 > ouch. Yep, several people are hitting this it seems. If Pavel doesn't provide a fix or direction soon I'll just revert.
On Sun, Jan 20, 2008 at 02:30:27AM -0800, David Miller (davem@davemloft.net) wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Sat, 19 Jan 2008 16:58:02 -0800 > > > ouch. > > Yep, several people are hitting this it seems. > > If Pavel doesn't provide a fix or direction soon I'll just revert. It looks like patch is still valid. Here is a problem description as I undestood. When new device (let's talk about ethernet, since that is what I tested) is being turned on, it gets neigh_parms entry allocated for it via inetdev_init(), which is called for NETDEV_REGISTER inetdev event. This entry is stored in arp_tbl table and is in_dev->arp_parms. When later new arp entry is created, device is provided into arp_constructor(), which clones (increase reference counter) device's in_dev->arp_parms and puts it into provided neighbour entry. When later we remove device, its in_dev->arp_parms's reference counter is high enough (it is equal to number of arp entries found on given device plu one), so neigh_parms_destroy() is not called. Later all neighbour entries are flushed by garbage collector and reference counter for that parm hits zero and device can be removed. I will think about how to fix the problem nicely or if this patch still can be simplified/dropped, but so far it looks valid. Maybe this analysis will help someone to fix problem first. Here is debug dmesg: [ 21.835595] inetdev_init: allocating parms. [ 21.839829] neigh_parms_alloc: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 2. ... [ 30.251576] r8169: eth0: link up [ 31.067079] NET: Registered protocol family 10 [ 31.072055] neigh_parms_alloc: parms: ffff81003efc72a8, dev: lo, refcnt: 1, dev_refcnt: 9. [ 31.080891] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2. [ 31.087816] neigh_parms_alloc: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 9. [ 31.097335] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2. [ 31.104172] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2. [ 31.500348] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2. [ 32.499628] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2. [ 102.827796] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 3, dev_refcnt: 13. [ 106.828843] neigh_destroy: parms: ffff81003f8c3be8, dev: lo, refcnt: 2, dev_refcnt: 78. [ 109.810987] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2. First arp entry for eth0 device, bump the counter: [ 109.817827] arp_constructor: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2. [ 109.831811] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2. [ 109.838661] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2. [ 110.837894] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 2, dev_refcnt: 15. Can not release that neigh parm: [ 113.638228] neigh_parms_release: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2, dev_refcnt: 5. Can release some other (for ipv6): [ 113.649380] neigh_parms_release: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 5. [ 113.671806] neigh_parms_destroy: parms: ffff81003efc7210, dev: eth0, dev_refcnt: 3. [ 123.916250] unregister_netdevice: waiting for eth0 to become free. Usage count = 1 GC hits us: [ 124.839572] neigh_destroy: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 11. [ 124.847813] neigh_parms_destroy: parms: ffff81003d8e8df0, dev: eth0, dev_refcnt: 1. [ 124.952026] ACPI: PCI interrupt for device 0000:02:0d.0 disabled
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Date: Mon, 21 Jan 2008 15:14:45 +0300 > I will think about how to fix the problem nicely or if this patch still > can be simplified/dropped, but so far it looks valid. Maybe this > analysis will help someone to fix problem first. I have currently reverted Pavel's patch. I have no doubt that his patch was in some aspects correct, the regressions were worse than the disease he initially intended to cure. We can add the race fix back once we get to the bottom of the reference count issue.
Agree. I overlooked that the neighbor entries (i.e. arp in our case) are not flushed when the device goes down :( The same problem exists for net namespaces - we need to flush (among other things) the neighbors on namespace stop. Either me, or Denis Lunev will implement this so this will be done, but I'm afraid, that this is not 2.6.24 issue.
On Mon, Jan 21, 2008 at 03:14:45PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote: > It looks like patch is still valid. > Here is a problem description as I undestood. > > When new device (let's talk about ethernet, since that is what I tested) > is being turned on, it gets neigh_parms entry allocated for it via > inetdev_init(), which is called for NETDEV_REGISTER inetdev event. > This entry is stored in arp_tbl table and is in_dev->arp_parms. > > When later new arp entry is created, device is provided into > arp_constructor(), which clones (increase reference counter) device's > in_dev->arp_parms and puts it into provided neighbour entry. > > When later we remove device, its in_dev->arp_parms's reference counter > is high enough (it is equal to number of arp entries found on given > device plu one), so neigh_parms_destroy() is not called. Later all > neighbour entries are flushed by garbage collector and reference counter > for that parm hits zero and device can be removed. > > I will think about how to fix the problem nicely or if this patch still > can be simplified/dropped, but so far it looks valid. Maybe this > analysis will help someone to fix problem first. Yes, patch is valid, and there is a (very noticeble) race between neighbour processing and parm release - parm still can be accessed after device was fully freed (as with old behaviour when dev_pu() was called from neigh_parms_release()), although no one access it, so the simplest solution is to move dev_put() under the table lock and allow to access parms->dev only under table lock and always check if it is non-null. So I propose a following patch as a simplest solution for the current time. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> diff --git a/include/net/neighbour.h b/include/net/neighbour.h index a4f2618..410b7e7 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -34,6 +34,11 @@ struct neighbour; struct neigh_parms { + /* + * This device is only allowed to be accessed under table lock (bh turned off) + * and while device is alive. After parm was released, it will be set to NULL + * and has to be always checked before accessed. + */ struct net_device *dev; struct neigh_parms *next; int (*neigh_setup)(struct neighbour *); diff --git a/net/core/neighbour.c b/net/core/neighbour.c index cc8a2f1..5076acd 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1315,7 +1315,12 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) if (*p == parms) { *p = parms->next; parms->dead = 1; + if (parms->dev) { + dev_put(parms->dev); + parms->dev = NULL; + } write_unlock_bh(&tbl->lock); + call_rcu(&parms->rcu_head, neigh_rcu_free_parms); return; } @@ -1326,8 +1331,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms) void neigh_parms_destroy(struct neigh_parms *parms) { - if (parms->dev) - dev_put(parms->dev); kfree(parms); }
Reverted in commit cecbb63967b4f36701b9412a12377e8fe006a93b.