Bug 16611 - can not remove vlan device: unregister_netdevice: waiting for <device> to become free.
Summary: can not remove vlan device: unregister_netdevice: waiting for <device> to bec...
Status: RESOLVED INVALID
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Patrick McHardy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-17 08:37 UTC by LÉVAI Dániel
Modified: 2014-01-23 10:03 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.36
Subsystem:
Regression: No
Bisected commit-id:


Attachments
2.6.33.7 kernel config (56.38 KB, text/plain)
2010-11-19 14:45 UTC, LÉVAI Dániel
Details

Description LÉVAI Dániel 2010-08-17 08:37:02 UTC
Hi!

I'm experiencing this problem when I try to remove a vlan device with vconfig rem <device>.

After deconfiguring the interface (eg.: ifconfig eth1.100 down), and making sure it is really down, I can not remove it with vconfig rem eth1.100, because I'll start getting these messages:
kernel: unregister_netdevice: waiting for eth1.100 to become free. Usage count = 472

And I can not escape from this. I can ssh into the machine or login on a new console, but can not remove the vlan device and thus I can not reboot the machine.

The vlan interface was used by pppoe-server and many pppds, but they are no longer running when I'm trying to remove the vlan interface.
The usage count is different everytime, but I can not imagine what could still use the device. I've killed every process what could have been using it. I've experienced this with 2.6.34.* too. I've upgraded to .35 to see if this was fixed there. Googling just gave me some shady tips, to unload the bonding module or disable ipv6 in the kernel. Although I've done both, this didn't cured it.
I'm willing to test patches of course or compile debug features into the kernel image I'm using.
Also if you need any other information, please do not hesitate to ask.
Comment 1 Andrew Morton 2010-08-19 23:39:09 UTC
Patrick, I think this is yours?
Comment 2 LÉVAI Dániel 2010-08-21 07:09:54 UTC
I've tried with these NICs so far and the problem persists:

02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet (rev 10)
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21)

^^^ These are using the tg3 module.


01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)

^^^ This is using the r8169 module.


01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

^^^ And this is using the e1000e module.


All of the machines create the vlan devices "on top" of the above NICs, and the machines use kernel versions from 2.6.34.1 to 2.6.35.1.
Comment 3 LÉVAI Dániel 2010-08-31 07:06:12 UTC
Can I be of any more assistance or help regarding this? I would really love to help solving this issue.
Thanks!

Daniel
Comment 4 LÉVAI Dániel 2010-09-09 09:49:39 UTC
May I get at least an answer? I'm really willing to my best in helping you guys. This is really a major problem for me.
Comment 5 Michael 2010-09-15 15:26:08 UTC
Not sure, if this is of any help, but I experience either the same or a very similar bug.

It happens when the laptop wants to go into sleep mode. Very often, it gives these messages:

15.9.2010 15.49:32	LaptopMB	NetworkManager[2054]	   SCPlugin-Ifupdown: devices removed (path: /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/net/wlan0, iface: wlan0)
15.9.2010 15.49:42	LaptopMB	kernel	[83857.248053] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.49:52	LaptopMB	kernel	[83867.284151] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.50:02	LaptopMB	kernel	[83877.328604] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.50:12	LaptopMB	kernel	[83887.360163] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.50:22	LaptopMB	kernel	[83897.372118] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.50:32	LaptopMB	kernel	[83907.404160] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.50:42	LaptopMB	kernel	[83917.432044] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.50:52	LaptopMB	kernel	[83927.484642] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.51:03	LaptopMB	kernel	[83937.532556] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.51:13	LaptopMB	kernel	[83947.576552] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.51:23	LaptopMB	kernel	[83957.612124] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.51:33	LaptopMB	kernel	[83967.644150] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.51:43	LaptopMB	kernel	[83977.664134] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.51:53	LaptopMB	kernel	[83987.676113] unregister_netdevice: waiting for wlan0 to become free. Usage count = 3
15.9.2010 15.51:55	LaptopMB	NetworkManager[2054]	<info> radio killswitch /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/ieee80211/phy0/rfkill10 disappeared
15.9.2010 15.51:55	LaptopMB	kernel	[83989.746639] iwlagn 0000:03:00.0: PCI INT A disabled
15.9.2010 17.10:15	LaptopMB	kernel	[83990.263731] PM: Syncing filesystems ... done.
15.9.2010 17.10:15	LaptopMB	kernel	[83990.293006] PM: Preparing system for mem sleep
15.9.2010 17.10:15	LaptopMB	kernel	[83990.293197] Freezing user space processes ... (elapsed 0.01 seconds) done.
15.9.2010 17.10:15	LaptopMB	kernel	[83990.308110] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
15.9.2010 17.10:15	LaptopMB	kernel	[83990.324111] PM: Entering mem sleep

Kernel 2.6.35 as well as 2.6.34.

Not sure, if it may be NetworkManager in my case that's the culprit not freeing the device...
Comment 6 Patrick McHardy 2010-09-15 19:30:46 UTC
(In reply to comment #3)
> Can I be of any more assistance or help regarding this? I would really love
> to
> help solving this issue.

Something is leaking device references to the VLAN devices. Please describe your full network setup, including routing, packet filtering, TC and any other features you might use. Thanks.
Comment 7 LÉVAI Dániel 2010-09-16 10:08:11 UTC
A pppoe-server is running on the machine. It listens on eth1.<vlanid> devices.
At a given time, there is approximately between 300 and 500 pppd devices on the
machine. The pppd devices are constantly and sometimes rapidly changing (ie.:
they are destroyed and recreated always). These ppp devices are created using
the rp-pppoe plugin.
The ppp interfaces are getting routable public ip addresses, and this machine is
their default gateway. From now on, let's this machine (with the problem at
hand) the "gateway". Gateway has iptables rules which are applied to the traffic
from and to the ppp interfaces' addresses. In the FORWARD chain, there is some
rules regarding rejection of RFC1918 addresses and some smtp port filtering,
nothing fancy. Routing wise on the gateway, in the route table, every ppp
interface appears with their public ip address as /32 routes. and these are
advertised as aggregate routes via bgpd to a neighbor router using Quagga.
Every ppp device gets a htb and a bfifo qdisc, which are created using an ip-up
script and removed using an ip-down script.
So after all, nothing is using the vlan devices directly but the pppoe-server
and the pppds.
After every pppd process has been killed (along with the pppoe-server process),
and nothing is using the vlan devices anymore, I still can not remove the vlan
interface.
Comment 8 Patrick McHardy 2010-09-17 13:02:04 UTC
Am 16.09.2010 12:08, schrieb bugzilla-daemon@bugzilla.kernel.org:
> --- Comment #7 from LÉVAI Dániel <leva@ecentrum.hu>  2010-09-16 10:08:11 ---
> A pppoe-server is running on the machine. It listens on eth1.<vlanid>
> devices.
> At a given time, there is approximately between 300 and 500 pppd devices on
> the
> machine. The pppd devices are constantly and sometimes rapidly changing (ie.:
> they are destroyed and recreated always). These ppp devices are created using
> the rp-pppoe plugin.

Is the usage count correlated with the number of PPP devices? What are
the minimal and maximal values you've seen so far?

> The ppp interfaces are getting routable public ip addresses, and this machine
> is
> their default gateway. From now on, let's this machine (with the problem at
> hand) the "gateway". Gateway has iptables rules which are applied to the
> traffic
> from and to the ppp interfaces' addresses. In the FORWARD chain, there is
> some
> rules regarding rejection of RFC1918 addresses and some smtp port filtering,
> nothing fancy. Routing wise on the gateway, in the route table, every ppp
> interface appears with their public ip address as /32 routes. and these are
> advertised as aggregate routes via bgpd to a neighbor router using Quagga.
> Every ppp device gets a htb and a bfifo qdisc, which are created using an
> ip-up
> script and removed using an ip-down script.
> So after all, nothing is using the vlan devices directly but the pppoe-server
> and the pppds.
> After every pppd process has been killed (along with the pppoe-server
> process),
> and nothing is using the vlan devices anymore, I still can not remove the
> vlan
> interface.

Does it make a difference if you stop the pppoe-server before trying to
remove the device?
Comment 9 LÉVAI Dániel 2010-09-17 14:07:06 UTC
(In reply to comment #8)
> Is the usage count correlated with the number of PPP devices? What are
> the minimal and maximal values you've seen so far?
It is correlated with the time pppoe-server has been running. If I must take a guess, every destroyed/recreated ppp device adds some to the usage count, because after just a few hours, the usage count is ~50-200, but after a few days it is sometimes 500+.

Minimal is one. I've started pppoe-server, and stopped it after a few minutes. Maximum I've seen so far is something around 500, IIRC.

> Does it make a difference if you stop the pppoe-server before trying to
> remove the device?

No, it doesn't make a difference.
Comment 10 LÉVAI Dániel 2010-11-09 22:13:14 UTC
This bug is also present in the latest 2.6.36 kernel. Do you guys have any idea what could cause this malfunction?
Comment 11 LÉVAI Dániel 2010-11-19 14:40:53 UTC
I must update this bug report, because just now, I've experienced similar hangs, and error messages with a 2.6.33.7 kernel.
I'm attaching my kernel .config.
Comment 12 LÉVAI Dániel 2010-11-19 14:45:01 UTC
Created attachment 37662 [details]
2.6.33.7 kernel config
Comment 13 LÉVAI Dániel 2011-04-17 11:11:12 UTC
.
Comment 14 Fred Ma 2014-01-23 06:26:40 UTC
I met this problem with kernel 2.6.36 too.

kmesg print like this:
unregister_netdevice: waiting for eth0.2 to become free. Usage count = 1
unregister_netdevice: waiting for eth0.2 to become free. Usage count = 1
unregister_netdevice: waiting for eth0.2 to become free. Usage count = 1
unregister_netdevice: waiting for eth0.2 to become free. Usage count = 1

eth0.2 is a vlan interface of switch eth0, eth0.2 is WAN interface, when repeat pptp down/up and WAN up/down test, this problem occurs sometime, The probability is about 5%.

I merge Eric Dumazet patch(fc75fc8339e7727167443469027540b283daac71), but it doesn't works.

Did you resolve this problem, LÉVAI Dániel ? Thanks.
Comment 15 LÉVAI Dániel 2014-01-23 08:15:02 UTC
Devs didn't give two shits about this, and obviously didn't even bother to understand or track down the problem, then closed this as invalid...
Same old, same old Linux mentality. Next time make sure to come from a wealthy supporter like Red Hat or HP...
Comment 16 Fred Ma 2014-01-23 10:03:55 UTC
OK, so sad...
Is somewhere has guide to analyse these problems ?
Is anyone konws their ? Thanks.

Note You need to log in before you can comment on or make changes to this bug.