Bug 6197

Summary: unregister_netdevice: waiting for ppp9 to become free. Usage count = 658
Product: Networking Reporter: Zabavschi Vlad (vlad031)
Component: OtherAssignee: Patrick McHardy (kaber)
Status: CLOSED CODE_FIX    
Severity: blocking CC: andrew.hall, anton, kernel, kernelbugs, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.15 and all 2.6 series Subsystem:
Regression: --- Bisected commit-id:
Attachments: /var/log/messages output
Proposed patch to fix
kernel configuration
Hard test patch
Free queue entries when bridge port disappears
My kernel config
Handle NF_STOP in nf_reinject
Handle NF_STOP and invalid verdicts
fix xfrm state leak

Description Zabavschi Vlad 2006-03-09 01:24:02 UTC
Hi there! I've been experienced a big problem with the lastest kernel version
and all 2.6 versions prior to this version.
I'm using Fedora Core 3 and the machine in cause is a router and a dialin
server. I use pppoe-server in kernel mode (rp-pppoe-3.7 and pppd 2.4.3).
When a user connects to server with pppoe, then to the http daemon and then
disconnects the kernel start saying messages like this kind:

Message from syslogd@nextc at Thu Mar  9 10:51:14 2006 ...

nextc kernel: unregister_netdevice: waiting for ppp9 to become free. Usage count
= 233

NOW this blocks any connection attempt to the pppoe-server for at least 10
min!!! I cannot even do a simple "ip addr list" cause my shell will hang!
I've made this path:

--- include/net/tcp.h.original  2006-03-06 05:22:38.000000000 +0200
+++ include/net/tcp.h   2006-03-06 05:24:58.000000000 +0200
@@ -100,7 +100,7 @@
                                 */


-#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
+#define TCP_TIMEWAIT_LEN (2*HZ) /* how long to wait to destroy TIME-WAIT
                                  * state, about 60 seconds     */
 #define TCP_FIN_TIMEOUT        TCP_TIMEWAIT_LEN
                                  /* BSD style FIN_WAIT2 deadlock breaker.

after some googling on the net, and now the tcp conection dissapears very fast
but the problem still remains...

Best regards,
Zabavschi Vlad
Comment 1 Zabavschi Vlad 2006-03-09 01:28:04 UTC
NOW I realise that this bug is occuring even if the user doesn't disconnects
from the server!!!
Comment 2 Zabavschi Vlad 2006-03-09 01:33:43 UTC
Created attachment 7539 [details]
/var/log/messages output
Comment 3 Andrew Morton 2006-03-09 02:19:13 UTC

Begin forwarded message:

Date: Thu, 9 Mar 2006 01:24:06 -0800
From: bugme-daemon@bugzilla.kernel.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 6197] New: unregister_netdevice: waiting for ppp9 to become free. Usage count = 658


http://bugzilla.kernel.org/show_bug.cgi?id=6197

           Summary: unregister_netdevice: waiting for ppp9 to become free.
                    Usage count = 658
    Kernel Version: 2.6.15 and all 2.6 series
            Status: NEW
          Severity: blocking
             Owner: acme@conectiva.com.br
         Submitter: vlad031@yahoo.com


Hi there! I've been experienced a big problem with the lastest kernel version
and all 2.6 versions prior to this version.
I'm using Fedora Core 3 and the machine in cause is a router and a dialin
server. I use pppoe-server in kernel mode (rp-pppoe-3.7 and pppd 2.4.3).
When a user connects to server with pppoe, then to the http daemon and then
disconnects the kernel start saying messages like this kind:

Message from syslogd@nextc at Thu Mar  9 10:51:14 2006 ...

nextc kernel: unregister_netdevice: waiting for ppp9 to become free. Usage count
= 233

NOW this blocks any connection attempt to the pppoe-server for at least 10
min!!! I cannot even do a simple "ip addr list" cause my shell will hang!
I've made this path:

--- include/net/tcp.h.original  2006-03-06 05:22:38.000000000 +0200
+++ include/net/tcp.h   2006-03-06 05:24:58.000000000 +0200
@@ -100,7 +100,7 @@
                                 */


-#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
+#define TCP_TIMEWAIT_LEN (2*HZ) /* how long to wait to destroy TIME-WAIT
                                  * state, about 60 seconds     */
 #define TCP_FIN_TIMEOUT        TCP_TIMEWAIT_LEN
                                  /* BSD style FIN_WAIT2 deadlock breaker.

after some googling on the net, and now the tcp conection dissapears very fast
but the problem still remains...

Best regards,
Zabavschi Vlad

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 4 Baruch Even 2006-03-09 14:20:51 UTC
Created attachment 7551 [details]
Proposed patch to fix

We need to handle the NETDEV_UNREGISTER message and remove all references to
the device. We currently fail to do so.
Comment 5 Zabavschi Vlad 2006-03-17 06:16:23 UTC
The patch doesn't work .... more then that it breaks the iproute!!!!
# ip route add default dev ppp0 table 1
RTNETLINK answers: No such device
#
Comment 6 Arkadiusz Miskiewicz 2006-04-13 11:10:54 UTC
Same here on 2.6.14.7 with rp-pppoe server (tested pppoe in kernel and in 
userspace mode - no difference). I have few routers with the same kernel and 
the same setup but the problem is happening only one one of these :/
Comment 7 Arkadiusz Miskiewicz 2006-04-13 11:12:11 UTC
Created attachment 7864 [details]
kernel configuration
Comment 8 Arkadiusz Miskiewicz 2006-04-13 11:16:45 UTC
loaded modules:
Module                  Size  Used by
ipt_mac                 1796  1
iptable_nat             6520  0
ip_nat                 16597  1 iptable_nat
ipv6                  225836  32
ipt_connlimit           2787  36
ip_conntrack           45456  3 iptable_nat,ip_nat,ipt_connlimit
nfnetlink               5163  2 ip_nat,ip_conntrack
ipt_ipp2p               7843  54
ipt_p2p                 4060  54
ipt_mark                1612  36
ipt_limit               2072  18
ipt_IMQ                 1949  36
cls_u32                 7297  2
sch_htb                15470  2
iptable_mangle          2308  1
iptable_filter          2453  1
ip6_tables             16532  0
ip_tables              18741  10 ipt_mac,iptable_nat,ipt_connlimit,ipt_ipp2p,ipt
_p2p,ipt_mark,ipt_limit,ipt_IMQ,iptable_mangle,iptable_filter
imq                     4023  0
sch_sfq                 4972  38
realtime                9245  0
pppoe                  10325  10
pppox                   2727  1 pppoe
ppp_generic            24204  22 pppoe,pppox
slhc                    6068  1 ppp_generic
8139too                22976  0
mii                     4316  1 8139too
ne2k_pci                8674  0
8390                    8061  1 ne2k_pci
ext3                  120069  2
mbcache                 7417  1 ext3
jbd                    46895  1 ext3
ide_disk               13945  4
piix                    9060  0 [permanent]
ide_core              110102  2 ide_disk,piix

Comment 9 Zabavschi Vlad 2006-05-23 14:18:58 UTC
The problem still remains on the 2.6.16 series ... could someone PLEASE fix this
bug?????????? I'm waiting for weeks now and the problem isn't solved !!!
Comment 10 Zabavschi Vlad 2006-06-07 13:55:45 UTC
Created attachment 8272 [details]
Hard test patch

So ... I've patched dev.c NOT to check for refs anymore ... 
Now the problem remains but it doesn't appear that often ... and instead of
showing the message " waiting for ... to bla bla bla " it gives me this:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:2949!
invalid opcode: 0000 [#1]
PREEMPT
Modules linked in: xt_MARK xt_mark xt_limit xt_state bonding sch_teql eql
ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT
ipt_REDIRECT ipt_recent ipt_policy ipt_owner ipt_NETMAP ipt_multiport
ipt_MASQUERADE ipt_LOG ipt_iprange ipt_IMQ ipt_hashlimit ipt_esp ipt_ECN
ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype iptable_raw iptable_nat
iptable_mangle iptable_filter ip_tables ip_nat_tftp ip_nat_snmp_basic
ip_nat_pptp ip_nat_irc ip_nat_ftp ip_nat_amanda ip_conntrack_tftp
ip_conntrack_pptp ip_conntrack_netlink ip_nat ip_conntrack_netbios_ns
ip_conntrack_irc ip_conntrack_ftp ip_conntrack_amanda ip_conntrack arpt_mangle
arptable_filter arp_tables intel_agp agpgart
CPU:	0
EIP:	0060:[<c0258ffb>]    Not tainted VLI
EFLAGS: 00010206   (2.6.16.19.31 #1)
EIP is at netdev_run_todo+0xf2/0x1a4
eax: 00000003	ebx: dd286400	ecx: d7923df0	edx: 00000000
esi: dd286400	edi: d7e79c00	ebp: dfccdac4	esp: dccc5f68
ds: 007b   es: 007b   ss: 0068
Process pppd (pid: 22669, threadinfo=dccc5000 task=de559030)
Stack: <0>dccc5f68 dccc5f68 dccc5000 dd286400 c02249ff d7e79c00 d8f92900
dfca611c
       c0224c54 dfccdb1c c013f299 dffa4f40 d8f92900 00000000 dfc03580 dccc5000
       c013cf85 00000008 80040e98 00000000 c010245f 00000008 ffffffe0 8003e488
Call Trace:
 [<c02249ff>] ppp_shutdown_interface+0x57/0xa0
 [<c0224c54>] ppp_release+0x20/0x4c
 [<c013f299>] __fput+0x85/0x146
 [<c013cf85>] filp_close+0x4e/0x54
 [<c010245f>] sysenter_past_esp+0x54/0x75
Code: bd 00 00 00 50 53 68 bc ea 2e c0 e9 a9 00 00 00 89 d8 e8 21 8d 00 00 c7
83 6c 01 00 00 04 00 00 00 8b 83 58 01 00 00 85 c0 74 08 <0f> 0b 85 0b 91 e8 2e
c0 83 bb a8 00 00 00 00 74 1c 68 86 0b 00

[root@localhost ~]#
Comment 11 Zabavschi Vlad 2006-06-09 10:59:14 UTC
I HAVE FOUND THE BUG!
The bug is in the conntrack modules from netfilter ... if I do nat, they
keep a ref and the ppp connection hangs!
Comment 12 Zabavschi Vlad 2006-06-09 12:17:20 UTC
Now ... this bug starts to piss me off ... I see that conntrack is keeping
connection trackings for all addresses ... not just the nated ones ... 
isn't that a bug?! (and also a performance impediment?!)

Best regards,
Vlad Z.
Comment 13 ralf 2006-06-11 13:39:37 UTC
Your kernel configuration file looks like you are using a distribution kernel
with some big patches, so please bug your vendor.

Aside I wonder how I ended on the cc list for this bug.  I may have done some
networking stuff but certainly nothing related to PPP.
Comment 14 Zabavschi Vlad 2006-06-12 02:42:39 UTC
First, I'm not using a distro kernel ... 
I'm using the lastest stable vanilla kernel from kernel.org...
Second, THIS IS A KERNEL BUG, not an pppd...
Sorry if my mail distrubed you, 
but it seems that no one is willing to help me ...

So, I will no longer use pppoe-server, as for those who can help me and 
don't want to ... I'll just say -> YOU SUCK!

Sorry for everyone else ...
Best regards,
Vlad Z.
Comment 15 Patrick McHardy 2006-06-13 08:53:24 UTC
Please retry without the IMQ patch and tell us if the problem persists.
Comment 16 Zabavschi Vlad 2006-06-14 04:48:52 UTC
Yes, the problem still persists without the imq patch ... however ... 
I'm starting to think that this is a ppp bug  ... I really don't know how 
to approach this problem ...

Best regards,
Vlad Z.
Comment 17 Patrick McHardy 2006-06-15 13:35:06 UTC
Possible, but it could just as well be anywhere else. Just to clariy: unloading
conntrack/NAT really makes the problem go away? If so please describe your full
ruleset and (in any case) your network setup.
Comment 18 Zabavschi Vlad 2006-06-17 14:39:47 UTC
Well.. I can accelerate the appearance of the bug ... but still ... 
I can't get him go away ... this happens when I let users connect to server and 
if I just put -j DROP to his ip in FORWARD, table filter ...

Even if I put 0 in /proc/sys/net/ipv4/ip_conntrack_max the bug is still there...
I don't know what to do...

Best regards,
Vlad Z.
Comment 19 Patrick McHardy 2006-06-17 19:59:54 UTC
Please answer my questions if you want me to help you.
Comment 20 Zabavschi Vlad 2006-06-18 02:25:24 UTC
And I told you that no ... I cannot get the bug go away when unloading
conntrack/NAT ... That was just an illusion...
And I was also trying to say that the bug appears very fast with the -j DROP...
Comment 21 Patrick McHardy 2006-06-18 05:06:10 UTC
That was only one part of it. Please describe your network setup and post
your iptables ruleset.
Comment 22 Anton Khalikov 2006-06-18 10:05:09 UTC
Guys !

I have exactly the same problem but with bridging code. I use XEN and bridged
networking. When any of hosted domUs goes down, the system tried to release
bridged network interface but fails with message:
unregister_netdevice: waiting for vif3.0 to become free. Usage count = 15
This bug existed in 2.6.12 kernel and all kernels I tried with XEN up to current
2.6.16.13. 
Someone may say that this is XEN related problem but its definatelly not because
XEN patches have nothing to do with vanilla bridging support.
Also I did some search through this bugzilla and found another guy with the
problem but ipv6 related: http://bugzilla.kernel.org/show_bug.cgi?id=6698

I can give as more information about my system and settings as one need, just ask
Comment 23 Patrick McHardy 2006-06-18 11:04:57 UTC
That message is just the symptom of a bug, which can be caused by a large number
of reasons, so without any evidence I wouldn't necessarily expect them to be
related. Feel free to describe your setup, we might notice some similarities,
but if not I prefer to focus on this case first.
Comment 24 Anton Khalikov 2006-06-18 22:10:29 UTC
I am not sure what info is required, here is something that may be useful. If
anything else needs, please let me know
andrew:~# lsmod
Module                  Size  Used by
xt_physdev              2128  5
iptable_nat             6596  0
ip_nat                 15244  1 iptable_nat
ip_conntrack           45164  2 iptable_nat,ip_nat
nfnetlink               5176  2 ip_nat,ip_conntrack
iptable_filter          2304  1
ip_tables              10936  2 iptable_nat,iptable_filter
x_tables                9732  3 xt_physdev,iptable_nat,ip_tables
ip_queue                8800  1
intel_agp              20732  1
agpgart                30096  1 intel_agp

andrew:~# lspci
0000:00:00.0 Host bridge: Intel Corp. 82865G/PE/P DRAM Controller/Host-Hub
Interface (rev 02)
0000:00:01.0 PCI bridge: Intel Corp. 82865G/PE/P PCI to AGP Controller (rev 02)
0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #1
(rev 02)
0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #2
(rev 02)
0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #3
(rev 02)
0000:00:1d.3 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #4
(rev 02)
0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02)
0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage
Controller (rev 02)
0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
0000:00:1f.5 Multimedia audio controller: Intel Corp. 82801EB/ER (ICH5/ICH5R)
AC'97 Audio Controller (rev 02)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX
5200] (rev a1)
0000:02:01.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)


andrew:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0A:5E:49:B7:8F
          inet addr:xxx.xxx.191.100  Bcast:xxx.xxx.191.111  Mask:255.255.255.240
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7392 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4358 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5677283 (5.4 MiB)  TX bytes:1011620 (987.9 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:447 errors:0 dropped:0 overruns:0 frame:0
          TX packets:447 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:153615 (150.0 KiB)  TX bytes:153615 (150.0 KiB)

peth0     Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:824228 errors:0 dropped:0 overruns:30 frame:0
          TX packets:1029137 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:178338055 (170.0 MiB)  TX bytes:1302189026 (1.2 GiB)
          Interrupt:16 Base address:0x2000

vif0.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4358 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7392 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1011620 (987.9 KiB)  TX bytes:5677283 (5.4 MiB)

vif5.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1669 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3235 errors:0 dropped:12 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:238965 (233.3 KiB)  TX bytes:896602 (875.5 KiB)

vif6.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:101808 errors:0 dropped:0 overruns:0 frame:0
          TX packets:88639 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:92074727 (87.8 MiB)  TX bytes:11000932 (10.4 MiB)

vif7.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:96355 errors:0 dropped:0 overruns:0 frame:0
          TX packets:87861 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:80993591 (77.2 MiB)  TX bytes:14691890 (14.0 MiB)

vif8.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:820304 errors:0 dropped:0 overruns:0 frame:0
          TX packets:636143 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1125509012 (1.0 GiB)  TX bytes:142082026 (135.4 MiB)

vif9.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4589 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4718 errors:0 dropped:4 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2115778 (2.0 MiB)  TX bytes:715754 (698.9 KiB)

xenbr0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:740 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:29306 (28.6 KiB)  TX bytes:0 (0.0 b)


The only common thing I can see that we both use netfilter conntrack
Comment 25 Anton Khalikov 2006-06-19 05:07:43 UTC
some more info

andrew:~# iptables -nvL
Chain INPUT (policy ACCEPT 7775 packets, 5872K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 12182 packets, 6714K bytes)
 pkts bytes target     prot opt in     out     source               destination
 381K  420M QUEUE      all  --  *      *       0.0.0.0/0            217.24.191.105
 196K   37M QUEUE      all  --  *      *       0.0.0.0/0            217.24.191.104
 256K   27M QUEUE      all  --  *      *       0.0.0.0/0            217.24.191.103
 728K  152M QUEUE      all  --  *      *       0.0.0.0/0            217.24.191.102
 6707 1022K QUEUE      all  --  *      *       0.0.0.0/0            217.24.191.101
 258K   26M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif5.0
 289K  251M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif6.0
 220K  198M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif7.0
 948K 1270M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif8.0
 7495 3030K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif9.0

Chain OUTPUT (policy ACCEPT 5522 packets, 1316K bytes)
 pkts bytes target     prot opt in     out     source               destination
Comment 26 Patrick McHardy 2006-06-19 10:06:55 UTC
@Anton: does flushing ip_queue's queue help? Packets passed to iptables from the
bridging layer have a reference to the bridge port which ip_queue doesn't check
for when a NETDEV_UNREGISTER notification comes, so these packets are not
automatically released.
Comment 27 Patrick McHardy 2006-06-19 10:23:59 UTC
Created attachment 8343 [details]
Free queue entries when bridge port disappears

@Anton: alternatively just try this patch.
Comment 28 Anton Khalikov 2006-06-20 09:11:59 UTC
Patrick, I am sorry for my silence - been damn busy :(
Unfortunatelly I can't apply your patch right now but I can try in 2-3 hours.

What I can check right now is: I can remove all lines from iptables with -j
QUEUE, then unload ip_queue module and then try to shutdown test domU (and it's
interface). Would it help somehow to shed some light before I apply the patch ?
Comment 29 Patrick McHardy 2006-06-20 09:20:12 UTC
Yes, that would also show if the problem really is within ip_queue.
Comment 30 Anton Khalikov 2006-06-20 09:29:22 UTC
Well ... I tried and got the same again:
andrew:~# iptables -n -v -L
Chain INPUT (policy ACCEPT 16628 packets, 7073K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 29629 packets, 8852K bytes)
 pkts bytes target     prot opt in     out     source               destination
 833K  690M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif6.0
 549K  502M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif7.0
2141K 2797M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif8.0
20186 5781K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0  
        PHYSDEV match --physdev-in vif9.0

Chain OUTPUT (policy ACCEPT 14553 packets, 2948K bytes)
 pkts bytes target     prot opt in     out     source               destination



syslog:
Jun 20 22:25:58 andrew kernel: xenbr0: port 3(vif5.0) entering disabled state
Jun 20 22:25:58 andrew kernel: device vif5.0 left promiscuous mode
Jun 20 22:25:58 andrew kernel: xenbr0: port 3(vif5.0) entering disabled state
Jun 20 22:25:58 andrew logger: /etc/xen/scripts/vif-bridge: offline
XENBUS_PATH=backend/vif/5/0
Jun 20 22:25:59 andrew logger: /etc/xen/scripts/vif-bridge: brctl delif xenbr0
vif5.0 failed
Jun 20 22:25:59 andrew logger: /etc/xen/scripts/vif-bridge: ifconfig vif5.0 down
failed
Jun 20 22:25:59 andrew logger: /etc/xen/scripts/vif-bridge: Successful
vif-bridge offline for vif5.0, bridge xenbr0.
Jun 20 22:26:08 andrew kernel: unregister_netdevice: waiting for vif5.0 to
become free. Usage count = 249
Jun 20 22:26:39 andrew last message repeated 3 times
Comment 31 Anton Khalikov 2006-06-20 09:31:16 UTC
P.S. sorry I ran "iptables -n -v -L" after shutdown of domU but I don't think it
really matters
Comment 32 Anton Khalikov 2006-06-20 09:50:26 UTC
Patrick, is it possible to check what is locking the device ?
I found that kernel gets in infinite loop in net/core/dev.c in
netdev_wait_allrefs() function. I see where it gets usage count from but I can't
see how to get what is using the device to print an useful debug message.
Comment 33 Patrick McHardy 2006-06-20 10:07:17 UTC
No, thats not possible. Something took the references and didn't release them,
which could be for a number of reasons:

- the references leaked
- something that contains them leaked
- something that is still holding them doesn't notice that they should be released

ip_queue falls in the last category. I'll see if I can spot some more, I could
imagine more similar mistakes since this part of bridging is badly integrated.
BTW, does the problem go away after some time (couple of minutes)?
Comment 34 Patrick McHardy 2006-06-20 10:13:01 UTC
One more thing: are you using NAT on your bridge?
Comment 35 Anton Khalikov 2006-06-20 10:22:13 UTC
Patrick:
1. No the problem is still here after about an hour since interface shudown.
refcount is still the same = 249
2. No I don't use NAT. All addresses including dom0 address are from the same
network xxx.xxx.191.96/255.255.255.240

again, the current list of loaded modules:
andrew:~# lsmod
Module                  Size  Used by
xt_physdev              2128  4
iptable_nat             6596  0
ip_nat                 15244  1 iptable_nat
ip_conntrack           45164  2 iptable_nat,ip_nat
nfnetlink               5176  2 ip_nat,ip_conntrack
iptable_filter          2304  1
ip_tables              10936  2 iptable_nat,iptable_filter
x_tables                9732  3 xt_physdev,iptable_nat,ip_tables
intel_agp              20732  1
agpgart                30096  1 intel_agp

and current iptables rules:
andrew:~# iptables-save
# Generated by iptables-save v1.2.11 on Tue Jun 20 23:22:41 2006
*nat
:PREROUTING ACCEPT [127949:6881642]
:POSTROUTING ACCEPT [128275:6901908]
:OUTPUT ACCEPT [329:20482]
COMMIT
# Completed on Tue Jun 20 23:22:41 2006
# Generated by iptables-save v1.2.11 on Tue Jun 20 23:22:41 2006
*filter
:INPUT ACCEPT [18890:7257551]
:FORWARD ACCEPT [47562:17379588]
:OUTPUT ACCEPT [16316:3317802]
-A FORWARD -m physdev  --physdev-in vif6.0 -j ACCEPT
-A FORWARD -m physdev  --physdev-in vif7.0 -j ACCEPT
-A FORWARD -m physdev  --physdev-in vif8.0 -j ACCEPT
-A FORWARD -m physdev  --physdev-in vif9.0 -j ACCEPT
COMMIT
# Completed on Tue Jun 20 23:22:41 2006
Comment 36 Anton Khalikov 2006-06-20 10:29:39 UTC
Created attachment 8353 [details]
My kernel config

Here is my dom0 kernel config. Hope this helps
Comment 37 Anton Khalikov 2006-07-04 07:44:26 UTC
Hello guys

Today I set up new XEN server with exactly the same config except that I used
Debian Etch (testing) instead of Debian Sarge (stable) from dom0 but the same domUs.
I see NO SUCH PROBLEM with this distro. It looks like it's Debian Sarge specific
problem.
I am going to move all users from affected server to the new one and then
reinstall new one with Etch to be sure it is not hardware related.

P.S. Debian Etch uses another gcc (version 4.0.4 20060507) vs gcc version 3.3.5
in Sarge
Comment 38 Anton Khalikov 2006-07-06 12:17:54 UTC
Umm sorry ...
It seems like Debian is not guilty. I am getting this message again on the new
server. Looks like it's ip_queue fault. I will check the Patrick's patch
tomorrow to be sure if it helps.
Comment 39 Anton Khalikov 2006-07-06 23:39:38 UTC
2 Patrick:
I've done a lot of tests and I can say for sure: the problem is inside IPQ
kernel module. And I tried to apply your patch too but I doesn't help.
More information: when I boot the server without `modprobe ip_queue` it works
excellent. I can reboot or shutdown any domU (which makes changes to bridge
topology) without problem. After I load IPQ module and redirect some traffic to
it ( I use ipcad to collect stats) I get the problem. Then even if I do `rmmod
ip_queue` before I get kernel message "unregister_netdevice" I get it anyway so
it looks like IPQ module leaves some trace in kernel (dunno how).

So I am going to try to use ULOG yet.
Comment 40 Patrick McHardy 2006-07-07 00:15:19 UTC
Created attachment 8494 [details]
Handle NF_STOP in nf_reinject

This means we're leaking somewhere on the queue path. I found a possible reason
for this in the queueing core, but I can't see how this could be triggered.
Please try this patch anyway.
Comment 41 Anton Khalikov 2006-07-07 00:38:55 UTC
Nope, SSDD :(
I applied your patch in addition to 2 previous ones.

Btw after I applied it and then ran `make modules` it didnt rebuild any modules
so  I ran `touch net/netfilter/nfnetlink_queue.c` and then `make modules` again
so it rebuild net/netfilter/nfnetlink_queue.ko
Was it correct ?
Comment 42 Patrick McHardy 2006-07-07 00:41:13 UTC
No, this affects a part that is always statically built in, so you need to
rebuild and install the entire kernel.
Comment 43 Anton Khalikov 2006-07-07 01:51:41 UTC
Patrick, I have rebuilt the whole kernel (make clean and then make ...) 
Now it behaves strange but a bit more stable. When i try to ping domU domains to
generate some traffic it works stable. When I use flood ping to do the same
sometimes I get unregister_netdevice message after shutdown but not always. I
tried to reproduce the bug twice but I have no success.
The statistics is as follows: I rebooted dom0 server 4 times and did some pings
(normal and flood) after every reboot and tried to reboot/shutdown domU domains.
Once I got the error, 3 times everything worked fine (stable). 
First time it was fine, then second time I got error and used sysrq reboot, then
2 times it worked fine and I couldn't reproduce the bug again.
Comment 44 Patrick McHardy 2006-07-07 01:57:12 UTC
There was one case my previous patch didn't handle, in case userspace sends
incorrect verdicts the packet may also leak. Does this patch behave more reliable?

BTW, you don't need to make clean, just "make" should be fine.
Comment 45 Patrick McHardy 2006-07-07 01:57:55 UTC
Created attachment 8496 [details]
Handle NF_STOP and invalid verdicts
Comment 46 Patrick McHardy 2006-07-07 02:02:22 UTC
One more thing: please also try to rmmod the ip_queue module with this patch in
case the problem appears.
Comment 47 Anton Khalikov 2006-07-07 02:08:57 UTC
Patrick, please let me know if your fiest patchset (Free queue entries when
bridge port disappears) is required. I have it applied to my kernel.
Comment 48 Patrick McHardy 2006-07-07 02:20:00 UTC
Not sure if its required for your case (it doesn't seem to help), but it is a
correct fix and it also part of 2.6.18-rc1.
Comment 49 Anton Khalikov 2006-07-07 02:30:18 UTC
Ok, some new info: I still CAN reproduce the bug but only under high traffic (2
flood pings to different domUs in parallel). First domU rebooted fine under
flood ping, second domU failed and I got:
unregister_netdevice: waiting for vif10.0 to become free. Usage count = 49786
removing -j QUEUE lines from iptables and then `rmmod ip_queue` didn't help too.
When traffic is not high everything works fine.
Comment 50 Patrick McHardy 2006-07-07 02:33:56 UTC
OK thanks. The patch is right in any case, I'm going to continue looking for
other possible reasons, but probably won't get to it today. One question I'm not
entirely clear about: the problem appears in the host system, not in the guest
system, right?
Comment 51 Anton Khalikov 2006-07-07 02:39:54 UTC
Yes the problem appears in host system (dom0) when rebooting/shutdown guest
system (domU).

Patrick, before you go, please answer: are there any chances you do something on
weekends ? I need to upgrade the system and put the server back to datacenter
today but I can wait until monday's morning (local time) if you need to check
anything else before this time.
Comment 52 Patrick McHardy 2006-07-07 02:44:27 UTC
I'll look into it this weekend, but not sure if I will find anything.
Comment 53 Anton Khalikov 2006-07-07 02:58:42 UTC
Ok I'll leave the server here and will be watching this topic
Comment 54 Natalie Protasevich 2007-07-07 15:46:25 UTC
Any updates on this problem? Vlad, Anton - have you tested with later kernels since?
There were multiple netlink fixes submitted by Patrick lately. Please test with latest kernel if you haven't already.
Thanks.
Comment 55 Stephen Hemminger 2007-09-05 06:56:56 UTC
*** Bug 8638 has been marked as a duplicate of this bug. ***
Comment 56 Andrew Hall 2007-11-11 13:48:41 UTC
I've consistently seen this problem and it appears to be a deadlock problem with a net device not being able to be released because it's being held by ipsec/racoon. I've tried this on stock kernels 2.6.21.5 and 2.6.23.1 both with the same issue.

The scenario that makes this happen consistently for me is to have two pppoe processes in use at the same time with an ipsec tunnel using racoon established over one of the links. If I then terminate the link (ppp0) that does NOT have the tunnel established over it, I get:
 
unregister_netdevice: waiting for ppp0 to become free. Usage count = 1

and both the pppd and racoon processes are held infinitely in a non-interruptible 'D' state and the only option is to reboot the server. Killing the other pppoe session (ppp2) where the tunnel is established over this link does not seem to cause this problem. Interesting also, if I tell Racoon to ONLY bind to the ppp device that has the tunnel, the problem still happens. 
Comment 57 Daniel Lin 2007-11-25 16:59:55 UTC
On 2.6.23.1, there are two setups which expose this bug for me:

eth0 (skge)
eth1 (tulip)
6to4 (sit)
IPv4 and IPv6 forwarding is in use.
After about 24 hours of heavy traffic, attempting to bring down 6to4 hangs.

eth0 (skge)
eth1 (tulip)
br0 (bridge of eth0 and eth1)
After about 24 hours of heavy traffic, attempting to bring down br0 hangs.

In both cases, dmesg shows a steady stream of something like
unregister_netdevice: waiting for br0 to become free. Usage count = 647
Comment 58 Trevor Cordes 2007-11-26 06:26:45 UTC
Comment #56, I get exactly the same behavior, but I'm not using racoon (just manual 2.6sec configuration).  I will be upgrading the relevant box to F8 very soon and will report back.  That will move me from 2.6.20 to 2.6.23.  I'm disappointed to hear you have the bug show up in 23, that means this hasn't been resolved yet?!
Comment 59 Patrick McHardy 2007-11-26 07:13:16 UTC
Created attachment 13759 [details]
fix xfrm state leak

This should fix the IPsec-related problems. As for the others, since this report is a complete mess of I don't know how many different problems I'm going to remove myself from the CC list. If you want someone to actually look into this, I 'd suggest opening new bugs for the different cases and adding full information about your network configuration.
Comment 60 Natalie Protasevich 2008-03-18 03:06:39 UTC
Why don't we do just that. I will close the bug, and if anyone still has problems please open new entry. It is OK if we will have duplicates, better than several problems on one...
Patrick, are you planning to submit the patch, or was the code fixed in other way?
Thanks.
Comment 61 Patrick McHardy 2008-03-18 04:29:34 UTC
The IPsec leak is already fixed upstream (5dba4797), using a slightly different patch.
Comment 62 Natalie Protasevich 2008-03-18 08:32:30 UTC
Great, thanks! Closing the bug then.