Bug 6295

Summary: unregister_netdevice loops indefinitely when bringing down an interface if static ARP entries are present
Product: Networking Reporter: Jurij Smakov (jurij)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel config with which the problem was reproduced

Description Jurij Smakov 2006-03-27 23:12:49 UTC
Most recent kernel where this bug did not occur: 2.6.8

Distribution: Debian unstable, bug reproduced with vanilla kernel from kernel.org

Problem Description:
When using a VPN client, such as openvpn or gvpe, killing the client (which
should result in bringing down of the VPN interface) deadlocks it in D+ state,
if static ARP entries are present for the interface. Kernel periodically emits
messages "unregister_netdevice: waiting for <interface> to become free. Usage
count = 1". The usage count number is equal to the number of the static ARP entries.

Thanks a lot to Marc Lehmann for patient debugging of this problem and
constructing the test case.

Steps to reproduce (using openvpn, all mentioned addresses are non-existent):

# modprobe tun
# openvpn --remote 192.168.1.111 --dev tun1 --ifconfig 10.0.0.1 10.0.0.2
In another console:
# ip neigh add to 10.0.0.3 dev tun1 nud permanent
Return to the console in which openvpn is running and press Ctrl-C. The openvpn
process hangs in D+ state (can't be killed), error messages mentioned above are
periodically produced on the console.

Best regards,

Jurij Smakov                                        jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/                   KeyID: C99E03CC
Comment 1 Jurij Smakov 2006-03-27 23:17:00 UTC
Created attachment 7692 [details]
Kernel config with which the problem was reproduced
Comment 2 Andrew Morton 2006-03-27 23:22:23 UTC
bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6295
> 
>            Summary: unregister_netdevice loops indefinitely when bringing
>                     down an interface if static ARP entries are present
>     Kernel Version: 2.6.16
>             Status: NEW
>           Severity: normal
>              Owner: shemminger@osdl.org
>          Submitter: jurij@wooyd.org
> 
> 
> Most recent kernel where this bug did not occur: 2.6.8
> 
> Distribution: Debian unstable, bug reproduced with vanilla kernel from kernel.org
> 
> Problem Description:
> When using a VPN client, such as openvpn or gvpe, killing the client (which
> should result in bringing down of the VPN interface) deadlocks it in D+ state,
> if static ARP entries are present for the interface. Kernel periodically emits
> messages "unregister_netdevice: waiting for <interface> to become free. Usage
> count = 1". The usage count number is equal to the number of the static ARP entries.
> 
> Thanks a lot to Marc Lehmann for patient debugging of this problem and
> constructing the test case.
> 
> Steps to reproduce (using openvpn, all mentioned addresses are non-existent):
> 
> # modprobe tun
> # openvpn --remote 192.168.1.111 --dev tun1 --ifconfig 10.0.0.1 10.0.0.2
> In another console:
> # ip neigh add to 10.0.0.3 dev tun1 nud permanent
> Return to the console in which openvpn is running and press Ctrl-C. The openvpn
> process hangs in D+ state (can't be killed), error messages mentioned above are
> periodically produced on the console.
> 

Comment 3 Stephen Hemminger 2006-04-10 16:34:12 UTC
I can not reproduce this with 2.6.16.2.  Did you have the problem on
2.6.16, or were running the ancient Debian kernel?

Comment 4 Jurij Smakov 2006-04-10 20:56:00 UTC
I have just successfully reproduced it with 2.6.16.2 built with the config I've
posted here earlier. Original submitter mentioned that the bug is not triggered
with all configs. Could you please post your config, so that I can confirm that
it does not trigger the bug and try to identify the option which causes the problem?
Comment 5 Stephen Hemminger 2006-04-11 16:13:02 UTC
The problem is in ATM Classical IP (clip) layer.
It is not correctly cleaning up the neighbor table entry.
Comment 6 Jurij Smakov 2006-04-11 19:57:30 UTC
Right, I can confirm that disabling CONFIG_ATM_CLIP made the problem disappear.
Comment 7 Stephen Hemminger 2006-04-12 10:57:07 UTC
Patch submitted for 2.6.16.5 and 2.6.17 to fix ATM clip neighbor table.