Bug 6698

Summary: unregister_netdevice hangs indefinitely from /proc/sys/net/ipv6/conf/all/forwarding
Product: Networking Reporter: Bernhard M. Wiedemann (kernelbmw2017)
Component: IPV6Assignee: Hideaki YOSHIFUJI (yoshfuji)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: bunk, herbert, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.17-rc6 Subsystem:
Regression: --- Bisected commit-id:

Description Bernhard M. Wiedemann 2006-06-16 09:20:16 UTC
Most recent kernel where this bug did not occur: none known (yet)
Distribution: reproduced on Debian/stable, SuSE/10.0, SuSE/10.1
Hardware Environment: reproduced on UML, i386, x86/64
Software Environment: reproduced with openvpn and UML tap devices
Problem Description: after adding IPv6 to my previously working openvpn
tunneling setup, a (really old) IPv6-related bug started to occurr:
http://lkml.org/lkml/2003/8/21/1
I also reproduced this bug with kernel 2.6.15.1(vanilla,uml) and
2.6.16.13(SuSE-version,x86/64) and linux-2.6.13 (SuSE-version,i386)

Steps to reproduce:
echo 0 > /proc/sys/net/ipv6/conf/all/forwarding # this is important initialization

Have (any version of) openvpn open a tunnel using a tap (virtual ethernet)
device. In the "up" script do:
echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
this can be easily tested with these lines:
apt-get install openvpn
modprobe tun
mknod /dev/net/tun c 10 200
echo 0 > /proc/sys/net/ipv6/conf/all/forwarding
echo "echo 1 > /proc/sys/net/ipv6/conf/all/forwarding" > /tmp/up ; chmod a+x /tmp/up
openvpn --dev-type tap --remote tunnel.lsmod.de 5003 --ifconfig 10.9.0.2
255.255.255.0 --dev-node /dev/net/tun --up /tmp/up
# at this point you can verify your tunnel setup by ping 10.9.0.1
# on the server I have this: openvpn --dev-type tap --ifconfig 10.9.0.1
255.255.255.0 --port 5003 --dev-node /dev/net/tun --float
# you need UDP port 5003 to pass through your firewall for this


Alternatively get an user-mode-linux(UML) binary and do something along the
lines of:
apt-get install uml-utilities
TAP=`tunctl -b`
ifconfig $TAP 192.168.121.1 netmask 255.255.255.252
echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
/path/to/linux eth0=tuntap,$TAP ... # booting up to the point where the tap dev
is really bound (at "ifconfig eth0 192.168.121.2" within the UML)
tunctl -d $TAP


After 20 seconds kill the openvpn or linux process.
This hangs indefinitely, leaving the openvpn process in "D" state.
syslog states every 10 secs:
unregister_netdevice: waiting for tap0 to become free.  Usage count = 1

The kernel will then hang "ifconfig" and "ip" commands, probably because the
waiting-for-tap0 still holds a mutex.

After a dozen reboots of trying I found a work-around: replacing the critical
line with
(sleep 2 ; echo 1 > /proc/sys/net/ipv6/conf/all/forwarding )&

A sleep 1 does not suffice.
Doing the echo before calling openvpn also works fine, so there seems to be a
timing problem or race condition during initialization of the IPv6 on the newly
created tap0 device.
Comment 1 Bernhard M. Wiedemann 2006-06-16 20:48:31 UTC
Now I found an even simpler way to trigger this bug, not needing any separate
software.

# preparation
ifconfig eth0 down
echo 0 > /proc/sys/net/ipv6/conf/all/forwarding
modprobe ipv6

# triggering - this line should be executed as one statement
ifconfig eth0 up ; echo 1 > /proc/sys/net/ipv6/conf/eth0/forwarding
ifconfig eth0 down
rmmod $ethernetdriver # this hangs because of the refcount-leak

also reproduced the bug on a 2.4.21 kernel, so this bug is REALLY old
Comment 2 Andrew Morton 2006-06-19 16:02:25 UTC
bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6698
> 
>            Summary: unregister_netdevice hangs indefinitely from
>                     /proc/sys/net/ipv6/conf/all/forwarding
>     Kernel Version: 2.6.17-rc6
>             Status: NEW
>           Severity: normal
>              Owner: yoshfuji@linux-ipv6.org
>          Submitter: kernelbmw@lsmod.de
> 
> 
> Most recent kernel where this bug did not occur: none known (yet)
> Distribution: reproduced on Debian/stable, SuSE/10.0, SuSE/10.1
> Hardware Environment: reproduced on UML, i386, x86/64
> Software Environment: reproduced with openvpn and UML tap devices
> Problem Description: after adding IPv6 to my previously working openvpn
> tunneling setup, a (really old) IPv6-related bug started to occurr:
> http://lkml.org/lkml/2003/8/21/1
> I also reproduced this bug with kernel 2.6.15.1(vanilla,uml) and
> 2.6.16.13(SuSE-version,x86/64) and linux-2.6.13 (SuSE-version,i386)
> 
> Steps to reproduce:
> echo 0 > /proc/sys/net/ipv6/conf/all/forwarding # this is important initialization
> 
> Have (any version of) openvpn open a tunnel using a tap (virtual ethernet)
> device. In the "up" script do:
> echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
> this can be easily tested with these lines:
> apt-get install openvpn
> modprobe tun
> mknod /dev/net/tun c 10 200
> echo 0 > /proc/sys/net/ipv6/conf/all/forwarding
> echo "echo 1 > /proc/sys/net/ipv6/conf/all/forwarding" > /tmp/up ; chmod a+x /tmp/up
> openvpn --dev-type tap --remote tunnel.lsmod.de 5003 --ifconfig 10.9.0.2
> 255.255.255.0 --dev-node /dev/net/tun --up /tmp/up
> # at this point you can verify your tunnel setup by ping 10.9.0.1
> # on the server I have this: openvpn --dev-type tap --ifconfig 10.9.0.1
> 255.255.255.0 --port 5003 --dev-node /dev/net/tun --float
> # you need UDP port 5003 to pass through your firewall for this
> 
> 
> Alternatively get an user-mode-linux(UML) binary and do something along the
> lines of:
> apt-get install uml-utilities
> TAP=`tunctl -b`
> ifconfig $TAP 192.168.121.1 netmask 255.255.255.252
> echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
> /path/to/linux eth0=tuntap,$TAP ... # booting up to the point where the tap dev
> is really bound (at "ifconfig eth0 192.168.121.2" within the UML)
> tunctl -d $TAP
> 
> 
> After 20 seconds kill the openvpn or linux process.
> This hangs indefinitely, leaving the openvpn process in "D" state.
> syslog states every 10 secs:
> unregister_netdevice: waiting for tap0 to become free.  Usage count = 1
> 
> The kernel will then hang "ifconfig" and "ip" commands, probably because the
> waiting-for-tap0 still holds a mutex.
> 
> After a dozen reboots of trying I found a work-around: replacing the critical
> line with
> (sleep 2 ; echo 1 > /proc/sys/net/ipv6/conf/all/forwarding )&
> 
> A sleep 1 does not suffice.
> Doing the echo before calling openvpn also works fine, so there seems to be a
> timing problem or race condition during initialization of the IPv6 on the newly
> created tap0 device.
> 

Thought to be an ipv6 refcount leak.

Comment 3 Herbert Xu 2006-07-11 05:55:24 UTC
I completely failed to reproduce this with either 2.6.16 or 2.6.17.  Could you
please do a cat /proc/net/igmp6 just before you bring eth0 down and unload the
module?

Thanks,
Comment 4 Bernhard M. Wiedemann 2006-07-27 10:52:12 UTC
I reproduced this bug with 2.6.18-rc2 today (even with worse impact) and made me
a script to easily reproduce this. Sometimes this failed to set the IPv6 addr
and didnt cause the bug - just run it for a second time then.

uploaded to
http://www3.zq1.de/~bernhard/linux/6698/

I also uploaded my kernel config (the x86_64 one) and the igmp6 outputs before
the rmmod. (igmp6-bad is the one when the bug is happening)

diff igmp6-good igmp6-bad
2d1
< 6    eth0            ff0200000000000000000001ff000000     1 00000004 0
3a3,5
> 6    eth0            ff0200000000000000000001ff000000     2 00000004 0
> 6    eth0            ff020000000000000000000000000002     1 00000004 0
> 6    eth0            ff0200000000000000000001ff000001     1 00000004 0



### script to cause bug 6698 ###
#!/bin/sh
drv=sk98lin

echo 0 > /proc/sys/net/ipv6/conf/all/forwarding
modprobe ipv6
ifconfig eth0 down
ifconfig eth1 down
rmmod forcedeth
rmmod $drv

modprobe $drv
# the bug happens in the next line
ifconfig eth0 192.168.0.2 up ; ifconfig eth0 add 2003:1234:1234::1/64 ; echo 1 >
/proc/sys/net/ipv6/conf/all/forwarding
sleep 9
cat /proc/net/igmp6 > /root/igmp6.$$
rmmod $drv


This should help you getting started... err getting your kernel crashed :-)
Comment 5 Natalie Protasevich 2007-07-08 11:29:59 UTC
Any updates on this bug? There have been multiple patches to ipv6 lately. Would you please test with latest git.
Thanks.
Comment 6 Adrian Bunk 2007-09-22 19:25:59 UTC
Please reopen this bug if it's still present with kernel 2.6.22.