Bug 13097

Summary: Kernel will freeze network after using a tun/tap device
Product: Drivers Reporter: Dâniel Fraga (fragabr)
Component: NetworkAssignee: Herbert Xu (herbert)
Status: CLOSED CODE_FIX    
Severity: normal CC: herbert, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13070    
Attachments: Cisco Kernel Panic Log Screenshot

Description Dâniel Fraga 2009-04-15 22:19:05 UTC
I choose the forcedeth driver to report this bug because I think it's related to forcedeth and not tun/tap driver, although I can be wrong.

With 2.6.29 kernel everything works fine, but with 2.6.30-rc2, a simple ping to a tun/tap device (for example ping to a vpn using openvpn) will freeze the network. In X I can't even use the keyboard anymore.

It just happens with a tun/tap device, not with a normal ethernet device or a normal internet connection.

If you think it's a tun/tap driver problem, please change the bug, but as I'm not completely sure, I choose forcedeth.

Thanks.

Ps: I couldn't test with 2.6.30-rc1 because it won't let me use the network at all. 2.6.30-rc2 fixes the network but it has this tun/tap problem.
Comment 1 Dâniel Fraga 2009-04-15 22:19:41 UTC
And no, there's no log message, nor error messages.
Comment 2 Andrew Morton 2009-04-15 22:33:12 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

Big fat regression!

On Wed, 15 Apr 2009 22:19:06 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13097
> 
>            Summary: Kernel will freeze network after using a tun/tap
>                     device
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.30-rc2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: fragabr@gmail.com
>         Regression: Yes
> 
> 
> I choose the forcedeth driver to report this bug because I think it's related
> to forcedeth and not tun/tap driver, although I can be wrong.
> 
> With 2.6.29 kernel everything works fine, but with 2.6.30-rc2, a simple ping
> to
> a tun/tap device (for example ping to a vpn using openvpn) will freeze the
> network. In X I can't even use the keyboard anymore.
> 
> It just happens with a tun/tap device, not with a normal ethernet device or a
> normal internet connection.
> 
> If you think it's a tun/tap driver problem, please change the bug, but as I'm
> not completely sure, I choose forcedeth.

An obvious question would be: are you able to test the same setup with
a different type of network card?
Comment 3 Dâniel Fraga 2009-04-15 23:21:13 UTC
On Wed, 15 Apr 2009 22:33:13 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13097

> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).

	Ok.

> Big fat regression!

> An obvious question would be: are you able to test the same setup with
> a different type of network card?

	Unfortunately no. I just have this computer and I use just the
onboard nic.

	I can test patches if you send me. No problem. Or if there's a
way to make it more verbose...
Comment 4 Dâniel Fraga 2009-04-15 23:36:33 UTC
Is there an easy way to see all the changes made to forcedeth driver between 2.6.29 and 2.6.30-rc2? This way I can try to remove the patches and see if it helps...
Comment 5 Herbert Xu 2009-04-17 00:59:16 UTC
Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> With 2.6.29 kernel everything works fine, but with 2.6.30-rc2, a simple ping
>> to
>> a tun/tap device (for example ping to a vpn using openvpn) will freeze the
>> network. In X I can't even use the keyboard anymore.
>> 
>> It just happens with a tun/tap device, not with a normal ethernet device or
>> a
>> normal internet connection.
>> 
>> If you think it's a tun/tap driver problem, please change the bug, but as
>> I'm
>> not completely sure, I choose forcedeth.
> 
> An obvious question would be: are you able to test the same setup with
> a different type of network card?

Please get a stack backtrace of all processes on the machine after
the hang, e.g., with Ctrl-ScrollLock.

Thanks,
Comment 6 Dâniel Fraga 2009-04-17 04:32:12 UTC
(In reply to comment #5)

> Please get a stack backtrace of all processes on the machine after
> the hang, e.g., with Ctrl-ScrollLock.

Ok, but is there a way to redirect the backtrace output to a file? I ask this because it's too much information and the output scrools off the screen.

If it's possible to sabe it directly in a file, it would be easier to post the requested backtrace.

Thanks.
Comment 7 Dâniel Fraga 2009-04-19 23:45:43 UTC
(In reply to comment #5)

> Please get a stack backtrace of all processes on the machine after
> the hang, e.g., with Ctrl-ScrollLock.

My last question was stupid because I noticed that the stack backtrace is written to syslog, but the problem is that after the hang nothing is written to syslog anymore... So we're stuck here.
Comment 8 Herbert Xu 2009-04-20 06:07:59 UTC
Can you try again with these two patches? I thought they didn't resemble your symptoms, but then again if OpenVPN detaches and reattaches the tun device, it could cause this problem.

http://patchwork.ozlabs.org/patch/26173/
http://patchwork.ozlabs.org/patch/26058/

Thanks!
Comment 9 Dâniel Fraga 2009-04-20 06:38:39 UTC
(In reply to comment #8)
> Can you try again with these two patches? I thought they didn't resemble your
> symptoms, but then again if OpenVPN detaches and reattaches the tun device,
> it
> could cause this problem.

Maybe that's not my problem because I use persistent connection. As far as I know, openvpn keeps the connection alive all the time, right? Anyway...

> http://patchwork.ozlabs.org/patch/26173/

...this patch applied correctly...

> http://patchwork.ozlabs.org/patch/26058/

...but this didn't.

I tried to apply on 2.6.30-rc2, ok?

> Thanks!

Thanks! I'd like to comment more on the symptons: after I try to use the tun device (for example, use ping on the vpn), ping will not reply and I can't kill the ping process either. Then the load average will get higher and higher and the cpu cooler will rotate faster and faster... So something is stuck and all the system is affected.

Anyway if you correct the second patch above, I'll apply again and test. Thanks.
Comment 10 Herbert Xu 2009-04-20 08:38:29 UTC
Ah sorry, my changes to the first patch had invalidated the second.  Here's an updated version of the second patch that should apply.

http://patchwork.ozlabs.org/patch/26183/
Comment 11 Dâniel Fraga 2009-04-20 17:16:36 UTC
(In reply to comment #10)
> Ah sorry, my changes to the first patch had invalidated the second.  Here's
> an
> updated version of the second patch that should apply.
> 
> http://patchwork.ozlabs.org/patch/26183/

Thank you very much Herbert! I confirm that your patches worked perfectly!
I'm changing the status of this bug to resolved, ok?
Comment 12 Rafael J. Wysocki 2009-04-25 21:27:41 UTC
Fixed by commit c40af84a6726f63e35740d26f841992e8f31f92c .
Comment 13 Marian 2009-06-18 18:25:56 UTC
Created attachment 21996 [details]
Cisco Kernel Panic Log Screenshot

Kernel Panic screen message taken after ping to remote host over Cisco VPN Connection. Kernel panic generated in Linux 2.6.29 (latest FC11 kernel)