Bug 9342 (__skb_dequeue_error)

Summary: a kernel error happend in the func: __skb_dequeue when using in pfifo_fast_dequeue
Product: Networking Reporter: almighty (jiang.wei7)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: REJECTED INVALID    
Severity: normal CC: jiang.wei7, protasnb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: linux-2.6.11.12 Subsystem:
Regression: --- Bisected commit-id:

Description almighty 2007-11-09 22:33:22 UTC
Most recent kernel where this bug did not occur:
Distribution:
Hardware Environment:CNXT Driver 
Software Environment:
Problem Description:

Steps to reproduce:
The bug happend When we are trying to create a pppoe connetion. The kernel just broken down, and showed the following information: (By the way, this bug happens in a very low probability)

Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = c0024000
[00000004] *pgd=00000000
Internal error: Oops: 817 [#1]
Modules linked in: stun_ahb solosw_defaultrestore wlan_wsc cnxt_drv fiq_rel hsl_mod_gpl solos_ethernet
CPU: 0
pc : [<c0156768>]    lr : [<c1bb99c0>]    Tainted: P     
sp : c02cbf40  ip : c1bb9a4c  fp : c02cbf4c
r10: 00000000  r9 : 00000000  r8 : 00000000
r7 : c1bb99c0  r6 : 0000000a  r5 : c128f800  r4 : c128f800
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c11223e0
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
Control: 5397F  Table: 01B74000  DAC: 00000017
Process ksoftirqd/0 (pid: 2, stack limit = 0xc02ca194)
Stack: (0xc02cbf40 to 0xc02cc000)
bf40: c02cbf6c c02cbf50 c0156200 c0156734 c128f800 00000000 0000000a c0053838 
bf60: c02cbf84 c02cbf70 c0147c8c c01561ec 00000001 c024439c c02cbfa0 c02cbf88 
bf80: c0053714 c0147bb0 60000013 00000000 c02d7f58 c02cbfb4 c02cbfa4 c005382c 
bfa0: c00536c4 c02ca000 c02cbfcc c02cbfb8 c005389c c00537f0 00000000 c02ca000 
bfc0: c02cbff4 c02cbfd0 c006211c c0053844 ffffffff ffffffff 00000000 00000000 
bfe0: 00000000 00000000 00000000 c02cbff8 c0050b38 c0062040 00000000 00000000 
Backtrace: 
Function entered at [<c0156728>] from [<c0156200>]
Function entered at [<c01561e0>] from [<c0147c8c>]
 r7 = C0053838  r6 = 0000000A  r5 = 00000000  r4 = C128F800
Function entered at [<c0147ba4>] from [<c0053714>]
 r5 = C024439C  r4 = 00000001 
Function entered at [<c00536b8>] from [<c005382c>]
 r6 = C02D7F58  r5 = 00000000  r4 = 60000013 
Function entered at [<c00537e4>] from [<c005389c>]
 r4 = C02CA000 
Function entered at [<c0053838>] from [<c006211c>]
 r5 = C02CA000  r4 = 00000000 
Function entered at [<c0062034>] from [<c0050b38>]
 r7 = 00000000  r6 = 00000000  r5 = 00000000  r4 = 00000000
Code: e3a02000 e2433001 e58c3008 e58c1000 (e581c004) 
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 FSB v0.06 PLL w ln p08 zi

By doing some research, we found the call-track is: softirq->net_tx_action->qdisc_run->qdisc_restart->(pfifo_fast_ops), and broken-down int the func:__skb_dequeue; and more detaily, just the following expression:next->prev   = prev; (next is empty !!, but how?)

Please Help us to solve this bug, and thanks !
Comment 1 Natalie Protasevich 2007-11-12 11:51:18 UTC
Can you please try newer kernel, recent 2.6.23 or 24-rc and confirm the problem is still there.
Thanks.
Comment 2 Stephen Hemminger 2007-11-13 10:26:32 UTC
You are working on your own driver, or using drivers that are not available in a kernel.org (mainline) kernel. Since it is most likely a bug in one of those drivers, please post a link to the kernel source for those drivers.

The following are not in the standard kernel:

stun_ahb solosw_defaultrestore wlan_wsc cnxt_drv fiq_rel
hsl_mod_gpl solos_ethernet
Comment 3 almighty 2007-11-13 19:29:42 UTC
Thank you for your help.

Do you mean that this bug definitely comes from the driver ,the kernel itself is not responsible for it. 
Comment 4 Stephen Hemminger 2007-11-14 12:48:07 UTC
The message comes from standard kernel, but since it hasn't been seen in
any other environment, the suspicion falls on the driver doing something wrong.

As a matter of policy, for kernel developers and bugzilla; we refuse to work on
any problems caused by closed source drivers (non GPL). If the driver is closed source, this bug will be closed and forgotten. If the driver is open source, then please post a link and the network developers will review to see if that is causing the problem.
Comment 5 Stephen Hemminger 2007-12-10 17:03:42 UTC
Probable out of tree driver bug. This would happen if skb was being used
after freed.
Comment 6 almighty 2008-01-31 19:38:04 UTC
There are two more kernel information showen up when this bug happened.They are:
"KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at net/packet/af_packet.c (221)
Attempt to release alive packet socket: c17c7440"
,which just followed by "Unable to handle kernel NULL pointer dereference at virtual address 00000004"

I think those information have something to do with this bug. Please give me some hints for solving this problem.
Thanks,sincerely