Most recent kernel where this bug did not occur: Distribution: Hardware Environment:CNXT Driver Software Environment: Problem Description: Steps to reproduce: The bug happend When we are trying to create a pppoe connetion. The kernel just broken down, and showed the following information: (By the way, this bug happens in a very low probability) Unable to handle kernel NULL pointer dereference at virtual address 00000004 pgd = c0024000 [00000004] *pgd=00000000 Internal error: Oops: 817 [#1] Modules linked in: stun_ahb solosw_defaultrestore wlan_wsc cnxt_drv fiq_rel hsl_mod_gpl solos_ethernet CPU: 0 pc : [<c0156768>] lr : [<c1bb99c0>] Tainted: P sp : c02cbf40 ip : c1bb9a4c fp : c02cbf4c r10: 00000000 r9 : 00000000 r8 : 00000000 r7 : c1bb99c0 r6 : 0000000a r5 : c128f800 r4 : c128f800 r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c11223e0 Flags: Nzcv IRQs on FIQs on Mode SVC_32 Segment kernel Control: 5397F Table: 01B74000 DAC: 00000017 Process ksoftirqd/0 (pid: 2, stack limit = 0xc02ca194) Stack: (0xc02cbf40 to 0xc02cc000) bf40: c02cbf6c c02cbf50 c0156200 c0156734 c128f800 00000000 0000000a c0053838 bf60: c02cbf84 c02cbf70 c0147c8c c01561ec 00000001 c024439c c02cbfa0 c02cbf88 bf80: c0053714 c0147bb0 60000013 00000000 c02d7f58 c02cbfb4 c02cbfa4 c005382c bfa0: c00536c4 c02ca000 c02cbfcc c02cbfb8 c005389c c00537f0 00000000 c02ca000 bfc0: c02cbff4 c02cbfd0 c006211c c0053844 ffffffff ffffffff 00000000 00000000 bfe0: 00000000 00000000 00000000 c02cbff8 c0050b38 c0062040 00000000 00000000 Backtrace: Function entered at [<c0156728>] from [<c0156200>] Function entered at [<c01561e0>] from [<c0147c8c>] r7 = C0053838 r6 = 0000000A r5 = 00000000 r4 = C128F800 Function entered at [<c0147ba4>] from [<c0053714>] r5 = C024439C r4 = 00000001 Function entered at [<c00536b8>] from [<c005382c>] r6 = C02D7F58 r5 = 00000000 r4 = 60000013 Function entered at [<c00537e4>] from [<c005389c>] r4 = C02CA000 Function entered at [<c0053838>] from [<c006211c>] r5 = C02CA000 r4 = 00000000 Function entered at [<c0062034>] from [<c0050b38>] r7 = 00000000 r6 = 00000000 r5 = 00000000 r4 = 00000000 Code: e3a02000 e2433001 e58c3008 e58c1000 (e581c004) <0>Kernel panic - not syncing: Aiee, killing interrupt handler! FSB v0.06 PLL w ln p08 zi By doing some research, we found the call-track is: softirq->net_tx_action->qdisc_run->qdisc_restart->(pfifo_fast_ops), and broken-down int the func:__skb_dequeue; and more detaily, just the following expression:next->prev = prev; (next is empty !!, but how?) Please Help us to solve this bug, and thanks !
Can you please try newer kernel, recent 2.6.23 or 24-rc and confirm the problem is still there. Thanks.
You are working on your own driver, or using drivers that are not available in a kernel.org (mainline) kernel. Since it is most likely a bug in one of those drivers, please post a link to the kernel source for those drivers. The following are not in the standard kernel: stun_ahb solosw_defaultrestore wlan_wsc cnxt_drv fiq_rel hsl_mod_gpl solos_ethernet
Thank you for your help. Do you mean that this bug definitely comes from the driver ,the kernel itself is not responsible for it.
The message comes from standard kernel, but since it hasn't been seen in any other environment, the suspicion falls on the driver doing something wrong. As a matter of policy, for kernel developers and bugzilla; we refuse to work on any problems caused by closed source drivers (non GPL). If the driver is closed source, this bug will be closed and forgotten. If the driver is open source, then please post a link and the network developers will review to see if that is causing the problem.
Probable out of tree driver bug. This would happen if skb was being used after freed.
There are two more kernel information showen up when this bug happened.They are: "KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at net/packet/af_packet.c (221) Attempt to release alive packet socket: c17c7440" ,which just followed by "Unable to handle kernel NULL pointer dereference at virtual address 00000004" I think those information have something to do with this bug. Please give me some hints for solving this problem. Thanks,sincerely