Distribution:Debian Lenny Hardware Environment: INTEL Server Board S5520HC Intel(R) Xeon(R) CPU X5560 @ 2.80GHz 2x RAID bus controller 3ware Inc 9690SA-8I Software Environment: squid (multi instances) Problem Description: The following error occurs whenever i halt the machine: [ 439.629361] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [ 439.723410] IP: [<ffffffff812462aa>] vlan_gro_common+0xd7/0x190 [ 439.794374] PGD 0 [ 439.818655] Oops: 0000 [#1] SMP [ 439.857638] last sysfs file: /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host1/target1:0:0/1:0:0:0/block/sdc/queue/nr_requests [ 440.000143] CPU 7 [ 440.024385] Pid: 0, comm: swapper Not tainted 2.6.33.1-univ #1 S5520HC/S5520HC [ 440.110817] RIP: 0010:[<ffffffff812462aa>] [<ffffffff812462aa>] vlan_gro_common+0xd7/0x190 [ 440.210919] RSP: 0018:ffff8800283c3d10 EFLAGS: 00010203 [ 440.274472] RAX: 0000000000000001 RBX: ffff88066d5fd818 RCX: ffff8806683aa680 [ 440.359798] RDX: 00000000000003e7 RSI: 0000000000000000 RDI: ffff88066d2b2000 [ 440.445124] RBP: 00000000000003e7 R08: ffff88066c5ea000 R09: ffff8806683aa680 [ 440.530488] R10: 00000000000003e7 R11: 000000000000040b R12: 0000000000000000 [ 440.615816] R13: ffff88066d5fd818 R14: 000000000000003e R15: 000000000000040c [ 440.701169] FS: 0000000000000000(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000 [ 440.797986] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 440.866705] CR2: 0000000000000028 CR3: 0000000001361000 CR4: 00000000000006e0 [ 440.952036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 441.037364] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 441.122690] Process swapper (pid: 0, threadinfo ffff88066fa82000, task ffff88066fa7acd0) [ 441.219462] Stack: [ 441.243535] ffff8806683aa680 ffffffff812466c7 0000000000000246 ffff8806683aa680 [ 441.330462] <0> 000000000000003e ffff88066d5fd680 0000000000000008 ffffffffa00b1871 [ 441.422868] <0> 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 441.517692] Call Trace: [ 441.546959] <IRQ> [ 441.572233] [<ffffffff812466c7>] ? vlan_gro_receive+0x6e/0x83 [ 441.641993] [<ffffffffa00b1871>] ? igb_poll+0x741/0xe80 [igb] [ 441.711790] [<ffffffff811e07ee>] ? net_rx_action+0xa8/0x1a1 [ 441.779478] [<ffffffff81039ffe>] ? __do_softirq+0xd7/0x195 [ 441.846122] [<ffffffff8100370c>] ? call_softirq+0x1c/0x28 [ 441.911727] [<ffffffff81005325>] ? do_softirq+0x31/0x63 [ 441.975300] [<ffffffff81039df4>] ? irq_exit+0x36/0x78 [ 442.036746] [<ffffffff81004a26>] ? do_IRQ+0xa7/0xbd [ 442.096119] [<ffffffff81258ed3>] ? ret_from_intr+0x0/0xa [ 442.160685] <EOI> [ 442.185957] [<ffffffff811cad43>] ? poll_idle+0x1b/0x55 [ 442.248444] [<ffffffff811cad32>] ? poll_idle+0xa/0x55 [ 442.309890] [<ffffffff811cb06a>] ? cpuidle_idle_call+0x8e/0xe8 [ 442.380683] [<ffffffff81001cd4>] ? cpu_idle+0x53/0x8b [ 442.442133] Code: 41 81 79 7e 88 09 0f 85 d1 00 00 00 44 89 d2 44 89 d0 66 81 e2 ff 0f 80 cc 10 66 41 89 81 b8 00 00 00 89 d0 66 c1 e8 09 0f b7 c0 <48> 8b 4c c6 20 31 c0 48 85 c9 74 0c 48 89 d0 25 ff$ [ 442.678920] RIP [<ffffffff812462aa>] vlan_gro_common+0xd7/0x190 [ 442.750922] RSP <ffff8800283c3d10> [ 442.792645] CR2: 0000000000000028 [ 442.832293] ---[ end trace 9245d00ed2188cae ]--- [ 442.887516] Kernel panic - not syncing: Fatal exception in interrupt [ 442.963506] Pid: 0, comm: swapper Tainted: G D 2.6.33.1-univ #1 [ 443.042610] Call Trace: [ 443.071875] <IRQ> [<ffffffff812568b7>] ? panic+0x86/0x145 [ 443.138593] [<ffffffff81039e06>] ? irq_exit+0x48/0x78 [ 443.200050] [<ffffffff81258ed3>] ? ret_from_intr+0x0/0xa [ 443.264661] [<ffffffff8103554c>] ? kmsg_dump+0x99/0x124 [ 443.328202] [<ffffffff81006416>] ? oops_end+0x9f/0xac [ 443.389656] [<ffffffff8101e9af>] ? no_context+0x1f2/0x201 [ 443.455263] [<ffffffffa014cd20>] ? bond_dev_queue_xmit+0x14c/0x169 [bonding] [ 443.540611] [<ffffffff8101eb65>] ? __bad_area_nosemaphore+0x1a7/0x1cb [ 443.618679] [<ffffffff811e0d8f>] ? dev_hard_start_xmit+0x221/0x2dd [ 443.693630] [<ffffffff811e134d>] ? dev_queue_xmit+0x401/0x433 [ 443.763397] [<ffffffff812590df>] ? page_fault+0x1f/0x30 [ 443.826927] [<ffffffff812462aa>] ? vlan_gro_common+0xd7/0x190 [ 443.896686] [<ffffffff812466c7>] ? vlan_gro_receive+0x6e/0x83 [ 443.966485] [<ffffffffa00b1871>] ? igb_poll+0x741/0xe80 [igb] [ 444.036246] [<ffffffff811e07ee>] ? net_rx_action+0xa8/0x1a1 [ 444.103927] [<ffffffff81039ffe>] ? __do_softirq+0xd7/0x195 [ 444.170569] [<ffffffff8100370c>] ? call_softirq+0x1c/0x28 [ 444.236176] [<ffffffff81005325>] ? do_softirq+0x31/0x63 [ 444.299704] [<ffffffff81039df4>] ? irq_exit+0x36/0x78 [ 444.361157] [<ffffffff81004a26>] ? do_IRQ+0xa7/0xbd [ 444.420533] [<ffffffff81258ed3>] ? ret_from_intr+0x0/0xa [ 444.485101] <EOI> [<ffffffff811cad43>] ? poll_idle+0x1b/0x55 [ 444.555036] [<ffffffff811cad32>] ? poll_idle+0xa/0x55 [ 444.616486] [<ffffffff811cb06a>] ? cpuidle_idle_call+0x8e/0xe8 [ 444.687283] [<ffffffff81001cd4>] ? cpu_idle+0x53/0x8b Steps to reproduce: Whenever i halt the machine. Igb driver is replaced by: http://downloadcenter.intel.com/detail_desc.aspx?agr=Y&DwnldID=13663 This is the same machine from bugs: https://bugzilla.kernel.org/show_bug.cgi?id=15148, https://bugzilla.kernel.org/show_bug.cgi?id=15581.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Fri, 19 Mar 2010 12:01:10 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=15582 > > Summary: BUG: unable to handle kernel NULL pointer dereference > at 0000000000000028 A bug in igb or the vlan code, I guess. > Product: Memory Management > Version: 2.5 > Kernel Version: 2.6.33.1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: akpm@linux-foundation.org > ReportedBy: stivi@kity.pl > Regression: No > > > Distribution:Debian Lenny > > Hardware Environment: > INTEL Server Board S5520HC > Intel(R) Xeon(R) CPU X5560 @ 2.80GHz > 2x RAID bus controller 3ware Inc 9690SA-8I > > Software Environment: squid (multi instances) > > Problem Description: > > The following error occurs whenever i halt the machine: > [ 439.629361] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000028 > [ 439.723410] IP: [<ffffffff812462aa>] vlan_gro_common+0xd7/0x190 > [ 439.794374] PGD 0 > [ 439.818655] Oops: 0000 [#1] SMP > [ 439.857638] last sysfs file: > > /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host1/target1:0:0/1:0:0:0/block/sdc/queue/nr_requests > [ 440.000143] CPU 7 > [ 440.024385] Pid: 0, comm: swapper Not tainted 2.6.33.1-univ #1 > S5520HC/S5520HC > [ 440.110817] RIP: 0010:[<ffffffff812462aa>] [<ffffffff812462aa>] > vlan_gro_common+0xd7/0x190 > [ 440.210919] RSP: 0018:ffff8800283c3d10 EFLAGS: 00010203 > [ 440.274472] RAX: 0000000000000001 RBX: ffff88066d5fd818 RCX: > ffff8806683aa680 > [ 440.359798] RDX: 00000000000003e7 RSI: 0000000000000000 RDI: > ffff88066d2b2000 > [ 440.445124] RBP: 00000000000003e7 R08: ffff88066c5ea000 R09: > ffff8806683aa680 > [ 440.530488] R10: 00000000000003e7 R11: 000000000000040b R12: > 0000000000000000 > [ 440.615816] R13: ffff88066d5fd818 R14: 000000000000003e R15: > 000000000000040c > [ 440.701169] FS: 0000000000000000(0000) GS:ffff8800283c0000(0000) > knlGS:0000000000000000 > [ 440.797986] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 440.866705] CR2: 0000000000000028 CR3: 0000000001361000 CR4: > 00000000000006e0 > [ 440.952036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 441.037364] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 441.122690] Process swapper (pid: 0, threadinfo ffff88066fa82000, task > ffff88066fa7acd0) > [ 441.219462] Stack: > [ 441.243535] ffff8806683aa680 ffffffff812466c7 0000000000000246 > ffff8806683aa680 > [ 441.330462] <0> 000000000000003e ffff88066d5fd680 0000000000000008 > ffffffffa00b1871 > [ 441.422868] <0> 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ 441.517692] Call Trace: > [ 441.546959] <IRQ> > [ 441.572233] [<ffffffff812466c7>] ? vlan_gro_receive+0x6e/0x83 > [ 441.641993] [<ffffffffa00b1871>] ? igb_poll+0x741/0xe80 [igb] > [ 441.711790] [<ffffffff811e07ee>] ? net_rx_action+0xa8/0x1a1 > [ 441.779478] [<ffffffff81039ffe>] ? __do_softirq+0xd7/0x195 > [ 441.846122] [<ffffffff8100370c>] ? call_softirq+0x1c/0x28 > [ 441.911727] [<ffffffff81005325>] ? do_softirq+0x31/0x63 > [ 441.975300] [<ffffffff81039df4>] ? irq_exit+0x36/0x78 > [ 442.036746] [<ffffffff81004a26>] ? do_IRQ+0xa7/0xbd > [ 442.096119] [<ffffffff81258ed3>] ? ret_from_intr+0x0/0xa > [ 442.160685] <EOI> > [ 442.185957] [<ffffffff811cad43>] ? poll_idle+0x1b/0x55 > [ 442.248444] [<ffffffff811cad32>] ? poll_idle+0xa/0x55 > [ 442.309890] [<ffffffff811cb06a>] ? cpuidle_idle_call+0x8e/0xe8 > [ 442.380683] [<ffffffff81001cd4>] ? cpu_idle+0x53/0x8b > [ 442.442133] Code: 41 81 79 7e 88 09 0f 85 d1 00 00 00 44 89 d2 44 89 d0 66 > 81 e2 ff 0f 80 cc 10 66 41 89 81 b8 00 00 00 89 d0 66 c1 e8 09 0f b7 c0 <48> > 8b > 4c c6 20 31 c0 48 85 c9 74 0c 48 89 d0 25 ff$ > [ 442.678920] RIP [<ffffffff812462aa>] vlan_gro_common+0xd7/0x190 > [ 442.750922] RSP <ffff8800283c3d10> > [ 442.792645] CR2: 0000000000000028 > [ 442.832293] ---[ end trace 9245d00ed2188cae ]--- > [ 442.887516] Kernel panic - not syncing: Fatal exception in interrupt > [ 442.963506] Pid: 0, comm: swapper Tainted: G D 2.6.33.1-univ #1 > [ 443.042610] Call Trace: > [ 443.071875] <IRQ> [<ffffffff812568b7>] ? panic+0x86/0x145 > [ 443.138593] [<ffffffff81039e06>] ? irq_exit+0x48/0x78 > [ 443.200050] [<ffffffff81258ed3>] ? ret_from_intr+0x0/0xa > [ 443.264661] [<ffffffff8103554c>] ? kmsg_dump+0x99/0x124 > [ 443.328202] [<ffffffff81006416>] ? oops_end+0x9f/0xac > [ 443.389656] [<ffffffff8101e9af>] ? no_context+0x1f2/0x201 > [ 443.455263] [<ffffffffa014cd20>] ? bond_dev_queue_xmit+0x14c/0x169 > [bonding] > [ 443.540611] [<ffffffff8101eb65>] ? __bad_area_nosemaphore+0x1a7/0x1cb > [ 443.618679] [<ffffffff811e0d8f>] ? dev_hard_start_xmit+0x221/0x2dd > [ 443.693630] [<ffffffff811e134d>] ? dev_queue_xmit+0x401/0x433 > [ 443.763397] [<ffffffff812590df>] ? page_fault+0x1f/0x30 > [ 443.826927] [<ffffffff812462aa>] ? vlan_gro_common+0xd7/0x190 > [ 443.896686] [<ffffffff812466c7>] ? vlan_gro_receive+0x6e/0x83 > [ 443.966485] [<ffffffffa00b1871>] ? igb_poll+0x741/0xe80 [igb] > [ 444.036246] [<ffffffff811e07ee>] ? net_rx_action+0xa8/0x1a1 > [ 444.103927] [<ffffffff81039ffe>] ? __do_softirq+0xd7/0x195 > [ 444.170569] [<ffffffff8100370c>] ? call_softirq+0x1c/0x28 > [ 444.236176] [<ffffffff81005325>] ? do_softirq+0x31/0x63 > [ 444.299704] [<ffffffff81039df4>] ? irq_exit+0x36/0x78 > [ 444.361157] [<ffffffff81004a26>] ? do_IRQ+0xa7/0xbd > [ 444.420533] [<ffffffff81258ed3>] ? ret_from_intr+0x0/0xa > [ 444.485101] <EOI> [<ffffffff811cad43>] ? poll_idle+0x1b/0x55 > [ 444.555036] [<ffffffff811cad32>] ? poll_idle+0xa/0x55 > [ 444.616486] [<ffffffff811cb06a>] ? cpuidle_idle_call+0x8e/0xe8 > [ 444.687283] [<ffffffff81001cd4>] ? cpu_idle+0x53/0x8b > > Steps to reproduce: Whenever i halt the machine. > > > Igb driver is replaced by: > http://downloadcenter.intel.com/detail_desc.aspx?agr=Y&DwnldID=13663 > > This is the same machine from bugs: > https://bugzilla.kernel.org/show_bug.cgi?id=15148, > https://bugzilla.kernel.org/show_bug.cgi?id=15581. >
From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 22 Mar 2010 16:14:16 -0700 > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Fri, 19 Mar 2010 12:01:10 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=15582 >> >> Summary: BUG: unable to handle kernel NULL pointer dereference >> at 0000000000000028 > > A bug in igb or the vlan code, I guess. Hmmm, should have been fixed by: commit d1c76af9e2434fac3add561e26c61b06503de986 Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Mon Mar 16 10:50:02 2009 -0700 GRO: Move netpoll checks to correct location ... Nevermind, the backtrace signature is different for this one.
From: "Duyck, Alexander H" <alexander.h.duyck@intel.com> Date: Tue, 23 Mar 2010 11:32:19 -0700 > The patch below should address it. However I suspect it will get > mangled by our email system here so I don't believe it will apply. > I have also sent a copy of to Jeff to pull into his tree for testing > and submission. Good spotting, thanks Alex. I'll wait for testing and a final version via Jeff.
David Miller wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Mon, 22 Mar 2010 16:14:16 -0700 > >> >> (switched to email. Please respond via emailed reply-to-all, not >> via the bugzilla web interface). >> >> On Fri, 19 Mar 2010 12:01:10 GMT >> bugzilla-daemon@bugzilla.kernel.org wrote: >> >>> http://bugzilla.kernel.org/show_bug.cgi?id=15582 >>> >>> Summary: BUG: unable to handle kernel NULL pointer >>> dereference at 0000000000000028 >> >> A bug in igb or the vlan code, I guess. > > Hmmm, should have been fixed by: > > commit d1c76af9e2434fac3add561e26c61b06503de986 > Author: Herbert Xu <herbert@gondor.apana.org.au> > Date: Mon Mar 16 10:50:02 2009 -0700 > > GRO: Move netpoll checks to correct location > > > ... > > Nevermind, the backtrace signature is different for this > one. Actually I think this may be a bug in igb_receive_skb. My guess would be that promiscuous mode is somehow being enabled which is turning off the vlan filtering and as a result we are probably picking up vlan traffic when we have no vlans registered. The null pointer in that case would be adapter->vlgrp. The patch below should address it. However I suspect it will get mangled by our email system here so I don't believe it will apply. I have also sent a copy of to Jeff to pull into his tree for testing and submission. Thanks, Alex --- This change makes it so that vlan_gro_receive is only used if vlans have been registered to the adapter structure. Previously we were just sending all vlan tagged frames in via this function but this results in a null pointer dereference when vlans are not registered. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> --- drivers/net/igb/igb_main.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 45a0e4f..7855f71 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -5110,7 +5110,7 @@ static void igb_receive_skb(struct igb_q_vector *q_vector, { struct igb_adapter *adapter = q_vector->adapter; - if (vlan_tag) + if (vlan_tag && adapter->vlgrp) vlan_gro_receive(&q_vector->napi, adapter->vlgrp, vlan_tag, skb); else