Bug 42764 - BUG at net/core/skbuff.c:147
Summary: BUG at net/core/skbuff.c:147
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-13 09:12 UTC by Guido Aulisi
Modified: 2012-12-27 23:49 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.1.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Guido Aulisi 2012-02-13 09:12:27 UTC
I got this BUG on a IBM blade server. This is the stack trace:

------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:147!
invalid opcode: 0000 [#1] SMP 
CPU 3 
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler ipv6 fuse loop sg mptsas mptscsih mptbase tpm_tis i2c_piix4 tpm scsi_transport_sas button i2c_core shpchp pci_hotplug bnx2 serio_raw tpm_bios linear scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt dm_snapshot dm_multipath scsi_dh scsi_mod edd dm_mod ext3 mbcache jbd fan thermal processor thermal_sys hwmon [last unloaded: usbcore]

Pid: 15629, comm: nscd Not tainted 3.1.9-inps #1 IBM BladeCenter LS42 -[7902CQG]-/Server Blade
RIP: 0010:[<ffffffff81241209>]  [<ffffffff81241209>] skb_push+0x75/0x7e
RSP: 0018:ffff880233f239e8  EFLAGS: 00010296
RAX: 0000000000000083 RBX: 0000000000000800 RCX: 000000000000e179
RDX: 000000000000d7d7 RSI: 0000000000000046 RDI: ffffffff8152dc9c
RBP: ffff880233f23a08 R08: ffff880233f238b8 R09: 0720072007200720
R10: ffff880233f237c8 R11: 0720072007200720 R12: 0000000000000000
R13: ffff8808a7cb5118 R14: 0000000000000055 R15: ffff880ca70c8000
FS:  00007fe510a1b950(0000) GS:ffff8810bfc00000(0000) knlGS:00000000e82ddb90
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fe51bc35000 CR3: 00000010a7967000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nscd (pid: 15629, threadinfo ffff880233f22000, task ffff88049e476750)
Stack:
 0000000000000057 0000000000000080 ffff880ca70c8000 0000000000000246
 ffff880233f23a48 ffffffff8125eb59 ffff880233f23a48 ffff8810a6ad4b80
 ffff8808a7cb5080 000000000000ff3c ffff8808a7cb5118 ffff8808a7cb5110
Call Trace:
 [<ffffffff8125eb59>] eth_header+0x29/0xa8
 [<ffffffff81252a2e>] neigh_resolve_output+0x150/0x19c
 [<ffffffff81276084>] ip_finish_output+0x237/0x26a
 [<ffffffff8127613c>] ip_output+0x85/0x89
 [<ffffffff81275519>] ip_local_out+0x24/0x29
 [<ffffffff81275527>] ip_send_skb+0x9/0x30
 [<ffffffff81291f38>] udp_send_skb+0x26c/0x2c9
 [<ffffffff81293d7c>] udp_sendmsg+0x50b/0x712
 [<ffffffff810e0e15>] ? poll_freewait+0x8d/0x8d
 [<ffffffff81274b51>] ? ip_append_page+0x4d8/0x4d8
 [<ffffffff812d3b2f>] ? page_fault+0x1f/0x30
 [<ffffffff81299c1a>] inet_sendmsg+0x83/0x90
 [<ffffffff8123a0c1>] sock_sendmsg+0xe3/0x106
 [<ffffffff8109b2d9>] ? release_pages+0x226/0x238
 [<ffffffff8126ed17>] ? __ip_route_output_key+0x141/0x815
 [<ffffffff812d36d0>] ? _raw_spin_unlock_bh+0xf/0x11
 [<ffffffff812d36d0>] ? _raw_spin_unlock_bh+0xf/0x11
 [<ffffffff8123cd9e>] ? release_sock+0xfd/0x106
 [<ffffffff8123a691>] sys_sendto+0xfa/0x123
 [<ffffffff8123af3f>] ? sys_connect+0x88/0x9c
 [<ffffffff812d8a3b>] system_call_fastpath+0x16/0x1b
Code: 8b 57 68 48 89 44 24 10 8b 87 d0 00 00 00 48 89 44 24 08 8b bf cc 00 00 00 31 c0 48 89 3c 24 48 c7 c7 ee 91 3a 81 e8 cb 00 09 00 <0f> 0b eb fe 48 89 c8 c9 c3 55 89 f1 48 89 e5 48 83 ec 20 83 7f 
RIP  [<ffffffff81241209>] skb_push+0x75/0x7e
 RSP <ffff880233f239e8>
---[ end trace 0909a5224bc05174 ]---
Comment 1 Guido Aulisi 2012-02-13 09:22:50 UTC
I forgot this line:

skb_under_panic: text:ffffffff8125eb59 len:99 put:14 head:ffff880cc5e5f000 data:ffff880cc5e5eff4 tail:0x57 end:0x80 dev:eth2
Comment 2 Stephen Hemminger 2012-10-30 16:09:25 UTC
You never include information about the application or the type of network device.
Comment 3 Guido Aulisi 2012-10-31 08:55:31 UTC
I got it one more time. Ethernet card is:
0e:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20)

ethtool -i eth0
driver: bnx2
version: 2.1.6
firmware-version: bc 4.4.14 IPMI 0.1.16
bus-info: 0000:04:00.0

ethtool -i eth2
driver: bnx2
version: 2.1.6
firmware-version: bc 4.4.10
bus-info: 0000:0e:00.0

Could this be related to an old firmware?

This machine does a lot of network traffic, because it load data from a database every night.

Kernel version is 3.0.34

Sorry for not reporting hardware, I thought it was a protocol bug not related to a specific card.

------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:147!
invalid opcode: 0000 [#1] SMP 
CPU 2 
Modules linked in: af_packet ipmi_devintf ipmi_si ipmi_msghandler ipv6 fuse loop sr_mod cdrom mptsas mptscsih mptbase scsi_transport_sas tpm_tis tpm bnx2 tpm_bios sg i2c_piix4 i2c_core shpchp pci_hotplug button serio_raw linear scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt dm_snapshot dm_multipath scsi_dh scsi_mod edd dm_mod ext3 mbcache jbd fan thermal processor thermal_sys hwmon [last unloaded: usbcore]

Pid: 21566, comm: nscd Not tainted 3.0.34-inps #5 IBM BladeCenter LS42 -[7902CQG]-/Server Blade
RIP: 0010:[<ffffffff8123fd12>]  [<ffffffff8123fd12>] skb_push+0x75/0x7e
RSP: 0018:ffff880b956c39d8  EFLAGS: 00010292
RAX: 0000000000000083 RBX: 0000000000000800 RCX: 0000000000023382
RDX: 0000000000007878 RSI: 0000000000000046 RDI: ffffffff8152ee9c
RBP: ffff880b956c39f8 R08: 0000000000000000 R09: 0720072007200720
R10: ffff880b956c37c8 R11: 0720072007200720 R12: 0000000000000000
R13: ffff8810231fb718 R14: 0000000000000055 R15: ffff880c25d24000
FS:  00007fd9a9222950(0000) GS:ffff880c3fc00000(0000) knlGS:00000000e8dafb90
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd9b403a000 CR3: 0000000c265e8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nscd (pid: 21566, threadinfo ffff880b956c2000, task ffff8803e50e4790)
Stack:
 0000000000000057 0000000000000080 ffff880c25d24000 ffff880c71806440
 ffff880b956c3a38 ffffffff8125e0c5 ffff880b956c3a28 0000000000000036
 ffff8810231fb680 ffff8810231fb718 ffff880b2cc49380 ffff8810231fb710
Call Trace:
 [<ffffffff8125e0c5>] eth_header+0x29/0xa8
 [<ffffffff81251a59>] neigh_resolve_output+0x284/0x2ed
 [<ffffffff8127575e>] ip_finish_output+0x24c/0x293
 [<ffffffff81097cac>] ? zone_watermark_ok+0x1a/0x1c
 [<ffffffff81275846>] ip_output+0xa1/0xa5
 [<ffffffff81274b92>] ip_local_out+0x24/0x29
 [<ffffffff81274ba0>] ip_send_skb+0x9/0x4c
 [<ffffffff8129178b>] udp_send_skb+0x28b/0x2e8
 [<ffffffff812935cb>] udp_sendmsg+0x501/0x708
 [<ffffffff810e42dd>] ? poll_freewait+0x8d/0x8d
 [<ffffffff812740ec>] ? ip_append_page+0x4f4/0x4f4
 [<ffffffff8109b8f2>] ? __alloc_pages_nodemask+0x731/0x77c
 [<ffffffff810541ad>] ? sched_clock_local+0x1c/0x80
 [<ffffffff81299607>] inet_sendmsg+0x83/0x90
 [<ffffffff81238e93>] sock_sendmsg+0xe3/0x106
 [<ffffffff810c7c8a>] ? ____cache_alloc_node+0x4c/0x132
 [<ffffffff810c7b4f>] ? fallback_alloc+0xe1/0x1d0
 [<ffffffff8126e390>] ? __ip_route_output_key+0x139/0x816
 [<ffffffff812d24e3>] ? _raw_spin_unlock_bh+0xf/0x11
 [<ffffffff8123ba9e>] ? release_sock+0xfd/0x106
 [<ffffffff81239463>] sys_sendto+0xfa/0x123
 [<ffffffff81239c3d>] ? sys_connect+0x88/0x9c
 [<ffffffff812d78bb>] system_call_fastpath+0x16/0x1b
Code: 8b 57 68 48 89 44 24 10 8b 87 d0 00 00 00 48 89 44 24 08 8b bf cc 00 00 00 31 c0 48 89 3c 24 48 c7 c7 5a 79 3a 81 e8 df 03 09 00 <0f> 0b eb fe 48 89 c8 c9 c3 55 89 f1 48 89 e5 48 83 ec 20 83 7f 
RIP  [<ffffffff8123fd12>] skb_push+0x75/0x7e
 RSP <ffff880b956c39d8>
---[ end trace 7e8c514a1c33a3f0 ]---

Output of lspci -vvv:

0e:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20)
	Subsystem: IBM Device 039f
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 26
	Region 0: Memory at ce000000 (64-bit, non-prefetchable) [size=32M]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data <?>
	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] MSI-X: Enable+ Mask- TabSize=9
		Vector table: BAR=0 offset=0000c000
		PBA: BAR=0 offset=0000e000
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0 <4us, L1 <4us
			ClockPM- Suprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Kernel driver in use: bnx2
	Kernel modules: bnx2

0e:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20)
	Subsystem: IBM Device 039f
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 26
	Region 0: Memory at d0000000 (64-bit, non-prefetchable) [size=32M]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data <?>
	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] MSI-X: Enable- Mask- TabSize=8
		Vector table: BAR=0 offset=0000c000
		PBA: BAR=0 offset=0000e000
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0 <4us, L1 <4us
			ClockPM- Suprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Kernel driver in use: bnx2
	Kernel modules: bnx2
Comment 4 sysop 2012-12-27 18:28:12 UTC
Hey,
status of this bug report is NEEDINFO. Any progress?
I experience the same problem on several machines with 2.6 and 3.x kernels. 

Regards
Maxim
Comment 5 Stephen Hemminger 2012-12-27 23:21:36 UTC
Fixed by in 3.0.49 by:
commit 5891cb7c82658d26ca323639553b94b7272ebb68
Author: ramesh.nagappa@gmail.com <ramesh.nagappa@gmail.com>
Date:   Fri Oct 5 19:10:15 2012 +0000

    net: Fix skb_under_panic oops in neigh_resolve_output
    
    [ Upstream commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c ]
    
    The retry loop in neigh_resolve_output() and neigh_connected_output()
    call dev_hard_header() with out reseting the skb to network_header.
    This causes the retry to fail with skb_under_panic. The fix is to
    reset the network_header within the retry loop.
    
    Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com>
    Reviewed-by: Shawn Lu <shawn.lu@ericsson.com>
    Reviewed-by: Robert Coulson <robert.coulson@ericsson.com>
    Reviewed-by: Billie Alsup <billie.alsup@ericsson.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

I always recommend updating to the latest stable kernel.
If you want to stay on 3.0.x that is 3.0.57
Comment 6 Guido Aulisi 2012-12-27 23:49:11 UTC
Many thanks, I'll try to upgrade as soon as possible.

Note You need to log in before you can comment on or make changes to this bug.