Aug 2 10:52:39 master CPU: 2 Aug 2 10:52:39 master EIP: 0060:[<c034ff25>] Not tainted VLI Aug 2 10:52:39 master EFLAGS: 00010296 (2.6.17-gentoo-r3 #2) Aug 2 10:52:39 master EIP is at skb_over_panic+0x37/0x45 Aug 2 10:52:39 master eax: 00000073 ebx: f7905000 ecx: 00000000 edx: 00000292 Aug 2 10:52:39 master esi: f7905000 edi: 00000040 ebp: 00000000 esp: c04c4f28 Aug 2 10:52:39 master ds: 007b es: 007b ss: 0068 Aug 2 10:52:39 master Process apache2 (pid: 1369, threadinfo=c04c4000 task=d02d7a90) Aug 2 10:52:39 master Stack: c03fc6ee c02e32b8 00000620 000005ea db267c00 db267c6a db26828a db267d00 Aug 2 10:52:39 master f7905000 000005ea 000000cc c02e32c3 c04c4fb8 f7919640 f7905400 00000023 Aug 2 10:52:39 master f7905000 f7548cb0 f7548cc0 f8c0fcb0 000005ea 000000cc c4809480 23000002 Aug 2 10:52:39 master Call Trace: Aug 2 10:52:39 master <c02e32b8> e1000_clean_rx_irq+0x31e/0x52e <c02e32c3> e1000_clean_rx_irq+0x329/0x52e Aug 2 10:52:39 master <c02e1667> e1000_clean+0xc9/0x175 <c0355473> net_rx_action+0x99/0x148 Aug 2 10:52:39 master <c011be1f> __do_softirq+0x58/0xc2 <c0104bc5> do_softirq+0x46/0x51 Aug 2 10:52:39 master ======================= kernel crash at high load TSO has disabled completely! random bug, after long uptime and under high load. master root # lspci 00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 00:00.1 Class ff00: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01) 00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02) 01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 02:03.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 02:03.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01) 03:02.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10) 03:02.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10) 04:01.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
Aug 2 10:52:39 master skb_over_panic: text:c02e32b8 len:1568 put:1514 head:db267c00 data:db267c6a tail: db26828a end:db267d00 dev:eth1 Aug 2 10:52:39 master ------------[ cut here ]------------ Aug 2 10:52:39 master kernel BUG at net/core/skbuff.c:94! Aug 2 10:52:39 master invalid opcode: 0000 [#1] Aug 2 10:52:39 master SMP Aug 2 10:52:39 master Aug 2 10:52:39 master Modules linked in: Aug 2 10:52:39 master ipt_REJECT Aug 2 10:52:39 master ipt_LOG Aug 2 10:52:39 master xt_tcpudp Aug 2 10:52:39 master xt_state Aug 2 10:52:39 master xt_pkttype Aug 2 10:52:39 master iptable_raw Aug 2 10:52:39 master xt_CLASSIFY Aug 2 10:52:39 master xt_CONNMARK Aug 2 10:52:39 master xt_connmark Aug 2 10:52:39 master ipt_owner Aug 2 10:52:39 master ipt_recent Aug 2 10:52:39 master ipt_iprange Aug 2 10:52:39 master xt_conntrack Aug 2 10:52:39 master iptable_mangle Aug 2 10:52:39 master iptable_nat Aug 2 10:52:39 master ip_nat Aug 2 10:52:39 master ip_conntrack_ftp Aug 2 10:52:39 master ip_conntrack Aug 2 10:52:39 master nfnetlink Aug 2 10:52:39 master iptable_filter Aug 2 10:52:39 master ip_tables Aug 2 10:52:39 master x_tables Aug 2 10:52:39 master netconsole
please add your dmesg output pre-crash is there a possibility you could have jumbo frames on your network? I'm specifically looking for what driver version you're running. There are a couple of known bugs in certain versions of the code. In this case it looks like we tried to do a put on an skb of a very long frame, which is really odd.
jumbo frames ... don't know on external interface 100Mb on internal 1G I had troubles before with vanilla kernels and with ULOG module it was random segfaults (ULOG) and hangs with skb and TSO on I used patch from opensuse kernel to disable TSO completely (it has improved my uptime to 2 week from 2 days) ====================== From: Olaf Kirch <okir@suse.de> Subject: [e1000] Disable TSO for now References: 157600 It seems there is a memory corruption problem related the use of TSO with the e1000 driver. As a matter of caution, I am turning off TSO by default on the e1000 for the time being. Signed-off-by: okir@suse.de drivers/net/e1000/e1000_main.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: build/drivers/net/e1000/e1000_main.c =================================================================== --- build.orig/drivers/net/e1000/e1000_main.c +++ build/drivers/net/e1000/e1000_main.c @@ -735,7 +735,7 @@ e1000_probe(struct pci_dev *pdev, } -#ifdef NETIF_F_TSO +#ifdef NETIF_F_TSO_default_to_off_for_now if ((adapter->hw.mac_type >= e1000_82544) && (adapter->hw.mac_type != e1000_82547)) netdev->features |= NETIF_F_TSO; ====================== my dmesg (new) Linux version 2.6.17-gentoo-r3 (root@master) (gcc version 4.1.1 (Gentoo 4.1.1)) #2 SMP Sun Jul 16 10:14:08 MSD 2006 BIOS-provided physical RAM map:
What is the status of this issue in 2.6.18-rc6?
Unable to check on production server until .18 release. sorry :-| too high load and 5 min downtime = half of my salary. :(
Please reopen this bug if it's still present in kernel 2.6.19.