Distribution: Slackware Hardware Environment: # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 10 cpu MHz : 998.478 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1999.41 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 10 cpu MHz : 998.478 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1996.47 # lspci -v 00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x] (rev c4) Subsystem: ABIT Computer Corp.: Unknown device a204 Flags: bus master, medium devsel, latency 8 Memory at d0000000 (32-bit, prefetchable) [size=16M] Capabilities: [a0] AGP version 2.0 Capabilities: [c0] Power Management version 2 00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP] (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, medium devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Capabilities: [80] Power Management version 2 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) Subsystem: ABIT Computer Corp.: Unknown device 0000 Flags: bus master, stepping, medium devsel, latency 0 Capabilities: [c0] Power Management version 2 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP]) Subsystem: VIA Technologies, Inc. VT82C586/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE Flags: bus master, medium devsel, latency 32 I/O ports at c000 [size=16] Capabilities: [c0] Power Management version 2 00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 16) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller Flags: bus master, medium devsel, latency 32, IRQ 10 I/O ports at c400 [size=32] Capabilities: [80] Power Management version 2 00:07.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 16) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller Flags: bus master, medium devsel, latency 32, IRQ 10 I/O ports at c800 [size=32] Capabilities: [80] Power Management version 2 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) Flags: medium devsel, IRQ 9 Capabilities: [68] Power Management version 2 00:09.0 VGA compatible controller: ATI Technologies Inc 215CT [Mach64 CT] (rev 41) (prog-if 00 [VGA]) Flags: stepping, medium devsel, IRQ 7 Memory at d1000000 (32-bit, non-prefetchable) [size=16M] Expansion ROM at 88060000 [disabled] [size=64K] 00:0b.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 10) Subsystem: 3Com Corporation 3C941 Gigabit LOM Ethernet Adapter Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 177 Memory at d3000000 (32-bit, non-prefetchable) [size=16K] I/O ports at cc00 [size=256] Expansion ROM at 88000000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data 00:0c.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30) Subsystem: 3Com Corporation 3C905B Fast Etherlink XL 10/100 Flags: bus master, medium devsel, latency 32, IRQ 169 I/O ports at d000 [size=128] Memory at d3004000 (32-bit, non-prefetchable) [size=128] Expansion ROM at 88020000 [disabled] [size=128K] Capabilities: [dc] Power Management version 1 00:0e.0 Mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372/372N (rev 04) Subsystem: Triones Technologies, Inc. HPT370A Flags: bus master, 66Mhz, medium devsel, latency 120, IRQ 185 I/O ports at d400 [size=8] I/O ports at d800 [size=4] I/O ports at dc00 [size=8] I/O ports at e000 [size=4] I/O ports at e400 [size=256] Expansion ROM at 88040000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 # cat /proc/interrupts CPU0 CPU1 0: 86255208 115575456 IO-APIC-edge timer 1: 7 1 IO-APIC-edge i8042 4: 136 146 IO-APIC-edge serial 8: 1 0 IO-APIC-edge rtc 9: 1 0 IO-APIC-level acpi 10: 0 0 IO-APIC-level uhci_hcd:usb1, uhci_hcd:usb2 12: 71 23 IO-APIC-edge i8042 14: 1936004 2200642 IO-APIC-edge ide0 15: 1957196 2227564 IO-APIC-edge ide1 169: 0 0 IO-APIC-level eth0 177: 24238205 13926 IO-APIC-level skge 185: 3713789 4198732 IO-APIC-level ide2, ide3 NMI: 0 0 LOC: 201861862 201862935 ERR: 0 MIS: 0 Software Environment: # cat /proc/version Linux version 2.6.15.4-debug (root@space) (gcc version 3.4.5) #1 SMP PREEMPT Sat Feb 25 12:41:19 CET 2006 # cat /proc/modules bonding 55396 0 - Live 0xf89c3000 # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v2.6.5 (November 4, 2005) Bonding Mode: fault-tolerance (active-backup) Primary Slave: eth0 Currently Active Slave: eth1 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: down Link Failure Count: 0 Permanent HW addr: 00:04:76:90:8b:ba Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:0a:5e:53:ac:1a dmesg: Linux version 2.6.15.4-debug (root@space) (gcc version 3.4.5) #1 SMP PREEMPT Sat Feb 25 12:41:19 CET 2006 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) BIOS-e820: 000000007fff0000 - 000000007fff3000 (ACPI NVS) BIOS-e820: 000000007fff3000 - 0000000080000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 2047MB LOWMEM available. found SMP MP-table at 000f5700 On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:0 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 520176 pages, LIFO batch:31 HighMem zone: 0 pages, LIFO batch:0 DMI 2.3 present. ACPI: RSDP (v000 VIA694 ) @ 0x000f7050 ACPI: RSDT (v001 VIA694 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x7fff3000 ACPI: FADT (v001 VIA694 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x7fff3040 ACPI: MADT (v001 VIA694 0x00000000 0x00000000) @ 0x7fff5640 ACPI: DSDT (v001 VIA694 AWRDACPI 0x00001000 MSFT 0x0100000c) @ 0x00000000 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:8 APIC version 17 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 6:8 APIC version 17 ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 dfl dfl) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000) Built 1 zonelists Kernel command line: auto BOOT_IMAGE=Linux-2.6.15.4d ro root=900 rootflags=data=journal hdb=noprobe console=ttyS0,115200 ide_setup: hdb=noprobe mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000 (fec00000) Initializing CPU#0 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 998.478 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x30 Dentry cache hash table entries: 524288 (order: 9, 2097152 bytes) Inode-cache hash table entries: 262144 (order: 8, 1048576 bytes) Memory: 2070484k/2097088k available (2812k kernel code, 26068k reserved, 993k data, 200k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 1999.41 BogoMIPS (lpj=999707) Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0387fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0387fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU serial number disabled. CPU: After all inits, caps: 0383fbf7 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. mtrr: v2.0 (20020519) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel Pentium III (Coppermine) stepping 0a Booting processor 1/1 eip 3000 Initializing CPU#1 Calibrating delay using timer specific routine.. 1996.47 BogoMIPS (lpj=998238) CPU: After generic identify, caps: 0387fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After vendor identify, caps: 0387fbff 00000000 00000000 00000000 00000000 00000000 00000000 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU serial number disabled. CPU: After all inits, caps: 0383fbf7 00000000 00000000 00000040 00000000 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel Pentium III (Coppermine) stepping 0a Total of 2 processors activated (3995.89 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 checking TSC synchronization across 2 CPUs: passed. Brought up 2 CPUs NET: Registered protocol family 16 ACPI: bus type pci registered PCI: PCI BIOS revision 2.10 entry at 0xfb370, last bus=1 PCI: Using configuration type 1 mtrr: your CPUs had inconsistent variable MTRR settings mtrr: probably your BIOS does not setup all CPUs. mtrr: corrected configuration. ACPI: Subsystem revision 20050902 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 PCI quirk: region 6000-607f claimed by vt82c686 HW-mon PCI quirk: region 5000-500f claimed by vt82c686 SMB Boot video device is 0000:00:09.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 *7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 1 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 1 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 1 3 4 5 6 7 *10 11 12 14 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 11 devices SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report PCI: Bridge: 0000:00:01.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Setting latency timer of device 0000:00:01.0 to 64 IA-32 Microcode Update Driver: v1.14 <tigran@veritas.com> audit: initializing netlink socket (disabled) audit(1140970843.006:1): initialized VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) NTFS driver 2.1.25 [Flags: R/O]. Initializing Cryptographic API io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered PCI: Enabling Via external APIC routing ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [PWRB] Real Time Clock Driver v1.12 PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A 00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:08: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 nbd: registered device at major 43 ACPI: PCI Interrupt 0000:00:0c.0[A] -> GSI 19 (level, low) -> IRQ 169 3c59x version LK1.1.19 eth0: 3Com PCI 3c905B Cyclone 100baseTx at 0xf8802000. 00:04:76:90:8b:ba, IRQ 169 product code 4d4c rev 00.12 date 04-11-01 Internal config register is 1800000, transceivers 0xa. 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 7849. Enabling bus-master transmits and whole-frame receives. eth0: scatter/gather enabled. h/w checksums enabled ACPI: PCI Interrupt 0000:00:0b.0[A] -> GSI 17 (level, low) -> IRQ 177 skge 1.3 addr 0xd3000000 irq 177 chip Yukon rev 1 skge eth1: addr 00:0a:5e:53:ac:1a Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: IDE controller at PCI slot 0000:00:07.1 PCI: Via IRQ fixup for 0000:00:07.1, from 255 to 0 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci0000:00:07.1 ide0: BM-DMA at 0xc000-0xc007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xc008-0xc00f, BIOS settings: hdc:DMA, hdd:pio Probing IDE interface ide0... hda: WDC WD800JB-00JJC0, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: WDC WD800BB-00JHC0, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 HPT370A: IDE controller at PCI slot 0000:00:0e.0 ACPI: PCI Interrupt 0000:00:0e.0[A] -> GSI 18 (level, low) -> IRQ 185 HPT370A: chipset revision 4 HPT370A: 100% native mode on irq 185 HPT37X: using 33MHz PCI clock ide2: BM-DMA at 0xe400-0xe407, BIOS settings: hde:DMA, hdf:pio HPT37X: using 33MHz PCI clock ide3: BM-DMA at 0xe408-0xe40f, BIOS settings: hdg:DMA, hdh:pio Probing IDE interface ide2... hde: WDC WD800JB-00FSA0, ATA DISK drive ide2 at 0xd400-0xd407,0xd802 on irq 185 Probing IDE interface ide3... hdg: WDC WD800JB-00JJC0, ATA DISK drive ide3 at 0xdc00-0xdc07,0xe002 on irq 185 Probing IDE interface ide4... Probing IDE interface ide5... hda: max request size: 128KiB hda: 156301488 sectors (80026 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100) hda: cache flushes supported hda: hda1 hda2 hda3 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 hda12 > hdc: max request size: 128KiB hdc: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100) hdc: cache flushes supported hdc: hdc1 hdc2 hdc3 < hdc5 hdc6 hdc7 hdc8 hdc9 hdc10 hdc11 hdc12 > hde: max request size: 1024KiB hde: 156301488 sectors (80026 MB) w/8192KiB Cache, CHS=16383/255/63, UDMA(100) hde: cache flushes supported hde: hde1 hde2 hde3 < hde5 hde6 hde7 hde8 hde9 hde10 hde11 hde12 > hdg: max request size: 128KiB hdg: 156301488 sectors (80026 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100) hdg: cache flushes supported hdg: hdg1 hdg2 hdg3 < hdg5 hdg6 hdg7 hdg8 hdg9 hdg10 hdg11 hdg12 > libata version 1.20 loaded. USB Universal Host Controller Interface driver v2.3 ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10 ACPI: PCI Interrupt 0000:00:07.2[D] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10 uhci_hcd 0000:00:07.2: UHCI Host Controller uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:07.2: irq 10, io base 0x0000c400 hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:07.3[D] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10 uhci_hcd 0000:00:07.3: UHCI Host Controller uhci_hcd 0000:00:07.3: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:07.3: irq 10, io base 0x0000c800 hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver mice: PS/2 mouse device common for all mice input: PC Speaker as /class/input/input0 md: raid1 personality registered as nr 3 md: raid10 personality registered as nr 9 md: raid5 personality registered as nr 4 raid5: automatically using best checksumming function: pIII_sse input: AT Translated Set 2 keyboard as /class/input/input1 pIII_sse : 1964.000 MB/sec raid5: using function: pIII_sse (1964.000 MB/sec) md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 Netfilter messages via NETLINK v0.30. NET: Registered protocol family 2 IP route cache hash table entries: 131072 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 9, 3145728 bytes) TCP bind hash table entries: 65536 (order: 7, 786432 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered ip_conntrack version 2.4 (8192 buckets, 65536 max) - 228 bytes per conntrack ctnetlink v0.90: registering with nfnetlink. ip_tables: (C) 2000-2002 Netfilter core team input: ImExPS/2 Generic Explorer Mouse as /class/input/input2 ipt_time loading ipt_random match loaded ipt_recent v0.3.1: Stephen Frost <sfrost@snowman.net>. http://snowman.net/projects/ipt_recent/ arp_tables: (C) 2002 David S. Miller TCP bic registered TCP westwood registered TCP highspeed registered TCP hybla registered TCP htcp registered TCP vegas registered TCP scalable registered NET: Registered protocol family 1 NET: Registered protocol family 10 ip6_tables: (C) 2000-2002 Netfilter core team registering ipv6 mark target NET: Registered protocol family 17 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com> All bugs added by David S. Miller <davem@redhat.com> Using IPI No-Shortcut mode ACPI wakeup devices: PCI0 USB0 USB1 MODM UAR1 UAR2 ACPI: (supports S0 S1 S4 S5) BIOS EDD facility v0.16 2004-Jun-25, 4 devices found md: Autodetecting RAID arrays. (...) md: ... autorun DONE. Problem Description: Kernel generates Oops about two or three times per week in random areas. I enabled suggested kernel debuging options and catched two more accurate oopses: CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_PREEMPT=y CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_PAGEALLOC=y Unable to handle kernel paging request at virtual address 252d7a5a printing eip: *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: bonding CPU: 0 EIP: 0060:[<7831dea6>] Not tainted VLI EFLAGS: 00010202 (2.6.15.4-debug) EIP is at skb_copy_bits+0x11b/0x1f5 eax: 00005a5a ebx: 00000002 ecx: 00000002 edx: a75cded8 esi: 252d7a5a edi: a75cded8 ebp: a75cded8 esp: a5219c7c ds: 007b es: 007b ss: 0068 Process httpd (pid: 32547, threadinfo=a5218000 task=b1242ae0) Stack: 00000000 00005a90 00000036 00000000 c99d1f64 a93d1f60 000000aa 00000004 7831d947 a93d1f60 00000036 a75cddf8 00000002 a93d1f60 a93d1f60 f76e8c00 7831d9d9 a93d1f60 000000aa 00000004 00000020 f76e8ebc a93d1f60 f76e8c00 Call Trace: [<7831d947>] skb_copy_expand+0xa7/0xc5 [<7831d9d9>] skb_pad+0x74/0xcb [<782abdad>] skge_xmit_frame+0x45/0x28f [<7832b342>] qdisc_restart+0xdf/0x1b8 [<78321b71>] net_tx_action+0x9c/0xef [<78123279>] __do_softirq+0x55/0xbd [<78123311>] do_softirq+0x30/0x35 [<78123374>] local_bh_enable+0x5e/0x7e [<78321902>] dev_queue_xmit+0x1d8/0x1df [<78338726>] ip_output+0x1e0/0x236 [<78338b67>] ip_queue_xmit+0x3eb/0x461 [<781440d3>] poison_obj+0x21/0x41 [<7814551c>] cache_free_debugcheck+0x1cd/0x1d7 [<78145da4>] kmem_cache_free+0x29/0x5e [<7819da3b>] journal_stop+0x1a0/0x1ac [<78195968>] __ext3_journal_stop+0x19/0x37 [<783470c3>] tcp_transmit_skb+0x596/0x65f [<783be436>] _spin_unlock+0xd/0x21 [<78347dfd>] tcp_write_xmit+0x1be/0x2d3 [<78347f35>] __tcp_push_pending_frames+0x23/0x80 [<7833ff90>] tcp_setsockopt+0x151/0x316 [<7831c71f>] sock_common_setsockopt+0x1e/0x22 [<7831a7da>] sys_setsockopt+0x58/0x70 [<7831ad5d>] sys_socketcall+0x164/0x1a4 [<78157f83>] sys_sendfile+0x5d/0x84 [<78102ddb>] sysenter_past_esp+0x54/0x75 Code: 39 52 78 0f b7 44 ca 1c 89 d9 c1 e9 02 c1 fe 05 c1 e6 0c 8d b4 06 00 00 00 78 03 74 24 28 2b 74 24 08 f3 a5 89 d9 83 e1 03 74 02 <f3> a4 29 5c 24 30 0f 84 bd 00 00 00 01 5c 24 28 01 dd ff 44 24 <0>Kernel panic - not syncing: Fatal exception in interrupt <0>Rebooting in 30 seconds.. Unable to handle kernel paging request at virtual address 252d7a5a printing eip: *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: bonding CPU: 1 EIP: 0060:[<7831de9d>] Not tainted VLI EFLAGS: 00010216 (2.6.15.4-debug) EIP is at skb_copy_bits+0x112/0x1f5 eax: 00005a5a ebx: 00000004 ecx: 00000001 edx: e3526ed8 esi: 252d7a5a edi: e3526ed8 ebp: e3526ed8 esp: b2a29b78 ds: 007b es: 007b ss: 0068 Process httpd (pid: 3922, threadinfo=b2a28000 task=b7b73ae0) Stack: 00000000 00005a90 00000036 00000000 97582f64 92055f60 000000aa 00000002 7831d947 92055f60 00000036 e3526df8 00000004 92055f60 92055f60 f76e7c00 7831d9d9 92055f60 000000aa 00000002 00000020 f76e7ebc 92055f60 f76e7c00 Call Trace: [<7831d947>] skb_copy_expand+0xa7/0xc5 [<7831d9d9>] skb_pad+0x74/0xcb [<782abdad>] skge_xmit_frame+0x45/0x28f [<7832b342>] qdisc_restart+0xdf/0x1b8 [<78321b71>] net_tx_action+0x9c/0xef [<78123279>] __do_softirq+0x55/0xbd [<78123311>] do_softirq+0x30/0x35 [<78123374>] local_bh_enable+0x5e/0x7e [<78321902>] dev_queue_xmit+0x1d8/0x1df [<78338726>] ip_output+0x1e0/0x236 [<78338b67>] ip_queue_xmit+0x3eb/0x461 [<783be49f>] _spin_unlock_irqrestore+0xf/0x23 [<78115cde>] change_page_attr+0x46/0x4d [<7831d157>] kfree_skbmem+0xb/0x70 [<78115dd4>] kernel_map_pages+0x1c/0x48 [<7814550e>] cache_free_debugcheck+0x1bf/0x1d7 [<7831d157>] kfree_skbmem+0xb/0x70 [<78145e59>] kfree+0x45/0x7a [<7831d157>] kfree_skbmem+0xb/0x70 [<78321b30>] net_tx_action+0x5b/0xef [<783470c3>] tcp_transmit_skb+0x596/0x65f [<783be49f>] _spin_unlock_irqrestore+0xf/0x23 [<78115cde>] change_page_attr+0x46/0x4d [<781440d3>] poison_obj+0x21/0x41 [<78347dfd>] tcp_write_xmit+0x1be/0x2d3 [<78347f35>] __tcp_push_pending_frames+0x23/0x80 [<7833df71>] do_tcp_sendpages+0x54a/0x574 [<7833dfe7>] tcp_sendpage+0x4c/0x5f [<7831998a>] sock_sendpage+0x3a/0x3e [<7813e210>] file_send_actor+0x32/0x49 [<7813dbc3>] do_generic_mapping_read+0x170/0x3ed [<7813e26e>] generic_file_sendfile+0x47/0x58 [<7813e1de>] file_send_actor+0x0/0x49 [<78157e8c>] do_sendfile+0x1a3/0x23d [<7813e1de>] file_send_actor+0x0/0x49 [<78157f70>] sys_sendfile+0x4a/0x84 [<78102ddb>] sysenter_past_esp+0x54/0x75 Code: 24 30 8b 74 02 18 2b 35 90 39 52 78 0f b7 44 ca 1c 89 d9 c1 e9 02 c1 fe 05 c1 e6 0c 8d b4 06 00 00 00 78 03 74 24 28 2b 74 24 08 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 29 5c 24 30 0f 84 bd 00 00 00 <0>Kernel panic - not syncing: Fatal exception in interrupt <0>Rebooting in 30 seconds.. I also tried the skge-fix-napi-irq-race but it didn't help. After some tests I finally discovered that disabling rand-robin irq balancing (echo 1 > /proc/irq/177/smp_affinity) helps - there have been no oopses for nearly three days. Everyting is fine when only other NIC (3c90B) is plugged to network. This Marvell based 3c940 NIC is know to work without problems in other (UP) server.
I'm having a similar problem with 2.6.15.6 on a Athlon64 X2 3800+ running 64 bit gentoo. The motherboard is an ASUS A8N-SLI nForce4 based board with two integrated NICs, one Marvell 88E8001 and one nVidia. The nVidia NIC works fine, but using the Marvell NIC with the skge driver eventually causes the system to lock up hard. It takes a while, but usually ~10 minutes of heavy NFS traffic (>20 MB/s) will break the system. It's not a hardware issue, since the Marvell NIC works fine (albeit slower and less efficiently) with the in-kernel sk98lin driver. The problem only manifests when using a SMP kernel. Setting smp_affinity to 1 on the skge interrupt (82 on my system) seems to make the problem go away. Smells like the race condition problems haven't quite been fixed yet. One minor complication: I'm using the loop-aes 3.1c patch and have disk encryption on all of my drives. Perhaps this is the source of the problem. Here are a list of things that don't seem to have any effect on the problem: Over/Underclocking the system 2.6.16-rc5 Linux Vserver patches Preempt vs. Non-Preempt Monkeying around with the interrupt coalescing settings with ethtool Side note: The newer sk98lin driver from SysKonnect causes the system to crash spectacularly whenever any NFS traffic occurs unless a big chunk of SSH traffic (>10MB) occurs first. If the SSH transfer occurs first, the system will be rock solid -- hours of high-speed data transfer -- from then on out.
On Wed, 15 Mar 2006, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=6142 > > > > > > ------- Additional Comments From robert@firehead.org 2006-03-15 15:18 ------- > I'm having a similar problem with 2.6.15.6 on a Athlon64 X2 3800+ running 64 bit > gentoo. The motherboard is an ASUS A8N-SLI nForce4 based board with two > integrated NICs, one Marvell 88E8001 and one nVidia. The nVidia NIC works fine, > but using the Marvell NIC with the skge driver eventually causes the system to > lock up hard. It takes a while, but usually ~10 minutes of heavy NFS traffic > (>20 MB/s) will break the system. It's not a hardware issue, since the Marvell > NIC works fine (albeit slower and less efficiently) with the in-kernel sk98lin > driver. The problem only manifests when using a SMP kernel. > > Setting smp_affinity to 1 on the skge interrupt (82 on my system) seems to make > the problem go away. Smells like the race condition problems haven't quite been > fixed yet. One minor complication: I'm using the loop-aes 3.1c patch and have > disk encryption on all of my drives. Perhaps this is the source of the problem. > > Here are a list of things that don't seem to have any effect on the problem: > > Over/Underclocking the system > 2.6.16-rc5 > Linux Vserver patches > Preempt vs. Non-Preempt > Monkeying around with the interrupt coalescing settings with ethtool > > > Side note: The newer sk98lin driver from SysKonnect causes the system to crash > spectacularly whenever any NFS traffic occurs unless a big chunk of SSH traffic > (>10MB) occurs first. If the SSH transfer occurs first, the system will be rock > solid -- hours of high-speed data transfer -- from then on out. > You may also try to disable rx and/or tx csum. With disabled rx&tx hardware csuming my system is stable even with smp_affinity set to 3. Now I only need to test what is the real problem: rx or tx... Best regards, Krzysztof Ol
Please retest with new 1.4 version (post 2.6.16). You can find diff from 2.6.16 version at: http://developer.osdl.org/shemminger/prototypes/skge-1.4.diff
Applied skge 1.4 patch to 2.6.16-vserver (presence or absence of vserver had no effect on crashes previously). This time the system locked up within a few minutes of heavy NFS traffic, so it seems the bug is still there. SMP affinity setting decreased the frequenct of crashing, but did not eliminate the problem entirely. Turning off tx and rx checksumming with ethtool -K seems to have made the bug go away for now. This caused a performance hit of about 20% which I was able to get rid of by messing with the interrupt coalescing settings on all the machines.
Please send full .config of a non-working system. I can't reproduce this with an old P3 SMP box, and 2.6.16.6 so something different is going on. It may have something to do with bonding or vlan's. I saw the bonding config, are you using VLAN's as well?
Please reopen this bug if: - it is still present in kernel 2.6.17 and - you can provide the requested information.
Created attachment 8711 [details] Full .config file
The bug still exists in 2.6.17. Anyway, it take some time before system crashes - sometimes even day or two and this server is quite busy (pop3/imap/smtp/amavis/apache/mysql/etc). For now I'm happy with the "/usr/sbin/ethtool -K eth1 tx off" workaround.
Ah, I don't use vlans on this server - only bonding (active/backup) with eth0+eth1.
Created attachment 8900 [details] possible IRQ race fix This changes order of lock and irq register read that could theoritically cause problems.
The problems should now be fixed in 2.6.17.13 and 2.6.18