Bug 7546
Summary: | sky2 transmit timed out and finally kernel crash, marvell 88E8053 | ||
---|---|---|---|
Product: | Networking | Reporter: | Regis Damongeot (regis.damongeot) |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | aros, bryce, bunk, erik, int, jerome.venturi, lindqvist, lkmlist, mathias.behrle, mvalsasna, peter, petr, pkdawson, roy.franz, serge, slavon.net, stefan, strerror, tony, ulrich |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.19-rc5 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Kernel configuration for my system with sky2 driver
Kernel config for 2.6.19-rc6 (egon2003) Kernel config 2.6.19-rc6 with sky2 Full log for sky2 (cat /var/log/messages | grep sky2) turn flow control off errorlog from 2.6.20.2 NAPI poll fix patch against 2.6.22-rc4 or later. missed IRQ workaround |
Description
Regis Damongeot
2006-11-18 01:28:07 UTC
Hi, Kernel version : 2.6.19-rc6 Distribution: Gentoo 2006.1 Hardware Environment : Marvell Technology Group Ltd. 88E8053 Gigabit Ethernet Controller (rev 20) I've got exactly the same problem as described I'm not downloading throught bittorent but throught Newsgroups Just freeze :s Here is a log from syslog-ng : Nov 26 04:24:21 Gentoo-LiNuX sky2 eth1: tx timeout Nov 26 04:24:21 Gentoo-LiNuX sky2 eth1: transmit ring 278 .. 255 report=278 done=278 Nov 26 04:24:21 Gentoo-LiNuX sky2 hardware hung? flushing Then can't do anything elese .. no keyboard, no mouse, no thing --> hard reset Kernel 2.6.19-rc6 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19) Distribution: Gentoo I also get hangs pretty often. Usually within the hour when running bitorrent. My internetconnection is 10MB full duplex. From the log: Nov 27 17:22:53 elite sky2 v1.10 addr 0xcdefc000 irq 17 Yukon-EC (0xb6) rev 2 Nov 27 17:22:53 elite sky2 eth0: addr 00:17:31:84:3f:28 Nov 27 17:22:55 elite sky2 eth0: enabling interface Nov 27 17:22:58 elite sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both Nov 27 17:57:23 elite NETDEV WATCHDOG: eth0: transmit timed out Nov 27 17:57:23 elite sky2 eth0: tx timeout Nov 27 17:57:23 elite sky2 eth0: transmit ring 287 .. 264 report=287 done=287 Nov 27 17:57:23 elite sky2 hardware hung? flushing Nov 27 17:57:32 elite BUG: soft lockup detected on CPU#0! Nov 27 17:57:32 elite [<c013c520>] [<c0121422>] [<c010d1be>] [<c0103613>] [<c0394f52>] [<f8f57d8f>] [<f8f59ef5>] [<c0354631>] [<c03546ad>] [<c012137c>] [<c011d804>] [<c01052fd>] [<c012143f>] [<c010d1c3>] [<c0103613>] [<c0100fb2>] [<c0100fce>] [<c0101a35>] [<c0437795>] [<c04371e0>] ======================= I need to know, based on past experience. IRQ routing: cat /proc/interrupts Kernel config Hardware config (lspci) and if possible motherboard vendor Full kernel log of sky2 messages dmesg | grep sky2 cat /proc/interrupts CPU0 CPU1 0: 1202 846636 IO-APIC-edge timer 1: 4342 0 IO-APIC-edge i8042 7: 0 0 IO-APIC-edge parport0 8: 2 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 10: 0 0 IO-APIC-edge MPU401 UART 12: 72434 86935 IO-APIC-edge i8042 14: 150366 0 IO-APIC-edge ide0 16: 98606 793965 IO-APIC-fasteoi nvidia 17: 2758017 2732033 IO-APIC-fasteoi uhci_hcd:usb5, eth0 18: 111945 0 IO-APIC-fasteoi EMU10K1 19: 35978 56133 IO-APIC-fasteoi libata, ohci1394 20: 0 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 22: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 NMI: 0 0 LOC: 837057 837056 ERR: 0 MIS: 0 Motherboard is a ASUS P5LD2 lspci: 00:00.0 Host bridge: Intel Corporation 945G/GZ/P/PL Express Memory Controller Hub (rev 02) 00:01.0 PCI bridge: Intel Corporation 945G/GZ/P/PL Express PCI Express Root Port (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:02.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04) 01:02.1 Input device controller: Creative Labs SB Audigy Game Port (rev 04) 01:02.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04) 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19) 04:00.0 VGA compatible controller: nVidia Corporation GeForce 7900 GT (rev a1) dmesg |grep sky2 sky2 v1.10 addr 0xcdefc000 irq 17 Yukon-EC (0xb6) rev 2 sky2 eth0: addr 00:17:31:84:3f:28 sky2 eth0: enabling interface sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both adding kernel config as attachment Created attachment 9642 [details]
Kernel configuration for my system with sky2 driver
Created attachment 9643 [details]
Kernel config for 2.6.19-rc6 (egon2003)
cat /proc/interrupts CPU0 CPU1 0: 216807783 0 IO-APIC-edge timer 1: 2 0 IO-APIC-edge i8042 4: 193261 0 IO-APIC-edge serial 6: 3 0 IO-APIC-edge floppy 8: 0 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12: 3 0 IO-APIC-edge i8042 14: 1462014 0 IO-APIC-edge ide0 16: 74813181 0 IO-APIC-fasteoi radeon@pci:0000:04:00.0 17: 125492 0 IO-APIC-fasteoi uhci_hcd:usb3 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 19: 38077760 0 IO-APIC-fasteoi eth0, uhci_hcd:usb5, HDA Intel 20: 13920035 0 IO-APIC-fasteoi ide4, ide5, ehci_hcd:usb1, uhci_hcd:usb2 21: 407053844 0 IO-APIC-fasteoi eth1 22: 13752946 0 IO-APIC-fasteoi ide2, ide3 23: 6091017 0 IO-APIC-fasteoi libata NMI: 150612 123474 LOC: 554264226 554264167 ERR: 0 /sbin/lspci 00:00.0 Host bridge: Intel Corporation 945G/P Memory Controller Hub (rev 02) 00:01.0 PCI bridge: Intel Corporation 945G/P PCI Express Graphics Port (rev 02) 00:1b.0 Class 0403: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 Class 0106: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controllers cc=AHCI (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 01:02.0 Mass storage controller: Promise Technology, Inc. PDC20268 (Ultra100 TX2) (rev 02) 01:03.0 Mass storage controller: Integrated Technology Express, Inc. ITE 8211F Single Channel UDMA 133 (ASUS 8211 (ITE IT8212 ATA RAID Controller)) (rev 11) 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit Ethernet Controller (rev 19) 04:00.0 VGA compatible controller: ATI Technologies Inc RV380 0x3e50 [Radeon X600] 04:00.1 Display controller: ATI Technologies Inc RV380 [Radeon X600] Secondary My motherboard is an Asus P5LD2 (pcb v1) with an integrated Marvell 88E8053 Gigabit Ethernet Controller. grep sky2 /var/log/syslog.2 Nov 15 07:57:35 regis kernel: sky2 eth0: tx timeout (syslog.2 contains boot and crash of the kernel) cat /proc/interrupts CPU0 CPU1 0: 16717 485853 IO-APIC-edge timer 1: 130 0 IO-APIC-edge i8042 6: 5 0 IO-APIC-edge floppy 8: 17 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12: 93653 0 IO-APIC-edge i8042 14: 5145 0 IO-APIC-edge libata 15: 34255 0 IO-APIC-edge libata 16: 3 0 IO-APIC-fasteoi ohci1394 17: 51994 0 IO-APIC-fasteoi eth1, nvidia 18: 658 0 IO-APIC-fasteoi uhci_hcd:usb5, HDA Intel, eth0 19: 5532 0 IO-APIC-fasteoi aic7xxx 20: 66 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 22: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 23: 8656 0 IO-APIC-fasteoi EMU10K1 NMI: 0 0 LOC: 502400 502904 ERR: 0 MIS: 0 Mother Board is : Asus P5W DH Deluxe lspci : 00:00.0 Host bridge: Intel Corporation Memory Controller Hub (rev c0) 00:01.0 PCI bridge: Intel Corporation PCI Express Graphics Port (rev c0) 00:1b.0 Class 0403: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) Serial ATA Storage Controllers cc=IDE (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:01.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07) 01:01.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07) 01:02.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02) 01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit Ethernet Controller (rev 20) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit Ethernet Controller (rev 20) 05:00.0 VGA compatible controller: nVidia Corporation Unknown device 0391 (rev a1) Created attachment 9645 [details]
Kernel config 2.6.19-rc6 with sky2
Created attachment 9646 [details]
Full log for sky2 (cat /var/log/messages | grep sky2)
Please reproduce the problem without the proprietary nvidia module. If it can not be reproduced then please close. I have removed the proprietary nvidia module and I have the same problem. Systemcrash after about an hour of running bittorrent with heavy traffic. oops, i still had the module loaded, but i wanst using it. Trying without it now... I still get hangs without the proprietary nvidia module. Nov 29 22:26:07 elite NETDEV WATCHDOG: eth0: transmit timed out Nov 29 22:26:07 elite sky2 eth0: tx timeout Nov 29 22:26:07 elite sky2 eth0: transmit ring 177 .. 156 report=177 done=177 Nov 29 22:26:07 elite sky2 hardware hung? flushing Nov 29 22:26:16 elite BUG: soft lockup detected on CPU#0! Nov 29 22:26:16 elite [<c013c520>] [<c0121422>] [<c010d1be>] [<c0103613>] [<c011007b>] [<c0394f52>] [<f8f6ed8f>] [<f8f70ef5>] [<c0354631>] [<c03546ad>] [<c012137c>] [<c011d804>] [<c01052fd>] [<c012143f>] [<c010d1c3>] [<c0103613>] [<c0100fb2>] [<c0100fce>] [<c0101a35>] [<c0437795>] [<c04371e0>] ======================= Yes, without the nvidia module, the problem is still here Another step to reproduce (a bit harder) : Download throught newsgroup on eth0 and burn an dvd throught eth1 (cifs) then freeze ... Both eth0 and eth1 are using sky2 driver Just compil 2.6.19-git3 and I still get this problem *** Bug 7670 has been marked as a duplicate of this bug. *** I have for the last 4 days been using a add in nic with a realtek 8139 chip. I have not had one hang during this time, bittorrent has been running when the computer has been running. Transmit flow control seems to cause hardware problem. I can reproduce a hang with Tx flow control, but if I turn off flow control, it won't hang (at least after 48hrs its okay).... I sent a patch to disable flow control by default. Created attachment 10292 [details]
turn flow control off
Compiled 2.6.20-gentoo with the patch below. I still get hangs. From the log below (there are lots more of the BUG: soft lockup) Feb 6 19:06:01 elite NETDEV WATCHDOG: eth0: transmit timed out Feb 6 19:06:01 elite sky2 eth0: tx timeout Feb 6 19:06:01 elite sky2 eth0: transmit ring 125 .. 102 report=125 done=125 Feb 6 19:06:01 elite sky2 hardware hung? flushing Feb 6 19:06:10 elite BUG: soft lockup detected on CPU#0! Feb 6 19:06:10 elite [<c0138165>] [<c011f0b3>] [<c010bc7e>] [<c0103434>] [<c036710a>] [<f9e61c46>] [<f9e63b7e>] [<c032a77b>] [<c032a7ee$ Feb 6 19:06:20 elite BUG: soft lockup detected on CPU#0! Feb 6 19:06:20 elite [<c0138165>] [<c011f0b3>] [<c010bc7e>] [<c0103434>] [<c036710a>] [<f9e61c46>] [<f9e63b7e>] [<c032a77b>] [<c032a7ee$ Feb 6 19:06:30 elite BUG: soft lockup detected on CPU#0! Feb 6 19:06:30 elite [<c0138165>] [<c011f0b3>] [<c010bc7e>] [<c0103434>] [<c036710d>] [<f9e61c46>] [<f9e63b7e>] [<c032a77b>] [<c032a7ee$ Feb 6 19:06:40 elite BUG: soft lockup detected on CPU#0! Feb 6 19:06:40 elite [<c0138165>] [<c011f0b3>] [<c010bc7e>] [<c0103434>] [<c0367108>] [<f9e61c46>] [<f9e63b7e>] [<c032a77b>] [<c032a7ee$ Feb 6 19:06:50 elite BUG: soft lockup detected on CPU#0! Feb 6 19:06:50 elite [<c0138165>] [<c011f0b3>] [<c010bc7e>] [<c0103434>] [<c036710a>] [<f9e61c46>] [<f9e63b7e>] [<c032a77b>] [<c032a7ee$ Feb 6 19:07:00 elite BUG: soft lockup detected on CPU#0! Feb 6 19:07:00 elite [<c0138165>] [<c011f0b3>] [<c010bc7e>] [<c0103434>] [<c036710d>] [<f9e61c46>] [<f9e63b7e>] [<c032a77b>] [<c032a7ee$ Feb 6 19:07:10 elite BUG: soft lockup detected on CPU#0! Feb 6 19:07:10 elite [<c0138165>] [<c011f0b3>] [<c010bc7e>] [<c0103434>] [<c036710d>] [<f9e61c46>] [<f9e63b7e>] [<c032a77b>] [<c032a7ee$ Feb 6 19:07:20 elite BUG: soft lockup detected on CPU#0! It still crash for me too with 2.6.20 and flow control off. syslog: Feb 10 01:16:42 regis kernel: sky2 eth0: tx timeout /var/log/message: Feb 10 01:16:42 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out Feb 10 01:16:42 regis kernel: sky2 hardware hung? flushing After that, keyboard is useless, mouse works but all app seems to crash (can't leave X either, kde crash when leaving X and all is frozen then). *** Bug 7647 has been marked as a duplicate of this bug. *** Same problem here. Even on minimal load my machine hangs... mouse still works but ha hardreset is required. Currently I am testing the workaround mentioned here: http://bugzilla.kernel.org/show_bug.cgi?id=6839#c12 (ethtool -A eth0 autoneg off rx on tx on) Some infos I can contribute: I got a few of these softlocks I actually dont know if they belong to the sky2 problem. When I got these softlocks the pc freezes (mouse works) but after a minute or so the system is responsible again. BUG: soft lockup detected on CPU#0! Call Trace: <IRQ> [<ffffffff80254ba1>] softlockup_tick+0xdb/0xed [<ffffffff8023b6d4>] update_process_times+0x42/0x68 [<ffffffff80217c62>] smp_local_timer_interrupt+0x34/0x55 [<ffffffff80218329>] smp_apic_timer_interrupt+0x52/0x6a [<ffffffff8020a136>] apic_timer_interrupt+0x66/0x70 [<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 [<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 [<ffffffff8025639e>] handle_edge_irq+0xec/0x132 [<ffffffff8020bb34>] do_IRQ+0x100/0x15c [<ffffffff80208964>] mwait_idle+0x0/0x48 [<ffffffff80209a81>] ret_from_intr+0x0/0xa <EOI> [<ffffffff802089a6>] mwait_idle+0x42/0x48 [<ffffffff802088fc>] cpu_idle+0x8b/0xae [<ffffffff807a06d8>] start_kernel+0x218/0x21d [<ffffffff807a015c>] _sinittext+0x15c/0x160 irq 18: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff80255837>] __report_bad_irq+0x30/0x7d [<ffffffff80255a65>] note_interrupt+0x1e1/0x224 [<ffffffff8025628b>] handle_fasteoi_irq+0x9e/0xc5 [<ffffffff8020bb34>] do_IRQ+0x100/0x15c [<ffffffff80209a81>] ret_from_intr+0x0/0xa [<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 [<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 [<ffffffff8025639e>] handle_edge_irq+0xec/0x132 [<ffffffff8020bb34>] do_IRQ+0x100/0x15c [<ffffffff80208964>] mwait_idle+0x0/0x48 [<ffffffff80209a81>] ret_from_intr+0x0/0xa <EOI> [<ffffffff802089a6>] mwait_idle+0x42/0x48 [<ffffffff802088fc>] cpu_idle+0x8b/0xae [<ffffffff807a06d8>] start_kernel+0x218/0x21d [<ffffffff807a015c>] _sinittext+0x15c/0x160 handlers: [<ffffffff8048137c>] (usb_hcd_irq+0x0/0x52) Disabling IRQ #18 ANd another one: BUG: soft lockup detected on CPU#0! Call Trace: <IRQ> [<ffffffff80254ba1>] softlockup_tick+0xdb/0xed [<ffffffff8023b6d4>] update_process_times+0x42/0x68 [<ffffffff80217c62>] smp_local_timer_interrupt+0x34/0x55 [<ffffffff80218329>] smp_apic_timer_interrupt+0x52/0x6a [<ffffffff8020a136>] apic_timer_interrupt+0x66/0x70 [<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 [<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 [<ffffffff8025639e>] handle_edge_irq+0xec/0x132 [<ffffffff8020bb34>] do_IRQ+0x100/0x15c [<ffffffff80209a81>] ret_from_intr+0x0/0xa [<ffffffff8044f8e1>] ata_scsi_qc_complete+0x0/0x20f [<ffffffff80237a09>] __do_softirq+0x4a/0xc3 [<ffffffff80219c84>] ack_apic_level+0x33/0x47 [<ffffffff8020a68c>] call_softirq+0x1c/0x28 [<ffffffff8020b9e3>] do_softirq+0x2c/0x7d [<ffffffff8020bb6d>] do_IRQ+0x139/0x15c [<ffffffff80209a81>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8056b8e8>] thread_return+0x56/0xed [<ffffffff80208964>] mwait_idle+0x0/0x48 [<ffffffff8020891d>] cpu_idle+0xac/0xae [<ffffffff807a06d8>] start_kernel+0x218/0x21d [<ffffffff807a015c>] _sinittext+0x15c/0x160 irq 1272: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff80255837>] __report_bad_irq+0x30/0x7d [<ffffffff80255a65>] note_interrupt+0x1e1/0x224 [<ffffffff802563b3>] handle_edge_irq+0x101/0x132 [<ffffffff8020bb34>] do_IRQ+0x100/0x15c [<ffffffff80209a81>] ret_from_intr+0x0/0xa [<ffffffff8044f8e1>] ata_scsi_qc_complete+0x0/0x20f [<ffffffff80237a09>] __do_softirq+0x4a/0xc3 [<ffffffff80219c84>] ack_apic_level+0x33/0x47 [<ffffffff8020a68c>] call_softirq+0x1c/0x28 [<ffffffff8020b9e3>] do_softirq+0x2c/0x7d [<ffffffff8020bb6d>] do_IRQ+0x139/0x15c [<ffffffff80209a81>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8056b8e8>] thread_return+0x56/0xed [<ffffffff80208964>] mwait_idle+0x0/0x48 [<ffffffff8020891d>] cpu_idle+0xac/0xae [<ffffffff807a06d8>] start_kernel+0x218/0x21d [<ffffffff807a015c>] _sinittext+0x15c/0x160 handlers: [<ffffffff80454f9d>] (ahci_interrupt+0x0/0x3cd) Disabling IRQ #1272 operapluginwrap[8324]: segfault at 00000000000004d1 rip 00000000f7d919f6 rsp 00000000ffd0b2a0 error 4 BUG: soft lockup detected on CPU#0! Call Trace: <IRQ> [<ffffffff80254ba1>] softlockup_tick+0xdb/0xed [<ffffffff8023b6d4>] update_process_times+0x42/0x68 [<ffffffff80217c62>] smp_local_timer_interrupt+0x34/0x55 [<ffffffff80218329>] smp_apic_timer_interrupt+0x52/0x6a [<ffffffff8020a136>] apic_timer_interrupt+0x66/0x70 [<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 [<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 [<ffffffff8025639e>] handle_edge_irq+0xec/0x132 [<ffffffff8020a68c>] call_softirq+0x1c/0x28 [<ffffffff8020bb34>] do_IRQ+0x100/0x15c [<ffffffff80209a81>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8056d433>] _spin_unlock_irqrestore+0x8/0x9 [<ffffffff8044bb77>] ata_hsm_move+0x214/0x668 [<ffffffff80244abe>] keventd_create_kthread+0x0/0x65 [<ffffffff8044e585>] ata_pio_task+0x0/0xe9 [<ffffffff8044e661>] ata_pio_task+0xdc/0xe9 [<ffffffff80241754>] run_workqueue+0x8f/0x137 [<ffffffff80242059>] worker_thread+0x0/0x14a [<ffffffff80244abe>] keventd_create_kthread+0x0/0x65 [<ffffffff8024216d>] worker_thread+0x114/0x14a [<ffffffff8022d0d7>] default_wake_function+0x0/0xe [<ffffffff80244d14>] kthread+0xd1/0x101 [<ffffffff8020a318>] child_rip+0xa/0x12 [<ffffffff80244abe>] keventd_create_kthread+0x0/0x65 [<ffffffff80244c43>] kthread+0x0/0x101 [<ffffffff8020a30e>] child_rip+0x0/0x12 Infos about my installed Linux: >>> cfg-update-1.8.0-r6: No new packages have been emerged, checksum index OK! Portage 2.1.2-r9 (default-linux/amd64/2006.1, gcc-4.1.1, glibc-2.5-r0, 2.6.20- gentoo x86_64) ================================================================= System uname: 2.6.20-gentoo x86_64 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Gentoo Base System release 1.12.9 Timestamp of tree: Sat, 17 Feb 2007 16:50:01 +0000 dev-java/java-config: 1.3.7, 2.0.31 dev-lang/python: 2.4.3-r4 dev-python/pycrypto: 2.0.1-r5 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.61 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.14 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r1 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/ shutdown /usr/share/X11/xkb /usr/share/config" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/ vms/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c" CXXFLAGS="-O2 -pipe" I hope these infos are usefull. Regards Bjoern Okay, with the workaround I have no more _noticeable_ softlocks, but now I got the following: Feb 18 21:02:58 freax sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none Feb 18 21:10:01 freax cron[8113]: (root) CMD (test -x /usr/sbin/run-crons && / usr/sbin/run-crons ) Feb 18 21:11:53 freax BUG: soft lockup detected on CPU#0! Feb 18 21:11:53 freax Feb 18 21:11:53 freax Call Trace: Feb 18 21:11:53 freax <IRQ> [<ffffffff80255638>] softlockup_tick+0xda/0xf5 Feb 18 21:11:53 freax [<ffffffff8023b929>] update_process_times+0x42/0x68 Feb 18 21:11:53 freax [<ffffffff802173c5>] smp_local_timer_interrupt+0x34/0x52 Feb 18 21:11:53 freax [<ffffffff80217a78>] smp_apic_timer_interrupt+0x49/0x61 Feb 18 21:11:53 freax [<ffffffff8020a256>] apic_timer_interrupt+0x66/0x70 Feb 18 21:11:53 freax [<ffffffff802558f8>] handle_IRQ_event+0x1a/0x53 Feb 18 21:11:53 freax [<ffffffff80256a56>] handle_edge_irq+0xec/0x133 Feb 18 21:11:53 freax [<ffffffff8020a7ac>] call_softirq+0x1c/0x28 Feb 18 21:11:53 freax [<ffffffff8020bc7f>] do_IRQ+0xf7/0x150 Feb 18 21:11:53 freax [<ffffffff80209b71>] ret_from_intr+0x0/0xa Feb 18 21:11:53 freax <EOI> [<ffffffff80242598>] worker_thread+0x0/0x14a Feb 18 21:11:53 freax [<ffffffff80570fed>] __sched_text_start+0x12d/0x9d6 Feb 18 21:11:53 freax [<ffffffff80229c4c>] __wake_up_common+0x3e/0x68 Feb 18 21:11:53 freax [<ffffffff8022a199>] __wake_up+0x38/0x4f Feb 18 21:11:53 freax [<ffffffff805738c6>] _spin_unlock_irqrestore+0x16/0x31 Feb 18 21:11:53 freax [<ffffffff80242598>] worker_thread+0x0/0x14a Feb 18 21:11:53 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a Feb 18 21:11:53 freax [<ffffffff80242598>] worker_thread+0x0/0x14a Feb 18 21:11:53 freax [<ffffffff8024267c>] worker_thread+0xe4/0x14a Feb 18 21:11:53 freax [<ffffffff8022cb73>] default_wake_function+0x0/0xe Feb 18 21:11:53 freax [<ffffffff802453bd>] kthread+0xd1/0x100 Feb 18 21:11:53 freax [<ffffffff8020a438>] child_rip+0xa/0x12 Feb 18 21:11:53 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a Feb 18 21:11:53 freax [<ffffffff802452ec>] kthread+0x0/0x100 Feb 18 21:11:53 freax [<ffffffff8020a42e>] child_rip+0x0/0x12 Feb 18 21:11:53 freax Feb 18 21:12:28 freax irq 18: nobody cared (try booting with the "irqpoll" option) Feb 18 21:12:28 freax Feb 18 21:12:28 freax Call Trace: Feb 18 21:12:28 freax <IRQ> [<ffffffff802562eb>] __report_bad_irq+0x30/0x7d Feb 18 21:12:28 freax [<ffffffff802564ff>] note_interrupt+0x1c7/0x20c Feb 18 21:12:28 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a Feb 18 21:12:28 freax [<ffffffff80256b44>] handle_fasteoi_irq+0xa7/0xd0 Feb 18 21:12:28 freax [<ffffffff8020bc7f>] do_IRQ+0xf7/0x150 Feb 18 21:12:28 freax [<ffffffff80209b71>] ret_from_intr+0x0/0xa Feb 18 21:12:28 freax [<ffffffff802558f8>] handle_IRQ_event+0x1a/0x53 Feb 18 21:12:28 freax [<ffffffff80256a56>] handle_edge_irq+0xec/0x133 Feb 18 21:12:28 freax [<ffffffff8020a7ac>] call_softirq+0x1c/0x28 Feb 18 21:12:28 freax [<ffffffff8020bc7f>] do_IRQ+0xf7/0x150 Feb 18 21:12:28 freax [<ffffffff80209b71>] ret_from_intr+0x0/0xa Feb 18 21:12:28 freax <EOI> [<ffffffff80242598>] worker_thread+0x0/0x14a Feb 18 21:12:28 freax [<ffffffff80570fed>] __sched_text_start+0x12d/0x9d6 Feb 18 21:12:28 freax [<ffffffff80229c4c>] __wake_up_common+0x3e/0x68 Feb 18 21:12:28 freax [<ffffffff8022a199>] __wake_up+0x38/0x4f Feb 18 21:12:28 freax [<ffffffff805738c6>] _spin_unlock_irqrestore+0x16/0x31 Feb 18 21:12:28 freax [<ffffffff80242598>] worker_thread+0x0/0x14a Feb 18 21:12:28 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a Feb 18 21:12:28 freax [<ffffffff80242598>] worker_thread+0x0/0x14a Feb 18 21:12:28 freax [<ffffffff8024267c>] worker_thread+0xe4/0x14a Feb 18 21:12:28 freax [<ffffffff8022cb73>] default_wake_function+0x0/0xe Feb 18 21:12:28 freax [<ffffffff802453bd>] kthread+0xd1/0x100 Feb 18 21:12:28 freax [<ffffffff8020a438>] child_rip+0xa/0x12 Feb 18 21:12:28 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a Feb 18 21:12:28 freax [<ffffffff802452ec>] kthread+0x0/0x100 Feb 18 21:12:28 freax [<ffffffff8020a42e>] child_rip+0x0/0x12 Feb 18 21:12:28 freax Feb 18 21:12:28 freax handlers: Feb 18 21:12:28 freax [<ffffffff8048e824>] (usb_hcd_irq+0x0/0x52) Feb 18 21:12:28 freax Disabling IRQ #18 Feb 18 21:20:01 freax cron[8136]: (root) CMD (test -x /usr/sbin/run-crons && / usr/sbin/run-crons ) Feb 18 21:30:01 freax cron[8183]: (root) CMD (test -x /usr/sbin/run-crons && / usr/sbin/run-crons ) Feb 18 21:30:36 freax sky2 eth0: rx error, status 0x5350002 length 1333 Feb 18 21:30:36 freax sky2 eth0: rx error, status 0x4d50002 length 1237 Feb 18 21:34:13 freax sky2 eth0: rx error, status 0x1b10002 length 433 Feb 18 21:34:14 freax sky2 eth0: rx error, status 0x2470002 length 583 Feb 18 21:34:16 freax sky2 eth0: rx error, status 0x7f0002 length 127 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x1f30002 length 499 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x58b0002 length 1419 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x4fd0002 length 1277 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x58a0002 length 1418 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x5900002 length 1424 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x57b0002 length 1403 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x50f0002 length 1295 Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x2680002 length 616 Feb 18 21:34:26 freax printk: 2 messages suppressed. Hope this helps in a way best regards Bjoern Update that fixes this problem is now in 2.6.21-rc1 and submitted to 2.6.20 stable series Created attachment 10680 [details]
errorlog from 2.6.20.2
I still get this error with 2.6.20.2. No hang but the network goes down.
Maybe similar problem that I experienced: http://bugzilla.kernel.org/show_bug.cgi?id=8091 It stills crash with 2.6.20.2 for me too. Syslog: Mar 13 17:29:48 regis kernel: sky2 eth0: tx timeout Mar 13 17:30:43 regis kernel: sky2 eth0: tx timeout (repeated endlessly) Messages: Mar 13 11:26:47 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 13 11:26:47 regis kernel: sky2 hardware hung? flushing Mar 13 11:37:07 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 13 11:37:07 regis kernel: sky2 status report lost? Mar 13 11:37:52 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 13 11:37:52 regis kernel: sky2 hardware hung? flushing (repeated endlessly too) A couple of questions: 1. Are you using hardware flow control (it is on by default), if so what kind of switch/hub are you connected to? what are the pause statistics? ethtool -S eth0 2. What chip version? dmesg | grep sky2 1) If hardware flow control is activated by default then I use it (but you have posted previously a patch disabling flow control and the bug was still present). I use a netgear gigabit switch (GS605). root@regis:/home/regis# ethtool -S eth0 NIC statistics: tx_bytes: 6571 rx_bytes: 4318 tx_broadcast: 47 rx_broadcast: 28 tx_multicast: 0 rx_multicast: 0 tx_unicast: 0 rx_unicast: 0 tx_mac_pause: 0 rx_mac_pause: 0 collisions: 0 late_collision: 0 aborted: 0 single_collisions: 0 multi_collisions: 0 rx_short: 0 rx_runt: 0 rx_64_byte_packets: 1 rx_65_to_127_byte_packets: 3 rx_128_to_255_byte_packets: 22 rx_256_to_511_byte_packets: 2 rx_512_to_1023_byte_packets: 0 rx_1024_to_1518_byte_packets: 0 rx_1518_to_max_byte_packets: 0 rx_too_long: 0 rx_fifo_overflow: 0 rx_jabber: 0 rx_fcs_error: 0 tx_64_byte_packets: 0 tx_65_to_127_byte_packets: 24 tx_128_to_255_byte_packets: 20 tx_256_to_511_byte_packets: 3 tx_512_to_1023_byte_packets: 0 tx_1024_to_1518_byte_packets: 0 tx_1519_to_max_byte_packets: 0 tx_fifo_underrun: 0 This is after a reboot, not when the card is crashed. root@regis:/home/regis# dmesg | grep sky2 sky2 v1.10 addr 0xcfefc000 irq 19 Yukon-EC (0xb6) rev 2 sky2 eth0: addr 00:15:f2:7e:74:45 sky2 eth0: enabling interface sky2 eth0: ram buffer 48K sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both I am using hardware flowcontrol if its on by default. The switch I am using is a Netgear GS108. ethtool -S eth0 This is before the error has occured. I will post the output with the error as soon as it happens. NIC statistics: tx_bytes: 486559 rx_bytes: 5341832 tx_broadcast: 35 rx_broadcast: 3 tx_multicast: 0 rx_multicast: 0 tx_unicast: 6246 rx_unicast: 3852 tx_mac_pause: 0 rx_mac_pause: 0 collisions: 0 late_collision: 0 aborted: 0 single_collisions: 0 multi_collisions: 0 rx_short: 0 rx_runt: 0 rx_64_byte_packets: 28 rx_65_to_127_byte_packets: 185 rx_128_to_255_byte_packets: 98 rx_256_to_511_byte_packets: 26 rx_512_to_1023_byte_packets: 62 rx_1024_to_1518_byte_packets: 3456 rx_1518_to_max_byte_packets: 0 rx_too_long: 0 rx_fifo_overflow: 0 rx_jabber: 0 rx_fcs_error: 0 tx_64_byte_packets: 35 tx_65_to_127_byte_packets: 6127 tx_128_to_255_byte_packets: 43 tx_256_to_511_byte_packets: 21 tx_512_to_1023_byte_packets: 53 tx_1024_to_1518_byte_packets: 2 tx_1519_to_max_byte_packets: 0 tx_fifo_underrun: 0 dmesg | grep sky2 sky2 v1.10 addr 0xcdefc000 irq 17 Yukon-EC (0xb6) rev 2 sky2 eth0: addr 00:17:31:84:3f:28 sky2 eth0: enabling interface sky2 eth0: ram buffer 48K sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both Just got the error again: Mar 13 23:05:49 elite sky2 eth0: tx timeout Mar 13 23:05:49 elite sky2 eth0: transmit ring 59 .. 37 report=60 done=60 Mar 13 23:05:49 elite sky2 status report lost? Mar 13 23:05:54 elite NETDEV WATCHDOG: eth0: transmit timed out Mar 13 23:05:54 elite sky2 eth0: tx timeout Mar 13 23:05:54 elite sky2 eth0: transmit ring 60 .. 37 report=60 done=60 Mar 13 23:05:54 elite sky2 hardware hung? flushing and so on..... ethtool -S eth0 with the error NIC statistics: tx_bytes: 1694409496 rx_bytes: 2089054273 tx_broadcast: 49 rx_broadcast: 56 tx_multicast: 4 rx_multicast: 0 tx_unicast: 2228657 rx_unicast: 2296615 tx_mac_pause: 1 rx_mac_pause: 0 collisions: 0 late_collision: 0 aborted: 0 single_collisions: 0 multi_collisions: 0 rx_short: 0 rx_runt: 0 rx_64_byte_packets: 340370 rx_65_to_127_byte_packets: 376059 rx_128_to_255_byte_packets: 38899 rx_256_to_511_byte_packets: 44021 rx_512_to_1023_byte_packets: 274974 rx_1024_to_1518_byte_packets: 1222348 rx_1518_to_max_byte_packets: 0 rx_too_long: 0 rx_fifo_overflow: 0 rx_jabber: 0 rx_fcs_error: 0 tx_64_byte_packets: 401437 tx_65_to_127_byte_packets: 624384 tx_128_to_255_byte_packets: 83117 tx_256_to_511_byte_packets: 18825 tx_512_to_1023_byte_packets: 52777 tx_1024_to_1518_byte_packets: 1048171 tx_1519_to_max_byte_packets: 0 tx_fifo_underrun: 0 I have the same problem when using the sky2 driver, though with the latest .21 rc it seems to be able to fix itself. It's annoying when streaming things though :) cat /proc/interrupts CPU0 CPU1 0: 113 0 IO-APIC-edge timer 1: 2 0 IO-APIC-edge i8042 6: 5 0 IO-APIC-edge floppy 8: 30 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12: 4 0 IO-APIC-edge i8042 14: 29828 0 IO-APIC-edge ide0 16: 221084 0 IO-APIC-fasteoi libata, fglrx 17: 202 0 IO-APIC-fasteoi uhci_hcd:usb5, HDA Intel 18: 0 0 IO-APIC-fasteoi libata, uhci_hcd:usb3 21: 1610499 34072 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 22: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 23: 3 0 IO-APIC-fasteoi ohci1394 216: 1 0 PCI-MSI-edge eth1 217: 380888 2773219 PCI-MSI-edge eth0 218: 51615 290181 PCI-MSI-edge libata NMI: 0 0 LOC: 743870 743841 ERR: 0 MIS: 0 dmesg|grep sky2 [ 55.730269] sky2 0000:04:00.0: v1.13 addr 0xff8fc000 irq 17 Yukon-EC (0xb6) rev 2 [ 55.730449] sky2 eth0: addr 00:17:31:ee:d8:46 [ 55.730486] sky2 0000:03:00.0: v1.13 addr 0xff7fc000 irq 16 Yukon-EC (0xb6) rev 2 [ 55.730634] sky2 eth1: addr 00:17:31:ee:d2:ce [ 55.744293] sky2 eth0: enabling interface [ 55.746101] sky2 eth0: ram buffer 48K [ 57.484538] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both [ 70.143440] Modules linked in: fglrx agpgart cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative video sbs dock i2c_ec button battery container ac asus_acpi iptable_filter ip_tables x_tables ipv6 xfs nls_utf8 ntfs w83627ehf eeprom i2c_isa i2c_i801 fuse sbp2 parport_pc lp parport tsdev snd_hda_intel snd_hda_codec dvb_usb_vp7045 dvb_usb snd_pcm_oss snd_mixer_oss dvb_core dvb_pll psmouse snd_pcm snd_timer sg usbhid i2c_core 8139cp 8139too mii sky2 snd soundcore serio_raw rng_core snd_page_alloc shpchp pci_hotplug pcspkr floppy evdev ext3 jbd mbcache raid456 md_mod xor ohci1394 ieee1394 uhci_hcd ehci_hcd usbcore ide_generic jmicron ide_cd cdrom ide_disk piix generic sd_mod thermal processor fan [ 1267.324218] sky2 eth0: tx timeout [ 1267.324222] sky2 eth0: transmit ring 345 .. 322 report=350 done=350 [ 1267.324228] sky2 eth0: disabling interface [ 1267.324702] sky2 eth0: enabling interface [ 1267.326645] sky2 eth0: ram buffer 48K [ 1269.009555] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both [ 2323.600088] sky2 eth0: tx timeout [ 2323.600092] sky2 eth0: transmit ring 109 .. 86 report=114 done=114 [ 2323.600098] sky2 eth0: disabling interface [ 2323.600596] sky2 eth0: enabling interface [ 2323.602540] sky2 eth0: ram buffer 48K [ 2325.353570] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both lspci 00:00.0 Host bridge: Intel Corporation 975X Express Memory Controller Hub (rev c0) 00:01.0 PCI bridge: Intel Corporation 975X Express PCI Express Root Port (rev c0) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01) 00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02) 02:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) 04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) 06:00.0 VGA compatible controller: ATI Technologies Inc Unknown device 7249 06:00.1 Display controller: ATI Technologies Inc Unknown device 7269 Motherboard is ASUS P5H DH Deluxe. Seems pretty reproducible if i upload large amounts of data. Can give shell if needed to help solve this issue. I can do upload's all weekend with same motherboard. What is the kernel version you are using? What are the statistics (ethtool -S eth0)? What kind of switch? *** Bug 8091 has been marked as a duplicate of this bug. *** *** Bug 8324 has been marked as a duplicate of this bug. *** Just want to add: still happens on 2.6.21.1. 3 times today. syslog says: May 6 00:07:17 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out May 6 00:07:17 burner kernel: sky2 mv0: tx timeout May 6 00:07:17 burner kernel: sky2 mv0: transmit ring 252 .. 229 report=252 don e=252 May 6 00:07:17 burner kernel: sky2 hardware hung? flushing May 6 00:15:42 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out May 6 00:15:42 burner kernel: sky2 mv0: tx timeout May 6 00:15:42 burner kernel: sky2 mv0: transmit ring 229 .. 206 report=252 don e=252 May 6 00:15:42 burner kernel: sky2 status report lost? May 6 00:16:07 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out May 6 00:16:07 burner kernel: sky2 mv0: tx timeout May 6 00:16:07 burner kernel: sky2 mv0: transmit ring 252 .. 229 report=252 don e=252 May 6 00:16:07 burner kernel: sky2 hardware hung? flushing And here it did not come back after getting this in the logs: May 6 01:22:13 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out May 6 01:22:13 burner kernel: sky2 mv0: tx timeout May 6 01:22:13 burner kernel: sky2 mv0: transmit ring 453 .. 430 report=453 done=453 May 6 01:22:13 burner kernel: sky2 mv0: disabling interface May 6 01:22:13 burner kernel: sky2 mv0: enabling interface May 6 01:22:13 burner kernel: sky2 mv0: ram buffer 48K May 6 01:22:16 burner kernel: sky2 mv0: Link is up at 1000 Mbps, full duplex, flow control both At this point the network is plain dead. Not pingable from outside, ifconfig mv0 down and up does not change anything. Using another (until now unused) NIC (rl8169) works immediately, so it's 'only' the sky2 driver. lspci: 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) /proc/interrupts CPU0 CPU1 0: 85 0 IO-APIC-edge timer 1: 0 2 IO-APIC-edge i8042 6: 2 1 IO-APIC-edge floppy 9: 0 0 IO-APIC-fasteoi acpi 12: 0 4 IO-APIC-edge i8042 14: 269957 121746 IO-APIC-edge ide0 15: 49 13 IO-APIC-edge ide1 18: 665606 403627 IO-APIC-fasteoi mv0 20: 0 2 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb7 21: 3 4 IO-APIC-fasteoi ehci_hcd:usb2, ohci1394 22: 0 19 IO-APIC-fasteoi ohci_hcd:usb3 23: 270673 1 IO-APIC-fasteoi ohci_hcd:usb4, eth2 24: 118010 1032119 IO-APIC-fasteoi ohci_hcd:usb5, ohci_hcd:usb6, ohci1394 25: 9 1 IO-APIC-fasteoi ul0 26: 0 0 IO-APIC-fasteoi ALi M5455 NMI: 0 0 LOC: 1772837 1772837 ERR: 0 MIS: 0 CPU is dual core Athlon 3800+, kernel is a freshly compiled 2.6.21.1. If there is anything I can do to get more debug output, someone let me please know! *** Bug 8453 has been marked as a duplicate of this bug. *** Here's a script which I run (nohup script &> /dev/null &) to avoid losing a connection: #! /bin/sh export LANG=C IP=192.168.1.200 LOG=/var/log/sky2 LOGDEBUG=/var/log/sky2.debug debug() { date '+%Y-%m-%d %H:%M:%S' ping -c5 $IP 2>&1 | grep transmitted ethtool eth1 2>&1 | egrep "Speed|detected" } while :; do ping -c3 $IP &> /dev/null RESULT=$? if [ "$RESULT" != "0" ]; then date '+%Y-%m-%d %H:%M:%S' >> $LOG debug >> $LOGDEBUG /sbin/ifdown eth0 rmmod sky2 /sbin/ifup eth0 fi sleep 60 done *** Bug 7579 has been marked as a duplicate of this bug. *** I have just done some testing with 2.6.22-rc4 on one of the machines I have with sky2 NICs. This one has an Asus P5W DH motherboard with two of these NICs integrated. lspci: 02:00.0 0200: 11ab:4362 (rev 20) 03:00.0 0200: 11ab:4362 (rev 20) 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 16 bytes Interrupt: pin A routed to IRQ 218 Region 0: Memory at fa8fc000 (64-bit, non-prefetchable) [size=16K] Region 2: I/O ports at a800 [size=256] Expansion ROM at fa8c0000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+ Address: 00000000fee0300c Data: 4172 Capabilities: [e0] Express Legacy Endpoint IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0 Link: Latency L0s <256ns, L1 unlimited Link: ASPM Disabled RCB 128 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x1 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 16 bytes Interrupt: pin A routed to IRQ 219 Region 0: Memory at fa9fc000 (64-bit, non-prefetchable) [size=16K] Region 2: I/O ports at b800 [size=256] Expansion ROM at fa9c0000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+ Address: 00000000fee0300c Data: 416a Capabilities: [e0] Express Legacy Endpoint IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0 Link: Latency L0s <256ns, L1 unlimited Link: ASPM Disabled RCB 128 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x1 dmesg: ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 19 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:03:00.0 to 64 sky2 0000:03:00.0: v1.14 addr 0xfa9fc000 irq 17 Yukon-EC (0xb6) rev 2 sky2 eth1: addr 00:18:f3:e0:46:f0 ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:02:00.0 to 64 sky2 0000:02:00.0: v1.14 addr 0xfa8fc000 irq 16 Yukon-EC (0xb6) rev 2 sky2 eth2: addr 00:18:f3:e0:22:42 ... sky2 eth1: enabling interface sky2 eth1: ram buffer 48K ADDRCONF(NETDEV_UP): eth1: link is not ready sky2 eth2: enabling interface sky2 eth2: ram buffer 48K ADDRCONF(NETDEV_UP): eth2: link is not ready sky2 eth2: Link is up at 1000 Mbps, full duplex, flow control both ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready eth2: no IPv6 routers present What I did was running these two commands simultaneously: $ ssh 192.168.1.1 "cat /dev/urandom" >/dev/null and $ cat /dev/urandom | ssh 192.168.1.1 "cat >/dev/null" And sure enough, pretty soon (once after about an hour, once after about 20 minutes) it stopped working. The ping I had running looked like this: 64 bytes from 192.168.1.1: icmp_seq=786 ttl=64 time=0.225 ms 64 bytes from 192.168.1.1: icmp_seq=787 ttl=64 time=0.270 ms ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available But nothing new appeared in dmesg. Stopping all the above commands did not make it recover, only unloading and reloading the sky2 module helped. The last test showed you have a different problem. You ran out of memory on the system. Sorry, that isn't a driver problem. I cannot agree with comment 44, because it was said earlier: > Stopping all the above commands did not make it recover, only unloading and reloading the sky2 module helped. I still wonder if this bug is so easily triggerable and reproducible, why the patch is still missing? I have to agree that it MAY be a different problem than what I have observed otherwise (at those times getting the "hw csum" errors in dmesg). I don't however understand why running those commands would cause the machine to run out of memory. (It's not a machine with low memory, it has 2 GB, so it's not right on the edge to start with.) I personally think the reason why it ran out of buffer space was because the sky2 driver stopped working properly. (The memory problem being an effect, not the cause.) The problem I mentioned above only happens when using the onboard sky2 rather than the e1000 PCI card that I normally use (to avoid the sky2 driver problems). FWIW, the current sk98lin driver version from Marvell (v10.0.5.3) with kernel 2.6.21.3 seems to be a working setup with the above mentioned card. (Their driver does not seem to compile out of the box with 2.6.22-rc4, which is why I switched kernel versions to get a quick test done with that.) I don't know if that can be of any use for tracking down the issues. (It implies that the hardware is ok, if nothing else.) Dmesg output with that driver for comparison: ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 19 (level, low) -> IRQ 17 sk98lin: Network Device Driver v10.0.5.3 (C)Copyright 1999-2007 Marvell(R). PCI: Setting latency timer of device 0000:03:00.0 to 64 eth1: Marvell Yukon 88E8053 Gigabit Ethernet Controller PrefPort:A RlmtMode:Check Link State ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:02:00.0 to 64 eth2: Marvell Yukon 88E8053 Gigabit Ethernet Controller PrefPort:A RlmtMode:Check Link State ... eth2: network connection up using port A speed: 1000 autonegotiation: yes duplex mode: full flowctrl: symmetric role: slave irq moderation: disabled tcp offload: enabled scatter-gather: enabled tx-checksum: enabled rx-checksum: enabled rx-polling: enabled ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready eth2: no IPv6 routers present Created attachment 11880 [details]
NAPI poll fix patch against 2.6.22-rc4 or later.
Recheck for more status in NAPI poll
I have been running some tests with 2.6.22-rc6 + the patch above (id=11880) on one of my machines and have yet to get sky2 to stop working the way it previously did. I'll need some further testing to be sure, but the initial results seem promising. I, unfortunately have to retract my previous statement. Just minutes after posting that, I got the same kind of behavior as before, where sky2 stopped working, leaving rmmod sky2, modprobe sky2 as the only way to get it working again. Created attachment 11984 [details]
missed IRQ workaround
This patch uses existing idle timeout to poll for list IRQ's. It works around all known stress test failures.
Great, Stephen! It makes sense too, as 2.6.16 was the last working version (at least for me). Will you send this patch to the stable team? |