Bug 7647
Summary: | sky2 88E8053 is going to down if it has big traffic | ||
---|---|---|---|
Product: | Drivers | Reporter: | Ing. Petr Dvoracek (petr) |
Component: | Network | Assignee: | Stephen Hemminger (stephen) |
Status: | REJECTED DUPLICATE | ||
Severity: | high | CC: | genneth, peter, stephen |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.19 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | Ethtool dump of eth0 after error |
Description
Ing. Petr Dvoracek
2006-12-06 23:37:01 UTC
I'm on a Mac Mini -- this seems to have started happening only after a recent firmware update from Apple to EFI version 1.1, which contained various changes to the BIOS emulation layer (not sure if that's relevant here). Distribution: Fedora Core 6 lspci output: 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT Express Memory Controller Hub (rev 03) 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03) 00:07.0 Performance counters: Intel Corporation Unknown device 27a3 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 02) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02) 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA Storage Controller IDE (rev 02) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02) 01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22) 02:00.0 Ethernet controller: Atheros Communications, Inc. Unknown device 001c (rev 01) 03:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61) In detail: 01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22) Subsystem: Marvell Technology Group Ltd. Marvell RDK-8053 Flags: bus master, fast devsel, latency 0, IRQ 233 Memory at 90200000 (64-bit, non-prefetchable) [size=16K] I/O ports at 1000 [size=256] Expansion ROM at 50000000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [e0] Express Legacy Endpoint IRQ 0 These are the things you need to debug a sky2 related problem. 1) What is exact kernel version in use? This is important because problems get fixed but it can be a long while until the fix bubbles down to the vendor kernels. 2) What is the chip version? The driver prints this out on boot up in the console log. (dmesg | grep sky2) This matters because each chip version has different bugs to deal with. 3) Does it work with the vendor driver? The vendor driver does a number of things differently than the sky2 driver and can mask problems, but if it doesn't work as well that is a useful data point. If you want to know why the sky2 driver was written instead of just using the vendor driver, look at the code. The sk98lin driver is huge, includes features that are unsupportable and broken, and locking mistakes. But the sk98lin also has a watchdog that masks off bugs and may provide useful insight. 4) What is the IRQ routing? There are two issues here, first the driver will never work with edge trigger IRQ's, some motherboards also have busted BIOS and chipsets that don't do MSI properly. A couple of module parameters are available to help: disable_msi=1 avoids using MSI idle_timeout=10 polls for lost IRQ's every 10ms 5) What are the messages in the console log when problem happens? 6) Are you running any of the following: bonding, vlans, bridging, netfilter, traffic control? 7) Please get a current version of ethtool from: git://git.kernel.org/pub/scm/network/ethtool/ethtool.git and run ethtool register dump after a problem occurs: ethtool -d eth0 1) Fedora Core 6, 2.6.18-1.2849.fc6. It is based off 2.6.18.2, but with a few patches. I believe that the FC people tend to try and cooperate with upstream? 2) Relevant line from dmesg | grep sky2: sky2 v1.5 addr 0x90200000 irq 177 Yukon-EC (0xb6) rev 2 3) I'm on a mac mini -- and I don't know what the vendor driver is. I do not have a sk98lin driver on my system. 4) Have changed the module loading parameters -- still can trigger. 5) No dmesg output. The first time that it occurs (from a fresh boot) usually complains of either an rx error or a tx error. Not very helpful, unfortunately. Is there any chance of a more verbose output? 6) None. 7) See attachment below. Notes: I've found that using GTK2 applications over an ssh tunnel seems to be able to reliably trigger the error. Created attachment 9830 [details]
Ethtool dump of eth0 after error
This is still present in FC6 kernel 2.6.18-1.2869, which is based on 2.6.18.6 i have the same problem in Opensuse 10.2 on Gigabyte DQ6 mainboard with Marvell 88E8053 onboard. unload and reload the driver the module fix the problem uname -a: Linux jupiter 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 i686 i686 i386 GNU/Linux lspci: 00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02) 00:01.0 PCI bridge: Intel Corporation 82P965/G965 PCI Express Root Port (rev 02) 00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #4 (rev 02) 00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #5 (rev 02) 00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI #2 (rev 02) 00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02) 00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02) 00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2) 00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA AHCI Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02) 01:00.0 VGA compatible controller: ATI Technologies Inc RV530 [Radeon X1600] 01:00.1 Display controller: ATI Technologies Inc RV530 [Radeon X1600] (Secondary) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22) 04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02) 04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02) 05:00.0 Multimedia video controller: Internext Compression Inc iTVC16 (CX23416) MPEG-2 Encoder (rev 01) 05:01.0 Multimedia controller: Philips Semiconductors SAA7146 (rev 01) 05:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link) i tried the module parameters but they didnt help: disable_msi=1 avoids using MSI idle_timeout=10 polls for lost IRQ's every 10ms i set sky 2 to max debug. the output before the error in /var/log/messages: Jan 21 16:10:26 jupiter kernel: eth0: tx done 224 Jan 21 16:10:26 jupiter kernel: eth0: tx done 226 Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 158 status 0x3c0100 len 60 Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 230, len 4434 Jan 21 16:10:26 jupiter kernel: eth0: tx done 229 Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 159 status 0x3c0100 len 60 Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 233, len 4434 Jan 21 16:10:26 jupiter kernel: eth0: tx done 232 Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 160 status 0x3c0100 len 60 Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 236, len 5894 Jan 21 16:10:26 jupiter kernel: eth0: tx done 235 Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 161 status 0x3c0100 len 60 Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 240, len 5894 Jan 21 16:10:26 jupiter kernel: eth0: tx done 239 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 243, len 1506 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 245, len 1226 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 248, len 1514 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 250, len 1462 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 253, len 1514 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 256, len 86 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 258, len 1506 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 260, len 1514 Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 262, len 1462 Jan 21 16:10:28 jupiter kernel: eth0: tx queued, slot 265, len 1506 Jan 21 16:10:28 jupiter kernel: eth0: tx queued, slot 267, len 1514 Jan 21 16:10:28 jupiter kernel: eth0: tx queued, slot 269, len 86 Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 271, len 1462 Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 274, len 60 Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 276, len 60 Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 278, len 1506 Jan 21 16:10:30 jupiter kernel: eth0: tx queued, slot 280, len 1514 Jan 21 16:10:30 jupiter kernel: eth0: tx queued, slot 282, len 86 Jan 21 16:10:31 jupiter kernel: eth0: tx queued, slot 284, len 75 Jan 21 16:10:32 jupiter kernel: eth0: tx queued, slot 286, len 1462 Jan 21 16:10:33 jupiter kernel: eth0: tx queued, slot 290, len 1506 Jan 21 16:10:33 jupiter kernel: eth0: tx queued, slot 292, len 1514 Jan 21 16:10:34 jupiter kernel: eth0: tx queued, slot 294, len 86 Jan 21 16:10:36 jupiter kernel: eth0: tx queued, slot 296, len 75 Jan 21 16:10:36 jupiter kernel: eth0: tx queued, slot 298, len 75 Jan 21 16:10:37 jupiter kernel: eth0: tx queued, slot 299, len 66 Jan 21 16:10:37 jupiter kernel: eth0: tx queued, slot 302, len 66 Jan 21 16:10:38 jupiter kernel: eth0: tx queued, slot 304, len 1462 Jan 21 16:10:40 jupiter kernel: eth0: tx queued, slot 307, len 1506 Jan 21 16:10:40 jupiter kernel: eth0: tx queued, slot 309, len 1514 Jan 21 16:10:41 jupiter kernel: eth0: tx queued, slot 311, len 79 Jan 21 16:10:41 jupiter kernel: eth0: tx queued, slot 313, len 75 Jan 21 16:10:42 jupiter kernel: eth0: tx queued, slot 314, len 86 Jan 21 16:10:46 jupiter kernel: eth0: tx queued, slot 317, len 79 Jan 21 16:10:46 jupiter kernel: eth0: tx queued, slot 319, len 79 Jan 21 16:10:47 jupiter kernel: eth0: tx queued, slot 320, len 75 Jan 21 16:10:50 jupiter kernel: eth0: tx queued, slot 321, len 74 Jan 21 16:10:50 jupiter kernel: eth0: tx queued, slot 322, len 98 Jan 21 16:10:51 jupiter kernel: eth0: tx queued, slot 323, len 1462 Jan 21 16:10:51 jupiter kernel: eth0: tx queued, slot 327, len 75 Jan 21 16:10:52 jupiter kernel: eth0: tx queued, slot 329, len 75 Jan 21 16:10:52 jupiter kernel: eth0: tx queued, slot 330, len 75 Jan 21 16:10:53 jupiter kernel: eth0: tx queued, slot 331, len 74 Jan 21 16:10:53 jupiter kernel: eth0: tx queued, slot 332, len 1506 Jan 21 16:10:54 jupiter kernel: eth0: tx queued, slot 335, len 66 Jan 21 16:10:54 jupiter kernel: eth0: tx queued, slot 337, len 1514 Jan 21 16:10:55 jupiter kernel: eth0: tx queued, slot 339, len 66 Jan 21 16:10:55 jupiter kernel: eth0: tx queued, slot 340, len 77 Jan 21 16:10:55 jupiter kernel: eth0: tx queued, slot 342, len 66 jupiter:~ # dmesg|grep sky2 sky2 v1.10 addr 0xe9000000 irq 169 Yukon-EC (0xb6) rev 2 sky2 eth0: addr 00:16:e6:84:7b:57 sky2 eth0: enabling interface sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control none I have the same problem with Marvell 88E8053 sky2 driver when running Gentoo amd64 with a vanilla 2.6.20 kernel. |