Bug 7647

Summary: sky2 88E8053 is going to down if it has big traffic
Product: Drivers Reporter: Ing. Petr Dvoracek (petr)
Component: NetworkAssignee: Stephen Hemminger (stephen)
Status: REJECTED DUPLICATE    
Severity: high CC: genneth, peter, stephen
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.19 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Ethtool dump of eth0 after error

Description Ing. Petr Dvoracek 2006-12-06 23:37:01 UTC
Most recent kernel where this bug did *NOT* occur:
2.6.15 with yukon official drivers

Distribution: gentoo

Hardware Environment:
00:00.0 Host bridge: Intel Corporation 945G/GZ/P/PL Express Memory Controller
Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 945G/GZ Express Integrated
Graphics Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2
(rev 01)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express
Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express
Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
(rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) Serial ATA
Storage Controller IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:03.0 Mass storage controller: Integrated Technology Express, Inc. ITE 8211F
Single Channel UDMA 133 (ASUS 8211 (ITE IT8212 ATA RAID Controller)) (rev 11)
05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 19)

Software Environment:
gentoo
mysql
proftpd

Problem Description:
If is big traffic with small packets, network adapter 88E8053 PCI-E is going to
down. I must restart system.


Steps to reproduce:
Send very many small packets throught the network adapter.
Comment 1 Gen Zhang 2006-12-11 08:01:01 UTC
I'm on a Mac Mini -- this seems to have started happening only after a recent
firmware update from Apple to EFI version 1.1, which contained various changes
to the BIOS emulation layer (not sure if that's relevant here).

Distribution: Fedora Core 6

lspci output:
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and 945GT
Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS/940GML
Express Integrated Graphics Controller (rev 03)
00:07.0 Performance counters: Intel Corporation Unknown device 27a3 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition
Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2
(rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge
(rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial ATA
Storage Controller IDE (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 22)
02:00.0 Ethernet controller: Atheros Communications, Inc. Unknown device 001c
(rev 01)
03:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61)

In detail:
01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 22)
        Subsystem: Marvell Technology Group Ltd. Marvell RDK-8053
        Flags: bus master, fast devsel, latency 0, IRQ 233
        Memory at 90200000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at 1000 [size=256]
        Expansion ROM at 50000000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+
        Capabilities: [e0] Express Legacy Endpoint IRQ 0

Comment 2 Stephen Hemminger 2006-12-14 11:28:37 UTC
These are the things you need to debug a sky2 related problem.

1) What is exact kernel version in use?  This is important because
   problems get fixed but it can be a long while until the fix bubbles down
   to the vendor kernels.

2) What is the chip version?  The driver prints this out on boot up in
   the console log.   (dmesg | grep sky2)
   This matters because each chip version has different
   bugs to deal with.

3) Does it work with the vendor driver?
   The vendor driver does a number of things differently than the sky2 driver
   and can mask problems, but if it doesn't work as well that is a useful
   data point.  If you want to know why the sky2 driver was written instead
   of just using the vendor driver, look at the code. The sk98lin driver
   is huge, includes features that are unsupportable and broken, and locking
   mistakes.  But the sk98lin also has a watchdog that masks off bugs and
   may provide useful insight.

4) What is the IRQ routing?
   There are two issues here, first the driver will never work with edge
   trigger IRQ's, some motherboards also have busted BIOS and chipsets
   that don't do MSI properly. A couple of module parameters are available
   to help:
      disable_msi=1   		avoids using MSI
      idle_timeout=10		polls for lost IRQ's every 10ms

5) What are the messages in the console log when problem happens?

6) Are you running any of the following: bonding, vlans, bridging,
   netfilter, traffic control?

7) Please get a current version of ethtool from:
   git://git.kernel.org/pub/scm/network/ethtool/ethtool.git
   and run ethtool register dump after a problem occurs:
      ethtool -d eth0

Comment 3 Gen Zhang 2006-12-15 10:21:19 UTC
1) Fedora Core 6, 2.6.18-1.2849.fc6. It is based off 2.6.18.2, but with a few
patches. I believe that the FC people tend to try and cooperate with upstream?

2) Relevant line from dmesg | grep sky2:
sky2 v1.5 addr 0x90200000 irq 177 Yukon-EC (0xb6) rev 2

3) I'm on a mac mini -- and I don't know what the vendor driver is. I do not
have a sk98lin driver on my system.

4) Have changed the module loading parameters -- still can trigger.

5) No dmesg output. The first time that it occurs (from a fresh boot) usually
complains of either an rx error or a tx error. Not very helpful, unfortunately.
Is there any chance of a more verbose output?

6) None.

7) See attachment below.

Notes: I've found that using GTK2 applications over an ssh tunnel seems to be
able to reliably trigger the error.
Comment 4 Gen Zhang 2006-12-15 10:23:54 UTC
Created attachment 9830 [details]
Ethtool dump of eth0 after error
Comment 5 Gen Zhang 2007-01-09 02:38:23 UTC
This is still present in FC6 kernel 2.6.18-1.2869, which is based on 2.6.18.6
Comment 6 Alexander Grimm 2007-01-21 07:54:08 UTC
i have the same problem in Opensuse 10.2 on Gigabyte DQ6 mainboard with Marvell
88E8053 onboard. unload and reload the driver the module fix the problem

uname -a:
Linux jupiter 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 i686 i686
i386 GNU/Linux
lspci:
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82P965/G965 PCI Express Root Port (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller
(rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1
(rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5
(rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6
(rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface
Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port
SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc RV530 [Radeon X1600]
01:00.1 Display controller: ATI Technologies Inc RV530 [Radeon X1600] (Secondary)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 22)
04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
05:00.0 Multimedia video controller: Internext Compression Inc iTVC16 (CX23416)
MPEG-2 Encoder (rev 01)
05:01.0 Multimedia controller: Philips Semiconductors SAA7146 (rev 01)
05:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000
Controller (PHY/Link)

i tried the module  parameters  but they didnt help:
disable_msi=1   		avoids using MSI
idle_timeout=10		polls for lost IRQ's every 10ms

i set sky 2 to max debug. the output before the error in /var/log/messages:
Jan 21 16:10:26 jupiter kernel: eth0: tx done 224
Jan 21 16:10:26 jupiter kernel: eth0: tx done 226
Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 158 status 0x3c0100 len 60
Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 230, len 4434
Jan 21 16:10:26 jupiter kernel: eth0: tx done 229
Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 159 status 0x3c0100 len 60
Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 233, len 4434
Jan 21 16:10:26 jupiter kernel: eth0: tx done 232
Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 160 status 0x3c0100 len 60
Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 236, len 5894
Jan 21 16:10:26 jupiter kernel: eth0: tx done 235
Jan 21 16:10:26 jupiter kernel: sky2 eth0: rx slot 161 status 0x3c0100 len 60
Jan 21 16:10:26 jupiter kernel: eth0: tx queued, slot 240, len 5894
Jan 21 16:10:26 jupiter kernel: eth0: tx done 239
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 243, len 1506
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 245, len 1226
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 248, len 1514
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 250, len 1462
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 253, len 1514
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 256, len 86
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 258, len 1506
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 260, len 1514
Jan 21 16:10:27 jupiter kernel: eth0: tx queued, slot 262, len 1462
Jan 21 16:10:28 jupiter kernel: eth0: tx queued, slot 265, len 1506
Jan 21 16:10:28 jupiter kernel: eth0: tx queued, slot 267, len 1514
Jan 21 16:10:28 jupiter kernel: eth0: tx queued, slot 269, len 86
Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 271, len 1462
Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 274, len 60
Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 276, len 60
Jan 21 16:10:29 jupiter kernel: eth0: tx queued, slot 278, len 1506
Jan 21 16:10:30 jupiter kernel: eth0: tx queued, slot 280, len 1514
Jan 21 16:10:30 jupiter kernel: eth0: tx queued, slot 282, len 86
Jan 21 16:10:31 jupiter kernel: eth0: tx queued, slot 284, len 75
Jan 21 16:10:32 jupiter kernel: eth0: tx queued, slot 286, len 1462
Jan 21 16:10:33 jupiter kernel: eth0: tx queued, slot 290, len 1506
Jan 21 16:10:33 jupiter kernel: eth0: tx queued, slot 292, len 1514
Jan 21 16:10:34 jupiter kernel: eth0: tx queued, slot 294, len 86
Jan 21 16:10:36 jupiter kernel: eth0: tx queued, slot 296, len 75
Jan 21 16:10:36 jupiter kernel: eth0: tx queued, slot 298, len 75
Jan 21 16:10:37 jupiter kernel: eth0: tx queued, slot 299, len 66
Jan 21 16:10:37 jupiter kernel: eth0: tx queued, slot 302, len 66
Jan 21 16:10:38 jupiter kernel: eth0: tx queued, slot 304, len 1462
Jan 21 16:10:40 jupiter kernel: eth0: tx queued, slot 307, len 1506
Jan 21 16:10:40 jupiter kernel: eth0: tx queued, slot 309, len 1514
Jan 21 16:10:41 jupiter kernel: eth0: tx queued, slot 311, len 79
Jan 21 16:10:41 jupiter kernel: eth0: tx queued, slot 313, len 75
Jan 21 16:10:42 jupiter kernel: eth0: tx queued, slot 314, len 86
Jan 21 16:10:46 jupiter kernel: eth0: tx queued, slot 317, len 79
Jan 21 16:10:46 jupiter kernel: eth0: tx queued, slot 319, len 79
Jan 21 16:10:47 jupiter kernel: eth0: tx queued, slot 320, len 75
Jan 21 16:10:50 jupiter kernel: eth0: tx queued, slot 321, len 74
Jan 21 16:10:50 jupiter kernel: eth0: tx queued, slot 322, len 98
Jan 21 16:10:51 jupiter kernel: eth0: tx queued, slot 323, len 1462
Jan 21 16:10:51 jupiter kernel: eth0: tx queued, slot 327, len 75
Jan 21 16:10:52 jupiter kernel: eth0: tx queued, slot 329, len 75
Jan 21 16:10:52 jupiter kernel: eth0: tx queued, slot 330, len 75
Jan 21 16:10:53 jupiter kernel: eth0: tx queued, slot 331, len 74
Jan 21 16:10:53 jupiter kernel: eth0: tx queued, slot 332, len 1506
Jan 21 16:10:54 jupiter kernel: eth0: tx queued, slot 335, len 66
Jan 21 16:10:54 jupiter kernel: eth0: tx queued, slot 337, len 1514
Jan 21 16:10:55 jupiter kernel: eth0: tx queued, slot 339, len 66
Jan 21 16:10:55 jupiter kernel: eth0: tx queued, slot 340, len 77
Jan 21 16:10:55 jupiter kernel: eth0: tx queued, slot 342, len 66

jupiter:~ # dmesg|grep sky2
sky2 v1.10 addr 0xe9000000 irq 169 Yukon-EC (0xb6) rev 2
sky2 eth0: addr 00:16:e6:84:7b:57
sky2 eth0: enabling interface
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control none
Comment 7 Peter Kerwien 2007-02-11 03:16:23 UTC
I have the same problem with Marvell 88E8053 sky2 driver when running Gentoo
amd64 with a vanilla 2.6.20 kernel.
Comment 8 Stephen Hemminger 2007-02-13 11:40:59 UTC
This looks like the same problem as another bug. Transmit timeout
followed by crash.

*** This bug has been marked as a duplicate of 7546 ***