Bug 7617 - sky2 driver crashes
Summary: sky2 driver crashes
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-12-02 00:22 UTC by Ziga Mlinar
Modified: 2007-12-07 13:57 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.19-rc6-mm2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Ziga Mlinar 2006-12-02 00:22:44 UTC
Most recent kernel where this bug did *NOT* occur: do not remember any more
Distribution: Gentoo
Hardware Environment: amd64, lspci follows at the end of post
Software Environment: gcc version 4.1.1 (Gentoo 4.1.1-r3)
Problem Description: 

sky2 driver starts ok. After some time or under heavy load it crashes. 
Sometimes I can use 

rmmod sky2
modprobe sky2

to bring it back, but this only works maybe a few times. Usually network 
becomes slow first, then dies. I have two network nices marvell (sky2) and 
nvidia (forcedeth). The problems occur if marvell is for internet and nvidia 
for local network AND vice versa.
I tried compiling kernel with and without Optimize for size. Network dies in 
any situation. 

I was putting a lot of hope for this kernels (2.6.19-*) from reading the change 
logs. Good work guys, but my card still doesn't work.

Here are the outputs from lspci 

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 19)
        Subsystem: Giga-byte Technology Marvell 88E8053 Gigabit Ethernet 
Controller (Gigabyte)
        Flags: fast devsel, IRQ 17
        Memory at f1000000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at b000 [size=256]
        [virtual] Expansion ROM at 50000000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 
Enable-
        Capabilities: [e0] Express Legacy Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting

00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
        Subsystem: Giga-byte Technology GA-K8N Ultra-9 Mainboard
        Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 20
        Memory at f2101000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at e400 [size=8]
        Capabilities: [44] Power Management version 2




Steps to reproduce: Turn on network. Dies sooner under heavy load (rsync, 
bittorrent), especially on both NICes.
Comment 1 Ziga Mlinar 2006-12-02 00:34:50 UTC
System works afterwards, only network doesen't. Sometimes both NICes die, 
sometimes Marvell only, sometimes Marvel and Nvidia slows down.
Comment 2 Ziga Mlinar 2006-12-02 00:46:50 UTC
As long as I remember It didn't work with any kernel I ever tried. From 
gentoo-sources, any mm, and other experimental kernels (many mm-based).

With 2.6.19-rc6-mm2 it looked promissing at the beginning, but after a day it 
is the same. I just cant get marvel to work ok. I'm writing this with nvidia 
(forcedeth) only. Ater network crashed I couldnt get on again. I needed a hard 
reset of the machine (or two) to get forcedeth to work again.

Sorry for two aditions. I remembered to add data afterwards.

Ziga
Comment 3 Stephen Hemminger 2006-12-04 12:13:24 UTC
Your problem is a duplicate of earlier bug.
It occurs only on the 88e8053 version of the chip.
I don't have that hardware to debug/fix the problem so resolution will be slow.
You might try the vendor driver; but it has other problems


*** This bug has been marked as a duplicate of 7579 ***
Comment 4 Matthew 2007-09-22 16:19:01 UTC
since my behavior resembles this bug the post I'll post it here:
(hope the CC was right)

I just encountered a hang with 2.6.23-rc7 & sky2:

hardware: Asus P5W DH Deluxe, 2 Gigabit Lan 8053 chipset of the Marvell Yukon2 lan-adaptor, one port is connected to a linksys WRT54GL router with dd-wrt
internet connection-speed: 8 MBit/s up, 0.5 MBit/s down
kernel: 2.6.23-rc7, gcc-4.2.1 hardened, GNU/Gentoo hardened x86 (32bit)

03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20)
        Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus)
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at ebcfc000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at a800 [size=256]
        Expansion ROM at ebcc0000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable-
        Capabilities: [e0] Express Legacy Endpoint IRQ 0

04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20)
        Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus)
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at ebdfc000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at b800 [size=256]
        Expansion ROM at ebdc0000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable-
        Capabilities: [e0] Express Legacy Endpoint IRQ 0


steps taken:
ethtool -s eth4 autoneg off speed 100
ethtool -s eth4 wol d
(like recommended on http://www.lesswatts.org/tips/index.php to save some energy ;) )

surfed some minutes through the net, started MultiGet (downloadmanager), 
added http://ftp.utcluj.ro/pub/rofreesbie/devel/RoFreeSBIE-1.3_rc4.iso , surfed some more through the net (with constantly 7 MBit/s downloading), then wanted to download some documentation: http://www.rofreesbie.org/GUIDE_USER.pdf#
BAM! everything stood still (network, no ping possible, even not to router, ...)

here the output (I didn't find it at once so hopefully there's nothing missing ;) ):

cat /sys/kernel/debug/sky2/eth4 
IRQ src=0 mask=c000001d control=0
Status ring (empty)
Tx ring pending=498...68 report=498 done=498
498: 0x3640908e(66)
499: 0x3640988e(66)
500: 0x363c64be(66) frag=0x343cd246(680)
502: 0x364094be(66) frag=0x343cd246(680)
504: 0x3630c48e(66)
505: 0x3630ca8e(66)
506: 0x36a2c68e(66)
507: 0x363c6c8e(66)
508: 0x363c6a8e(66)
509: 0x366eee8e(66)
510: 0x366a8e86(74)
511: 0x35dd9486(74)
0: 0x36708286(74)
1: 0x37091e8e(66)
2: 0x37091c8e(66)
3: 0x377fe28e(66)
4: 0x35dd968e(66)
5: 0x37f2c68e(66)
6: 0x37f2ca8e(66)
7: 0x35dd908e(66)
8: 0x366b288e(66)
9: 0x3670808e(66)
10: 0x36a02c8e(66)
11: 0x3670888e(66)
12: 0x36310e8e(66)
13: 0x36025a8e(66)
14: 0x3673508e(66)
15: 0x36a0248e(66)
16: 0x366a808e(66)
17: 0x3630ae9a(54)
18: 0x3630a29a(54)
19: csum=0x220028 0x352ebc02(73)
21: 0x3630a802(42)
22: 0x3630a002(42)
23: 0x36116002(42)
24: 0x352eba02(73)
25: 0x352eb402(42)
26: 0x352eb002(42)
27: 0x352ebe02(42)
28: 0x352eb802(42)
29: 0x352eb602(42)
30: 0x36116602(42)
31: 0x358a2a02(42)
32: 0x352eb202(42)
33: 0x33cc2e02(42)
34: 0x33cc2002(42)
35: 0x33cc2402(42)
36: 0x33cc2802(42)
37: 0x33cc2602(42)
38: 0x33cc2202(42)
39: 0x33cc2c02(42)
40: 0x33cda202(42)
41: 0x33cdac02(42)
42: 0x33cda602(42)
43: 0x33cdaa02(42)
44: 0x33cda402(42)
45: 0x36a02a02(42)
46: 0x33cda802(42)
47: 0x33cdae02(42)
48: 0x358a2802(42)
49: 0x33c7f202(42)
50: 0x33c7f402(42)
51: 0x33c7fc02(42)
52: 0x33c7fe02(42)
53: 0x33c7f002(42)
54: 0x33c7f802(42)
55: 0x33c7f602(42)
56: 0x3513d602(42)
57: 0x3513dc02(42)
58: 0x3513d202(42)
59: 0x3513d402(42)
60: 0x3513d802(42)
61: 0x3513d002(42)
62: 0x33c7fa02(42)
63: 0x33cda002(42)
64: 0x34154e02(42)
65: 0x34154002(42)
66: 0x34154c02(42)
67: 0x33aefc02(42)

Rx ring hw get=828 put=988 last=1023

after that I made some pinging to 192.168.1.1 and started to reach some sites, but no reaction:

cat /sys/kernel/debug/sky2/eth4 
IRQ src=0 mask=c000001d control=0
Status ring (empty)
Tx ring pending=498...145 report=498 done=498
498: 0x3640908e(66)
499: 0x3640988e(66)
500: 0x363c64be(66) frag=0x343cd246(680)
502: 0x364094be(66) frag=0x343cd246(680)
504: 0x3630c48e(66)
505: 0x3630ca8e(66)
506: 0x36a2c68e(66)
507: 0x363c6c8e(66)
508: 0x363c6a8e(66)
509: 0x366eee8e(66)
510: 0x366a8e86(74)
511: 0x35dd9486(74)
0: 0x36708286(74)
1: 0x37091e8e(66)
2: 0x37091c8e(66)
3: 0x377fe28e(66)
4: 0x35dd968e(66)
5: 0x37f2c68e(66)
6: 0x37f2ca8e(66)
7: 0x35dd908e(66)
8: 0x366b288e(66)
9: 0x3670808e(66)
10: 0x36a02c8e(66)
11: 0x3670888e(66)
12: 0x36310e8e(66)
13: 0x36025a8e(66)
14: 0x3673508e(66)
15: 0x36a0248e(66)
16: 0x366a808e(66)
17: 0x3630ae9a(54)
18: 0x3630a29a(54)
19: csum=0x220028 0x352ebc02(73)
21: 0x3630a802(42)
22: 0x3630a002(42)
23: 0x36116002(42)
24: 0x352eba02(73)
25: 0x352eb402(42)
26: 0x352eb002(42)
27: 0x352ebe02(42)
28: 0x352eb802(42)
29: 0x352eb602(42)
30: 0x36116602(42)
31: 0x358a2a02(42)
32: 0x352eb202(42)
33: 0x33cc2e02(42)
34: 0x33cc2002(42)
35: 0x33cc2402(42)
36: 0x33cc2802(42)
37: 0x33cc2602(42)
38: 0x33cc2202(42)
39: 0x33cc2c02(42)
40: 0x33cda202(42)
41: 0x33cdac02(42)
42: 0x33cda602(42)
43: 0x33cdaa02(42)
44: 0x33cda402(42)
45: 0x36a02a02(42)
46: 0x33cda802(42)
47: 0x33cdae02(42)
48: 0x358a2802(42)
49: 0x33c7f202(42)
50: 0x33c7f402(42)
51: 0x33c7fc02(42)
52: 0x33c7fe02(42)
53: 0x33c7f002(42)
54: 0x33c7f802(42)
55: 0x33c7f602(42)
56: 0x3513d602(42)
57: 0x3513dc02(42)
58: 0x3513d202(42)
59: 0x3513d402(42)
60: 0x3513d802(42)
61: 0x3513d002(42)
62: 0x33c7fa02(42)
63: 0x33cda002(42)
64: 0x34154e02(42)
65: 0x34154002(42)
66: 0x34154c02(42)
67: 0x33aefc02(42)
68: 0x33aef802(42)
69: 0x33aef602(42)
70: 0x33aefa02(42)
71: 0x34154602(42)
72: 0x34154202(42)
73: 0x33aef202(42)
74: 0x33aefe02(42)
75: 0x3630a602(42)
76: 0x33d97202(42)
77: 0x34154802(42)
78: 0x33d97802(42)
79: 0x33ade202(42)
80: 0x33ade002(42)
81: 0x33ade802(42)
82: 0x33adee02(42)
83: 0x3340a402(42)
84: 0x3340a802(42)
85: 0x33465a02(42)
86: 0x33465402(42)
87: 0x33465602(42)
88: 0x33465e02(42)
89: 0x3350a602(42)
90: 0x3340a202(42)
91: 0x33465802(42)
92: 0x3350a402(42)
93: 0x3350ac02(42)
94: 0x33414402(42)
95: 0x33414202(42)
96: 0x33465002(42)
97: 0x3350a002(42)
98: 0x335c3402(42)
99: 0x335c3e02(42)
100: 0x33792002(42)
101: 0x33646602(42)
102: 0x33646a02(42)
103: 0x3306d802(42)
104: 0x3306da02(42)
105: 0x3306d002(42)
106: 0x330aaa02(42)
107: 0x330aae02(42)
108: 0x330aa002(42)
109: 0x33646e02(42)
110: 0x3350ae02(42)
111: 0x330aac02(42)
112: 0x33792202(42)
113: 0x330dac02(42)
114: 0x330da202(42)
115: 0x330dae02(42)
116: 0x330da802(42)
117: 0x330da402(42)
118: 0x330ed802(42)
119: 0x330ed202(42)
120: 0x330ede02(42)
121: 0x330e9c02(42)
122: 0x330e9002(42)
123: 0x330cf802(42)
124: 0x330eda02(42)
125: 0x330edc02(42)
126: 0x335c3c02(42)
127: 0x330da002(42)
128: 0x330daa02(42)
129: 0x335c3802(42)
130: 0x330ed602(42)
131: 0x3306d402(42)
132: 0x330e9202(42)
133: 0x330e9e02(42)
134: 0x330e9802(42)
135: 0x330da602(42)
136: 0x330e9402(42)
137: 0x330ed002(42)
138: 0x330cfa02(42)
139: 0x330cf402(42)
140: 0x33646202(42)
141: 0x33792c02(42)
142: 0x33112a02(42)
143: 0x330cfc02(42)
144: 0x330e9a02(42)

Rx ring hw get=828 put=988 last=1023


after that: ifconfig eth4 down 
waited some time, then ifconfig eth4 up



lexa mat # ping -c 3 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
From 192.168.1.140 icmp_seq=2 Destination Host Unreachable
From 192.168.1.140 icmp_seq=3 Destination Host Unreachable

--- 192.168.1.1 ping statistics ---
3 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1999ms
, pipe 2
lexa mat # cat /sys/kernel/debug/sky2/eth4 
IRQ src=0 mask=c000001d control=0
Status ring (empty)
Tx ring pending=21...21 report=21 done=21

Rx ring hw get=60 put=169 last=1023


didn't help ;(


after modprobe -r sky2 && modprobe sky2 
it is working again
Comment 5 Stephen Hemminger 2007-09-22 17:09:40 UTC
Don't turn flow control off!  Some of the chip versions (EC/XL) have a hardware bug where if the receive fifo gets full it gets stuck... If you have hardware flow control, the FIFO should never get full. Also, if you turn off flow control and the other side sends a flow control packet the chip might stop as well.
Comment 6 Matthew 2007-09-22 17:20:13 UTC
ok, acknowledged, sorry - all my bad - this time

it nevertheless still hangs from time to time but in those cases

simple ifconfig up & down does the trick

thanks for that fast reply, btw, Stephen =) 

is it still advisable to append: pci=nomsi 
during bootup ?
Comment 7 Stephen Hemminger 2007-10-29 22:58:50 UTC
MSI works fine if the underlying BIOS and hardware isn't broken.
All chipsets by now should either work or MSI is automatically disabled
via the PCI quirk table (for example AMD Opteron PCI-X chipsets have
MSI marked broken).
Comment 8 Stephen Hemminger 2007-12-07 13:57:22 UTC
This should be at least managed by the receive hang detection and recovery
logic in 2.6.23 (and later kernels). Reopen the bug if it still occurs.

Note You need to log in before you can comment on or make changes to this bug.