Bug 7546

Summary: sky2 transmit timed out and finally kernel crash, marvell 88E8053
Product: Networking Reporter: Regis Damongeot (regis.damongeot)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: high CC: aros, bryce, bunk, erik, int, jerome.venturi, lindqvist, lkmlist, mathias.behrle, mvalsasna, peter, petr, pkdawson, roy.franz, serge, slavon.net, stefan, strerror, tony, ulrich
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.19-rc5 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel configuration for my system with sky2 driver
Kernel config for 2.6.19-rc6 (egon2003)
Kernel config 2.6.19-rc6 with sky2
Full log for sky2 (cat /var/log/messages | grep sky2)
turn flow control off
errorlog from 2.6.20.2
NAPI poll fix patch against 2.6.22-rc4 or later.
missed IRQ workaround

Description Regis Damongeot 2006-11-18 01:28:07 UTC
Most recent kernel where this bug did *NOT* occur: none, I always had this 
problem with the sky2 driver
Distribution: Slamd64
Hardware Environment: Marvell 88E8053 Gigabit Ethernet Controller (rev 19)
Software Environment:
Problem Description:
When there's is a lot of transfer activity on the card, after some time (~ 
1-2days) I get theses errors:
-in syslog:
Nov 15 07:57:35 regis kernel: sky2 eth0: tx timeout
-in "message" log file:
Nov 15 07:57:35 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 15 07:57:35 regis kernel: sky2 hardware hung? flushing

And the kernel seems to be amost crashed. Almost because the mouse cursor can 
move but I can't do anything with the keyboard and even if I try to quit X with 
the mouse it doesnt work. Only a hard reset (reboot) works.
Last kernels seems to be better with this issue than before, one night with 
this card working and sky2 driver and the kernel was crashed (2.6.18 and 
before).

Steps to reproduce: use the card to download with bittorrent for 2days with 
100kB/s traffic.
Comment 1 Jerome Venturi 2006-11-25 18:15:46 UTC
Hi,

Kernel version : 2.6.19-rc6
Distribution: Gentoo 2006.1
Hardware Environment : Marvell Technology Group Ltd. 88E8053 Gigabit Ethernet
Controller (rev 20)


I've got exactly the same problem as described
I'm not downloading throught bittorent but throught Newsgroups
Comment 2 Jerome Venturi 2006-11-25 19:37:10 UTC
Just freeze :s
Here is a log from syslog-ng :

Nov 26 04:24:21 Gentoo-LiNuX sky2 eth1: tx timeout
Nov 26 04:24:21 Gentoo-LiNuX sky2 eth1: transmit ring 278 .. 255 report=278 done=278
Nov 26 04:24:21 Gentoo-LiNuX sky2 hardware hung? flushing

Then can't do anything elese .. no keyboard, no mouse, no thing --> hard reset
Comment 3 egon2003 2006-11-27 09:11:02 UTC
Kernel 2.6.19-rc6
Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 19)
Distribution: Gentoo

I also get hangs pretty often. Usually within the hour when running bitorrent.
My internetconnection is 10MB full duplex.
From the log:

Nov 27 17:22:53 elite sky2 v1.10 addr 0xcdefc000 irq 17 Yukon-EC (0xb6) rev 2
Nov 27 17:22:53 elite sky2 eth0: addr 00:17:31:84:3f:28
Nov 27 17:22:55 elite sky2 eth0: enabling interface
Nov 27 17:22:58 elite sky2 eth0: Link is up at 1000 Mbps, full duplex, flow
control both
Nov 27 17:57:23 elite NETDEV WATCHDOG: eth0: transmit timed out
Nov 27 17:57:23 elite sky2 eth0: tx timeout
Nov 27 17:57:23 elite sky2 eth0: transmit ring 287 .. 264 report=287 done=287
Nov 27 17:57:23 elite sky2 hardware hung? flushing
Nov 27 17:57:32 elite BUG: soft lockup detected on CPU#0!
Nov 27 17:57:32 elite [<c013c520>]  [<c0121422>]  [<c010d1be>]  [<c0103613>] 
[<c0394f52>]  [<f8f57d8f>]  [<f8f59ef5>]  [<c0354631>]  [<c03546ad>] 
[<c012137c>]  [<c011d804>]  [<c01052fd>]  [<c012143f>]  [<c010d1c3>] 
[<c0103613>]  [<c0100fb2>]  [<c0100fce>]  [<c0101a35>]  [<c0437795>] 
[<c04371e0>]  =======================
Comment 4 Stephen Hemminger 2006-11-27 21:05:36 UTC
I need to know, based on past experience.

IRQ routing:
  cat /proc/interrupts

Kernel config
Hardware config (lspci) and if possible motherboard vendor

Full kernel log of sky2 messages
   dmesg | grep sky2
Comment 5 egon2003 2006-11-28 10:39:58 UTC
cat /proc/interrupts

           CPU0       CPU1
  0:       1202     846636   IO-APIC-edge      timer
  1:       4342          0   IO-APIC-edge      i8042
  7:          0          0   IO-APIC-edge      parport0
  8:          2          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 10:          0          0   IO-APIC-edge      MPU401 UART
 12:      72434      86935   IO-APIC-edge      i8042
 14:     150366          0   IO-APIC-edge      ide0
 16:      98606     793965   IO-APIC-fasteoi   nvidia
 17:    2758017    2732033   IO-APIC-fasteoi   uhci_hcd:usb5, eth0
 18:     111945          0   IO-APIC-fasteoi   EMU10K1
 19:      35978      56133   IO-APIC-fasteoi   libata, ohci1394
 20:          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 22:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
NMI:          0          0
LOC:     837057     837056
ERR:          0
MIS:          0

Motherboard is a ASUS P5LD2

lspci:

00:00.0 Host bridge: Intel Corporation 945G/GZ/P/PL Express Memory Controller
Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 945G/GZ/P/PL Express PCI Express Root Port
(rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4
(rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
(rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA
Storage Controller AHCI (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:02.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
01:02.1 Input device controller: Creative Labs SB Audigy Game Port (rev 04)
01:02.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 19)
04:00.0 VGA compatible controller: nVidia Corporation GeForce 7900 GT (rev a1)


dmesg |grep sky2

sky2 v1.10 addr 0xcdefc000 irq 17 Yukon-EC (0xb6) rev 2
sky2 eth0: addr 00:17:31:84:3f:28
sky2 eth0: enabling interface
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both

adding kernel config as attachment
Comment 6 Regis Damongeot 2006-11-28 10:41:06 UTC
Created attachment 9642 [details]
Kernel configuration for my system with sky2 driver
Comment 7 egon2003 2006-11-28 10:41:46 UTC
Created attachment 9643 [details]
Kernel config for 2.6.19-rc6 (egon2003)
Comment 8 Regis Damongeot 2006-11-28 10:52:45 UTC
cat /proc/interrupts
           CPU0       CPU1
  0:  216807783          0   IO-APIC-edge      timer
  1:          2          0   IO-APIC-edge      i8042
  4:     193261          0   IO-APIC-edge      serial
  6:          3          0   IO-APIC-edge      floppy
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          3          0   IO-APIC-edge      i8042
 14:    1462014          0   IO-APIC-edge      ide0
 16:   74813181          0   IO-APIC-fasteoi   radeon@pci:0000:04:00.0
 17:     125492          0   IO-APIC-fasteoi   uhci_hcd:usb3
 18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 19:   38077760          0   IO-APIC-fasteoi   eth0, uhci_hcd:usb5, HDA Intel
 20:   13920035          0   IO-APIC-fasteoi   ide4, ide5, ehci_hcd:usb1, 
uhci_hcd:usb2
 21:  407053844          0   IO-APIC-fasteoi   eth1
 22:   13752946          0   IO-APIC-fasteoi   ide2, ide3
 23:    6091017          0   IO-APIC-fasteoi   libata
NMI:     150612     123474
LOC:  554264226  554264167
ERR:          0


/sbin/lspci
00:00.0 Host bridge: Intel Corporation 945G/P Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 945G/P PCI Express Graphics Port (rev 02)
00:1b.0 Class 0403: Intel Corporation 82801G (ICH7 Family) High Definition 
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 
(rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 
01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 
01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 
01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 
01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface 
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller 
(rev 01)
00:1f.2 Class 0106: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA 
Storage Controllers cc=AHCI (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL-8139/8139C/8139C+ (rev 10)
01:02.0 Mass storage controller: Promise Technology, Inc. PDC20268 (Ultra100 
TX2) (rev 02)
01:03.0 Mass storage controller: Integrated Technology Express, Inc. ITE 8211F 
Single Channel UDMA 133 (ASUS 8211 (ITE IT8212 ATA RAID Controller)) (rev 11)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit 
Ethernet Controller (rev 19)
04:00.0 VGA compatible controller: ATI Technologies Inc RV380 0x3e50 [Radeon 
X600]
04:00.1 Display controller: ATI Technologies Inc RV380 [Radeon X600] Secondary


My motherboard is an Asus P5LD2 (pcb v1) with an integrated Marvell 88E8053 
Gigabit Ethernet Controller.

grep sky2 /var/log/syslog.2
Nov 15 07:57:35 regis kernel: sky2 eth0: tx timeout

(syslog.2 contains boot and crash of the kernel)
Comment 9 Jerome Venturi 2006-11-28 14:20:31 UTC
cat /proc/interrupts
           CPU0       CPU1
  0:      16717     485853   IO-APIC-edge      timer
  1:        130          0   IO-APIC-edge      i8042
  6:          5          0   IO-APIC-edge      floppy
  8:         17          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:      93653          0   IO-APIC-edge      i8042
 14:       5145          0   IO-APIC-edge      libata
 15:      34255          0   IO-APIC-edge      libata
 16:          3          0   IO-APIC-fasteoi   ohci1394
 17:      51994          0   IO-APIC-fasteoi   eth1, nvidia
 18:        658          0   IO-APIC-fasteoi   uhci_hcd:usb5, HDA Intel, eth0
 19:       5532          0   IO-APIC-fasteoi   aic7xxx
 20:         66          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 22:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 23:       8656          0   IO-APIC-fasteoi   EMU10K1
NMI:          0          0
LOC:     502400     502904
ERR:          0
MIS:          0

Mother Board is : Asus P5W DH Deluxe

lspci :

00:00.0 Host bridge: Intel Corporation Memory Controller Hub (rev c0)
00:01.0 PCI bridge: Intel Corporation PCI Express Graphics Port (rev c0)
00:1b.0 Class 0403: Intel Corporation 82801G (ICH7 Family) High Definition Audio
Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express
Port 5 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface
Bridge (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) Serial ATA
Storage Controllers cc=IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:01.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
01:01.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07)
01:02.0 SCSI storage controller: Adaptec AIC-7892B U160/m (rev 02)
01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit
Ethernet Controller (rev 20)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit
Ethernet Controller (rev 20)
05:00.0 VGA compatible controller: nVidia Corporation Unknown device 0391 (rev a1)
Comment 10 Jerome Venturi 2006-11-28 14:23:01 UTC
Created attachment 9645 [details]
Kernel config 2.6.19-rc6 with sky2
Comment 11 Jerome Venturi 2006-11-28 14:29:24 UTC
Created attachment 9646 [details]
Full log for sky2 (cat /var/log/messages | grep sky2)
Comment 12 Stephen Hemminger 2006-11-28 14:31:25 UTC
Please reproduce the problem without the proprietary nvidia module.
If it can not be reproduced then please close.
Comment 13 egon2003 2006-11-29 09:40:25 UTC
I have removed the proprietary nvidia module and I have the same problem.
Systemcrash after about an hour of running bittorrent with heavy traffic.
Comment 14 egon2003 2006-11-29 12:49:28 UTC
oops, i still had the module loaded, but i wanst using it. Trying without it now...
Comment 15 egon2003 2006-11-29 13:30:12 UTC
I still get hangs without the proprietary nvidia module.

Nov 29 22:26:07 elite NETDEV WATCHDOG: eth0: transmit timed out
Nov 29 22:26:07 elite sky2 eth0: tx timeout
Nov 29 22:26:07 elite sky2 eth0: transmit ring 177 .. 156 report=177 done=177
Nov 29 22:26:07 elite sky2 hardware hung? flushing
Nov 29 22:26:16 elite BUG: soft lockup detected on CPU#0!
Nov 29 22:26:16 elite [<c013c520>]  [<c0121422>]  [<c010d1be>]  [<c0103613>] 
[<c011007b>]  [<c0394f52>]  [<f8f6ed8f>]  [<f8f70ef5>]  [<c0354631>] 
[<c03546ad>]  [<c012137c>]  [<c011d804>]  [<c01052fd>]  [<c012143f>] 
[<c010d1c3>]  [<c0103613>]  [<c0100fb2>]  [<c0100fce>]  [<c0101a35>] 
[<c0437795>]  [<c04371e0>]  =======================
Comment 16 Jerome Venturi 2006-11-29 14:14:27 UTC
Yes, without the nvidia module, the problem is still here

Another step to reproduce (a bit harder) :

Download throught newsgroup on eth0 and burn an dvd throught eth1 (cifs) then
freeze ...
Both eth0 and eth1 are using sky2 driver
Comment 17 Jerome Venturi 2006-12-03 11:58:31 UTC
Just compil 2.6.19-git3 and I still get this problem
Comment 18 Stephen Hemminger 2006-12-14 11:13:19 UTC
*** Bug 7670 has been marked as a duplicate of this bug. ***
Comment 19 egon2003 2006-12-22 00:24:35 UTC
I have for the last 4 days been using a add in nic with a realtek 8139 chip.
I have not had one hang during this time, bittorrent has been running when the
computer has been running.
Comment 20 Stephen Hemminger 2007-02-05 09:39:04 UTC
Transmit flow control seems to cause hardware problem.
I can reproduce a hang with Tx flow control, but if I turn off flow
control, it won't hang (at least after 48hrs its okay)....

I sent a patch to disable flow control by default.
Comment 21 Stephen Hemminger 2007-02-05 09:42:10 UTC
Created attachment 10292 [details]
turn flow control off
Comment 22 egon2003 2007-02-06 11:53:31 UTC
Compiled 2.6.20-gentoo with the patch below. I still get hangs.
From the log below (there are lots more of the BUG: soft lockup)


Feb  6 19:06:01 elite NETDEV WATCHDOG: eth0: transmit timed out
Feb  6 19:06:01 elite sky2 eth0: tx timeout
Feb  6 19:06:01 elite sky2 eth0: transmit ring 125 .. 102 report=125 done=125
Feb  6 19:06:01 elite sky2 hardware hung? flushing
Feb  6 19:06:10 elite BUG: soft lockup detected on CPU#0!
Feb  6 19:06:10 elite [<c0138165>]  [<c011f0b3>]  [<c010bc7e>]  [<c0103434>] 
[<c036710a>]  [<f9e61c46>]  [<f9e63b7e>]  [<c032a77b>]  [<c032a7ee$
Feb  6 19:06:20 elite BUG: soft lockup detected on CPU#0!
Feb  6 19:06:20 elite [<c0138165>]  [<c011f0b3>]  [<c010bc7e>]  [<c0103434>] 
[<c036710a>]  [<f9e61c46>]  [<f9e63b7e>]  [<c032a77b>]  [<c032a7ee$
Feb  6 19:06:30 elite BUG: soft lockup detected on CPU#0!
Feb  6 19:06:30 elite [<c0138165>]  [<c011f0b3>]  [<c010bc7e>]  [<c0103434>] 
[<c036710d>]  [<f9e61c46>]  [<f9e63b7e>]  [<c032a77b>]  [<c032a7ee$
Feb  6 19:06:40 elite BUG: soft lockup detected on CPU#0!
Feb  6 19:06:40 elite [<c0138165>]  [<c011f0b3>]  [<c010bc7e>]  [<c0103434>] 
[<c0367108>]  [<f9e61c46>]  [<f9e63b7e>]  [<c032a77b>]  [<c032a7ee$
Feb  6 19:06:50 elite BUG: soft lockup detected on CPU#0!
Feb  6 19:06:50 elite [<c0138165>]  [<c011f0b3>]  [<c010bc7e>]  [<c0103434>] 
[<c036710a>]  [<f9e61c46>]  [<f9e63b7e>]  [<c032a77b>]  [<c032a7ee$
Feb  6 19:07:00 elite BUG: soft lockup detected on CPU#0!
Feb  6 19:07:00 elite [<c0138165>]  [<c011f0b3>]  [<c010bc7e>]  [<c0103434>] 
[<c036710d>]  [<f9e61c46>]  [<f9e63b7e>]  [<c032a77b>]  [<c032a7ee$
Feb  6 19:07:10 elite BUG: soft lockup detected on CPU#0!
Feb  6 19:07:10 elite [<c0138165>]  [<c011f0b3>]  [<c010bc7e>]  [<c0103434>] 
[<c036710d>]  [<f9e61c46>]  [<f9e63b7e>]  [<c032a77b>]  [<c032a7ee$
Feb  6 19:07:20 elite BUG: soft lockup detected on CPU#0!
Comment 23 Regis Damongeot 2007-02-10 01:45:19 UTC
It still crash for me too with 2.6.20 and flow control off.

syslog:
Feb 10 01:16:42 regis kernel: sky2 eth0: tx timeout

/var/log/message:
Feb 10 01:16:42 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out
Feb 10 01:16:42 regis kernel: sky2 hardware hung? flushing

After that, keyboard is useless, mouse works but all app seems to crash (can't 
leave X either, kde crash when leaving X and all is frozen then).
Comment 24 Stephen Hemminger 2007-02-13 11:41:07 UTC
*** Bug 7647 has been marked as a duplicate of this bug. ***
Comment 25 Bjoern Olausson 2007-02-18 12:47:49 UTC
Same problem here.

Even on minimal load my machine hangs... mouse still works but ha hardreset is 
required.

Currently I am testing the workaround mentioned here:
http://bugzilla.kernel.org/show_bug.cgi?id=6839#c12
(ethtool -A eth0 autoneg off rx on tx on)

Some infos I can contribute:

I got a few of these softlocks I actually dont know if they belong to the sky2 
problem. When I got these softlocks the pc freezes (mouse works) but after a 
minute or so the system is responsible again.

BUG: soft lockup detected on CPU#0! 

Call Trace: 
<IRQ> [<ffffffff80254ba1>] softlockup_tick+0xdb/0xed 
[<ffffffff8023b6d4>] update_process_times+0x42/0x68 
[<ffffffff80217c62>] smp_local_timer_interrupt+0x34/0x55 
[<ffffffff80218329>] smp_apic_timer_interrupt+0x52/0x6a 
[<ffffffff8020a136>] apic_timer_interrupt+0x66/0x70 
[<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 
[<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 
[<ffffffff8025639e>] handle_edge_irq+0xec/0x132 
[<ffffffff8020bb34>] do_IRQ+0x100/0x15c 
[<ffffffff80208964>] mwait_idle+0x0/0x48 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
<EOI> [<ffffffff802089a6>] mwait_idle+0x42/0x48 
[<ffffffff802088fc>] cpu_idle+0x8b/0xae 
[<ffffffff807a06d8>] start_kernel+0x218/0x21d 
[<ffffffff807a015c>] _sinittext+0x15c/0x160 

irq 18: nobody cared (try booting with the "irqpoll" option) 

Call Trace: 
<IRQ> [<ffffffff80255837>] __report_bad_irq+0x30/0x7d 
[<ffffffff80255a65>] note_interrupt+0x1e1/0x224 
[<ffffffff8025628b>] handle_fasteoi_irq+0x9e/0xc5 
[<ffffffff8020bb34>] do_IRQ+0x100/0x15c 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
[<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 
[<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 
[<ffffffff8025639e>] handle_edge_irq+0xec/0x132 
[<ffffffff8020bb34>] do_IRQ+0x100/0x15c 
[<ffffffff80208964>] mwait_idle+0x0/0x48 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
<EOI> [<ffffffff802089a6>] mwait_idle+0x42/0x48 
[<ffffffff802088fc>] cpu_idle+0x8b/0xae 
[<ffffffff807a06d8>] start_kernel+0x218/0x21d 
[<ffffffff807a015c>] _sinittext+0x15c/0x160 

handlers: 
[<ffffffff8048137c>] (usb_hcd_irq+0x0/0x52) 
Disabling IRQ #18

ANd another one:

BUG: soft lockup detected on CPU#0! 

Call Trace: 
<IRQ> [<ffffffff80254ba1>] softlockup_tick+0xdb/0xed 
[<ffffffff8023b6d4>] update_process_times+0x42/0x68 
[<ffffffff80217c62>] smp_local_timer_interrupt+0x34/0x55 
[<ffffffff80218329>] smp_apic_timer_interrupt+0x52/0x6a 
[<ffffffff8020a136>] apic_timer_interrupt+0x66/0x70 
[<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 
[<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 
[<ffffffff8025639e>] handle_edge_irq+0xec/0x132 
[<ffffffff8020bb34>] do_IRQ+0x100/0x15c 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
[<ffffffff8044f8e1>] ata_scsi_qc_complete+0x0/0x20f 
[<ffffffff80237a09>] __do_softirq+0x4a/0xc3 
[<ffffffff80219c84>] ack_apic_level+0x33/0x47 
[<ffffffff8020a68c>] call_softirq+0x1c/0x28 
[<ffffffff8020b9e3>] do_softirq+0x2c/0x7d 
[<ffffffff8020bb6d>] do_IRQ+0x139/0x15c 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
<EOI> [<ffffffff8056b8e8>] thread_return+0x56/0xed 
[<ffffffff80208964>] mwait_idle+0x0/0x48 
[<ffffffff8020891d>] cpu_idle+0xac/0xae 
[<ffffffff807a06d8>] start_kernel+0x218/0x21d 
[<ffffffff807a015c>] _sinittext+0x15c/0x160 

irq 1272: nobody cared (try booting with the "irqpoll" option) 

Call Trace: 
<IRQ> [<ffffffff80255837>] __report_bad_irq+0x30/0x7d 
[<ffffffff80255a65>] note_interrupt+0x1e1/0x224 
[<ffffffff802563b3>] handle_edge_irq+0x101/0x132 
[<ffffffff8020bb34>] do_IRQ+0x100/0x15c 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
[<ffffffff8044f8e1>] ata_scsi_qc_complete+0x0/0x20f 
[<ffffffff80237a09>] __do_softirq+0x4a/0xc3 
[<ffffffff80219c84>] ack_apic_level+0x33/0x47 
[<ffffffff8020a68c>] call_softirq+0x1c/0x28 
[<ffffffff8020b9e3>] do_softirq+0x2c/0x7d 
[<ffffffff8020bb6d>] do_IRQ+0x139/0x15c 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
<EOI> [<ffffffff8056b8e8>] thread_return+0x56/0xed 
[<ffffffff80208964>] mwait_idle+0x0/0x48 
[<ffffffff8020891d>] cpu_idle+0xac/0xae 
[<ffffffff807a06d8>] start_kernel+0x218/0x21d 
[<ffffffff807a015c>] _sinittext+0x15c/0x160 

handlers: 
[<ffffffff80454f9d>] (ahci_interrupt+0x0/0x3cd) 
Disabling IRQ #1272 
operapluginwrap[8324]: segfault at 00000000000004d1 rip 00000000f7d919f6 rsp 
00000000ffd0b2a0 error 4 
BUG: soft lockup detected on CPU#0! 

Call Trace: 
<IRQ> [<ffffffff80254ba1>] softlockup_tick+0xdb/0xed 
[<ffffffff8023b6d4>] update_process_times+0x42/0x68 
[<ffffffff80217c62>] smp_local_timer_interrupt+0x34/0x55 
[<ffffffff80218329>] smp_apic_timer_interrupt+0x52/0x6a 
[<ffffffff8020a136>] apic_timer_interrupt+0x66/0x70 
[<ffffffff804bd474>] pci_conf1_read+0x0/0xc6 
[<ffffffff80254e58>] handle_IRQ_event+0x1a/0x53 
[<ffffffff8025639e>] handle_edge_irq+0xec/0x132 
[<ffffffff8020a68c>] call_softirq+0x1c/0x28 
[<ffffffff8020bb34>] do_IRQ+0x100/0x15c 
[<ffffffff80209a81>] ret_from_intr+0x0/0xa 
<EOI> [<ffffffff8056d433>] _spin_unlock_irqrestore+0x8/0x9 
[<ffffffff8044bb77>] ata_hsm_move+0x214/0x668 
[<ffffffff80244abe>] keventd_create_kthread+0x0/0x65 
[<ffffffff8044e585>] ata_pio_task+0x0/0xe9 
[<ffffffff8044e661>] ata_pio_task+0xdc/0xe9 
[<ffffffff80241754>] run_workqueue+0x8f/0x137 
[<ffffffff80242059>] worker_thread+0x0/0x14a 
[<ffffffff80244abe>] keventd_create_kthread+0x0/0x65 
[<ffffffff8024216d>] worker_thread+0x114/0x14a 
[<ffffffff8022d0d7>] default_wake_function+0x0/0xe 
[<ffffffff80244d14>] kthread+0xd1/0x101 
[<ffffffff8020a318>] child_rip+0xa/0x12 
[<ffffffff80244abe>] keventd_create_kthread+0x0/0x65 
[<ffffffff80244c43>] kthread+0x0/0x101 
[<ffffffff8020a30e>] child_rip+0x0/0x12

Infos about my installed Linux:

>>> cfg-update-1.8.0-r6: No new packages have been emerged, checksum index OK!
Portage 2.1.2-r9 (default-linux/amd64/2006.1, gcc-4.1.1, glibc-2.5-r0, 2.6.20-
gentoo x86_64)
=================================================================
System uname: 2.6.20-gentoo x86_64 Intel(R) Core(TM)2 CPU          6600  @ 
2.40GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Sat, 17 Feb 2007 16:50:01 +0000
dev-java/java-config: 1.3.7, 2.0.31
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r1
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/
shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/
vms/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-O2 -pipe"

I hope these infos are usefull.

Regards
Bjoern
Comment 26 Bjoern Olausson 2007-02-18 13:12:39 UTC
Okay, with the workaround I have no more _noticeable_ softlocks, but now I got 
the following:

Feb 18 21:02:58 freax sky2 eth0: Link is up at 100 Mbps, full duplex, flow 
control none
Feb 18 21:10:01 freax cron[8113]: (root) CMD (test -x /usr/sbin/run-crons && /
usr/sbin/run-crons )
Feb 18 21:11:53 freax BUG: soft lockup detected on CPU#0!
Feb 18 21:11:53 freax
Feb 18 21:11:53 freax Call Trace:
Feb 18 21:11:53 freax <IRQ>  [<ffffffff80255638>] softlockup_tick+0xda/0xf5
Feb 18 21:11:53 freax [<ffffffff8023b929>] update_process_times+0x42/0x68
Feb 18 21:11:53 freax [<ffffffff802173c5>] smp_local_timer_interrupt+0x34/0x52
Feb 18 21:11:53 freax [<ffffffff80217a78>] smp_apic_timer_interrupt+0x49/0x61
Feb 18 21:11:53 freax [<ffffffff8020a256>] apic_timer_interrupt+0x66/0x70
Feb 18 21:11:53 freax [<ffffffff802558f8>] handle_IRQ_event+0x1a/0x53
Feb 18 21:11:53 freax [<ffffffff80256a56>] handle_edge_irq+0xec/0x133
Feb 18 21:11:53 freax [<ffffffff8020a7ac>] call_softirq+0x1c/0x28
Feb 18 21:11:53 freax [<ffffffff8020bc7f>] do_IRQ+0xf7/0x150
Feb 18 21:11:53 freax [<ffffffff80209b71>] ret_from_intr+0x0/0xa
Feb 18 21:11:53 freax <EOI>  [<ffffffff80242598>] worker_thread+0x0/0x14a
Feb 18 21:11:53 freax [<ffffffff80570fed>] __sched_text_start+0x12d/0x9d6
Feb 18 21:11:53 freax [<ffffffff80229c4c>] __wake_up_common+0x3e/0x68
Feb 18 21:11:53 freax [<ffffffff8022a199>] __wake_up+0x38/0x4f
Feb 18 21:11:53 freax [<ffffffff805738c6>] _spin_unlock_irqrestore+0x16/0x31
Feb 18 21:11:53 freax [<ffffffff80242598>] worker_thread+0x0/0x14a
Feb 18 21:11:53 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a
Feb 18 21:11:53 freax [<ffffffff80242598>] worker_thread+0x0/0x14a
Feb 18 21:11:53 freax [<ffffffff8024267c>] worker_thread+0xe4/0x14a
Feb 18 21:11:53 freax [<ffffffff8022cb73>] default_wake_function+0x0/0xe
Feb 18 21:11:53 freax [<ffffffff802453bd>] kthread+0xd1/0x100
Feb 18 21:11:53 freax [<ffffffff8020a438>] child_rip+0xa/0x12
Feb 18 21:11:53 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a
Feb 18 21:11:53 freax [<ffffffff802452ec>] kthread+0x0/0x100
Feb 18 21:11:53 freax [<ffffffff8020a42e>] child_rip+0x0/0x12
Feb 18 21:11:53 freax
Feb 18 21:12:28 freax irq 18: nobody cared (try booting with the "irqpoll" 
option)
Feb 18 21:12:28 freax
Feb 18 21:12:28 freax Call Trace:
Feb 18 21:12:28 freax <IRQ>  [<ffffffff802562eb>] __report_bad_irq+0x30/0x7d
Feb 18 21:12:28 freax [<ffffffff802564ff>] note_interrupt+0x1c7/0x20c
Feb 18 21:12:28 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a
Feb 18 21:12:28 freax [<ffffffff80256b44>] handle_fasteoi_irq+0xa7/0xd0
Feb 18 21:12:28 freax [<ffffffff8020bc7f>] do_IRQ+0xf7/0x150
Feb 18 21:12:28 freax [<ffffffff80209b71>] ret_from_intr+0x0/0xa
Feb 18 21:12:28 freax [<ffffffff802558f8>] handle_IRQ_event+0x1a/0x53
Feb 18 21:12:28 freax [<ffffffff80256a56>] handle_edge_irq+0xec/0x133
Feb 18 21:12:28 freax [<ffffffff8020a7ac>] call_softirq+0x1c/0x28
Feb 18 21:12:28 freax [<ffffffff8020bc7f>] do_IRQ+0xf7/0x150
Feb 18 21:12:28 freax [<ffffffff80209b71>] ret_from_intr+0x0/0xa
Feb 18 21:12:28 freax <EOI>  [<ffffffff80242598>] worker_thread+0x0/0x14a
Feb 18 21:12:28 freax [<ffffffff80570fed>] __sched_text_start+0x12d/0x9d6
Feb 18 21:12:28 freax [<ffffffff80229c4c>] __wake_up_common+0x3e/0x68
Feb 18 21:12:28 freax [<ffffffff8022a199>] __wake_up+0x38/0x4f
Feb 18 21:12:28 freax [<ffffffff805738c6>] _spin_unlock_irqrestore+0x16/0x31
Feb 18 21:12:28 freax [<ffffffff80242598>] worker_thread+0x0/0x14a
Feb 18 21:12:28 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a
Feb 18 21:12:28 freax [<ffffffff80242598>] worker_thread+0x0/0x14a
Feb 18 21:12:28 freax [<ffffffff8024267c>] worker_thread+0xe4/0x14a
Feb 18 21:12:28 freax [<ffffffff8022cb73>] default_wake_function+0x0/0xe
Feb 18 21:12:28 freax [<ffffffff802453bd>] kthread+0xd1/0x100
Feb 18 21:12:28 freax [<ffffffff8020a438>] child_rip+0xa/0x12
Feb 18 21:12:28 freax [<ffffffff80245162>] keventd_create_kthread+0x0/0x6a
Feb 18 21:12:28 freax [<ffffffff802452ec>] kthread+0x0/0x100
Feb 18 21:12:28 freax [<ffffffff8020a42e>] child_rip+0x0/0x12
Feb 18 21:12:28 freax
Feb 18 21:12:28 freax handlers:
Feb 18 21:12:28 freax [<ffffffff8048e824>] (usb_hcd_irq+0x0/0x52)
Feb 18 21:12:28 freax Disabling IRQ #18
Feb 18 21:20:01 freax cron[8136]: (root) CMD (test -x /usr/sbin/run-crons && /
usr/sbin/run-crons )
Feb 18 21:30:01 freax cron[8183]: (root) CMD (test -x /usr/sbin/run-crons && /
usr/sbin/run-crons )
Feb 18 21:30:36 freax sky2 eth0: rx error, status 0x5350002 length 1333
Feb 18 21:30:36 freax sky2 eth0: rx error, status 0x4d50002 length 1237
Feb 18 21:34:13 freax sky2 eth0: rx error, status 0x1b10002 length 433
Feb 18 21:34:14 freax sky2 eth0: rx error, status 0x2470002 length 583
Feb 18 21:34:16 freax sky2 eth0: rx error, status 0x7f0002 length 127
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x1f30002 length 499
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x58b0002 length 1419
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x4fd0002 length 1277
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x58a0002 length 1418
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x5900002 length 1424
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x57b0002 length 1403
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x50f0002 length 1295
Feb 18 21:34:18 freax sky2 eth0: rx error, status 0x2680002 length 616
Feb 18 21:34:26 freax printk: 2 messages suppressed.

Hope this helps in a way
best regards
Bjoern
Comment 27 Stephen Hemminger 2007-02-23 13:36:47 UTC
Update that fixes this problem is now in 2.6.21-rc1 and submitted to 2.6.20
stable series
Comment 28 egon2003 2007-03-10 04:14:02 UTC
Created attachment 10680 [details]
errorlog from 2.6.20.2

I still get this error with 2.6.20.2. No hang but the network goes down.
Comment 29 Peter Kerwien 2007-03-10 04:22:18 UTC
Maybe similar problem that I experienced:

http://bugzilla.kernel.org/show_bug.cgi?id=8091
Comment 30 Regis Damongeot 2007-03-13 10:54:54 UTC
It stills crash with 2.6.20.2 for me too.
Syslog:
Mar 13 17:29:48 regis kernel: sky2 eth0: tx timeout
Mar 13 17:30:43 regis kernel: sky2 eth0: tx timeout
(repeated endlessly)

Messages:
Mar 13 11:26:47 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 13 11:26:47 regis kernel: sky2 hardware hung? flushing
Mar 13 11:37:07 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 13 11:37:07 regis kernel: sky2 status report lost?
Mar 13 11:37:52 regis kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar 13 11:37:52 regis kernel: sky2 hardware hung? flushing
(repeated endlessly too)
Comment 31 Stephen Hemminger 2007-03-13 11:11:45 UTC
A couple of questions:

1. Are you using hardware flow control (it is on by default), if so what kind
   of switch/hub are you connected to? what are the pause statistics?
         ethtool -S eth0

2. What chip version?
        dmesg | grep sky2


Comment 32 Regis Damongeot 2007-03-13 11:36:47 UTC
1) If hardware flow control is activated by default then I use it (but you have 
posted previously a patch disabling flow control and the bug was still 
present).
I use a netgear gigabit switch (GS605).
root@regis:/home/regis# ethtool -S eth0
NIC statistics:
     tx_bytes: 6571
     rx_bytes: 4318
     tx_broadcast: 47
     rx_broadcast: 28
     tx_multicast: 0
     rx_multicast: 0
     tx_unicast: 0
     rx_unicast: 0
     tx_mac_pause: 0
     rx_mac_pause: 0
     collisions: 0
     late_collision: 0
     aborted: 0
     single_collisions: 0
     multi_collisions: 0
     rx_short: 0
     rx_runt: 0
     rx_64_byte_packets: 1
     rx_65_to_127_byte_packets: 3
     rx_128_to_255_byte_packets: 22
     rx_256_to_511_byte_packets: 2
     rx_512_to_1023_byte_packets: 0
     rx_1024_to_1518_byte_packets: 0
     rx_1518_to_max_byte_packets: 0
     rx_too_long: 0
     rx_fifo_overflow: 0
     rx_jabber: 0
     rx_fcs_error: 0
     tx_64_byte_packets: 0
     tx_65_to_127_byte_packets: 24
     tx_128_to_255_byte_packets: 20
     tx_256_to_511_byte_packets: 3
     tx_512_to_1023_byte_packets: 0
     tx_1024_to_1518_byte_packets: 0
     tx_1519_to_max_byte_packets: 0
     tx_fifo_underrun: 0
This is after a reboot, not when the card is crashed.

root@regis:/home/regis# dmesg | grep sky2
sky2 v1.10 addr 0xcfefc000 irq 19 Yukon-EC (0xb6) rev 2
sky2 eth0: addr 00:15:f2:7e:74:45
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
Comment 33 egon2003 2007-03-13 14:32:57 UTC
I am using hardware flowcontrol if its on by default.
The switch I am using is a Netgear GS108.


ethtool -S eth0

This is before the error has occured. I will post the output with the error as
soon as it happens.

NIC statistics:
     tx_bytes: 486559
     rx_bytes: 5341832
     tx_broadcast: 35
     rx_broadcast: 3
     tx_multicast: 0
     rx_multicast: 0
     tx_unicast: 6246
     rx_unicast: 3852
     tx_mac_pause: 0
     rx_mac_pause: 0
     collisions: 0
     late_collision: 0
     aborted: 0
     single_collisions: 0
     multi_collisions: 0
     rx_short: 0
     rx_runt: 0
     rx_64_byte_packets: 28
     rx_65_to_127_byte_packets: 185
     rx_128_to_255_byte_packets: 98
     rx_256_to_511_byte_packets: 26
     rx_512_to_1023_byte_packets: 62
     rx_1024_to_1518_byte_packets: 3456
     rx_1518_to_max_byte_packets: 0
     rx_too_long: 0
     rx_fifo_overflow: 0
     rx_jabber: 0
     rx_fcs_error: 0
     tx_64_byte_packets: 35
     tx_65_to_127_byte_packets: 6127
     tx_128_to_255_byte_packets: 43
     tx_256_to_511_byte_packets: 21
     tx_512_to_1023_byte_packets: 53
     tx_1024_to_1518_byte_packets: 2
     tx_1519_to_max_byte_packets: 0
     tx_fifo_underrun: 0

dmesg | grep sky2
sky2 v1.10 addr 0xcdefc000 irq 17 Yukon-EC (0xb6) rev 2
sky2 eth0: addr 00:17:31:84:3f:28
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both

Comment 34 egon2003 2007-03-13 15:11:22 UTC
Just got the error again:

Mar 13 23:05:49 elite sky2 eth0: tx timeout
Mar 13 23:05:49 elite sky2 eth0: transmit ring 59 .. 37 report=60 done=60
Mar 13 23:05:49 elite sky2 status report lost?
Mar 13 23:05:54 elite NETDEV WATCHDOG: eth0: transmit timed out
Mar 13 23:05:54 elite sky2 eth0: tx timeout
Mar 13 23:05:54 elite sky2 eth0: transmit ring 60 .. 37 report=60 done=60
Mar 13 23:05:54 elite sky2 hardware hung? flushing

and so on.....

ethtool -S eth0

with the error

NIC statistics:
     tx_bytes: 1694409496
     rx_bytes: 2089054273
     tx_broadcast: 49
     rx_broadcast: 56
     tx_multicast: 4
     rx_multicast: 0
     tx_unicast: 2228657
     rx_unicast: 2296615
     tx_mac_pause: 1
     rx_mac_pause: 0
     collisions: 0
     late_collision: 0
     aborted: 0
     single_collisions: 0
     multi_collisions: 0
     rx_short: 0
     rx_runt: 0
     rx_64_byte_packets: 340370
     rx_65_to_127_byte_packets: 376059
     rx_128_to_255_byte_packets: 38899
     rx_256_to_511_byte_packets: 44021
     rx_512_to_1023_byte_packets: 274974
     rx_1024_to_1518_byte_packets: 1222348
     rx_1518_to_max_byte_packets: 0
     rx_too_long: 0
     rx_fifo_overflow: 0
     rx_jabber: 0
     rx_fcs_error: 0
     tx_64_byte_packets: 401437
     tx_65_to_127_byte_packets: 624384
     tx_128_to_255_byte_packets: 83117
     tx_256_to_511_byte_packets: 18825
     tx_512_to_1023_byte_packets: 52777
     tx_1024_to_1518_byte_packets: 1048171
     tx_1519_to_max_byte_packets: 0
     tx_fifo_underrun: 0
Comment 35 Henrik Olsson 2007-03-17 17:13:14 UTC
I have the same problem when using the sky2 driver, though with the latest .21
rc it seems to be able to fix itself. It's annoying when streaming things though :)

cat /proc/interrupts
           CPU0       CPU1       
  0:        113          0   IO-APIC-edge      timer
  1:          2          0   IO-APIC-edge      i8042
  6:          5          0   IO-APIC-edge      floppy
  8:         30          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          4          0   IO-APIC-edge      i8042
 14:      29828          0   IO-APIC-edge      ide0
 16:     221084          0   IO-APIC-fasteoi   libata, fglrx
 17:        202          0   IO-APIC-fasteoi   uhci_hcd:usb5, HDA Intel
 18:          0          0   IO-APIC-fasteoi   libata, uhci_hcd:usb3
 21:    1610499      34072   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 22:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 23:          3          0   IO-APIC-fasteoi   ohci1394
216:          1          0   PCI-MSI-edge      eth1
217:     380888    2773219   PCI-MSI-edge      eth0
218:      51615     290181   PCI-MSI-edge      libata
NMI:          0          0 
LOC:     743870     743841 
ERR:          0
MIS:          0


 dmesg|grep sky2
[   55.730269] sky2 0000:04:00.0: v1.13 addr 0xff8fc000 irq 17 Yukon-EC (0xb6) rev 2
[   55.730449] sky2 eth0: addr 00:17:31:ee:d8:46
[   55.730486] sky2 0000:03:00.0: v1.13 addr 0xff7fc000 irq 16 Yukon-EC (0xb6) rev 2
[   55.730634] sky2 eth1: addr 00:17:31:ee:d2:ce
[   55.744293] sky2 eth0: enabling interface
[   55.746101] sky2 eth0: ram buffer 48K
[   57.484538] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
[   70.143440] Modules linked in: fglrx agpgart cpufreq_userspace cpufreq_stats
cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative video sbs
dock i2c_ec button battery container ac asus_acpi iptable_filter ip_tables
x_tables ipv6 xfs nls_utf8 ntfs w83627ehf eeprom i2c_isa i2c_i801 fuse sbp2
parport_pc lp parport tsdev snd_hda_intel snd_hda_codec dvb_usb_vp7045 dvb_usb
snd_pcm_oss snd_mixer_oss dvb_core dvb_pll psmouse snd_pcm snd_timer sg usbhid
i2c_core 8139cp 8139too mii sky2 snd soundcore serio_raw rng_core snd_page_alloc
shpchp pci_hotplug pcspkr floppy evdev ext3 jbd mbcache raid456 md_mod xor
ohci1394 ieee1394 uhci_hcd ehci_hcd usbcore ide_generic jmicron ide_cd cdrom
ide_disk piix generic sd_mod thermal processor fan
[ 1267.324218] sky2 eth0: tx timeout
[ 1267.324222] sky2 eth0: transmit ring 345 .. 322 report=350 done=350
[ 1267.324228] sky2 eth0: disabling interface
[ 1267.324702] sky2 eth0: enabling interface
[ 1267.326645] sky2 eth0: ram buffer 48K
[ 1269.009555] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
[ 2323.600088] sky2 eth0: tx timeout
[ 2323.600092] sky2 eth0: transmit ring 109 .. 86 report=114 done=114
[ 2323.600098] sky2 eth0: disabling interface
[ 2323.600596] sky2 eth0: enabling interface
[ 2323.602540] sky2 eth0: ram buffer 48K
[ 2325.353570] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both

 lspci
00:00.0 Host bridge: Intel Corporation 975X Express Memory Controller Hub (rev c0)
00:01.0 PCI bridge: Intel Corporation 975X Express PCI Express Root Port (rev c0)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express
Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express
Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
(rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA
Storage Controller AHCI (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link)
02:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
02:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 02)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20)
04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20)
06:00.0 VGA compatible controller: ATI Technologies Inc Unknown device 7249
06:00.1 Display controller: ATI Technologies Inc Unknown device 7269

Motherboard is ASUS P5H DH Deluxe.

Seems pretty reproducible if i upload large amounts of data. Can give shell if
needed to help solve this issue.
Comment 36 Stephen Hemminger 2007-03-19 11:28:08 UTC
I can do upload's all weekend with same motherboard.

What is the kernel version you are using?

What are the statistics (ethtool -S eth0)?

What kind of switch? 
Comment 37 Stephen Hemminger 2007-04-18 13:19:57 UTC
*** Bug 8091 has been marked as a duplicate of this bug. ***
Comment 38 Stephen Hemminger 2007-04-18 13:20:57 UTC
*** Bug 8324 has been marked as a duplicate of this bug. ***
Comment 39 Harald Kubota 2007-05-05 10:27:35 UTC
Just want to add: still happens on 2.6.21.1. 3 times today.

syslog says:
May  6 00:07:17 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out
May  6 00:07:17 burner kernel: sky2 mv0: tx timeout
May  6 00:07:17 burner kernel: sky2 mv0: transmit ring 252 .. 229 report=252 don
e=252
May  6 00:07:17 burner kernel: sky2 hardware hung? flushing
May  6 00:15:42 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out
May  6 00:15:42 burner kernel: sky2 mv0: tx timeout
May  6 00:15:42 burner kernel: sky2 mv0: transmit ring 229 .. 206 report=252 don
e=252
May  6 00:15:42 burner kernel: sky2 status report lost?
May  6 00:16:07 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out
May  6 00:16:07 burner kernel: sky2 mv0: tx timeout
May  6 00:16:07 burner kernel: sky2 mv0: transmit ring 252 .. 229 report=252 don
e=252
May  6 00:16:07 burner kernel: sky2 hardware hung? flushing

And here it did not come back after getting this in the logs:

May  6 01:22:13 burner kernel: NETDEV WATCHDOG: mv0: transmit timed out
May  6 01:22:13 burner kernel: sky2 mv0: tx timeout
May  6 01:22:13 burner kernel: sky2 mv0: transmit ring 453 .. 430 report=453
done=453
May  6 01:22:13 burner kernel: sky2 mv0: disabling interface
May  6 01:22:13 burner kernel: sky2 mv0: enabling interface
May  6 01:22:13 burner kernel: sky2 mv0: ram buffer 48K
May  6 01:22:16 burner kernel: sky2 mv0: Link is up at 1000 Mbps, full duplex,
flow control both

At this point the network is plain dead. Not pingable from outside, ifconfig mv0
down and up does not change anything. Using another (until now unused) NIC
(rl8169) works immediately, so it's 'only' the sky2 driver.

lspci:
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20)
/proc/interrupts
           CPU0       CPU1
  0:         85          0   IO-APIC-edge      timer
  1:          0          2   IO-APIC-edge      i8042
  6:          2          1   IO-APIC-edge      floppy
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          0          4   IO-APIC-edge      i8042
 14:     269957     121746   IO-APIC-edge      ide0
 15:         49         13   IO-APIC-edge      ide1
 18:     665606     403627   IO-APIC-fasteoi   mv0
 20:          0          2   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb7
 21:          3          4   IO-APIC-fasteoi   ehci_hcd:usb2, ohci1394
 22:          0         19   IO-APIC-fasteoi   ohci_hcd:usb3
 23:     270673          1   IO-APIC-fasteoi   ohci_hcd:usb4, eth2
 24:     118010    1032119   IO-APIC-fasteoi   ohci_hcd:usb5, ohci_hcd:usb6,
ohci1394
 25:          9          1   IO-APIC-fasteoi   ul0
 26:          0          0   IO-APIC-fasteoi   ALi M5455
NMI:          0          0
LOC:    1772837    1772837
ERR:          0
MIS:          0

CPU is dual core Athlon 3800+, kernel is a freshly compiled 2.6.21.1.

If there is anything I can do to get more debug output, someone let me please know!
Comment 40 Stephen Hemminger 2007-05-08 14:17:07 UTC
*** Bug 8453 has been marked as a duplicate of this bug. ***
Comment 41 Artem S. Tashkinov 2007-05-15 22:40:13 UTC
Here's a script which I run (nohup script &> /dev/null &) to avoid losing a
connection:

#! /bin/sh
export LANG=C

IP=192.168.1.200
LOG=/var/log/sky2
LOGDEBUG=/var/log/sky2.debug

debug()
{
        date '+%Y-%m-%d %H:%M:%S'
        ping -c5 $IP 2>&1 | grep transmitted
        ethtool eth1 2>&1 | egrep "Speed|detected"
}


while :; do
        ping -c3 $IP &> /dev/null
        RESULT=$?

        if [ "$RESULT" != "0" ]; then
                date '+%Y-%m-%d %H:%M:%S' >> $LOG
                debug >> $LOGDEBUG
                /sbin/ifdown eth0
                rmmod sky2
                /sbin/ifup eth0
        fi
        sleep 60
done
Comment 42 Stephen Hemminger 2007-06-04 09:08:22 UTC
*** Bug 7579 has been marked as a duplicate of this bug. ***
Comment 43 H 2007-06-06 04:02:35 UTC
I have just done some testing with 2.6.22-rc4 on one of the machines I have with
sky2 NICs.

This one has an Asus P5W DH motherboard with two of these NICs integrated.

lspci:
02:00.0 0200: 11ab:4362 (rev 20)
03:00.0 0200: 11ab:4362 (rev 20)

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20)
	Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller
PCIe (Asus)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 16 bytes
	Interrupt: pin A routed to IRQ 218
	Region 0: Memory at fa8fc000 (64-bit, non-prefetchable) [size=16K]
	Region 2: I/O ports at a800 [size=256]
	Expansion ROM at fa8c0000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
	Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+
		Address: 00000000fee0300c  Data: 4172
	Capabilities: [e0] Express Legacy Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s unlimited, L1 unlimited
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
		Link: Latency L0s <256ns, L1 unlimited
		Link: ASPM Disabled RCB 128 bytes CommClk+ ExtSynch-
		Link: Speed 2.5Gb/s, Width x1

03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20)
	Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller
PCIe (Asus)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 16 bytes
	Interrupt: pin A routed to IRQ 219
	Region 0: Memory at fa9fc000 (64-bit, non-prefetchable) [size=16K]
	Region 2: I/O ports at b800 [size=256]
	Expansion ROM at fa9c0000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
	Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+
		Address: 00000000fee0300c  Data: 416a
	Capabilities: [e0] Express Legacy Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s unlimited, L1 unlimited
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
		Link: Latency L0s <256ns, L1 unlimited
		Link: ASPM Disabled RCB 128 bytes CommClk+ ExtSynch-
		Link: Speed 2.5Gb/s, Width x1

dmesg:
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 19 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:03:00.0 to 64
sky2 0000:03:00.0: v1.14 addr 0xfa9fc000 irq 17 Yukon-EC (0xb6) rev 2
sky2 eth1: addr 00:18:f3:e0:46:f0
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:02:00.0 to 64
sky2 0000:02:00.0: v1.14 addr 0xfa8fc000 irq 16 Yukon-EC (0xb6) rev 2
sky2 eth2: addr 00:18:f3:e0:22:42
...
sky2 eth1: enabling interface
sky2 eth1: ram buffer 48K
ADDRCONF(NETDEV_UP): eth1: link is not ready
sky2 eth2: enabling interface
sky2 eth2: ram buffer 48K
ADDRCONF(NETDEV_UP): eth2: link is not ready
sky2 eth2: Link is up at 1000 Mbps, full duplex, flow control both
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
eth2: no IPv6 routers present


What I did was running these two commands simultaneously:
$ ssh 192.168.1.1 "cat /dev/urandom" >/dev/null
and 
$ cat /dev/urandom | ssh 192.168.1.1 "cat >/dev/null"

And sure enough, pretty soon (once after about an hour, once after about 20
minutes) it stopped working.

The ping I had running looked like this:
64 bytes from 192.168.1.1: icmp_seq=786 ttl=64 time=0.225 ms
64 bytes from 192.168.1.1: icmp_seq=787 ttl=64 time=0.270 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available

But nothing new appeared in dmesg.

Stopping all the above commands did not make it recover, only unloading and
reloading the sky2 module helped.
Comment 44 Stephen Hemminger 2007-06-06 09:08:28 UTC
The last test showed you have a different problem.
You ran out of memory on the system. Sorry, that isn't a driver problem.
Comment 45 Artem S. Tashkinov 2007-06-06 10:18:44 UTC
I cannot agree with comment 44, because it was said earlier:

> Stopping all the above commands did not make it recover, only unloading and
reloading the sky2 module helped.

I still wonder if this bug is so easily triggerable and reproducible, why the
patch is still missing?
Comment 46 H 2007-06-06 11:31:32 UTC
I have to agree that it MAY be a different problem than what I have observed
otherwise (at those times getting the "hw csum" errors in dmesg).

I don't however understand why running those commands would cause the machine to
run out of memory. (It's not a machine with low memory, it has 2 GB, so it's not
right on the edge to start with.)

I personally think the reason why it ran out of buffer space was because the
sky2 driver stopped working properly. (The memory problem being an effect, not
the cause.)


The problem I mentioned above only happens when using the onboard sky2 rather
than the e1000 PCI card that I normally use (to avoid the sky2 driver problems).
Comment 47 H 2007-06-07 05:23:59 UTC
FWIW, the current sk98lin driver version from Marvell (v10.0.5.3) with kernel
2.6.21.3 seems to be a working setup with the above mentioned card.

(Their driver does not seem to compile out of the box with 2.6.22-rc4, which is
why I switched kernel versions to get a quick test done with that.)

I don't know if that can be of any use for tracking down the issues. (It implies
that the hardware is ok, if nothing else.)


Dmesg output with that driver for comparison:
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 19 (level, low) -> IRQ 17
sk98lin: Network Device Driver v10.0.5.3
(C)Copyright 1999-2007 Marvell(R).
PCI: Setting latency timer of device 0000:03:00.0 to 64
eth1: Marvell Yukon 88E8053 Gigabit Ethernet Controller
      PrefPort:A  RlmtMode:Check Link State
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:02:00.0 to 64
eth2: Marvell Yukon 88E8053 Gigabit Ethernet Controller
      PrefPort:A  RlmtMode:Check Link State
...
eth2: network connection up using port A
    speed:           1000
    autonegotiation: yes
    duplex mode:     full
    flowctrl:        symmetric
    role:            slave
    irq moderation:  disabled
    tcp offload:     enabled
    scatter-gather:  enabled
    tx-checksum:     enabled
    rx-checksum:     enabled
    rx-polling:      enabled
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
eth2: no IPv6 routers present
Comment 48 Stephen Hemminger 2007-06-25 17:52:14 UTC
Created attachment 11880 [details]
NAPI poll fix patch against 2.6.22-rc4 or later.

Recheck for more status in NAPI poll
Comment 49 H 2007-06-27 12:36:23 UTC
I have been running some tests with 2.6.22-rc6 + the patch above (id=11880) on one of my machines and have yet to get sky2 to stop working the way it previously did.

I'll need some further testing to be sure, but the initial results seem promising.
Comment 50 H 2007-06-27 12:42:25 UTC
I, unfortunately have to retract my previous statement. Just minutes after posting that, I got the same kind of behavior as before, where sky2 stopped working, leaving rmmod sky2, modprobe sky2 as the only way to get it working again.
Comment 51 Stephen Hemminger 2007-07-09 13:55:56 UTC
Created attachment 11984 [details]
missed IRQ workaround

This patch uses existing idle timeout to poll for list IRQ's. It works around all known stress test failures.
Comment 52 H 2007-07-13 05:13:35 UTC
Great, Stephen!

It makes sense too, as 2.6.16 was the last working version (at least for me).

Will you send this patch to the stable team?