Bug 7579

Summary:	Sky2 receive checksum errors
Product:	Drivers	Reporter:	Badalian Slava (slavon.net)
Component:	Network	Assignee:	Stephen Hemminger (stephen)
Status:	REJECTED DUPLICATE
Severity:	high	CC:	bryce, bunk, flyboy, grail, rakhal, stephen, zigamlinar
Priority:	P2
Hardware:	i386
OS:	Linux
Kernel Version:	2.6.18	Subsystem:
Regression:	---	Bisected commit-id:
Attachments:	config Filtered messages log detect and turn off receive checksum

Description Badalian Slava 2006-11-24 07:09:06 UTC

Distribution:
Linux ns1 2.6.18-gentoo #1 SMP Thu Oct 5 20:49:00 MSD 2006 i686 Intel(R)
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
Hardware Environment:

00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Express Memory
Controller Hub (rev 04)
        Subsystem: Intel Corporation 915G/P/GV/GL/PL/910GL Express Memory
Controller Hub
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
        Latency: 0
        Capabilities: [e0] Vendor Specific Information

00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI
Express Port 1 (rev 03) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size: 16 bytes
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: fff00000-000fffff
        Prefetchable memory behind bridge: 00000000fff00000-0000000000000000
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: [40] Express Root Port (Slot+) IRQ 0
                Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
                Device: Latency L0s unlimited, L1 unlimited
                Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
                Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
                Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 1
                Link: Latency L0s <1us, L1 <4us
                Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
                Link: Speed 2.5Gb/s, Width x0
                Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
                Slot: Number 1, PowerLimit 0.000000
                Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
                Slot: AttnInd Unknown, PwrInd Unknown, Power-
                Root: Correctable- Non-Fatal- Fatal- PME-
        Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable-
                Address: 00000000  Data: 0000
        Capabilities: [90] #0d [0000]
        Capabilities: [a0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI
Express Port 2 (rev 03) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size: 16 bytes
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: dff00000-dfffffff
        Prefetchable memory behind bridge: 00000000fff00000-0000000000000000
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
        Capabilities: [40] Express Root Port (Slot+) IRQ 0
                Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
                Device: Latency L0s unlimited, L1 unlimited
                Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
                Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
                Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 2
                Link: Latency L0s <1us, L1 <4us
                Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
                Link: Speed 2.5Gb/s, Width x1
                Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
                Slot: Number 2, PowerLimit 0.000000
                Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
                Slot: AttnInd Unknown, PwrInd Unknown, Power-
                Root: Correctable- Non-Fatal- Fatal- PME-
        Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable-
                Address: 00000000  Data: 0000
        Capabilities: [90] #0d [0000]
        Capabilities: [a0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3) (prog-if 01
[Subtractive decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
        I/O behind bridge: 0000c000-0000cfff
        Memory behind bridge: de000000-dfefffff
        Prefetchable memory behind bridge: 0000000050000000-0000000050000000
        Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
        Capabilities: [50] #0d [0000]

00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface
Bridge (rev 03)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0

00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE
Controller (rev 03) (prog-if 8a [Master SecP PriP])
        Subsystem: ASUSTeK Computer Inc. P5GD1-VW Mainboard
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at <unassigned>
        Region 1: I/O ports at <unassigned>
        Region 2: I/O ports at <unassigned>
        Region 3: I/O ports at <unassigned>
        Region 4: I/O ports at ffa0 [size=16]

00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA
Controller (rev 03) (prog-if 8f [Master SecP SecO PriP PriO])
        Subsystem: ASUSTeK Computer Inc. Unknown device 2601
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin B routed to IRQ 19
        Region 0: I/O ports at bc00 [size=8]
        Region 1: I/O ports at b880 [size=4]
        Region 2: I/O ports at b800 [size=8]
        Region 3: I/O ports at b480 [size=4]
        Region 4: I/O ports at b400 [size=16]
        Region 5: Memory at ddfffc00 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [70] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus
Controller (rev 03)
        Subsystem: ASUSTeK Computer Inc. P5GD1-VW Mainboard
        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin B routed to IRQ 0
        Region 4: I/O ports at 0400 [size=32]

01:0b.0 VGA compatible controller: ATI Technologies Inc 3D Rage II+ 215GTB
[Mach64 GTB] (rev 9a) (prog-if 00 [VGA])
        Subsystem: ATI Technologies Inc 3D Rage II+ 215GTB [Mach64 GTB]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (2000ns min), Cache Line Size: 16 bytes
        Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: I/O ports at c000 [size=256]
        Region 2: Memory at dfeff000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at 50000000 [disabled] [size=128K]

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 15)
        Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet
controller PCIe (Asus)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size: 16 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at dfffc000 (64-bit, non-prefetchable) [size=16K]
        Region 2: I/O ports at d800 [size=256]
        Expansion ROM at dffc0000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express Legacy Endpoint IRQ 0
                Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
                Device: Latency L0s unlimited, L1 unlimited
                Device: AtnBtn- AtnInd- PwrInd-
                Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
                Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop-
                Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
                Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
                Link: Latency L0s <256ns, L1 unlimited
                Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
                Link: Speed 2.5Gb/s, Width x1



Problem Description:

dmesg::

ip_ct_ras: decoding error: out of bound
sky2 eth0: rx error, status 0x7ffc0001 length 96
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
sky2 eth0: rx error, status 0x7ffc0001 length 88
sky2 eth0: Link is down.
sky2 eth0: rx error, status 0x7ffc0001 length 88
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none
sky2 eth0: rx error, status 0x7ffc0001 length 96
eth0: hw csum failure.
 [<c02e8d43>] __skb_checksum_complete+0x67/0x69
 [<c0336868>] udp_error+0xca/0x1a8
 [<c0114d4c>] try_to_wake_up+0x40/0x401
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0334b31>] ip_conntrack_in+0xa4/0x4b8
 [<c03217ec>] udp_queue_rcv_skb+0xa8/0x2a8
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02faf06>] nf_iterate+0x66/0x8a
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02fb079>] nf_hook_slow+0x59/0xd9
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0302073>] ip_rcv+0x304/0x51b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02eba3d>] __net_timestamp+0x14/0x27
 [<c02ebbd2>] netif_receive_skb+0x182/0x1fc
 [<c0290040>] sky2_poll+0x514/0xa5a
 [<c02ed624>] net_rx_action+0x7d/0x10a
 [<c01200a2>] __do_softirq+0x73/0xdf
 [<c0120149>] do_softirq+0x3b/0x3d
 [<c01054f5>] do_IRQ+0x30/0x6b
 [<c01036de>] common_interrupt+0x1a/0x20
 [<c0101a99>] mwait_idle+0x2a/0x34
 [<c0101a59>] cpu_idle+0x63/0x79
 [<c042c7af>] start_kernel+0x34a/0x3fb
 [<c042c1eb>] unknown_bootoption+0x0/0x27a
eth0: hw csum failure.
 [<c02e8d43>] __skb_checksum_complete+0x67/0x69
 [<c0336868>] udp_error+0xca/0x1a8
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0334b31>] ip_conntrack_in+0xa4/0x4b8
 [<c03217ec>] udp_queue_rcv_skb+0xa8/0x2a8
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02faf06>] nf_iterate+0x66/0x8a
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02fb079>] nf_hook_slow+0x59/0xd9
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0302073>] ip_rcv+0x304/0x51b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02eba3d>] __net_timestamp+0x14/0x27
 [<c02ebbd2>] netif_receive_skb+0x182/0x1fc
 [<c0290040>] sky2_poll+0x514/0xa5a
 [<c02ed624>] net_rx_action+0x7d/0x10a
 [<c01200a2>] __do_softirq+0x73/0xdf
 [<c0120149>] do_softirq+0x3b/0x3d
 [<c01054f5>] do_IRQ+0x30/0x6b
 [<c01036de>] common_interrupt+0x1a/0x20
 [<c0101a99>] mwait_idle+0x2a/0x34
 [<c0101a59>] cpu_idle+0x63/0x79
 [<c042c7af>] start_kernel+0x34a/0x3fb
 [<c042c1eb>] unknown_bootoption+0x0/0x27a
ip_ct_ras: decoding error: out of bound
eth0: hw csum failure.
 [<c02e8d43>] __skb_checksum_complete+0x67/0x69
 [<c0336868>] udp_error+0xca/0x1a8
 [<c0114d4c>] try_to_wake_up+0x40/0x401
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0334b31>] ip_conntrack_in+0xa4/0x4b8
 [<c03217ec>] udp_queue_rcv_skb+0xa8/0x2a8
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02faf06>] nf_iterate+0x66/0x8a
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02fb079>] nf_hook_slow+0x59/0xd9
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0302073>] ip_rcv+0x304/0x51b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02eba3d>] __net_timestamp+0x14/0x27
 [<c02ebbd2>] netif_receive_skb+0x182/0x1fc
 [<c0290040>] sky2_poll+0x514/0xa5a
 [<c02ed624>] net_rx_action+0x7d/0x10a
 [<c01200a2>] __do_softirq+0x73/0xdf
 [<c0120149>] do_softirq+0x3b/0x3d
 [<c01054f5>] do_IRQ+0x30/0x6b
 [<c01036de>] common_interrupt+0x1a/0x20
 [<c0101a99>] mwait_idle+0x2a/0x34
 [<c0101a59>] cpu_idle+0x63/0x79
 [<c042c7af>] start_kernel+0x34a/0x3fb
 [<c042c1eb>] unknown_bootoption+0x0/0x27a
sky2 eth0: rx error, status 0x7ffc0001 length 80
ip_ct_ras: decoding error: out of bound
sky2 eth0: rx error, status 0x7ffc0001 length 72
eth0: hw csum failure.
 [<c02e8d43>] __skb_checksum_complete+0x67/0x69
 [<c0336868>] udp_error+0xca/0x1a8
 [<c0114d4c>] try_to_wake_up+0x40/0x401
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0334b31>] ip_conntrack_in+0xa4/0x4b8
 [<c03217ec>] udp_queue_rcv_skb+0xa8/0x2a8
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02faf06>] nf_iterate+0x66/0x8a
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02fb079>] nf_hook_slow+0x59/0xd9
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c0302073>] ip_rcv+0x304/0x51b
 [<c03017d0>] ip_rcv_finish+0x0/0x27b
 [<c02eba3d>] __net_timestamp+0x14/0x27
 [<c02ebbd2>] netif_receive_skb+0x182/0x1fc
 [<c0290040>] sky2_poll+0x514/0xa5a
 [<c02ed624>] net_rx_action+0x7d/0x10a
 [<c01200a2>] __do_softirq+0x73/0xdf
 [<c0120149>] do_softirq+0x3b/0x3d
 [<c01054f5>] do_IRQ+0x30/0x6b
 [<c01036de>] common_interrupt+0x1a/0x20
 [<c0101a99>] mwait_idle+0x2a/0x34
 [<c0101a59>] cpu_idle+0x63/0x79
 [<c042c7af>] start_kernel+0x34a/0x3fb
 [<c042c1eb>] unknown_bootoption+0x0/0x27a
sky2 eth0: Link is down.
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
sky2 eth0: Link is down.
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
ip_ct_ras: decoding error: out of bound
sky2 eth0: rx error, status 0x7ffc0001 length 72
ip_ct_ras: decoding error: out of bound
sky2 eth0: Link is down.
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none

Comment 1 Stephen Hemminger 2006-11-28 10:36:06 UTC

Does it fail instantly, or only under load?  The sky2 receive error message
occurs if the receiver can't keep up with the arriving data.  How loaded is this
machine?

Comment 2 Badalian Slava 2006-11-28 13:18:04 UTC

computer use for DNS server... more time 99% idle... 
have second computer... his have some problems (install new system... not have
clients and connections... only 1-5 ssh sessions)

Symtoms:

Mashine not ping and not answer to requests 2-10 mins - after all work ok... if
unplug and plug network cable - all work ok...

Comment 3 Stephen Hemminger 2006-11-28 13:29:58 UTC

Some more information:
1. What is system name/motherboard, perhaps I can find one to try and reproduce
the problem.

2. What is output of driver on boot up.  (dmesg | grep sky2) 
    The driver prints chip version information.

Please retry with 2.6.18.3 or 2.6.19-rc6, there were fixes after 2.6.18 for sky2

Comment 4 Badalian Slava 2006-11-28 22:18:21 UTC

1. OS - gentoo... last version and last portage. i can't see motherboard name...
computer location 200km from me =( Second computer now have 2 e1000 cards and go
to first computer... i can't also see motherboard name =( I can get to u any
other info that can get from linux remote.
2. sky2 v1.7 addr 0xdfffc000 irq 17 Yukon-EC (0xb6) rev 1

Ok... i try 2.6.18.3 and post results... but bug reproduce time may be up to week...

Comment 5 Badalian Slava 2006-11-28 22:42:21 UTC

After update to 2.6.18.3

ns1 ~ # uname -a
Linux ns1 2.6.18-gentoo-r3 #1 SMP Wed Nov 29 09:32:49 MSK 2006 i686 Intel(R)
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux

ns1 ~ # dmesg | grep sky2
sky2 v1.5 addr 0xdfffc000 irq 17 Yukon-EC (0xb6) rev 1
sky2 eth0: addr 00:11:2f:88:9d:e4
sky2 eth0: enabling interface
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none

version not change ;)

Comment 6 Stephen Hemminger 2006-11-29 12:03:01 UTC

Does turning off receive checksumming fix the problem?
  ethtool -K eth0 rx off

Are you using both ports?

Comment 7 Stephen Hemminger 2006-11-29 12:40:25 UTC

What is the kernel configuration?

What is the IRQ assignment? ie. cat /proc/interrupts

Comment 8 Badalian Slava 2006-11-30 03:46:01 UTC

2.6.18.3

ip_ct_ras: decoding error: out of bound
eth0: hw csum failure.
 [<c02e9245>] __skb_checksum_complete+0x67/0x69
 [<c0336bc8>] udp_error+0xca/0x1a8
 [<c0114d6c>] try_to_wake_up+0x40/0x401
 [<c0301b50>] ip_rcv_finish+0x0/0x27b
 [<c0334e91>] ip_conntrack_in+0xa4/0x4b8
 [<c0321b3c>] udp_queue_rcv_skb+0xa8/0x2a8
 [<c0301b50>] ip_rcv_finish+0x0/0x27b
 [<c02fb286>] nf_iterate+0x66/0x8a
 [<c0301b50>] ip_rcv_finish+0x0/0x27b
 [<c0301b50>] ip_rcv_finish+0x0/0x27b
 [<c02fb3f9>] nf_hook_slow+0x59/0xd9
 [<c0301b50>] ip_rcv_finish+0x0/0x27b
 [<c03023f3>] ip_rcv+0x304/0x51b
 [<c0301b50>] ip_rcv_finish+0x0/0x27b
 [<c02ebf3d>] __net_timestamp+0x14/0x27
 [<c02ec0d2>] netif_receive_skb+0x182/0x1fc
 [<c02905d1>] sky2_poll+0x545/0xa69
 [<c0274ee5>] i8042_interrupt+0x1e7/0x22a
 [<c02edb24>] net_rx_action+0x7d/0x10a
 [<c01200e2>] __do_softirq+0x73/0xdf
 [<c0120189>] do_softirq+0x3b/0x3d
 [<c01054f5>] do_IRQ+0x30/0x6b
 [<c01036de>] common_interrupt+0x1a/0x20
 [<c0101a99>] mwait_idle+0x2a/0x34
 [<c0101a59>] cpu_idle+0x63/0x79
 [<c042e7af>] start_kernel+0x34a/0x3fb
 [<c042e1eb>] unknown_bootoption+0x0/0x27a

ns1 ~ # cat /proc/interrupts
           CPU0       CPU1
  0:   26205345          0    IO-APIC-edge  timer
  7:          0          0    IO-APIC-edge  parport0
  9:          0          0   IO-APIC-level  acpi
 17:    6791063          0   IO-APIC-level  sky2
 19:      70142          0   IO-APIC-level  libata
NMI:          0          0
LOC:   26088304   26088304
ERR:          0
MIS:          0

>Does turning off receive checksumming fix the problem?
>  ethtool -K eth0 rx off

i try do it only now... i ask if it help for me...

> Are you using both ports?

My MB have only 1 ethernet port... 

config.gz i attach

Comment 9 Badalian Slava 2006-11-30 03:47:20 UTC

Created attachment 9673 [details]
config

Comment 10 Badalian Slava 2006-12-02 04:49:02 UTC

ok... now i not have error messages in dmesg... but have problem in connection

some time server not request on ip... (not ping, not connect ssh)... if i
shutdown port on Cisco and up it - all normal... some time... to next down... =(

in dmesg

sky2 eth0: Link is down.
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none
sky2 eth0: Link is down.
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control none

Comment 11 Daniel Berglund 2006-12-02 09:01:09 UTC

I have the same problem...

System information:

Motherboard: AOPEN i915gmm-hfs
Distribution: debian

atos:/etc# uname -a
Linux atos 2.6.18.2 #1 PREEMPT Sat Nov 18 19:28:03 CET 2006 i686 GNU/Linux

atos:/etc# lspci
0000:00:00.0 Host bridge: Intel Corporation Mobile 915GM/PM/GMS/910GML Express 
Processor to DRAM Controller (rev 03)
0000:00:02.0 VGA compatible controller: Intel Corporation Mobile 
915GM/GMS/910GML Express Graphics Controller (rev 03)
0000:00:02.1 Display controller: Intel Corporation Mobile 915GM/GMS/910GML 
Express Graphics Controller (rev 03)
0000:00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
PCI Express Port 1 (rev 04)
0000:00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
PCI Express Port 2 (rev 04)
0000:00:1c.2 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
PCI Express Port 3 (rev 04)
0000:00:1c.3 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
PCI Express Port 4 (rev 04)
0000:00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #1 (rev 04)
0000:00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #2 (rev 04)
0000:00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #3 (rev 04)
0000:00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #4 (rev 04)
0000:00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB2 EHCI Controller (rev 04)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev d4)
0000:00:1f.0 ISA bridge: Intel Corporation 82801FBM (ICH6M) LPC Interface 
Bridge (rev 04)
0000:00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) IDE Controller (rev 04)
0000:00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA Controller 
(rev 04)
0000:00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
SMBus Controller (rev 04)
0000:02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 19)
0000:03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller (rev 19)
0000:05:04.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] 
(rev 78)
0000:05:05.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] 
(rev 30)

atos:/etc# ethtool -i eth2
driver: sky2
version: 1.5
firmware-version: N/A
bus-info: 0000:02:00.0

Problem Description:
Recently I downloaded and compiled the 2.6.18.2 kernel because I needed to add 
USB support to the kernel. Previously I ran 2.6.14 kernel patched with sk98lin 
from SysKonnect without any problem for over a year. After the kernel update 
the 88E8053 ethernet interface randomly freezes. The machine have two 3com 
ethernet cards installed. The firts I installed when the machine was new 
because when I used the 88E8053 the ISP:s dhcp server stepped in and activated 
a failsafe and shut down the link because my ethernet interfaces had reported 
over 30 diffrent mac addresses to the server. The second one I installed today 
when I was tierd of that my internal network was down due to the new sky2 bug.

Comment 12 Daniel Berglund 2006-12-02 09:08:29 UTC

Created attachment 9721 [details]
Filtered messages log

Comment 13 Stephen Hemminger 2006-12-04 12:13:29 UTC

*** Bug 7617 has been marked as a duplicate of this bug. ***

Comment 14 Stephen Hemminger 2006-12-04 12:14:50 UTC

*** Bug 7615 has been marked as a duplicate of this bug. ***

Comment 15 Stephen Hemminger 2006-12-05 16:56:56 UTC

Change title to match description

Comment 16 Stephen Hemminger 2006-12-05 16:58:00 UTC

*** Bug 7611 has been marked as a duplicate of this bug. ***

Comment 17 Stephen Hemminger 2006-12-06 15:11:06 UTC

What are the netfilter/iptables rules in use.
I need to audit those code paths.

Comment 18 Badalian Slava 2006-12-06 22:34:55 UTC

ns1 ~ # lsmod
Module                  Size  Used by
iptable_filter          3456  0

ns1 ~ # iptables-save
# Generated by iptables-save v1.3.6 on Thu Dec  7 10:41:39 2006
*filter
:INPUT ACCEPT [198729160:19918983787]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [201042059:31695809779]
COMMIT
# Completed on Thu Dec  7 10:41:39 2006

see config for compiled to kernel modules

Comment 19 Ziga Mlinar 2006-12-07 22:09:14 UTC

Stephen Hemminger, could you write down all commands in bash, that you would 
like to see the output of. I will gladly post it here.

Ziga

Comment 20 Stephen Hemminger 2007-02-23 15:39:59 UTC

Created attachment 10514 [details]
detect and turn off receive checksum

I have seen similar problem on Sony laptop.
This patch automatically turns off hardware rx checksumming if we get a bogus
value.

Comment 21 Vedran Sego 2007-03-10 12:36:54 UTC

I also have been experiencing this problem, under several Fedora Core 6 kernels
(2.6.18 to 2.6.19-1.2911.6.5). No reboot is needed; only
rmmod sky2
ifup eth0
This is a web server with phpBB2 forum and a proxy for only 6-7 people, i.e. the
machine is under no heavy network load (it does have small, but almost constant
load).

Before this started to happen, I could transfer hundreds of MB over the network
in one go with no trouble at all. Under heavy testing, the card started
responding after somewhere around 6GB was transfered (testing was done with 8GB
file).

From dmesg:
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 107 .. 84 report=110 done=110
sky2 status report lost?

grep sky2 /var/log/messages:
Mar 10 20:53:56 xxx kernel: sky2 eth0: tx timeout
Mar 10 20:53:56 xxx kernel: sky2 status report lost?
Mar 10 20:54:01 xxx kernel: sky2 eth0: tx timeout
Mar 10 20:54:01 xxx kernel: sky2 hardware hung? flushing

Ethernet is Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet
Controller (rev 19), built on some Intel motherboard.

Any ideas from which kernel has this started, so I can downgrade until it is fixed?

Comment 22 Vedran Sego 2007-03-10 13:07:40 UTC

Forgot to mention: I've found somewhere that adding idle=poll to kernel boot
options helps sometimes (with additional comment "I don't really understand why
it helps" :-D) and have tried it with no visible improvement.

Heavy load DOES increase probability of network failure. I.e. a simple aMule
client will produce that effect.

Comment 23 Rakhal Dave 2007-03-12 05:46:17 UTC

Distribution: Debian
Linux oft10 2.6.18-3-686 #1 SMP Mon Dec 4 16:41:14 UTC 2006 i686 GNU/Linux

I want to confirm that with Debian 2.6.18-3-686 we were seeing the problem:
Feb 19 10:39:17 oft14 kernel: sky2 v1.5 addr 0xdf100000 irq 169 Yukon-EC 
(0xb6) rev 2
Feb 19 10:39:17 oft14 kernel: sky2 eth0: addr 00:03:25:27:c3:62
Feb 19 10:39:17 oft14 kernel: sky2 v1.5 addr 0xdf200000 irq 177 Yukon-EC 
(0xb6) rev 2
Feb 19 10:39:17 oft14 kernel: sky2 eth1: addr 00:03:25:27:c3:63
Feb 19 10:39:17 oft14 kernel: sky2 eth0: enabling interface
Feb 19 10:39:17 oft14 kernel: sky2 eth0: Link is up at 100 Mbps, full duplex, 
flow control both

Mar  2 17:24:14 oft14 kernel: sky2 hardware hung? flushing
Mar  2 21:55:27 oft14 kernel: sky2 status report lost?
Mar  2 22:12:31 oft14 kernel: sky2 hardware hung? flushing
As you can see above - the lock up occurs after a few days of
high load - seems like a cumulative effect of handling large bandwidths.

Now we have upgraded to Debian 2.6.18-4-686
We are waiting to see if the event recurs and will report it if it does.
(If we see the problem again we would be willing to try different settings 
upon request - to help diagnose the problem).

Comment 24 Vedran Sego 2007-03-12 07:27:38 UTC

Stumbled upon this bug: http://bugs.gentoo.org/show_bug.cgi?id=127367

They claim that the issue (not sure if it was/is the same issue!) was sloved in
sky 1.4. Funny thing is: my FC6 kernel (2.6.19-1.2911.6.5.fc6) has sky 1.10,
while the other machine - running older kernel (2.6.18-1.2798.fc6) has sky 1.5.
Unless .10 is more than .5, this is confusing. The other machine is unter almost
no load, so I cannot confirm that everything is ok.

Version 1.4 is in kernel 2.6.17; 2.6.20 has 1.10.

Btw, a little workaround while the issue arises: add to /etc/crontab:
6-56/10 * * * * root if (ping -c1 -q -w1 ip1 >& /dev/null || ping -c1 -q -w1 ip2
>& /dev/null); then echo -n > /dev/null; else /sbin/rmmod sky2; /sbin/ifup eth0;
date | tee /root/network_restart | /bin/mail -s "Network restart" your@email; fi

ip1, ip2 are IP addresses of some machines that are always up. This chunk pings
those machines every 10 minutes and if none of them is responding, it restarts
sky2 and eth0.

Comment 25 Vedran Sego 2007-03-19 06:07:11 UTC

I've been running kernel 2.6.16.43 now with sky2 version 0.15.1 for 4.5 days now
and had no problems with it (aMule running all the time, so there is a constant
network load which previously was enough to crash sky2 in a metter of hours). It
seems to me that this issue is resolved.

Comment 26 Rakhal Dave 2007-03-21 02:13:10 UTC

Following up on comment#23 I want to report that we again had the sky2 hang 
incident this morning: 21.03.2007 09:10:00 MET
The incident seems to consistently repeat every 10 days or so on our load.
Inspired by comment#24 (thanks Vedran) we have placed the following entry 
in /etc/crontab

*  *    * * *   root    if ! (ping -c1 -q -w2 IP >& /dev/null) 
then /sbin/rmmod sky2;/sbin/modprobe sky2;/usr/bin/mailx -s "Sky2 Restarted" 
EMAIL;fi
[where IP is a machine which is always on - and EMAIL is where you want 
notification]

Comment 27 Rakhal Dave 2007-03-21 02:14:54 UTC

Forgot to mention - we are using: Debian
Linux oft14 2.6.18-4-686 #1 SMP Wed Feb 21 16:06:54 UTC 2007 i686 GNU/Linux

Comment 28 Stephen Hemminger 2007-04-25 11:26:06 UTC

Can you please reproduce problem on a 2.6.20 or later kernel.

Comment 29 H 2007-04-30 15:21:33 UTC

I have the same problem (I believe) under 2.6.21.

See: http://lkml.org/lkml/2007/4/28/105

Comment 30 Daniel Berglund 2007-06-03 09:34:46 UTC

I have updated the network driver on my machine and have now reduced the 
number of network interface hangs to just a few in a month. The bug is still 
there. I do not find any messages in the logs when the hang occurs. Is there 
some thing I have to configure to get the messages (I have the standard 
messages when I start and stop the device and plug and unplug the cable)? I 
bought a manageable switch to rule out that the switch had anything to do with 
the hangs.

atos:/# uname -a
Linux atos 2.6.20.2 #5 PREEMPT Mon Mar 12 00:34:27 CET 2007 i686 GNU/Linux
atos:/# ethtool -i eth2
driver: sky2
version: 1.10
firmware-version: N/A
bus-info: 0000:03:00.0

Comment 31 Stephen Hemminger 2007-06-04 09:08:15 UTC

The remaining problems look like the (long) list reported in another
but, so I am going to mark this bug as a duplicate.



*** This bug has been marked as a duplicate of 7546 ***