Bug 6444 - (net 3c59x) transmit timed out
Summary: (net 3c59x) transmit timed out
Status: RESOLVED DUPLICATE of bug 7440
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Steffen Klassert
URL:
Keywords:
Depends on:
Blocks: 7440
  Show dependency tree
 
Reported: 2006-04-26 04:53 UTC by Petr Sebor
Modified: 2009-03-17 08:57 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.16.7
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
[PATCH] Support ringsize changes with ethtool (16.72 KB, patch)
2008-04-15 02:35 UTC, Steffen Klassert
Details | Diff

Description Petr Sebor 2006-04-26 04:53:05 UTC
Distribution: 
Debian unstable

Hardware Environment:
AMD Athlon(tm) XP 1600+, 512MB mem size
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
0000:00:06.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
        Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
        Flags: bus master, medium devsel, latency 64, IRQ 10
        I/O ports at ec00 [size=128]
        Memory at dfffff80 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at dffc0000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8233 PCI to ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR AGP

/proc/interrupts:
           CPU0
  0:  168625658          XT-PIC  timer
  1:       1381          XT-PIC  i8042
  2:          0          XT-PIC  cascade
 10:  245915268          XT-PIC  eth0
 14:   12461309          XT-PIC  ide0
 15:         27          XT-PIC  ide1
NMI:          0
ERR:          0

Problem Description: 
Kernel displays the following error message in the logs -
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status 8000.
  diagnostics: net 0ccc media 8880 dma 000000a0 fifo 0000
  Flags; bus-master 1, dirty 128746672(0) current 128746688(0)
  Transmit list 1f9b4200 vs. df9b4200.
  0: @df9b4200  length 00000036 status 0c0005ea
  1: @df9b42a0  length 00000036 status 0c0005ea
  2: @df9b4340  length 00000036 status 0c0005ea
  3: @df9b43e0  length 00000036 status 0c0005ea
  4: @df9b4480  length 00000036 status 0c0005ea
  5: @df9b4520  length 00000036 status 0c0005ea
  6: @df9b45c0  length 00000036 status 0c0005ea
  7: @df9b4660  length 00000036 status 0c0005ea
  8: @df9b4700  length 00000036 status 0c0005ea
  9: @df9b47a0  length 00000036 status 0c0005e2
  10: @df9b4840  length 00000042 status 0c0005ea
  11: @df9b48e0  length 00000042 status 0c0005ea
  12: @df9b4980  length 00000036 status 0c0005e2
  13: @df9b4a20  length 800005ea status 0c0005ea
  14: @df9b4ac0  length 80000036 status 80000036
  15: @df9b4b60  length 00000036 status 8c0005e2
eth0: Resetting the Tx ring pointer.

Steps to reproduce:
It is very random, happens under higher network load.
Comment 1 Michel Grentzinger 2006-08-22 05:51:19 UTC
I've the same problem with a "3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 34)".

It appears after a long time of uploading with a slow rate (between 2 hours and
24 hours at 60 Ko/s) or a short time if the rate is higher.

In my systog file :

Aug 22 12:33:41 kayak kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 22 12:33:41 kayak kernel: eth0: transmit timed out, tx_status 00 status e000.
Aug 22 12:33:41 kayak kernel:   diagnostics: net 0cda media 8880 dma 000000a0
fifo 8000
Aug 22 12:33:41 kayak kernel:   Flags; bus-master 1, dirty 117999(15) current
118015(15)
Aug 22 12:33:41 kayak kernel:   Transmit list 1ff7cb60 vs. dff7cb60.
Aug 22 12:33:41 kayak kernel:   0: @dff7c200  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   1: @dff7c2a0  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   2: @dff7c340  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   3: @dff7c3e0  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   4: @dff7c480  length 80000042 status 00000042
Aug 22 12:33:41 kayak kernel:   5: @dff7c520  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   6: @dff7c5c0  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   7: @dff7c660  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   8: @dff7c700  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   9: @dff7c7a0  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   10: @dff7c840  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   11: @dff7c8e0  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   12: @dff7c980  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel:   13: @dff7ca20  length 80000302 status 80000302
Aug 22 12:33:41 kayak kernel:   14: @dff7cac0  length 800005ba status 800005ba
Aug 22 12:33:41 kayak kernel:   15: @dff7cb60  length 800005ba status 000005ba
Aug 22 12:33:41 kayak kernel: eth0: Resetting the Tx ring pointer.

If you want more details abour this, please ask me.

Thanks,
Comment 2 Michel Grentzinger 2006-08-28 00:35:49 UTC
This bug seems appear only when the network card is in promiscuous mode.

Petr, can you confirm ?
Comment 3 Petr Sebor 2006-08-28 01:28:29 UTC
You might have been just lucky with the promiscuous mode. This was certainly not
my case.
Comment 4 Michel Grentzinger 2006-09-20 09:54:24 UTC
Sorry...
I've also the same problem with or without promiscuous mode !
Comment 5 Sebastian Serrano 2006-12-13 04:59:19 UTC
Same problem:
Distribution Debian Sarga (stable)
Kernel: 2.6.16.9

Hardware:
Intel(R) Pentium(R) 4 CPU 2.80GHz

lspci: 
0000:00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32)
0000:00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge
0000:00:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702X
Gigabit Ethernet (rev 02)
0000:00:04.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
0000:00:05.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
0000:00:06.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T
[Marvell] (rev 10)
0000:00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
0000:00:0f.0 ISA bridge: ServerWorks CSB6 South Bridge (rev a0)
0000:00:0f.1 IDE interface: ServerWorks CSB6 RAID/IDE Controller (rev a0)
0000:00:0f.2 USB Controller: ServerWorks CSB6 OHCI USB Controller (rev 05)
0000:00:0f.3 Host bridge: ServerWorks GCLE-2 Host Bridge

/proc/interrupts
  0:   79610834          XT-PIC  timer
  2:          0          XT-PIC  cascade
  3:    3420250          XT-PIC  eth1
  5:          1          XT-PIC  eth2
  8:          4          XT-PIC  rtc
  9:          0          XT-PIC  acpi, skge
 12:    4074200          XT-PIC  eth0
 14:    1756907          XT-PIC  ide0
 15:         12          XT-PIC  ide1
NMI:          0
ERR:    2342688

lsmod:
sch_sfq                 4864  6
sch_htb                14464  2
cls_fw                  4096  3
cls_u32                 7300  10
sch_hfsc               16256  3
sch_prio                4224  5
ipt_TOS                 2048  1
ipt_multiport           2304  5
ipt_layer7             10540  26
xt_MARK                 2432  26
xt_mark                 1664  2
xt_CONNMARK             2176  2
xt_limit                2176  1
xt_tcpudp               3072  2946
xt_state                1792  32
ipt_SAME                2048  2
iptable_mangle          2304  1
iptable_nat             6788  1
iptable_filter          2432  1
ip_tables               9928  3 iptable_mangle,iptable_nat,iptable_filter
x_tables                9220  12
ipt_TOS,ipt_multiport,ipt_layer7,xt_MARK,xt_mark,xt_CONNMARK,xt_limit,xt_tcpudp,xt_state,ipt_SAME,iptable_nat,ip_tables
pcspkr                  2564  0
8250_pnp                8320  0
8250                   18068  1 8250_pnp
serial_core            15872  1 8250
floppy                 55364  0
i2c_piix4               7696  0
i2c_core               16272  1 i2c_piix4
skge                   31888  0
3c59x                  39208  0
tg3                    94980  0
ip_nat_tftp             1536  0
ip_conntrack_tftp       3320  1 ip_nat_tftp
ip_conntrack_proto_sctp     7300  0
ip_nat_pptp             4740  0
ip_conntrack_pptp       8464  1 ip_nat_pptp
ip_nat_irc              2176  0
ip_conntrack_irc        5232  1 ip_nat_irc
ip_nat_ftp              2944  0
ip_conntrack_ftp        6128  1 ip_nat_ftp
ip_nat_amanda           1792  0
ip_nat                 13740  7
ipt_SAME,iptable_nat,ip_nat_tftp,ip_nat_pptp,ip_nat_irc,ip_nat_ftp,ip_nat_amanda
ip_conntrack_amanda     3336  1 ip_nat_amanda
ip_conntrack           44236  15
xt_CONNMARK,xt_state,iptable_nat,ip_nat_tftp,ip_conntrack_tftp,ip_conntrack_proto_sctp,ip_nat_pptp,ip_conntrack_pptp,ip_nat_irc,ip_conntrack_irc,ip_nat_ftp,ip_conntrack_ftp,ip_nat_amanda,ip_nat,ip_conntrack_amanda
nfnetlink               4888  2 ip_nat,ip_conntrack
8021q                  16520  0
psmouse                35336  0
ide_cd                 33696  0
cdrom                  34208  1 ide_cd


Same log on kernel.log:

kernel: NETDEV WATCHDOG: eth2: transmit timed out
kernel: eth2: transmit timed out, tx_status 00 status e000.
kernel:   diagnostics: net 0ccc media 8880 dma 000000a0 fifo 0000
kernel:   Flags; bus-master 1, dirty 166045(13) current 166061(13)
kernel:   Transmit list 36f2da20 vs. f6f2da20.
kernel:   0: @f6f2d200  length 80000036 status 00000036
kernel:   1: @f6f2d2a0  length 8000003e status 0000003e
kernel:   2: @f6f2d340  length 80000036 status 00000036
kernel:   3: @f6f2d3e0  length 8000003e status 0000003e
kernel:   4: @f6f2d480  length 8000003e status 0000003e
kernel:   5: @f6f2d520  length 800005ea status 000005ea
kernel:   6: @f6f2d5c0  length 800005ea status 000005ea
kernel:   7: @f6f2d660  length 8000003e status 0000003e
kernel:   8: @f6f2d700  length 800005ae status 000005ae
kernel:   9: @f6f2d7a0  length 800005ea status 000005ea
kernel:   10: @f6f2d840  length 80000036 status 00000036
kernel:   11: @f6f2d8e0  length 800000ca status 800000ca
kernel:   12: @f6f2d980  length 800000d9 status 800000d9
kernel:   13: @f6f2da20  length 80000a8d status 00000a8d
kernel:   14: @f6f2dac0  length 800005ea status 000005ea
kernel:   15: @f6f2db60  length 800005ea status 000005ea
kernel: eth2: Resetting the Tx ring pointer.


The NICs failing are eth1 and eth2, eth3 never fail but it has very little traffic.

I have a second server with the same hardware but kernel 2.6.15 without problems.
Comment 6 Natalie Protasevich 2007-07-19 00:20:36 UTC
How does it work with new kernels?
Can you please do git bisect (from 2.6.15 to 2.6.16.X) if the problem is still there.
Thanks.
Comment 7 Michel Grentzinger 2007-08-24 05:32:30 UTC
Ok, I will try in few days.
I havn't use this network card for a long time since I've changed my server. I will try with kernel 2.6.18.
Comment 8 Adryan Ban 2007-10-07 07:34:35 UTC
I've got the same problem. I run tests on kernel 2.6.18, 2.6.20 and 2.6.22 with or without APIC, after a while my network card stop respondin and give me the same messages.

I guess that is a bug in 3c59x driver! I had 5 PCI 3COM cards 905B and all make same problem. Now i'm using Intel Pro 100, and there ar no problems.
Comment 9 Adryan Ban 2007-10-07 07:37:56 UTC
BTW: i changed the MainBoard, i tested those card on AMD K8 AM2 with nVidia chipset, on my AMD K7 Athlon XP 1800+ with VIA KT400 and KT600. Same problems. I have an dual CPU Intel PIII 1000Mhz with VIA Chipset.

I'll make a test on this system, if the problem occours .. is a driver problem.
Comment 10 Adryan Ban 2007-10-08 04:36:45 UTC
The problem occours from some parameters in driver: TX_RING_SIZE and RX_RING_SIZE. Also tne max_interrupt_work is a problem even for a desktop station! Those parameters are so small and occours all problems. I thing is time that new version of kernels to update the parameter of driver to those values:

RX_RING_SIZE = 256
TX_RING_SIZE = 256

max_interrupt_work = 1024; /* or 2048 */

The bug is also describe here: http://bugzilla.kernel.org/show_bug.cgi?id=7440

I build a patch and it available on my website : http://linux.mantech.ro

I run the NIC with modified driver for 1 day with those parameters, high load and there is no sign of problems :). Hope that is the end of my problems with this NICs.
Comment 11 Adryan Ban 2007-10-18 09:28:07 UTC
I found another problem. TX Ring resets when WATCHDOG expires. If i used an watchdog value larger that 5000 ms (5seconds), about 10000 ms, seems that driver works fine.

check to this message:

kernel: NETDEV WATCHDOG: eth0: transmit timed out

Rise up value of watchdog when you load the driver (see document vortex.txt from Documentation dir in kernel source)
Comment 12 Roc Vallès 2008-03-04 16:02:57 UTC
Mar  3 09:42:37 hiyono [25556.647041] NETDEV WATCHDOG: eth1: transmit timed out
Mar  3 09:42:37 hiyono [25556.647057] eth1: transmit timed out, tx_status 00 status 8601.
Mar  3 09:42:37 hiyono [25556.647070]   diagnostics: net 0cd2 media 8880 dma 0000003a fifo 0000
Mar  3 09:42:37 hiyono [25556.647080] eth1: Interrupt posted but not delivered -- IRQ blocked by another device?
Mar  3 09:42:37 hiyono [25556.647389]   Flags; bus-master 1, dirty 2041697(1) current 2041697(1)
Mar  3 09:42:37 hiyono [25556.647399]   Transmit list 00000000 vs. fffff800fb0f4260.
Mar  3 09:42:37 hiyono [25556.647409]   0: @fffff800fb0f4200  length 8000004d status 8c01004d
Mar  3 09:42:37 hiyono [25556.647416]   1: @fffff800fb0f4260  length 8000006c status 0001006c
Mar  3 09:42:37 hiyono [25556.647424]   2: @fffff800fb0f42c0  length 8000008e status 0001008e
Mar  3 09:42:37 hiyono [25556.647431]   3: @fffff800fb0f4320  length 8000004a status 0001004a
Mar  3 09:42:37 hiyono [25556.647439]   4: @fffff800fb0f4380  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647447]   5: @fffff800fb0f43e0  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647454]   6: @fffff800fb0f4440  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647462]   7: @fffff800fb0f44a0  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647470]   8: @fffff800fb0f4500  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647477]   9: @fffff800fb0f4560  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647485]   10: @fffff800fb0f45c0  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647493]   11: @fffff800fb0f4620  length 8000004b status 0c01004b
Mar  3 09:42:37 hiyono [25556.647500]   12: @fffff800fb0f4680  length 80000050 status 0c010050
Mar  3 09:42:37 hiyono [25556.647508]   13: @fffff800fb0f46e0  length 80000050 status 0c010050
Mar  3 09:42:37 hiyono [25556.647515]   14: @fffff800fb0f4740  length 8000004d status 0c01004d
Mar  3 09:42:37 hiyono [25556.647523]   15: @fffff800fb0f47a0  length 8000004d status 8c01004d
Mar  3 09:42:37 hiyono [25556.647533] eth1: Resetting the Tx ring pointer.

This is on sparc64. Didn't happen with 2.6.21, happens with 2.6.24.3; once it happens a reboot is needed :/.
Comment 13 Natalie Protasevich 2008-03-20 12:30:44 UTC
Adryan, since the problem is still there, is it possible for you to post the patch to netdev@vger.kernel.org and/or lkml?
Comment 14 Steffen Klassert 2008-03-21 00:13:02 UTC
The problem of Roc looks different. In his case the IntLatch flag is set, what 
means that there are pending interrupts and this seems to be not the others case. 

The line

eth1: Interrupt posted but not delivered -- IRQ blocked by another device?

indicates that.

The drivers ISR routine is probaply just not executed for some reason. This is
already the second report for this kind of problem for a sparc. It came in
between 2.6.23-rc4 and 2.6.23.1. Roc, can you confirm this?
Perhaps something in the generic inerrupt handling of the sparc changed,
I'm going to ask the sparc people about that.

Regarding the problem of the others, unfortunately I'm not able to reproduce it,
even thought I tried hard. So it would be very helpfull if somebody who see
this problem frequently could bisect it. Perhaps we need the patch of Adryan,
but we should not apply anything without knowing why it is necesarry. The driver
supports a lot of different NICs and it is therefore pretty easy to fix 
something at one end and break something at the other. This would not be the first time in the long history of the driver.
Comment 15 Steffen Klassert 2008-04-15 02:35:22 UTC
Created attachment 15757 [details]
[PATCH] Support ringsize changes with ethtool

Ok, I think it is time to find a solution here. I attached a patch to support
ringsize changes via ethtool. So everybody who has problems with the standard
values can set the ringsize from userspace. I wrote that patch yesterday, so it
has just some basic testing but it appeared to work for me. These changes made
it necessary to adapt the suspend/resume code of the driver too. This is in the
patch included but completely untested yet, so be aware if you use 
suspend/resume.

The maximum value for the tx ring is 512 and for the rx ring 1024 but we can
change this to other values if needed. The standard values are left untouched 
for now, but we can change them later.

It would be great if somebody could test the attached patch. It is based on
git current, but should apply to 2.6.24 too.

One more thing to spot the source of the problem, could somebody please try to
set
#define tx_interrupt_mitigation 0
in the drivers code and see whether this changes something?
Comment 16 Roland Kletzing 2008-12-31 05:52:06 UTC
what`s the status of this patch? 
has this been brought up on the netdev ML in the meantime?
Comment 17 Alan 2009-03-17 08:57:10 UTC
Folding into 7440


*** This bug has been marked as a duplicate of bug 7440 ***

Note You need to log in before you can comment on or make changes to this bug.