Distribution: Debian unstable Hardware Environment: AMD Athlon(tm) XP 1600+, 512MB mem size 0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP] 0000:00:06.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78) Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC Flags: bus master, medium devsel, latency 64, IRQ 10 I/O ports at ec00 [size=128] Memory at dfffff80 (32-bit, non-prefetchable) [size=128] Expansion ROM at dffc0000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8233 PCI to ISA Bridge 0000:00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 0000:01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR AGP /proc/interrupts: CPU0 0: 168625658 XT-PIC timer 1: 1381 XT-PIC i8042 2: 0 XT-PIC cascade 10: 245915268 XT-PIC eth0 14: 12461309 XT-PIC ide0 15: 27 XT-PIC ide1 NMI: 0 ERR: 0 Problem Description: Kernel displays the following error message in the logs - NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out, tx_status 00 status 8000. diagnostics: net 0ccc media 8880 dma 000000a0 fifo 0000 Flags; bus-master 1, dirty 128746672(0) current 128746688(0) Transmit list 1f9b4200 vs. df9b4200. 0: @df9b4200 length 00000036 status 0c0005ea 1: @df9b42a0 length 00000036 status 0c0005ea 2: @df9b4340 length 00000036 status 0c0005ea 3: @df9b43e0 length 00000036 status 0c0005ea 4: @df9b4480 length 00000036 status 0c0005ea 5: @df9b4520 length 00000036 status 0c0005ea 6: @df9b45c0 length 00000036 status 0c0005ea 7: @df9b4660 length 00000036 status 0c0005ea 8: @df9b4700 length 00000036 status 0c0005ea 9: @df9b47a0 length 00000036 status 0c0005e2 10: @df9b4840 length 00000042 status 0c0005ea 11: @df9b48e0 length 00000042 status 0c0005ea 12: @df9b4980 length 00000036 status 0c0005e2 13: @df9b4a20 length 800005ea status 0c0005ea 14: @df9b4ac0 length 80000036 status 80000036 15: @df9b4b60 length 00000036 status 8c0005e2 eth0: Resetting the Tx ring pointer. Steps to reproduce: It is very random, happens under higher network load.
I've the same problem with a "3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 34)". It appears after a long time of uploading with a slow rate (between 2 hours and 24 hours at 60 Ko/s) or a short time if the rate is higher. In my systog file : Aug 22 12:33:41 kayak kernel: NETDEV WATCHDOG: eth0: transmit timed out Aug 22 12:33:41 kayak kernel: eth0: transmit timed out, tx_status 00 status e000. Aug 22 12:33:41 kayak kernel: diagnostics: net 0cda media 8880 dma 000000a0 fifo 8000 Aug 22 12:33:41 kayak kernel: Flags; bus-master 1, dirty 117999(15) current 118015(15) Aug 22 12:33:41 kayak kernel: Transmit list 1ff7cb60 vs. dff7cb60. Aug 22 12:33:41 kayak kernel: 0: @dff7c200 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 1: @dff7c2a0 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 2: @dff7c340 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 3: @dff7c3e0 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 4: @dff7c480 length 80000042 status 00000042 Aug 22 12:33:41 kayak kernel: 5: @dff7c520 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 6: @dff7c5c0 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 7: @dff7c660 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 8: @dff7c700 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 9: @dff7c7a0 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 10: @dff7c840 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 11: @dff7c8e0 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 12: @dff7c980 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: 13: @dff7ca20 length 80000302 status 80000302 Aug 22 12:33:41 kayak kernel: 14: @dff7cac0 length 800005ba status 800005ba Aug 22 12:33:41 kayak kernel: 15: @dff7cb60 length 800005ba status 000005ba Aug 22 12:33:41 kayak kernel: eth0: Resetting the Tx ring pointer. If you want more details abour this, please ask me. Thanks,
This bug seems appear only when the network card is in promiscuous mode. Petr, can you confirm ?
You might have been just lucky with the promiscuous mode. This was certainly not my case.
Sorry... I've also the same problem with or without promiscuous mode !
Same problem: Distribution Debian Sarga (stable) Kernel: 2.6.16.9 Hardware: Intel(R) Pentium(R) 4 CPU 2.80GHz lspci: 0000:00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32) 0000:00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge 0000:00:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702X Gigabit Ethernet (rev 02) 0000:00:04.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78) 0000:00:05.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78) 0000:00:06.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 10) 0000:00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 0000:00:0f.0 ISA bridge: ServerWorks CSB6 South Bridge (rev a0) 0000:00:0f.1 IDE interface: ServerWorks CSB6 RAID/IDE Controller (rev a0) 0000:00:0f.2 USB Controller: ServerWorks CSB6 OHCI USB Controller (rev 05) 0000:00:0f.3 Host bridge: ServerWorks GCLE-2 Host Bridge /proc/interrupts 0: 79610834 XT-PIC timer 2: 0 XT-PIC cascade 3: 3420250 XT-PIC eth1 5: 1 XT-PIC eth2 8: 4 XT-PIC rtc 9: 0 XT-PIC acpi, skge 12: 4074200 XT-PIC eth0 14: 1756907 XT-PIC ide0 15: 12 XT-PIC ide1 NMI: 0 ERR: 2342688 lsmod: sch_sfq 4864 6 sch_htb 14464 2 cls_fw 4096 3 cls_u32 7300 10 sch_hfsc 16256 3 sch_prio 4224 5 ipt_TOS 2048 1 ipt_multiport 2304 5 ipt_layer7 10540 26 xt_MARK 2432 26 xt_mark 1664 2 xt_CONNMARK 2176 2 xt_limit 2176 1 xt_tcpudp 3072 2946 xt_state 1792 32 ipt_SAME 2048 2 iptable_mangle 2304 1 iptable_nat 6788 1 iptable_filter 2432 1 ip_tables 9928 3 iptable_mangle,iptable_nat,iptable_filter x_tables 9220 12 ipt_TOS,ipt_multiport,ipt_layer7,xt_MARK,xt_mark,xt_CONNMARK,xt_limit,xt_tcpudp,xt_state,ipt_SAME,iptable_nat,ip_tables pcspkr 2564 0 8250_pnp 8320 0 8250 18068 1 8250_pnp serial_core 15872 1 8250 floppy 55364 0 i2c_piix4 7696 0 i2c_core 16272 1 i2c_piix4 skge 31888 0 3c59x 39208 0 tg3 94980 0 ip_nat_tftp 1536 0 ip_conntrack_tftp 3320 1 ip_nat_tftp ip_conntrack_proto_sctp 7300 0 ip_nat_pptp 4740 0 ip_conntrack_pptp 8464 1 ip_nat_pptp ip_nat_irc 2176 0 ip_conntrack_irc 5232 1 ip_nat_irc ip_nat_ftp 2944 0 ip_conntrack_ftp 6128 1 ip_nat_ftp ip_nat_amanda 1792 0 ip_nat 13740 7 ipt_SAME,iptable_nat,ip_nat_tftp,ip_nat_pptp,ip_nat_irc,ip_nat_ftp,ip_nat_amanda ip_conntrack_amanda 3336 1 ip_nat_amanda ip_conntrack 44236 15 xt_CONNMARK,xt_state,iptable_nat,ip_nat_tftp,ip_conntrack_tftp,ip_conntrack_proto_sctp,ip_nat_pptp,ip_conntrack_pptp,ip_nat_irc,ip_conntrack_irc,ip_nat_ftp,ip_conntrack_ftp,ip_nat_amanda,ip_nat,ip_conntrack_amanda nfnetlink 4888 2 ip_nat,ip_conntrack 8021q 16520 0 psmouse 35336 0 ide_cd 33696 0 cdrom 34208 1 ide_cd Same log on kernel.log: kernel: NETDEV WATCHDOG: eth2: transmit timed out kernel: eth2: transmit timed out, tx_status 00 status e000. kernel: diagnostics: net 0ccc media 8880 dma 000000a0 fifo 0000 kernel: Flags; bus-master 1, dirty 166045(13) current 166061(13) kernel: Transmit list 36f2da20 vs. f6f2da20. kernel: 0: @f6f2d200 length 80000036 status 00000036 kernel: 1: @f6f2d2a0 length 8000003e status 0000003e kernel: 2: @f6f2d340 length 80000036 status 00000036 kernel: 3: @f6f2d3e0 length 8000003e status 0000003e kernel: 4: @f6f2d480 length 8000003e status 0000003e kernel: 5: @f6f2d520 length 800005ea status 000005ea kernel: 6: @f6f2d5c0 length 800005ea status 000005ea kernel: 7: @f6f2d660 length 8000003e status 0000003e kernel: 8: @f6f2d700 length 800005ae status 000005ae kernel: 9: @f6f2d7a0 length 800005ea status 000005ea kernel: 10: @f6f2d840 length 80000036 status 00000036 kernel: 11: @f6f2d8e0 length 800000ca status 800000ca kernel: 12: @f6f2d980 length 800000d9 status 800000d9 kernel: 13: @f6f2da20 length 80000a8d status 00000a8d kernel: 14: @f6f2dac0 length 800005ea status 000005ea kernel: 15: @f6f2db60 length 800005ea status 000005ea kernel: eth2: Resetting the Tx ring pointer. The NICs failing are eth1 and eth2, eth3 never fail but it has very little traffic. I have a second server with the same hardware but kernel 2.6.15 without problems.
How does it work with new kernels? Can you please do git bisect (from 2.6.15 to 2.6.16.X) if the problem is still there. Thanks.
Ok, I will try in few days. I havn't use this network card for a long time since I've changed my server. I will try with kernel 2.6.18.
I've got the same problem. I run tests on kernel 2.6.18, 2.6.20 and 2.6.22 with or without APIC, after a while my network card stop respondin and give me the same messages. I guess that is a bug in 3c59x driver! I had 5 PCI 3COM cards 905B and all make same problem. Now i'm using Intel Pro 100, and there ar no problems.
BTW: i changed the MainBoard, i tested those card on AMD K8 AM2 with nVidia chipset, on my AMD K7 Athlon XP 1800+ with VIA KT400 and KT600. Same problems. I have an dual CPU Intel PIII 1000Mhz with VIA Chipset. I'll make a test on this system, if the problem occours .. is a driver problem.
The problem occours from some parameters in driver: TX_RING_SIZE and RX_RING_SIZE. Also tne max_interrupt_work is a problem even for a desktop station! Those parameters are so small and occours all problems. I thing is time that new version of kernels to update the parameter of driver to those values: RX_RING_SIZE = 256 TX_RING_SIZE = 256 max_interrupt_work = 1024; /* or 2048 */ The bug is also describe here: http://bugzilla.kernel.org/show_bug.cgi?id=7440 I build a patch and it available on my website : http://linux.mantech.ro I run the NIC with modified driver for 1 day with those parameters, high load and there is no sign of problems :). Hope that is the end of my problems with this NICs.
I found another problem. TX Ring resets when WATCHDOG expires. If i used an watchdog value larger that 5000 ms (5seconds), about 10000 ms, seems that driver works fine. check to this message: kernel: NETDEV WATCHDOG: eth0: transmit timed out Rise up value of watchdog when you load the driver (see document vortex.txt from Documentation dir in kernel source)
Mar 3 09:42:37 hiyono [25556.647041] NETDEV WATCHDOG: eth1: transmit timed out Mar 3 09:42:37 hiyono [25556.647057] eth1: transmit timed out, tx_status 00 status 8601. Mar 3 09:42:37 hiyono [25556.647070] diagnostics: net 0cd2 media 8880 dma 0000003a fifo 0000 Mar 3 09:42:37 hiyono [25556.647080] eth1: Interrupt posted but not delivered -- IRQ blocked by another device? Mar 3 09:42:37 hiyono [25556.647389] Flags; bus-master 1, dirty 2041697(1) current 2041697(1) Mar 3 09:42:37 hiyono [25556.647399] Transmit list 00000000 vs. fffff800fb0f4260. Mar 3 09:42:37 hiyono [25556.647409] 0: @fffff800fb0f4200 length 8000004d status 8c01004d Mar 3 09:42:37 hiyono [25556.647416] 1: @fffff800fb0f4260 length 8000006c status 0001006c Mar 3 09:42:37 hiyono [25556.647424] 2: @fffff800fb0f42c0 length 8000008e status 0001008e Mar 3 09:42:37 hiyono [25556.647431] 3: @fffff800fb0f4320 length 8000004a status 0001004a Mar 3 09:42:37 hiyono [25556.647439] 4: @fffff800fb0f4380 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647447] 5: @fffff800fb0f43e0 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647454] 6: @fffff800fb0f4440 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647462] 7: @fffff800fb0f44a0 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647470] 8: @fffff800fb0f4500 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647477] 9: @fffff800fb0f4560 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647485] 10: @fffff800fb0f45c0 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647493] 11: @fffff800fb0f4620 length 8000004b status 0c01004b Mar 3 09:42:37 hiyono [25556.647500] 12: @fffff800fb0f4680 length 80000050 status 0c010050 Mar 3 09:42:37 hiyono [25556.647508] 13: @fffff800fb0f46e0 length 80000050 status 0c010050 Mar 3 09:42:37 hiyono [25556.647515] 14: @fffff800fb0f4740 length 8000004d status 0c01004d Mar 3 09:42:37 hiyono [25556.647523] 15: @fffff800fb0f47a0 length 8000004d status 8c01004d Mar 3 09:42:37 hiyono [25556.647533] eth1: Resetting the Tx ring pointer. This is on sparc64. Didn't happen with 2.6.21, happens with 2.6.24.3; once it happens a reboot is needed :/.
Adryan, since the problem is still there, is it possible for you to post the patch to netdev@vger.kernel.org and/or lkml?
The problem of Roc looks different. In his case the IntLatch flag is set, what means that there are pending interrupts and this seems to be not the others case. The line eth1: Interrupt posted but not delivered -- IRQ blocked by another device? indicates that. The drivers ISR routine is probaply just not executed for some reason. This is already the second report for this kind of problem for a sparc. It came in between 2.6.23-rc4 and 2.6.23.1. Roc, can you confirm this? Perhaps something in the generic inerrupt handling of the sparc changed, I'm going to ask the sparc people about that. Regarding the problem of the others, unfortunately I'm not able to reproduce it, even thought I tried hard. So it would be very helpfull if somebody who see this problem frequently could bisect it. Perhaps we need the patch of Adryan, but we should not apply anything without knowing why it is necesarry. The driver supports a lot of different NICs and it is therefore pretty easy to fix something at one end and break something at the other. This would not be the first time in the long history of the driver.
Created attachment 15757 [details] [PATCH] Support ringsize changes with ethtool Ok, I think it is time to find a solution here. I attached a patch to support ringsize changes via ethtool. So everybody who has problems with the standard values can set the ringsize from userspace. I wrote that patch yesterday, so it has just some basic testing but it appeared to work for me. These changes made it necessary to adapt the suspend/resume code of the driver too. This is in the patch included but completely untested yet, so be aware if you use suspend/resume. The maximum value for the tx ring is 512 and for the rx ring 1024 but we can change this to other values if needed. The standard values are left untouched for now, but we can change them later. It would be great if somebody could test the attached patch. It is based on git current, but should apply to 2.6.24 too. One more thing to spot the source of the problem, could somebody please try to set #define tx_interrupt_mitigation 0 in the drivers code and see whether this changes something?
what`s the status of this patch? has this been brought up on the netdev ML in the meantime?
Folding into 7440 *** This bug has been marked as a duplicate of bug 7440 ***