Distribution: Gentoo Hardware Environment: AMD Athlon64 3200+, K8N Neo4 Platinum (nForce4 chipset) Software Environment: forcedeth kernel module Problem Description: nForce onboard NIC sometimes stop working and remains blocked also after reboot. No packets is coming in or out through this NIC. In the /var/log/messages the errors appears: Apr 24 21:44:30 amit NETDEV WATCHDOG: eth0: transmit timed out Apr 24 21:44:30 amit nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! Apr 24 21:44:30 amit Badness in local_bh_enable at kernel/softirq.c:140 Apr 24 21:44:30 amit Apr 24 21:44:30 amit Call Trace: <IRQ> <ffffffff80135905>{local_bh_enable+53} <ffffffff880e3d35>{:ip_conntrack:destroy_conntrack+53} Apr 24 21:44:30 amit <ffffffff802bf764>{__kfree_skb+196} <ffffffff88000535>{:forcedeth:nv_drain_tx+133} Apr 24 21:44:30 amit <ffffffff880008ed>{:forcedeth:nv_tx_timeout+93} <ffffffff801e7ed0>{cursor_timer_handler+0} Apr 24 21:44:30 amit <ffffffff802d2350>{dev_watchdog+0} <ffffffff802d23b3>{dev_watchdog+99} Apr 24 21:44:30 amit <ffffffff801390de>{run_timer_softirq+366} <ffffffff80135833>{__do_softirq+83} Apr 24 21:44:30 amit <ffffffff801358c5>{do_softirq+53} <ffffffff801110b7>{do_IRQ+71} Apr 24 21:44:30 amit <ffffffff8010ec69>{ret_from_intr+0} <EOI> <ffffffff8030e184>{thread_return+0} Apr 24 21:44:30 amit <ffffffff88072aa0>{:parport:parport_ieee1284_write_compat+0} Apr 24 21:44:30 amit <ffffffff8010ca60>{default_idle+0} <ffffffff8010ca80>{default_idle+32} Apr 24 21:44:30 amit <ffffffff8010cb91>{cpu_idle+49} <ffffffff804f47cf>{start_kernel+463} Apr 24 21:44:30 amit <ffffffff804f4263>{_sinittext+611} Apr 24 21:44:30 amit Badness in local_bh_enable at kernel/softirq.c:140 Apr 24 21:44:30 amit Apr 24 21:44:30 amit Call Trace: <IRQ> <ffffffff80135905>{local_bh_enable+53} <ffffffff880e3d9f>{:ip_conntrack:destroy_conntrack+159} Apr 24 21:44:30 amit <ffffffff802bf764>{__kfree_skb+196} <ffffffff88000535>{:forcedeth:nv_drain_tx+133} Apr 24 21:44:30 amit <ffffffff880008ed>{:forcedeth:nv_tx_timeout+93} <ffffffff801e7ed0>{cursor_timer_handler+0} Apr 24 21:44:30 amit <ffffffff802d2350>{dev_watchdog+0} <ffffffff802d23b3>{dev_watchdog+99} Apr 24 21:44:30 amit <ffffffff801390de>{run_timer_softirq+366} <ffffffff80135833>{__do_softirq+83} Apr 24 21:44:30 amit <ffffffff801358c5>{do_softirq+53} <ffffffff801110b7>{do_IRQ+71} Apr 24 21:44:30 amit <ffffffff8010ec69>{ret_from_intr+0} <EOI> <ffffffff8030e184>{thread_return+0} Apr 24 21:44:30 amit <ffffffff88072aa0>{:parport:parport_ieee1284_write_compat+0} Apr 24 21:44:30 amit <ffffffff8010ca60>{default_idle+0} <ffffffff8010ca80>{default_idle+32} Apr 24 21:44:30 amit <ffffffff8010cb91>{cpu_idle+49} <ffffffff804f47cf>{start_kernel+463} Apr 24 21:44:30 amit <ffffffff804f4263>{_sinittext+611} The only solution how to get out of this blocked state is to turn off computer completely (plug out it from electricity). For me it happens 1 or 2 times in a day. It depends on network traffic (lot of traffic - bigger chance to block the NIC). The kernels that i've tried: * Gentoo patched kernel 2.6.9-gentoo-r14 - work fine * Gentoo patched kernel 2.6.11-gentoo-r6 - bug is present * Vanilla kernel 2.6.12_rc3 - bug is present See bugreport to gentoo bugzilla: http://bugs.gentoo.org/show_bug.cgi?id=90069 And also other sources describing this bug: http://forums.gentoo.org/viewtopic-t-320241.html http://forums.gentoo.org/viewtopic-t-318214.html http://forums.gentoo.org/viewtopic-t-310223.html http://www.ussg.iu.edu/hypermail/linux/kernel/0502.0/0219.html Steps to reproduce: 1. boot linux 2. make some network traffic (copy big file from/to network) - (it is not easy to reproduce)
I've tried to reproduce the bug, without success. 10 hours at 80 MByte/sec, no hang with the latest kernel (2.6.12-rc5-git10) on an nForce 250-Gb board. Could you add a few more details? - what's the link partner? A switch or a cross-over cable to another nic? - at which link speed do you operate? Gigabit or 100 mbit? - Could you send me (manfred@colorfullife.com) the source code from forcedeth.c from 2.6.9-gentoo-r14? I'm not aware of a change that might have introduced the regression.
Computer is connected to 10/100 Mbit switch whith straight thru cabel. Speed is 100 Mbit. I've sended the forcedeth.c from 2.6.9-gentoo-r14 to you.
Distribution: Gentoo Hardware Environment: nForce4 chipset on 2 different systems (see bellow for details) Software Environment: forcedeth kernel module Problem Description: I have to second this problem. I have 2 computers with this problem. One is an Asus K8N (nforce4 chipset, Athlon64 CPU). When I'm using the nvidia NIC (the motherboard also have a sk98lin NIC built-in) I sometime have the problem. I got it ever since I have a kernel version of 2.6.11 (all gentoo-sources version I tried, I moved from a 2.6.9 to a 2.6.11, never tried 2.6.10). I had it when my card was plugged using a straight-through cable directly to a 100 Mbit/s card, and when it is plugged using a straight-through cable to a 1000 Mbit/s switch (netgear). The second is a Tyan Thunder K8WE (S2895) (nforce4 chipset, dual dual-core Opteron, this board have 2 NIC from two nvidia parts ("1st from nForce™ Prof. 2200, 2nd from nForce Prof. 2050" (http://www.tyan.com/products/html/thunderk8we_spec.html)) each connected to a cpu). The problem occured just 2 days ago (the system is only 3 week old). It currently only affect the first NIC, the second one still work properly. The first NIC is connected to a 1000Mbit/s switch (I can't check the brand right now, but if needed I could), the second to another computer on it's second port of a Broadcom BCM5704C dual-channel integrated NIC (tg3 driver), both uses straight-through cables. The problem on this system was noticed about 2h (but could have happened before) after we had a night long shutdown (normally it stays on but we had to turn it off since the building A/C was being repaired). On power on it booted a 2.6.12-rc6 kernel (from a 2.6.12-rc5, downloaded off kernel.org website, and NOT a gentoo kernel (we had to get 2.6.12 kernel to get the system working with the Interrupt Mode (bios option) set to APIC (default) (2.6.11 gentoo-sources was only working using Interrupt Mode=PIC, but it had a lot of lag)). In both cases, when the problem occur it get (in the dmesg log): NETDEV WATCHDOG: eth0: transmit timed out nv_stop_tx: TransmitterStatus remained busy and after some time: nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries! The "NETDEV WATCHDOG: eth0: transmit timed out" happen from time to time, but usually when I try to restart the adapter. Otherwise the "nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries!" line can start appearing more and more, but at different rates (it looks like it depends on how much is tried to being transmitted). I've noticed that power cycling (unplug-plug back in the power), as stated previously corrects the problem (until it happens again). Also, on one or two occasion, I had only removed the network cable and did a soft reboot and it worked. This problem is intermittend. I sometimes use the Asus system for 1h before I get the problem, other time I can have it working for over a week without any problem. But I can't say it's related to high load, it seems to also happen on low load. One difference with the first report, it seems that sometimes the NIC is still able to receive some packets (only a few) when this problem occur, but it's not transmitting anything.
Correction, it is an Asus A8N (not K8N, specifically the A8N-Sli delux (still nforce 4 chipset)), sorry for the typo.
I have a MSI K8N Diamond with all the same problems and errors under Debian 2.6.11-1-k7.
Same problem here on a pristine 2.6.12 on a Asus A8N-E board, it has an nforce4 chipset with a marvell PHY. The machine is running Debian Sarge AMD64 version and is connected to a gigabit switch, but this can happen when it is connected using a direct cable to another nic or whatever. If you need more info just ask. Regards!
Created attachment 5234 [details] Add statistics to forcedeth tx handler
Hi all. It's a known bug, but unfortunately I do not understand what exactly causes the hang. It would be great if you could the following: - bring the nic into the hung state. - rmmod the driver. - Enable all debug outputs (The change is in line 124: replace "#if 0" with "#if 1"), recompile the nic, load the new driver and try to use the nic. - Attach the complete dmesg output. Other interesting tests might be to use version 0.35 of the forcedeth driver [it should be in the latest git tree], to use the attached patch or to double check that really both packet receive and transmit to not work: ping from another computer and check with "tcpdump -n" if any packet is received. But a full debug output would be the best starting point for me.
FYI gentoo-sources-2.6.12-r2 includes forcedeth 0.35
Created attachment 5242 [details] dmesg with debug info This is the dmesg of linux-2.6.12-gentoo-r2. The network operated as normal until: nv_stop_tx: TransmitterStatus remained busy After the freeze-up I removed forcedeth, and inserted the debug version. I rebooted to Windows XP (SP2+nforce drivers) and the network card was still inoperative. Did the power-down/plug out trick. The network socket's LEDs were both lit without blinking until I plugged out the power cord. They were on even when I plugged out the network cable! After the power-down/plug out trick, the interface came back up normally. Don't hesitate to ask me to experiment on my hardware... I work all day on my PC and I experience network hangups about 2-4 times a day.
Thanks. Unfortunately you didn't wait long enough with the debug version: The initialization worked as expected. The link was detected. packet receive worked. Packets were queued for transmission. The only this that is missing is either the timeout from the packet transmission or the tx done interrupt. The timeout is 5 seconds: ping another computer, wait 5 seconds and then check for a line that starts with "nv_tx_done: looking at packet" in the dmesg dump.
Created attachment 5285 [details] Forcedeth with debug info, pinging out This is take 2 of the dmesg dump.... I've actually concatenated a bunch of dmesgs together to get this one, so it's long (deleted the duplicate lines by hand so it *should* be just like one long dmesg). It's about 10-20 seconds of me pinging out on a hung-up forcedeth connection.
Created attachment 5286 [details] Forcedeth with debug info, being pinged Like above, but I stopped the outgoing ping and started an incoming ping from another PC on the net.
Created attachment 5287 [details] Forcedeth tcpdump -n -i eth0 output Finally, while the interface was "hung" I did a tcpdump -n -i eth0 to confirm that data is actually received. Therefore, it looks like it's mainly outgoing packets that are affected. Which might explain why my NIC didn't hang yesterday: nobody copied from me...
Created attachment 5290 [details] unconditionally check for completed packets Thanks for the dmesg output - finally progress on that bug. Could you again use the stock driver until the nic hangs, and then load a nic driver with - dprintk enabled - the attached patch applied? The patch will show if the tx engine really hangs, or if it just doesn't generate interrupts.
Created attachment 5294 [details] Forcedeth with debug info, pinging out (added debug patch) Another day, another hang... ;-) Here's the dmesg with the patched forcedeth. Now it shows: eth0: nv_tx_done: looking at packet 0, Flags 0xa0800059.
Could you please output all the MAC registers (from 0x0 to 0x400) and the whole Tx ring after the hang? Thanks!
How would I do that? I assume I use something like ethtool, or if you give me a little C program I can run it too...
Created attachment 5303 [details] dump MAC registers and tx ring on timeout Unfortunately forcedeth doesn't support the ethtool register dump command yet. I've written a patch that dumps everything interesting on a tx timeout. Could you add this patch to a normal nic driver and then use it until it hangs? Instead of the "normal" error message that you got so far, i.e. NETDEV WATCHDOG: eth1: transmit timed out eth1: Got tx_timeout. irq: 00000000 , the patch driver dumps around 100 lines with all registers and all tx entries. Please send us that part of your dmesg file. Thanks for your patience!
Created attachment 5329 [details] Register dump of the hung NIC FINALLY! I've been running the patch since Monday, and this morning the NIC hung! Attached, please find the register dump... Sorry, the hangs *seem* less frequent than they used to be...
Created attachment 5330 [details] Another register dump of the hung NIC Nope, it still hangs randomly. Another hang, another register dump, just in case having two dumps might be useful.
Thanks for the debug output. I see some changes that can be made and will work with Manfred to give you a patch to try out.
Hi Ayaz, The attached patch is what you want, correct? The timeout was 5 seconds - that should be long enough to guarantee that the queue is drained. -- Manfred --- 2.6/drivers/net/forcedeth.c 2005-07-15 21:42:35.000000000 +0200 +++ build-2.6/drivers/net/forcedeth.c 2005-07-15 21:42:30.000000000 +0200 @@ -306,7 +306,7 @@ #define NV_TX2_LASTPACKET (1<<29) #define NV_TX2_RETRYERROR (1<<18) -#define NV_TX2_LASTPACKET1 (1<<23) +#define NV_TX2_LASTPACKET1 (1<<30) #define NV_TX2_DEFERRED (1<<25) #define NV_TX2_CARRIERLOST (1<<26) #define NV_TX2_LATECOLLISION (1<<27)
Yes, that is correct. Jan, please give it a try and let me know if you encounter the hang.
Created attachment 5347 [details] Crash dump... I *think* I left my PC on with the updated forcedeth, but I might be completely wrong. This is what I found this morning... I won't trust this for incontrovertible evidence of a hang, but I'll log a new dump as soon as it hangs. Cross fingers!
Created attachment 5352 [details] Register dump (crashed with patch) Definite hang this time with the one-liner patch in the previous comment.
One more experient to try is the new tx interrupt scheme (in forcedeth version 38). You can find the patch here: http://www.colorfullife.com/~manfred/Linux-kernel/forcedeth/ Thanks, Ayaz
Created attachment 5373 [details] Register dump (crashed with forcedeth 038) Up for roughly 5 hours, then a crash. This time with forcedeth 038, so even with the new interrupt routine it fails...
Created attachment 5374 [details] YARD (hung with forcedeth 038) So, 2 minutes after rebooting from the previous hang, it hangs again! I'm now trying forcedeth 040... I wonder if 64-bit DMA will have any effect?
Created attachment 5375 [details] Forcedeth 040 hang Forcedeth 040 also fails to fix the problem...
Created attachment 5376 [details] Tx ring size increasement Could you please try out this new patch? It will increase the Tx ring size to 1024. Thanks, Ayaz
Sorry, even with the enlarged tx ring it hangs. Do you want another register dump, or are they getting a bit redundant?
Can you describe your exact setup? ie. switch? hub? link speed/duplex? network environment (many machines, access to internet, etc). Also, what kind of network traffic are you doing? alot of FTP traffic? web traffic? NFS shares? etc, etc.
1) Link characteristics: I'm currently on a corporate LAN that's using loads of different equipment and has grown over years... I'll get the specific switch specs from the IT guys tomorrow... BTW, I've had crashes at home running a crossover between the nvidia NIC and a 3com NIC (100Mb/s). Ethtool reports: 100Mb/s Full duplex, autonegotiated. 2) I've found that opening a samba share on my box (sharing nigh 200MB data ;-) causes the link to go down almost reliably after +- 30 minutes to 2 hours. This is totally random, though, and may manifest itself as fast as 10 minutes or I can have a whole day relatively problem-free. So yeah, I assume with lots of outgoing traffic, the hangs manifest quicker. I'm currently using the backup sk98lin NIC on my motherboard, but since it's a patch out of mainline, I'd actually prefer to get the nforce NIC working. Yah, intermittent failures are HORRIBLE to debug...
Sorry, meant 200GB. AFAIK, as soon as a couple of GB is transferred to one destination, it hangs...
I've found out the switch we're using is an Alcatel 6024 with cat6 utp copper cabling... Any other info that's needed?
Created attachment 5496 [details] Modified packet filter flags Could you please try this new patch? Thanks.
Created attachment 5504 [details] Forcedeth 041 with patch hang Hung again after +/- 3 hours of samba traffic...
Certains switches send pause frames to the ethernet device when there is heavy load. The previous patch I created disabled pause frame handling in forcedeth. But it might be worthwhile to see if you can turn of pause (flow) frames on your switch as a cross check. Also, it will be worthwhile to try "nvnet" binary driver to help isolate the issue as a forcedeth driver issue vs. hardware issue. The nvnet driver can be downloaded from nvidia website.
I don't think it's the pause frame issue because I've had the same problem with a crossover cable and a 3c59x card (unless that ALSO can send pause frames), but I'll ask the IT people if they can turn off pause/flow frames on my port... Also, I've never experienced this problem in Windows and I've done some serious traffic there too with the exact same setup. Depending, naturally on whether the Windows driver uses different features on the NIC to do the same thing... Finally, the nvnet driver refuses to build cleanly with the newer kernel (2.6.12-gentoo-r7 x86_64) I'm using, so I'll look up a patch or *gasp* try to figure out how to finagle one...
I have the same damn error. I only have a cable-modem and I simple surf the internet (http) and I get this error, too. No FTP, no NFS.
I've just confirmed with the IT guys that flow control/pause frames are disabled on our switch. I've also successfully patched nvenet.c (one-liner) and I'm currently running on nvnet. If it hangs I'll report, otherwise assume that it's stable ;-)
Let me try to summarize the reports: The TX engine crashes, only a hard power cycle can restart it. A) What could cause a hang of the tx engine? - PAUSE frame reception. Some 3c59x cards support PAUSE, but I don't know if they generate PAUSE frames or only listen to them. But PAUSE is definitively unlikely with a cable modem. - PAUSE frame sending: Some nics send a pause packet on rx ring overflow. I'll try to test that. - DMA underruns. Perhaps a graphic card blocks the HT link too long? Does the bug appear from text mode, without loading the nv module? I've tried to force that condition by reducing the HT frequency, but I didn't run into any problems. Perhaps my graphic card is not fast enough (ATI 9200, open source driver) B) Is there a way to reset the tx engine even harder that we do right now in nv_probe/nv_open?
I would like to summarize some of my personal experiences: My Hardware: three nForce 2 GigaBit Boards (MSI K7N Delta 2) running GentooLinux and WinXP one brandnew Apple Powerbook 15" GigaBit-LAN running MacOSX 10.4.2 one D-Link DGS-1005D GigaBit-Switch My Problems clearly started while upgrading from kernel 2.6.10 to 2.6.11.x by the ethtool patch to v0.31 http://www.colorfullife.com/~manfred/Linux-kernel/forcedeth/patch-forcedeth-031-ethtool My perfect working solution so far (kernel 2.6.12) is to use the good old v0.30. After various testing i've found that netio v1.23 is very reliable to tell me if my nForce-NIC is working or not. - with v0.30 i'm getting fantastic speeds of 90-113 MBytes/s - with v0.31, v0.35, v0.41 only 80 KBytes/s - 2 MBytes/s and some lookups which needed the full powercycle - with v0.42 it improved very much, but not perfect, to 75 - 105 MB/s and very strangely from my PowerBook to only 30 - 100 MBytes/s Different other variants of tests like direct cross-cables, vanilla- vs. gentoo-sources, 2.6.11 vs. 2.6.12.x, WinXP vs. Linux vs. MacOS didn't show any significant differences so far. So whatever is causing this problems, for me it was introduced in v0.31 and only partially resolved in v0.42. PS: I also would like to announce that from now on i have a spare nForce-machine with installed Linux and WinXP and so i am ready to run further tests and patches.
Created attachment 5573 [details] further linkspeed changes Thanks! I've sent 0.42 to Jeff, it should appear in the kernel soon. Now we must figure out how the fix the powerbook slowdown: - Could you try the attached patch, on top of 0.42? - Could you boot without a network cable attached, and then attach it after boot? - Boot, and then load the network interface manually: ifdown ethX;rmmod forcedeth; modprobe forcedeth;<wait>;ifup ethX. It seems that recent Linux distos do ifdown/ifup during boot. Perhaps they issue some ethtools commands, too. Ayaz: Perhaps excessive collisions cause the problem, not PAUSE frames? Do "modprobe forcedeth; ifup ethX;<wait>;ifdown ethX;<wait>;ifup ethX". Then the link speed registers are not initialized properly.
The logic in nv_tx_timeout is fine. You can be more aggressive and perform a reset of the NvRegTxRxControl register but then you will also need to halt the rx engine. I believe collisions only happen in half duplex.
Ok Manfred Spraul, I applied your patch #5573 and followed all your instructions for each v0.42 & 0.42 with patch. I've tested also on 2.6.12 and 2.6.13-rc6, with fully manual net-config (only ifconfig up/down ...). But I'm sorry, nothing yielded in an significant change. Here you have my netio v1.23 measurements: (PowerBook was client -> Linux-forcedeth was netio-server) -> with forcedeth v0.30 Packet size 1k bytes: 65904 KByte/s Tx, 71846 KByte/s Rx. Packet size 2k bytes: 85714 KByte/s Tx, 86864 KByte/s Rx. Packet size 4k bytes: 103276 KByte/s Tx, 93461 KByte/s Rx. Packet size 8k bytes: 110972 KByte/s Tx, 108088 KByte/s Rx. Packet size 16k bytes: 110914 KByte/s Tx, 109928 KByte/s Rx. Packet size 32k bytes: 112113 KByte/s Tx, 109661 KByte/s Rx. -> with forcedeth v0.42 with patch #5573 Packet size 1k bytes: 35722 KByte/s Tx, 70138 KByte/s Rx. Packet size 2k bytes: 47363 KByte/s Tx, 86635 KByte/s Rx. Packet size 4k bytes: 57858 KByte/s Tx, 93541 KByte/s Rx. Packet size 8k bytes: 63085 KByte/s Tx, 108703 KByte/s Rx. Packet size 16k bytes: 64960 KByte/s Tx, 110812 KByte/s Rx. Packet size 32k bytes: 66648 KByte/s Tx, 110466 KByte/s Rx. As you can see, the recieving-speed of the nForce-Nic is nearly cut to half (+- 10MBytes/s) while sending is quite stable. Could it be possible that v0.42 is using more CPU-power on recieving than v0.30, because netio is very CPU-intensive?
For completition, here my measurements nForce <-> nForce (WinXP was client -> Linux-forcedeth was netio-server) -> with forcedeth v0.30 Packet size 1k bytes: 83960 KByte/s Tx, 115644 KByte/s Rx. Packet size 2k bytes: 100446 KByte/s Tx, 115671 KByte/s Rx. Packet size 4k bytes: 113597 KByte/s Tx, 115664 KByte/s Rx. Packet size 8k bytes: 114886 KByte/s Tx, 115683 KByte/s Rx. Packet size 16k bytes: 114898 KByte/s Tx, 115682 KByte/s Rx. Packet size 32k bytes: 114903 KByte/s Tx, 115667 KByte/s Rx. -> with forcedeth v0.42 with patch #5573 Packet size 1k bytes: 69623 KByte/s Tx, 113654 KByte/s Rx. Packet size 2k bytes: 88212 KByte/s Tx, 113944 KByte/s Rx. Packet size 4k bytes: 95181 KByte/s Tx, 114135 KByte/s Rx. Packet size 8k bytes: 112202 KByte/s Tx, 114065 KByte/s Rx. Packet size 16k bytes: 112974 KByte/s Tx, 114051 KByte/s Rx. Packet size 32k bytes: 113790 KByte/s Tx, 114100 KByte/s Rx. I've also done some test with running "top" and "Windows Task-Manager" to prove my "CPU-Power"-suspect, but i wasn't able to determine any special relationship between CPU-Load, netio and v0.30 vs. v0.42.
Sorry, had a bunch of public holidays this week... 1) I can conclusively state that nvnet works without lockups: strike hardware... 2) Forcedeth 042 looks like it did the trick! Another reason to keep the "forced" in the name: forced linkinit! I'll report back the moment the NIC hangs again, hopefully never...
Thanks for the netio output. I must reproduce it myself and then figure out what went wrong. Perhaps rx hardware checksumming doesn't work anymore, or the 64-bit DMA patch has unintended side effects. Which board do you use? nForce 3 or 4?
Neither, I have only nForce 2 GigaBit boards from MSI, look here: http://www.msi.com.tw/program/products/mainboard/mbd/pro_mbd_detail.php?UID=613 The Powerbook has an so called "SunGEM" (Sun Gigabit Ethernet) NIC, sorry, don't have any further info about it, but mabe you can get a look to kernel-sources/drivers/net/sungem.c
Can someone add v0.42 i wanna test it too.
Created attachment 5623 [details] forcedeth 0.42 Here is the forcedeth 0.42 patch. All recent patches are available from http://www.colorfullife.com/~manfred/Linux-kernel/forcedeth/ The slowdown is still unresolved, I'll try to look at the issue tomorrow.
Created attachment 5629 [details] IRQ are going high Hi, I am a kernel-dummy. But I noticed that every time my network card stops these week (4 times). I have much traffic on irq (see attachment) and here: root@pc1:~# cat /var/log/messages| grep "TransmitterStatus" Aug 7 12:28:51 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 7 12:30:17 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 7 12:31:32 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 7 12:32:48 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 7 17:10:12 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 7 17:11:57 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 7 17:12:18 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>capilib_new_ncci:kcapi: appl 2 ncci 0x10101 up Aug 7 21:41:11 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 8 09:42:49 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 8 09:43:59 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:51:11 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:51:21 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:51:32 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:51:42 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:51:53 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:52:03 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:52:14 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:52:24 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:52:35 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:52:45 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:52:56 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:53:06 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:53:17 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:53:27 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! Aug 11 17:53:38 pc1 kernel: nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout:dead entries! root@pc1:~# I run http://hotsanic.sourceforge.net/ for some debugging at my pc. Maybe it helps someone, mybe not, I do not know...
Have same problem with my new MB ASUS A8N SLI Premium nFOrce4 chip set on CPU AMD X2 4400+ 2G RAM . my dmesg output may help ? warning: many lost ticks. Your time source seems to be instable or some driver is hogging interupts nv_stop_tx: TransmitterStatus remained busy<6>forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.35. ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 3 (level, low) -> IRQ 3 PCI: Setting latency timer of device 0000:00:0a.0 to 64 eth0: forcedeth.c: subsystem: 01043:8141 bound to 0000:00:0a.0 nv_stop_tx: TransmitterStatus remained busy<7>eth0: no IPv6 routers present nv_stop_tx: TransmitterStatus remained busy<6>ld[2613]: segfault at 0000000000000020 rip 00002aaaaad194d5 rsp 00007fffff8e0470 error 4 ld[8686]: segfault at 0000000000000020 rip 00002aaaaad194d5 rsp 00007fffff824bc0 error 4 forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.35. ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 3 (level, low) -> IRQ 3 PCI: Setting latency timer of device 0000:00:0a.0 to 64 eth0: forcedeth.c: subsystem: 01043:8141 bound to 0000:00:0a.0 eth0: no link during initialization. nv_stop_tx: TransmitterStatus remained busyeth0: no link during initialization. eth0: link up. eth0: no IPv6 routers present nv_stop_tx: TransmitterStatus remained busy<6>forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.35. ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LMAC] -> GSI 3 (level, low) -> IRQ 3 PCI: Setting latency timer of device 0000:00:0a.0 to 64 eth0: forcedeth.c: subsystem: 01043:8141 bound to 0000:00:0a.0 eth0: no link during initialization. nv_stop_tx: TransmitterStatus remained busyeth0: no link during initialization. eth0: no IPv6 routers present eth0: link up. nv_stop_tx: TransmitterStatus remained busy<6>ld[2085]: segfault at 0000000000000020 rip 00002aaaaad194d5 rsp 00007fffff85c9c0 error 4 ld[2322]: segfault at 0000000000000020 rip 00002aaaaad194d5 rsp 00007fffff99c590 error 4 I hope Problem will be fix soon Thx and Best Regards
I've been running with forcedeth 0.42 for a week now and never had the hang again.
On my box, 0,42 seems stable.
I've been experiencing the same lockups on my Abit NI8-SLI running a P4 with EM64T, and also previously with an Epox board and an Athlon64 (can't recall the board model). Hopefully the 0.42 patch will fix the problem. However, the actual reason I'm posting here is that I noticed that the 0.42 patch doesn't update the driver version number. It adds a line to the changelog about 0.42, but the #define FORCEDETH_VERSION line is still "0.41". I didn't know where else to post this - it certainly doesn't seem to deserve its own bug. Anyway, just a heads-up.
The force-linkinit patch (the bit that fixes the hangup problems) is in the latest gentoo-sources kernel. Since I've had that one-liner patch on my kernel, the NIC hasn't blinked.
I can not find this patches in latest Vanila Kernel 2.6.13 > ?
I try to apply patch 4.2 to 2.6.13 kernel and have error in forcedeth.c.rej cat forcedeth.c.rej --- 2180,2188 ---- writel(NVREG_MIISTAT_MASK, base + NvRegMIIStatus); dprintk(KERN_INFO "startup: got 0x%08x.\n", miistat); } + /* set linkspeed to invalid value, thus force nv_update_linkspeed + * to init hw */ + np->linkspeed = 0; ret = nv_update_linkspeed(dev); nv_start_rx(dev); nv_start_tx(dev);
Re: post #61 On vanilla kernel 2.6.13, I applied: http://www.colorfullife.com/~manfred/Linux-kernel/forcedeth/patch-forcedeth-042-forcelinkinit It caused one rejection (in the comments and can basically be ignored) and basically causes the line: np->linkspeed = 0; to be added in front of the line: ret = nv_update_linkspeed(dev); in the file: drivers/net/forcedeth.c (relative to the kernel sources directory) This fixed the bug for me (the gentoo sources are patched in the exact same way, according to the .orig file ;-) Personally I vote for this bug to be closed THE MOMENT that one-liner gets accepted into the "stable kernel release". Any other problems with the driver are clearly not related to this specific bug. The link speed regressions should be handled as a new bug, not a variant of this one. I'd be very suprised if the hangs and regressions are symptoms of the same bug...
I agree: As far as I can see, 0.42 fixes the bug. The patch is in 2.6.13-git3. Unfortunately, I forgot to increase the version number, but I won't send another patch just to increase the number. Could someone close the bug? I see the performance regression, too, although less severe: The cpu load at 60 MB/sec is around 60% instead of 50% as it was before. I have no idea why, but I'll try to figure out which change caused that.
Peter, can you close out this bug?
Hi, first of all I wonder why I am the owner of this bug? Maybe someone else can change this for me, please? I got last week this: /softirq.c:140 [<c011feb2>] local_bh_enable+0x72/0x80 [<f91d90d3>] destroy_conntrack+0x83/0xd0 [ip_conntrack] [<c0238c27>] __kfree_skb+0xa7/0x130 [<c0238b74>] kfree_skbmem+0x24/0x30 [<f8a8084b>] nv_drain_tx+0x3b/0x70 [forcedeth] [<f8a80b66>] nv_tx_timeout+0x56/0xd0 [forcedeth] [<c024f3e0>] dev_watchdog+0x0/0xa0 [<c024f477>] dev_watchdog+0x97/0xa0 [<c0123b66>] run_timer_softirq+0xb6/0x1a0 [<c011fdfd>] __do_softirq+0x7d/0x90 [<c011fe36>] do_softirq+0x26/0x30 [<c010565e>] do_IRQ+0x1e/0x30 [<c0103b0a>] common_interrupt+0x1a/0x20 [<f8d0aa70>] acpi_processor_idle+0x0/0x299 [processor] [<f8d0ab75>] acpi_processor_idle+0x105/0x299 [processor] [<c01010d8>] cpu_idle+0x48/0x60 [<c03627db>] start_kernel+0x17b/0x1c0 [<c0362390>] unknown_bootoption+0x0/0x1b0 Badness in local_bh_enable at kernel/softirq.c:140 [<c011feb2>] local_bh_enable+0x72/0x80 [<f91d90d3>] destroy_conntrack+0x83/0xd0 [ip_conntrack] [<c0238c27>] __kfree_skb+0xa7/0x130 [<c0238b74>] kfree_skbmem+0x24/0x30 [<f8a8084b>] nv_drain_tx+0x3b/0x70 [forcedeth] [<f8a80b66>] nv_tx_timeout+0x56/0xd0 [forcedeth] [<c024f3e0>] dev_watchdog+0x0/0xa0 [<c024f477>] dev_watchdog+0x97/0xa0 [<c0123b66>] run_timer_softirq+0xb6/0x1a0 [<c011fdfd>] __do_softirq+0x7d/0x90 [<c011fe36>] do_softirq+0x26/0x30 [<c010565e>] do_IRQ+0x1e/0x30 [<c0103b0a>] common_interrupt+0x1a/0x20 [<f8d0aa70>] acpi_processor_idle+0x0/0x299 [processor] [<f8d0ab75>] acpi_processor_idle+0x105/0x299 [processor] [<c01010d8>] cpu_idle+0x48/0x60 [<c03627db>] start_kernel+0x17b/0x1c0 [<c0362390>] unknown_bootoption+0x0/0x1b0 Badness in local_bh_enable at kernel/softirq.c:140 [<c011feb2>] local_bh_enable+0x72/0x80 [<f91d90d3>] destroy_conntrack+0x83/0xd0 [ip_conntrack] [<c0238c27>] __kfree_skb+0xa7/0x130 [<c0238b74>] kfree_skbmem+0x24/0x30 [<f8a8084b>] nv_drain_tx+0x3b/0x70 [forcedeth] [<f8a80b66>] nv_tx_timeout+0x56/0xd0 [forcedeth] [<c024f3e0>] dev_watchdog+0x0/0xa0 [<c024f477>] dev_watchdog+0x97/0xa0 [<c0123b66>] run_timer_softirq+0xb6/0x1a0 [<c011fdfd>] __do_softirq+0x7d/0x90 [<c011fe36>] do_softirq+0x26/0x30 [<c010565e>] do_IRQ+0x1e/0x30 [<c0103b0a>] common_interrupt+0x1a/0x20 [<f8d0aa70>] acpi_processor_idle+0x0/0x299 [processor] [<f8d0ab75>] acpi_processor_idle+0x105/0x299 [processor] [<c01010d8>] cpu_idle+0x48/0x60 [<c03627db>] start_kernel+0x17b/0x1c0 [<c0362390>] unknown_bootoption+0x0/0x1b0 Badness in local_bh_enable at kernel/softirq.c:140 [<c011feb2>] local_bh_enable+0x72/0x80 [<f91d90d3>] destroy_conntrack+0x83/0xd0 [ip_conntrack] [<c0238c27>] __kfree_skb+0xa7/0x130 [<c0238b74>] kfree_skbmem+0x24/0x30 [<f8a8084b>] nv_drain_tx+0x3b/0x70 [forcedeth] [<f8a80b66>] nv_tx_timeout+0x56/0xd0 [forcedeth] [<c024f3e0>] dev_watchdog+0x0/0xa0 [<c024f477>] dev_watchdog+0x97/0xa0 [<c0123b66>] run_timer_softirq+0xb6/0x1a0 [<c011fdfd>] __do_softirq+0x7d/0x90 [<c011fe36>] do_softirq+0x26/0x30 [<c010565e>] do_IRQ+0x1e/0x30 [<c0103b0a>] common_interrupt+0x1a/0x20 [<f8d0aa70>] acpi_processor_idle+0x0/0x299 [processor] [<f8d0ab75>] acpi_processor_idle+0x105/0x299 [processor] [<c01010d8>] cpu_idle+0x48/0x60 [<c03627db>] start_kernel+0x17b/0x1c0 [<c0362390>] unknown_bootoption+0x0/0x1b0 Badness in local_bh_enable at kernel/softirq.c:140 [<c011feb2>] local_bh_enable+0x72/0x80 [<f91d90d3>] destroy_conntrack+0x83/0xd0 [ip_conntrack] [<c0238c27>] __kfree_skb+0xa7/0x130 [<c0238b74>] kfree_skbmem+0x24/0x30 [<f8a8084b>] nv_drain_tx+0x3b/0x70 [forcedeth] [<f8a80b66>] nv_tx_timeout+0x56/0xd0 [forcedeth] [<c024f3e0>] dev_watchdog+0x0/0xa0 [<c024f477>] dev_watchdog+0x97/0xa0 [<c0123b66>] run_timer_softirq+0xb6/0x1a0 [<c011fdfd>] __do_softirq+0x7d/0x90 [<c011fe36>] do_softirq+0x26/0x30 [<c010565e>] do_IRQ+0x1e/0x30 [<c0103b0a>] common_interrupt+0x1a/0x20 [<f8d0aa70>] acpi_processor_idle+0x0/0x299 [processor] [<f8d0ab75>] acpi_processor_idle+0x105/0x299 [processor] [<c01010d8>] cpu_idle+0x48/0x60 [<c03627db>] start_kernel+0x17b/0x1c0 [<c0362390>] unknown_bootoption+0x0/0x1b0 Badness in local_bh_enable at kernel/softirq.c:140 [<c011feb2>] local_bh_enable+0x72/0x80 [<f91d90d3>] destroy_conntrack+0x83/0xd0 [ip_conntrack] [<c0238c27>] __kfree_skb+0xa7/0x130 [<c0238b74>] kfree_skbmem+0x24/0x30 [<f8a8084b>] nv_drain_tx+0x3b/0x70 [forcedeth] [<f8a80b66>] nv_tx_timeout+0x56/0xd0 [forcedeth] [<c024f3e0>] dev_watchdog+0x0/0xa0 [<c024f477>] dev_watchdog+0x97/0xa0 [<c0123b66>] run_timer_softirq+0xb6/0x1a0 [<c011fdfd>] __do_softirq+0x7d/0x90 [<c011fe36>] do_softirq+0x26/0x30 [<c010565e>] do_IRQ+0x1e/0x30 [<c0103b0a>] common_interrupt+0x1a/0x20 [<f8d0aa70>] acpi_processor_idle+0x0/0x299 [processor] [<f8d0ab75>] acpi_processor_idle+0x105/0x299 [processor] [<c01010d8>] cpu_idle+0x48/0x60 [<c03627db>] start_kernel+0x17b/0x1c0 [<c0362390>] unknown_bootoption+0x0/0x1b0 Badness in local_bh_enable at kernel/softirq.c:140 [<c011feb2>] local_bh_enable+0x72/0x80 [<f91d90d3>] destroy_conntrack+0x83/0xd0 [ip_conntrack] [<c0238c27>] __kfree_skb+0xa7/0x130 [<c0238b74>] kfree_skbmem+0x24/0x30 [<f8a8084b>] nv_drain_tx+0x3b/0x70 [forcedeth] [<f8a80b66>] nv_tx_timeout+0x56/0xd0 [forcedeth] [<c024f3e0>] dev_watchdog+0x0/0xa0 [<c024f477>] dev_watchdog+0x97/0xa0 [<c0123b66>] run_timer_softirq+0xb6/0x1a0 [<c011fdfd>] __do_softirq+0x7d/0x90 [<c011fe36>] do_softirq+0x26/0x30 [<c010565e>] do_IRQ+0x1e/0x30 [<c0103b0a>] common_interrupt+0x1a/0x20 [<f8d0aa70>] acpi_processor_idle+0x0/0x299 [processor] [<f8d0ab75>] acpi_processor_idle+0x105/0x299 [processor] [<c01010d8>] cpu_idle+0x48/0x60 [<c03627db>] start_kernel+0x17b/0x1c0 [<c0362390>] unknown_bootoption+0x0/0x1b0 NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! nv_stop_tx: TransmitterStatus remained busy<7>capilib_new_ncci: kcapi: appl 2 ncci 0x10101 up kcapi: appl 2 ncci 0x10101 down capilib_new_ncci: kcapi: appl 2 ncci 0x10101 up capidrv-1: incoming call ,1,1,6099700 capidrv-1: patching si2=1 to 0 for VBOX isdn_net: Incoming call without OAD, assuming '0' isdn_net: call from 0 -> 0 6099700 ignored isdn_tty: Incoming call without OAD, assuming '0' isdn_tty: call from 0 -> 6099700 ignored capidrv-1: incoming call ,1,0,6099700 ignored kcapi: appl 2 ncci 0x10101 down eth1: no IPv6 routers present NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! NETDEV WATCHDOG: eth1: transmit timed out nv_stop_tx: TransmitterStatus remained busy<7>eth1: tx_timeout: dead entries! nv_stop_tx: TransmitterStatus remained busy<7>eth1: no IPv6 routers present With kernel Linux pc1 2.6.13.1 #1 Thu Sep 15 20:44:51 CEST 2005 i686 GNU/Linux is it good or bad sign? Gruss, Peter
Reply-To: info@padberg-it.com ********************************************************** ** Achtung! Bei Antwort/Reply auf diese eMail bitte ** ** NICHT das Subject/Betreff ver
Try 2.6.13.2, it contains the fix: http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.13.2.bz2;z=all#11
Is this issue still a problem?
The problem seems to have re-surfaced for me (Tyan Thunder K8WE (S2895) (nforce4 chipset, dual dual-core Opteron)) in the last week, for a few version. Gentoo kernel 2.6.12-r9, 2.6.12-r3 (both had the forcedeth updated to 0.42), and 2.6.13-r5 (both with the default driver and driver version up to 0.46). Also, the problem seems to have re-surfaced not long after the computer was changed to a 100mbps switch. I'm going to send the output of dmesg
Created attachment 6515 [details] dmesg output dmesg output of Tyan K8WE, nic connected to 100mbps switch.
Based on the dmesg dump, it seems that the link speed is 100mbps and the duplex is Half. Can you verify using ethtool the link speed and duplex before and after the tx hang?
The link connection is 100mbps half-duplex, both before and after, note that the speed & duplex is auto-negotiatied.
Created attachment 6620 [details] dmesg output Same here on a Tyan K8WE and kernel 2.6.14-ck5. Attached via cross-over to a 100mbs adsl "router" which is configured for full-duplex. Likewise, I generally only get hangs when there's a lot of outbound traffic.
Michael, what is the model of the switch? What kind of traffic are you running? I can not reproduce the problem when running at 100mbps and half dulex. Jason, based on your dmesg output, you are running version 41 which does not have the fix for this. Please try version 42 and above.
The patch is applied... I confirmed that while initially reading through this bug.
Could you please enable the debug messages and send me the output after the timeout? Please set the following define to 1 #if 0 #define dprintk printk #else #define dprintk(x...) do { } while (0) #endif
The switch is a: 3com Superstack II Switch 3000 As for debug messages, I will do it as soon as I can get the system back on the onboard nic (we borrowed an another one since we have to do a few critical things this few days).
Created attachment 6642 [details] dprintk during failure (gzipped due to size constraints) I've tried to kill off most of the output and just capture a bit of normal traffic before the failure, what happens during the failure and a couple of unsuccessful resets afterward. I've still got the full log so let me know if I've cut it too close.
Created attachment 6643 [details] dprintk after reboot (gzipped due to size constraints) This is what happens during the following reboot.
Jason, Thanks for the output. Just want to double check your duplex settings. In an earlier post you said you were running at full-duplex. Based on output, you are running in half duplex. Is that true? You can use ethtool to verify before the failure and after the failure. The reason I ask is because if there is a discrepany in what the MAC thinks as the speed/duplex versus what the PHY is running at, there could be a hang.
Bingo. ethtool reports half duplex. That's consistent with the scenarios under which it hangs too - lots of incoming and outgoing packets. I'll switch the router to half duplex for the time being, but is there anything that can be done for that?
Can you send me the full output? Based on the capture before the failure, the NIC was already in half duplex (due to router only advertising 100 half). I want to know if it ever was in full duplex to begin with.
I don't have the log since the boot, but there is about 20 minutes worth (=several gigabytes) before the hang. I researched a little into how to read the output and it seemed that the link was in fact 100/half from the start of what I do have. I've done a few tests with my router and come up with the following result though. With the router set to auto-negotate and plugging in the network cable: eth0: mii_rw read from reg 5 at PHY 1: 0x41e1. eth0: nv_update_linkspeed: PHY advertises 0x0de1, lpa 0x41e1. eth0: changing link setting from 66536/0 to 65636/1. eth0: link up. eth0: nv_start_rx eth0: nv_start_rx to duplex 1, speed 0x00010064. Everything seems good. However, specifically setting to router to any of 100/full, 100/half, 10/full or 10/half yields: eth0: mii_rw read from reg 5 at PHY 1: 0x81. eth0: nv_update_linkspeed: PHY advertises 0x0de1, lpa 0x0081. eth0: changing link setting from 66536/0 to 65636/0. eth0: link up. eth0: nv_start_rx eth0: nv_start_rx to duplex 0, speed 0x00010064. The router is always being detected as 100/half and subsequent nv_update_linkspeed and nv_start_rx lines continue to show the same. However, ethtool (nv_get_settings?) correctly detects a speed of 10mb/s... When I got the last hang the router was specifically set to run at 100/full so it'd be safe to assume that the nic was running in 100/half from the start.
That is the problem. You can not use force settings on the router and then set autoneg on the client machine. If you force one side you must force the other side aswell. Otherwise, the outcome is undetermined. Please use autoneg on the router and autoneg on forcedeth (autoneg is default). Or if you use force settings on router, you must use forced settings on forcedeth (through ethtool). I suggest leaving it to autoneg. That should resolve your tx hang problem.
Changing it was actually a misguided attempt at solving hangs when the router was set to auto-negotiate. I wouldn't call it "the" problem as mismatched settings shouldn't cause a hard-lock of the nic but there's a FIXME in the code for 'parallel detection' which is good enough for me. Sorry for the noise on this issue, but if I get another hang I'll be definitely be able to produce more useful info. :)
Jeff, can this bug be closed out now?
I seem to be running into something similar to this problem. It may just be coincidence, but it seems to happen almost always when I load KDE and NOT before I login. No real network activity either way, just minor work over SSH. asus a8n-vm csm 2.6.14-1.1656_FC4 x86_64
2.6.14-1.1656_FC4 ships with forcedeth-0.41. You need to upgrade to a much newer version of forcedeth to resolve this problem. See here for semi-official FC4 kernel RPMS with the latest in-kernel forcedeth: http://people.redhat.com/linville/kernels/fedora-netdev/