Most recent kernel where this bug did not occur: Distribution: Debian sarge Hardware Environment: Tyan NF-CK804, AMD Athlon(tm) 64 Processor 3200+, CK804 Ethernet Controller (the Problem), More see attachment Software Environment: Services: Routing, Firewalling, Mail, Proxy Problem Description: After a while the transmitter of the CK804 NIC hangs and recovers only after reboot. The netdev watchdog cannot successfully reset the NIC. Steps to reproduce: No usage pattern detected so far. Happens after more than one day usage. Cabling is Cat5 crossover connection to one LAN-port of a "Sonicwall TZ170 Enhanced". Cables and machines were replaced with no luck. Using the other Onboard NIC here (the Net-Extreme) helps. The same combination with a Cisco switch between the Sonicwall and the CK804-NIC works without problems (at least since 2.6.13.4).
Created attachment 8010 [details] The hardware as lshw sees it of the problematic machine
Created attachment 8011 [details] Log messages from the kernel regarding this bug
Created attachment 8012 [details] My kernel config Please ignore the drbd part. This is is never opened so it is nothing more but a BLOCK device handing around in the device tree.
Oh! Forgot to complete the summary line
Comment on attachment 8010 [details] The hardware as lshw sees it of the problematic machine correct mime type
Reply-To: netdev@axxeo.de Hi Manfred, I filed BUG 6480 describing the problem and providing lots of info. If you need more info, just ask. At the moment I have no idea about the reasons. Except Crossover-Cabling vs. using a switch. The machine is a production machine, so I cannot test kernels and patches. But I have several machines with identical hardware where I can test this. Once we can reproduce it without a Sonicwall, I can build a test setup using any kernel hackery required to resolve the issue :-) Could you please contact the right nVIDIA people, if needed? PS: Stephen, you are not CC'ed from bugzilla and "agreed to help out with net driver maintenance" so I CC'ed you here manually. Regards Ingo Oeser
Does this still occur with the current forcedeth driver (2.6.18 or later)?
Hi Stephen, bugme-daemon@bugzilla.kernel.org schrieb: > ------- Additional Comments From shemminger@osdl.org 2006-12-18 12:02 ------- > Does this still occur with the current forcedeth driver (2.6.18 or later)? I'll ask the customer, if I can flip interfaces on his mailserver again, but this will take time. I can also only test upto 2.6.18.x at the moment until that disk corruption problem is sorted out. But I'll try my best to set up a test.
Can you attach your forcedeth.c file?
Created attachment 10705 [details] Fix for calling interrupt routine in nv_do_nic_poll Patch 1/2
Created attachment 10706 [details] Fix tx timeout routine Patch 2/2
The patches are in now. Ingo does everything work for you now? If so we can close this bug, thanks.
We've hit it just with this customer and have several machines deployed and 4 deployed in the same configuration without problems. We'll try this at the next complete on-site day for the customer. But this day is one or two months away. So I close this bug and reopen it, when we hit it again. NIC cannot be replaced (without soldering), as it is on-board in the chipset.