Bug 5632 - forcedeth driver occasionally hangs
forcedeth driver occasionally hangs
Status: RESOLVED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Network
i386 Linux
: P2 high
Assigned To: Ayaz Abdulla
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-20 07:00 UTC by Alexey Dobriyan
Modified: 2006-12-22 17:42 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.15-rc1
Tree: Mainline
Regression: ---


Attachments
kernel log (38.43 KB, text/plain)
2005-11-25 12:35 UTC, Anders Fugmann
Details
More data from the kernel log. (419.74 KB, text/plain)
2005-11-25 15:23 UTC, Anders Fugmann
Details
Added more debug prints. (2.02 KB, patch)
2005-11-27 16:32 UTC, Ayaz Abdulla
Details | Diff
Kernel log with extra debug (149.05 KB, application/x-bzip)
2005-12-03 05:59 UTC, Anders Fugmann
Details
segmentation fix (9.27 KB, patch)
2005-12-09 21:14 UTC, Ayaz Abdulla
Details | Diff
segmentation fix (9.27 KB, patch)
2005-12-09 21:22 UTC, Ayaz Abdulla
Details | Diff
TSO fix for large buffers (9.38 KB, patch)
2005-12-10 14:01 UTC, Ayaz Abdulla
Details | Diff

Description Alexey Dobriyan 2005-11-20 07:00:18 UTC
From: Anders Peter Fugmann <afu@fugmann.net>
http://marc.theaimsgroup.com/?t=113208292000008&r=1&w=2

Most recent kernel where this bug did not occur: 2.6.14
Hardware Environment:

0000:00:05.0 Bridge: nVidia Corporation CK8S Ethernet Controller (rev a2)
	Subsystem: ASUSTeK Computer Inc.: Unknown device 80a7
	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 18
	Memory at febfc000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at ec00 [size=8]
	Capabilities: [44] Power Management version 2

Problem Description:

The forcedeth driver occasionally hangs, sometimes after several hours,
sometimes after just a few minutes of traffic. Hardware is a Nvidia NForce3
motherboard with onboard LAN (Linux compiled for 64 bit, using gcc 4.0.3
prerelease). Removing and inserting resolves to problem for a short period of
time.

The following is being printed to the kernel logs repeatably:

NETDEV WATCHDOG: eth1: transmit timed out
eth1: Got tx_timeout. irq: 00000000
eth1: Ring at 1ee38000: next 271242 nic 271178
eth1: Dumping tx registers
  0: 00000200 000000ff 00000003 012b03ca 00000040 00000000 00000000 00000000
 20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 40: 0420e20e 00000855 00000000 00000000 00000000 00000000 00000000 00000000
 60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 80: 003b0f3c 00000001 00040000 007f0028 0000061c 00000001 00000000 00002d3f
 a0: 0016070f 00000016 69d81100 00000b48 00000001 00000000 9600cccd 0000f4b7
 c0: 10000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001
 e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001
100: 1ee38400 1ee38000 007f003f 00000000 00010064 00000000 0000003e 1ee38440
120: 1ee38380 1eb093c0 a00002e7 1eb13810 8000061c 1ee38454 1ee38304 00200010
140: 003041c0 00002500 00000000 00000000 00000000 00000000 00000000 00000000
160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
180: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3
1a0: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3
1c0: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3
1e0: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3
200: 00007770 00000000 00000000 00000000 00000000 00000000 00000000 00000000
220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
260: 00000000 00000000 fe020001 00000100 00000000 00000000 fe020001 00000100
280: 0e424b11 00041674 00000000 00000000 00000000 00000000 00000000 00000000
2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2c0: 00000000 00000000 00036e63 00000004 0000000c 00000001 00000001 00000001
2e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001
300: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
320: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
340: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
360: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
380: 00000000 00000000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff
3a0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
3c0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
3e0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
eth1: Dumping tx ring
000: 0d8ca892 a000004d // 08ae0012 a0000041 // 08ae0e12 a0000041 // 0624aa02
a0000029
004: 0d8ca4b2 8000004d // 0344ca97 a000002f // 0624a002 a0000029 // 04741402
a0000029
008: 0d8ca092 a000004d // 04741002 a0000029 // 11fef0be a00001e1 // 10dc00be
b16a1139
00c: 027340be b16a1139 // 0a33e0be b16a1c89 // 1eb090be a00005e9 // 0cf410be
b16a0b91
010: 0fcce0be b16a1139 // 0ebca8be a00005e9 // 1a9880be b16a16e1 // 143390be
a00001e1
014: 03f8d8be 80000041 // 0da08000 a0000073 // 03f8d492 a000004d // 03f8d0be
a00001e1
018: 0f1a9c92 a000004d // 1eb17812 a0000189 // 0f1a98be a00001e1 // 0f276812
a0000189
01c: 0f1a9492 a000004d // 1e84d812 a0000189 // 1d9afcbe a00001e1 // 0f1a9092
a000004d
020: 1d9af4be a0000087 // 1d9af092 a000004d // 1e89f012 a0000189 // 04e56c92
a000004d
024: 04e56892 a000004d // 04e564be a00001e1 // 04e56092 a000004d // 04396c92
a000004d
028: 1eadb812 a0000189 // 04741612 a0000041 // 128f7412 a0000041 // 0735de12
a0000041
02c: 060b4812 a0000041 // 14473c12 a0000041 // 0735d612 a0000041 // 04396892
a000004d
030: 1d8cae12 a0000041 // 128f7012 a0000041 // 0e281812 a0000189 // 043960be
a00001e1
034: 04396492 a000004d // 0735d402 a0000045 // 0735d202 a0000029 // 0bf0e8be
80000041
038: 0344ca07 a000002f // 0bf0e0be 80000041 // 0344ca37 a000002f // 0ef0a8be
80000041
03c: 0344ca67 a000002f // 0ef0a092 a000004d // 0d8cacbe 80000041 // 0344ca07
a000002f
Comment 1 Ayaz Abdulla 2005-11-21 14:55:11 UTC
Could you please enable the debug messages and send me the output after the 
timeout?

Please set the following define to 1

#if 0
#define dprintk			printk
#else
#define dprintk(x...)		do { } while (0)
#endif

Comment 2 Anders Fugmann 2005-11-25 12:35:03 UTC
Created attachment 6681 [details]
kernel log

Kernel log, captured just before the driver hung. Linux 2.6.15-rc1, module
compiled with debug enabled.
Comment 3 Ayaz Abdulla 2005-11-25 14:46:43 UTC
Could you post more output? Especially the output relating to the transmit of 
packet packet 8452580 and beyond. And also the tx timeout register dump.
Comment 4 Anders Fugmann 2005-11-25 15:23:12 UTC
Created attachment 6682 [details]
More data from the kernel log.

Additional kernel output. 

There is no sign of TX register dumps, but that may be related to the script
that removed and inserted the forcedeth after five seconds of inactivity.
Please note, that I do not have console access to the machine in question, so I
hope that my 1.5 GB of kernel logs do contain enough information for you to
find the problem. 

Rmmod'ing and inserting the forcedeth module did not resolve the problem this
time. Also the problem occured (reproducable) when doing an rsync from the
machine. Copying large files over NFS or streaming video through the machine
did not trigger the problem.
Comment 5 Ayaz Abdulla 2005-11-27 16:32:50 UTC
Created attachment 6696 [details]
Added more debug prints.

Please use this new patch which has some more debug prints. Then please send me
the output log. Thanks!
Comment 6 Anders Fugmann 2005-12-03 05:59:42 UTC
Created attachment 6755 [details]
Kernel log with extra debug

Full log, forcedeth driver patched with extra debug information.
Comment 7 Ayaz Abdulla 2005-12-03 11:27:16 UTC
The attachment is a binary file. 
Comment 8 Anders Fugmann 2005-12-03 15:08:12 UTC
True. As suggested by the attachment type I compressed it with bzip2 (reducing
the size from 12M to 150K). I hope you can use this format. 
Sorry for the inconvenience.
Comment 9 Ayaz Abdulla 2005-12-03 17:58:54 UTC
Thanks for taking the time to generate the output. The issue is that during 
the stream a buffer for transmit is too big for the hardware to handle. I am 
working on a fix for the issue.
Comment 10 Anders Fugmann 2005-12-05 01:05:48 UTC
Jeff, can you change the assigned to field to Ayaz, as he is currently working
on fixing the bug?
Comment 11 Ayaz Abdulla 2005-12-09 21:14:38 UTC
Created attachment 6789 [details]
segmentation fix

I have fixed the segmentation issue. Could you please try this patch out?

Thanks.
Comment 12 Ayaz Abdulla 2005-12-09 21:22:15 UTC
Created attachment 6790 [details]
segmentation fix

Modified the patch to account for the real max size. Previous patch had my
debug test size.
Comment 13 Anders Fugmann 2005-12-10 12:27:08 UTC
Patch works flawlessly - Thanks.

I hope that this patch can make it into 2.6.15. 

The patch create a huge amount of debug output. Do you need those?
Comment 14 Ayaz Abdulla 2005-12-10 14:01:56 UTC
Created attachment 6793 [details]
TSO fix for large buffers

I have submitted this patch to netdev kernel maintainers.

Jeff, can this patch be applied to 2.6.15 kernel since it is a bug fix?

Note You need to log in before you can comment on or make changes to this bug.