From: Anders Peter Fugmann <afu@fugmann.net> http://marc.theaimsgroup.com/?t=113208292000008&r=1&w=2 Most recent kernel where this bug did not occur: 2.6.14 Hardware Environment: 0000:00:05.0 Bridge: nVidia Corporation CK8S Ethernet Controller (rev a2) Subsystem: ASUSTeK Computer Inc.: Unknown device 80a7 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 18 Memory at febfc000 (32-bit, non-prefetchable) [size=4K] I/O ports at ec00 [size=8] Capabilities: [44] Power Management version 2 Problem Description: The forcedeth driver occasionally hangs, sometimes after several hours, sometimes after just a few minutes of traffic. Hardware is a Nvidia NForce3 motherboard with onboard LAN (Linux compiled for 64 bit, using gcc 4.0.3 prerelease). Removing and inserting resolves to problem for a short period of time. The following is being printed to the kernel logs repeatably: NETDEV WATCHDOG: eth1: transmit timed out eth1: Got tx_timeout. irq: 00000000 eth1: Ring at 1ee38000: next 271242 nic 271178 eth1: Dumping tx registers 0: 00000200 000000ff 00000003 012b03ca 00000040 00000000 00000000 00000000 20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 40: 0420e20e 00000855 00000000 00000000 00000000 00000000 00000000 00000000 60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 80: 003b0f3c 00000001 00040000 007f0028 0000061c 00000001 00000000 00002d3f a0: 0016070f 00000016 69d81100 00000b48 00000001 00000000 9600cccd 0000f4b7 c0: 10000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 100: 1ee38400 1ee38000 007f003f 00000000 00010064 00000000 0000003e 1ee38440 120: 1ee38380 1eb093c0 a00002e7 1eb13810 8000061c 1ee38454 1ee38304 00200010 140: 003041c0 00002500 00000000 00000000 00000000 00000000 00000000 00000000 160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 180: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3 1a0: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3 1c0: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3 1e0: 00000016 00000008 0194796d 00008103 00000025 000045e1 0194796d 0000c5e3 200: 00007770 00000000 00000000 00000000 00000000 00000000 00000000 00000000 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 260: 00000000 00000000 fe020001 00000100 00000000 00000000 fe020001 00000100 280: 0e424b11 00041674 00000000 00000000 00000000 00000000 00000000 00000000 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 2c0: 00000000 00000000 00036e63 00000004 0000000c 00000001 00000001 00000001 2e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 300: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 320: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 340: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 360: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 380: 00000000 00000000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff 3a0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff 3c0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff 3e0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff eth1: Dumping tx ring 000: 0d8ca892 a000004d // 08ae0012 a0000041 // 08ae0e12 a0000041 // 0624aa02 a0000029 004: 0d8ca4b2 8000004d // 0344ca97 a000002f // 0624a002 a0000029 // 04741402 a0000029 008: 0d8ca092 a000004d // 04741002 a0000029 // 11fef0be a00001e1 // 10dc00be b16a1139 00c: 027340be b16a1139 // 0a33e0be b16a1c89 // 1eb090be a00005e9 // 0cf410be b16a0b91 010: 0fcce0be b16a1139 // 0ebca8be a00005e9 // 1a9880be b16a16e1 // 143390be a00001e1 014: 03f8d8be 80000041 // 0da08000 a0000073 // 03f8d492 a000004d // 03f8d0be a00001e1 018: 0f1a9c92 a000004d // 1eb17812 a0000189 // 0f1a98be a00001e1 // 0f276812 a0000189 01c: 0f1a9492 a000004d // 1e84d812 a0000189 // 1d9afcbe a00001e1 // 0f1a9092 a000004d 020: 1d9af4be a0000087 // 1d9af092 a000004d // 1e89f012 a0000189 // 04e56c92 a000004d 024: 04e56892 a000004d // 04e564be a00001e1 // 04e56092 a000004d // 04396c92 a000004d 028: 1eadb812 a0000189 // 04741612 a0000041 // 128f7412 a0000041 // 0735de12 a0000041 02c: 060b4812 a0000041 // 14473c12 a0000041 // 0735d612 a0000041 // 04396892 a000004d 030: 1d8cae12 a0000041 // 128f7012 a0000041 // 0e281812 a0000189 // 043960be a00001e1 034: 04396492 a000004d // 0735d402 a0000045 // 0735d202 a0000029 // 0bf0e8be 80000041 038: 0344ca07 a000002f // 0bf0e0be 80000041 // 0344ca37 a000002f // 0ef0a8be 80000041 03c: 0344ca67 a000002f // 0ef0a092 a000004d // 0d8cacbe 80000041 // 0344ca07 a000002f
Could you please enable the debug messages and send me the output after the timeout? Please set the following define to 1 #if 0 #define dprintk printk #else #define dprintk(x...) do { } while (0) #endif
Created attachment 6681 [details] kernel log Kernel log, captured just before the driver hung. Linux 2.6.15-rc1, module compiled with debug enabled.
Could you post more output? Especially the output relating to the transmit of packet packet 8452580 and beyond. And also the tx timeout register dump.
Created attachment 6682 [details] More data from the kernel log. Additional kernel output. There is no sign of TX register dumps, but that may be related to the script that removed and inserted the forcedeth after five seconds of inactivity. Please note, that I do not have console access to the machine in question, so I hope that my 1.5 GB of kernel logs do contain enough information for you to find the problem. Rmmod'ing and inserting the forcedeth module did not resolve the problem this time. Also the problem occured (reproducable) when doing an rsync from the machine. Copying large files over NFS or streaming video through the machine did not trigger the problem.
Created attachment 6696 [details] Added more debug prints. Please use this new patch which has some more debug prints. Then please send me the output log. Thanks!
Created attachment 6755 [details] Kernel log with extra debug Full log, forcedeth driver patched with extra debug information.
The attachment is a binary file.
True. As suggested by the attachment type I compressed it with bzip2 (reducing the size from 12M to 150K). I hope you can use this format. Sorry for the inconvenience.
Thanks for taking the time to generate the output. The issue is that during the stream a buffer for transmit is too big for the hardware to handle. I am working on a fix for the issue.
Jeff, can you change the assigned to field to Ayaz, as he is currently working on fixing the bug?
Created attachment 6789 [details] segmentation fix I have fixed the segmentation issue. Could you please try this patch out? Thanks.
Created attachment 6790 [details] segmentation fix Modified the patch to account for the real max size. Previous patch had my debug test size.
Patch works flawlessly - Thanks. I hope that this patch can make it into 2.6.15. The patch create a huge amount of debug output. Do you need those?
Created attachment 6793 [details] TSO fix for large buffers I have submitted this patch to netdev kernel maintainers. Jeff, can this patch be applied to 2.6.15 kernel since it is a bug fix?