Bug 9015 - network load can break nforce nic (forcedeth driver)
Summary: network load can break nforce nic (forcedeth driver)
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Ayaz Abdulla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-13 05:14 UTC by edo
Modified: 2009-03-24 05:03 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.23
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
ethtool output after tx timeout happens (32 bytes, text/plain)
2007-09-27 22:52 UTC, spiridonov ed
Details

Description edo 2007-09-13 05:14:50 UTC
Most recent kernel where this bug did not occur:

Distribution:
debian etch

Hardware Environment:
cpu AMD Athlon(tm) 64 X2 Processor 3800+
chipset: NF4-Ultra (DFI motherboard)

Software Environment:
kernel versions 2.6.21.5 and 2.6.22.6

Problem Description:
sometimes i get message "NETDEV WATCHDOG: eth0: transmit timed out".
after it network card don't work.

kernel logs can be found here:
http://dionis.pnz.ru/err.log
http://dionis.pnz.ru/err2.log

Steps to reproduce:
i can't reproduce this bug, it occur randomly, but it look like related with network load.

similar report is here:
http://groups.google.com/group/linux.kernel/browse_thread/thread/a3c25d6593fd47b2/02f842c98b7e60c8
Comment 1 Ayaz Abdulla 2007-09-26 12:59:46 UTC
Adding report information:


100% reproducible hang on xmit timeout. 
Just do a "make -j4 modules" on an nfs mounted kernel source. 


attached is the messages log 


berkley 
-- 


// E. F. Berkley Shands, MSc// 


** Exegy Inc.** 


349 Marshall Road, Suite 100 


St. Louis , MO  63119 


Direct:  (314) 218-3600 X450 


Cell:  (314) 303-2546 


Office:  (314) 218-3600 


Fax:  (314) 218-3601 


The Usual Disclaimer follows... 


This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others. 


[  messages ] 
Aug 23 18:34:55 crash kernel: [30819.690155] NETDEV WATCHDOG: eth1: transmit timed out 
Aug 23 18:34:55 crash kernel: [30819.690162] eth1: Got tx_timeout. irq: 00000036 
Aug 23 18:34:55 crash kernel: [30819.690164] eth1: Ring at 16e086000 
Aug 23 18:34:55 crash kernel: [30819.690166] eth1: Dumping tx registers 
Aug 23 18:34:55 crash kernel: [30819.690171]   0: 00000036 000000ff 00000003 024e03ca 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690176]  20: 06255300 ff701365 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690181]  40: 0420e20e 0000a855 00002e20 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690186]  60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690192]  80: 003b0f3c 00000001 00040000 007f0020 0000061c 00000001 00200000 00007f87 
Aug 23 18:34:55 crash kernel: [30819.690197]  a0: 0014050f 00000016 5781e000 0000020a 00000001 00000000 a800cccd 0000fcf5 
Aug 23 18:34:55 crash kernel: [30819.690203]  c0: 10000002 00000001 00000001 00000001 00000001 00000001 00000001 00000001 
Aug 23 18:34:55 crash kernel: [30819.690207]  e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 
Aug 23 18:34:55 crash kernel: [30819.690213] 100: 6e086800 6e086000 007f00ff 00008000 00010032 00000000 0000002c 6e0874c0 
Aug 23 18:34:55 crash kernel: [30819.690220] 120: 6e086360 1ca37240 a000ffeb 00000000 00000000 6e0874cc 6e08636c 0fe08000 
Aug 23 18:34:55 crash kernel: [30819.690225] 140: 00304120 80002600 00000001 00000001 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690229] 160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690235] 180: 00000016 00000008 0194796d 00008103 0000002a 00003800 0194000f 00000003 
Aug 23 18:34:55 crash kernel: [30819.690241] 1a0: 00000016 00000008 0194796d 00008103 0000002a 00003800 0194000f 00000003 
Aug 23 18:34:55 crash kernel: [30819.690246] 1c0: 00000016 00000008 0194796d 00008103 0000002a 00003800 0194000f 00000003 
Aug 23 18:34:55 crash kernel: [30819.690252] 1e0: 00000016 00000008 0194796d 00008103 0000002a 00003800 0194000f 00000003 
Aug 23 18:34:55 crash kernel: [30819.690257] 200: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690261] 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690266] 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690271] 260: 00000000 00000000 fe020001 00000100 00000000 00000000 7e020001 00000100 
Aug 23 18:34:55 crash kernel: [30819.690276] 280: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690280] 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Aug 23 18:34:55 crash kernel: [30819.690285] 2c0: 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 
Aug 23 18:34:55 crash kernel: [30819.690287] eth1: Dumping tx ring 
Aug 23 18:34:55 crash kernel: [30819.690292] 000: 00000000 8fd00892 20000052 // 00000000 88115c92 20000052 // 00000000 875ae892 20000052 // 00000000 8a660492 20000052 
Aug 23 18:34:55 crash kernel: [30819.690298] 004: 00000001 61fdb492 20000052 // 00000000 8bf3f892 20000052 // 00000000 8daa7092 20000052 // 00000000 8fa29892 20000052 
Aug 23 18:34:55 crash kernel: [30819.690304] 008: 00000001 0d558892 20000052 // 00000000 8e0bf892 20000052 // 00000000 8fd00492 20000052 // 00000000 8d160092 20000052 
Aug 23 18:34:55 crash kernel: [30819.690310] 00c: 00000001 27698092 20000052 // 00000000 7fc6cc92 20000052 // 00000000 8d03ec92 20000052 // 00000000 88085492 20000052 
Aug 23 18:34:55 crash kernel: [30819.690317] 010: 00000000 850ee492 20000052 // 00000000 8bba8c92 20000052 // 00000001 56108492 20000052 // 00000000 7f0ed892 20000052 
Aug 23 18:34:55 crash kernel: [30819.690323] 014: 00000001 509a0492 20000052 // 00000000 87b60892 20000052 // 00000000 87b62092 20000052 // 00000000 8f0d5892 20000052 
Aug 23 18:34:55 crash kernel: [30819.690329] 018: 00000000 87c7b492 20000052 // 00000000 8ee76092 20000052 // 00000001 4fb8e892 20000052 // 00000000 7fc6c492 20000052 
Aug 23 18:34:55 crash kernel: [30819.690335] 01c: 00000001 541bc492 20000052 // 00000000 85572492 20000052 // 00000000 8ee77492 20000052 // 00000000 8bbab492 20000052 
Aug 23 18:34:55 crash kernel: [30819.690341] 020: 00000001 1b4b3492 20000052 // 00000001 541bc892 20000052 // 00000000 7fc6e092 20000052 // 00000000 8fd01092 20000052 
Aug 23 18:34:55 crash kernel: [30819.690347] 024: 00000001 0d530c92 20000052 // 00000000 8bbaa092 20000052 // 00000000 8d03e892 20000052 // 00000000 7deac0b2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690352] 028: 00000000 8257dba0 00000000 // 00000000 895d0000 200005ee // 00000000 8d03e4b2 00000000 // 00000000 895d013c 22000bdc 
Aug 23 18:34:55 crash kernel: [30819.690357] 02c: 00000000 8eba30b2 00000000 // 00000000 895d0c74 00000000 // 00000000 895d1000 22000bdc // 00000000 875aecb2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690363] 030: 00000000 895d17ac 00000000 // 00000000 86f62000 22000bdc // 00000000 8a6618b2 00000000 // 00000000 86f622e4 22000bdc 
Aug 23 18:34:55 crash kernel: [30819.690368] 034: 00000000 855734b2 00000000 // 00000000 86f62e1c 00000000 // 00000000 86f63000 22000bdc // 00000001 65b95cb2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690374] 038: 00000000 86f63954 00000000 // 00000000 7fb6e000 22000bdc // 00000000 850ee0b2 00000000 // 00000000 7fb6e48c 22000bdc 
Aug 23 18:34:55 crash kernel: [30819.690379] 03c: 00000000 8d1604b2 00000000 // 00000000 7fb6efc4 00000000 // 00000000 7fb6f000 22000bdc // 00000000 8d03f8b2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690384] 040: 00000000 7fb6fafc 00000000 // 00000000 8b7c0000 220005ee // 00000001 1000e8b2 00000000 // 00000000 8b7c0098 2200007a 
Aug 23 18:34:55 crash kernel: [30819.690390] 044: 00000000 8e0bf0b2 00000000 // 00000000 8b7c00c0 22000fde // 00000000 8d03e0b2 00000000 // 00000000 8b7c0fa8 00000000 
Aug 23 18:34:55 crash kernel: [30819.690395] 048: 00000000 8b7c1000 22000bdc // 00000000 87b600b2 00000000 // 00000000 8b7c1ae0 00000000 // 00000000 8dc38000 22000bdc 
Aug 23 18:34:55 crash kernel: [30819.690401] 04c: 00000001 4fb8e0b2 00000000 // 00000000 8dc38618 00000000 // 00000000 8dc39000 22000bdc // 00000000 8fa294b2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690406] 050: 00000000 8dc39150 22000bdc // 00000001 620870b2 00000000 // 00000000 8dc39c88 00000000 // 00000000 85ad8000 22000bdc 
Aug 23 18:34:55 crash kernel: [30819.690411] 054: 00000000 875af8b2 00000000 // 00000000 85ad87c0 00000000 // 00000000 85ad9000 22000bdc // 00000000 881150b2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690417] 058: 00000000 85ad92f8 22000bdc // 00000001 62f8e8b2 00000000 // 00000000 85ad9e30 00000000 // 00000000 887d8000 22000bdc 
Aug 23 18:34:55 crash kernel: [30819.690422] 05c: 00000000 8ee764b2 00000000 // 00000000 887d8968 00000000 // 00000000 887d9000 22000bdc // 00000000 8d160cb2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690428] 060: 00000000 887d94a0 220005ee // 00000000 8a6600b2 00000000 // 00000000 887d9a3c 2200023e // 00000001 6427e4b2 00000000 
Aug 23 18:34:55 crash kernel: [30819.690433] 064: 00000000 887d9c28 00000000 // 00000000 87e80000 22000fde // 00000000 875af0b2 00000000 // 00000000 87e80b10 00000000 
Aug 23 18:34:55 crash kernel: [30819.690439] 068: 00000000 87e81000 22000bb4 // 00000000 850ee892 22000052 // 00000001 1000ec92 22000052 // 00000000 7deac492 22000052 
Aug 23 18:34:55 crash kernel: [30819.690445] 06c: 00000000 8daa7c92 22000052 // 00000001 1b4b3092 22000052 // 00000000 8bbaa492 22000052 // 00000000 8fa29c92 22000052 
Aug 23 18:34:55 crash kernel: [30819.690451] 070: 00000000 850ed492 22000052 // 00000000 875ae092 22000052 // 00000000 7fc6e492 22000052 // 00000000 8d03fc92 22000052 
Aug 23 18:34:55 crash kernel: [30819.690457] 074: 00000000 8bf3f492 22000052 // 00000000 8daa7892 22000052 // 00000001 509a0c92 22000052 // 00000000 8eba3892 22000052 
Aug 23 18:34:55 crash kernel: [30819.690463] 078: 00000000 8eba3c92 22000052 // 00000000 8a661492 22000052 // 00000000 8f0d5092 22000052 // 00000000 850eec92 22000052 
Aug 23 18:34:55 crash kernel: [30819.690469] 07c: 00000000 7f0ed492 22000052 // 00000000 88115492 22000052 // 00000000 87c7a48a 2200005a // 00000000 8ee7708a 2200005a 
Aug 23 18:34:55 crash kernel: [30819.690475] 080: 00000001 6233848a 2200005a // 00000001 4fb8688a 2200005a // 00000000 8557288a 2200005a // 00000000 87c7b88a 2200005a 
Aug 23 18:34:55 crash kernel: [30819.690481] 084: 00000000 8a660c8a 2200005a // 00000000 8557208a 2200005a // 00000000 8ee7788a 2200005a // 00000001 4fb8608a 2200005a 
Aug 23 18:34:55 crash kernel: [30819.690487] 088: 00000001 62f8e48a 2200005a // 00000001 567cb88a 2200005a // 00000000 7f0edcaa 00000000 // 00000000 8257dba0 00000000 
Aug 23 18:34:55 crash kernel: [30819.690492] 08c: 00000000 895d0000 200005ee // 00000001 6ebb8802 20000066 // 00000001 620874aa 00000000 // 00000000 8257dba0 00000000 
Aug 23 18:34:55 crash kernel: [30819.690498] 090: 00000000 895d0000 200005ee // 00000001 63945a02 20000066 // 00000001 61a9b4aa 00000000 // 00000000 8257dba0 00000000 
Aug 23 18:34:55 crash kernel: [30819.690504] 094: 00000000 895d0000 200005ee // 00000001 259b9202 20000066 // 00000001 339e6002 20000040 // 00000001 61a9b8aa 00000000 
Aug 23 18:34:55 crash kernel: [30819.690510] 098: 00000000 8257dba0 00000000 // 00000000 895d0000 220005ee // 00000001 259b9a02 20000066 // 00000001 2f34ea02 20000040 
Aug 23 18:34:55 crash kernel: [30819.690516] 09c: 00000001 6ebb8402 22000066 // 00000001 63945402 20000040 // 00000001 4f226a02 22000066 // 00000001 2e04f402 20000066 
Aug 23 18:34:55 crash kernel: [30819.690521] 0a0: 00000001 4fb8ecaa 00000000 // 00000000 8257dba0 00000000 // 00000000 895d0000 200005ee ...

We're seeing this identical timeout starting with 2.6.21, any time we try and 
push a significant amount of traffic through the nforce ethernet.  We've rolled 
back to 2.6.20.18 and don't see any problems.  It seems that this bug got 
introduced along with all the forcedeth fixes and optimizations in 2.6.21. 


Steve 
Comment 2 Ayaz Abdulla 2007-09-26 13:03:56 UTC
- Is CONFIG_FORCEDETH_NAPI define setup?
- Can you send me output of the following:
1) cat /proc/interrupts
2) ethtool eth1
3) ethtool -d eth1

Thanks.
Comment 3 spiridonov ed 2007-09-26 14:15:32 UTC
> - Is CONFIG_FORCEDETH_NAPI define setup?

topalm-dionis:~# zgrep FORCEDETH /proc/config.gz 
CONFIG_FORCEDETH=y
# CONFIG_FORCEDETH_NAPI is not set

> - Can you send me output of the following:

yes, of course.
you need output after crash or it doesn't matter?

now i attach output on working system

> 1) cat /proc/interrupts

           CPU0       CPU1       
  0:         88          0   IO-APIC-edge      timer
  1:          1          1   IO-APIC-edge      i8042
  4:      65433          1   IO-APIC-edge      serial
  6:          0          3   IO-APIC-edge      floppy
  8:          0          1   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 14:          0          0   IO-APIC-edge      libata
 15:          0          0   IO-APIC-edge      libata
 16:         15          1   IO-APIC-fasteoi   serial
 17:    3338746    3437054   IO-APIC-fasteoi   ehci_hcd:usb2, eth0
 18:     269419     233099   IO-APIC-fasteoi   3w-xxxx
 19:          0          0   IO-APIC-fasteoi   sata_nv
 20:       8108          7   IO-APIC-fasteoi   sata_nv
 21:          0          0   IO-APIC-fasteoi   ohci_hcd:usb1
 22:    2321751          1   IO-APIC-fasteoi   saa7146 (0)
NMI:          0          0 
LOC:    1812879    1831471 
ERR:          1
MIS:         66

> 2) ethtool eth1

Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 1
        Transceiver: external
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

> 3) ethtool -d eth1

Offset  Values
--------        -----
000:     00 00 00 00 ff 00 00 00 03 00 00 00 ca 03 6d 00
010:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020:     00 53 25 06 65 13 70 ff 00 00 00 00 00 00 00 00
030:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
040:     0e e2 20 04 55 a8 00 00 20 2e 00 00 00 00 00 00
050:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
070:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
080:     3c 0f 3b 00 01 00 00 00 00 00 04 00 20 00 7f 00
090:     1c 06 00 00 01 00 00 00 00 00 00 00 35 7f 00 00
0a0:     0f 05 14 00 16 00 00 00 00 01 29 d2 fd 3d 00 00
0b0:     01 00 5e 00 00 01 00 00 ff ff ff ff ff ff 00 00
0c0:     02 00 00 10 01 00 00 00 01 00 00 00 01 00 00 00
0d0:     01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
0e0:     01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
0f0:     01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
100:     00 68 43 37 00 60 43 37 ff 00 7f 00 00 80 00 00
110:     32 00 01 00 00 00 00 00 04 00 00 00 e0 77 43 37
120:     a0 67 43 37 40 9a fb 33 e7 ff 00 a0 d8 4e ba 26
130:     1c 06 00 80 ec 77 43 37 bc 66 43 37 00 80 e0 0f
140:     20 41 30 00 00 26 00 80 00 00 00 00 00 00 00 00
150:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
160:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
170:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180:     16 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
190:     29 00 00 00 00 03 00 00 0d 00 94 01 03 00 00 00
1a0:     16 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
1b0:     29 00 00 00 00 03 00 00 0d 00 94 01 03 00 00 00
1c0:     16 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
1d0:     29 00 00 00 00 03 00 00 0d 00 94 01 03 00 00 00
1e0:     16 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
1f0:     29 00 00 00 00 03 00 00 0d 00 94 01 03 00 00 00
200:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
250:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260:     00 00 00 00 00 00 00 00 01 00 02 fe 00 01 00 00
270:     00 00 00 00 00 00 00 00 01 00 02 7e 00 01 00 00
280:     89 5a 01 00 76 00 00 00 00 00 00 00 00 00 00 00
290:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2a0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2b0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2c0:     00 00 00 00 00 00 00 00 a1 00 00 00 00 00 00 00
2d0:     00 00 00 00
Comment 4 Ayaz Abdulla 2007-09-26 14:20:46 UTC
After the tx timeout happens please... :)
Comment 5 spiridonov ed 2007-09-27 22:52:43 UTC
Created attachment 12976 [details]
ethtool output after tx timeout happens
Comment 6 Ayaz Abdulla 2007-09-28 11:42:20 UTC
Interrupts have stopped working as the count does not increment. The driver has setup the proper IRQ mask in the HW.

Can you send me the output of pci config space for the device when in the failed state?
Comment 7 spiridonov ed 2007-09-28 12:40:34 UTC
is "lspci -vvvv -xxx -s 00:0a.0" output enough?
i can send it after timeout occur again.

ps: lspci output on my system:
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:06.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80)
01:07.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01)
01:08.0 Multimedia controller: Philips Semiconductors SAA7146 (rev 01)
01:09.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01)
Comment 8 Ayaz Abdulla 2007-09-28 13:01:47 UTC
Yes, that command will do.
Comment 9 spiridonov ed 2007-09-28 13:40:10 UTC
after bug occur:
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
        Subsystem: nVidia Corporation Unknown device cb84
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0 (250ns min, 5000ns max)
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at febfa000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at bc00 [size=8]
        Capabilities: [44] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
00: de 10 57 00 07 00 b8 00 a3 00 80 06 00 00 00 00
10: 00 a0 bf fe 01 bc 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 de 10 84 cb
30: 00 00 00 00 44 00 00 00 00 00 00 00 07 01 01 14
40: de 10 84 cb 01 00 02 fe 00 01 00 00 0a 00 00 10
50: 05 64 84 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 0f 00 00 00 08 00 02 a8 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 11 00 00 00 40 ff ff ff 04 2a 32 07

diff with lspci before bug occur:
 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-f0: 00 00 00 00 11 00 00 00 40 ff ff ff 12 27 32 07
+f0: 00 00 00 00 11 00 00 00 40 ff ff ff 04 2a 32 07
Comment 10 Ayaz Abdulla 2007-09-28 14:30:10 UTC
The diff at offset 0xf0 is normal.

Can you try booting the system with boot options: noapic acpi=off
Comment 11 edo 2007-10-03 04:01:24 UTC
i cannot reproduce this bug with "noapic acpi=off" boot options
Comment 12 Natalie Protasevich 2008-02-11 08:40:48 UTC
Any update on this problem please.
Thanks
Comment 13 Francois Cartegnie 2008-03-02 14:33:25 UTC
Still valid on 2.6.24.2
Comment 14 Francois Cartegnie 2008-03-06 09:00:36 UTC
ethtool registers dump after lock:

Offset  Values
--------        -----
000:     72 00 00 00 ff 00 00 00 03 00 00 00 ca 03 ae 03
010:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030:     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
040:     0e e2 20 04 55 a4 00 00 20 2e 00 00 00 00 00 00
050:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060:     00 00 00 00 00 00 00 00 00 00 00 00 ff ff 00 00
070:     ff ff 00 00 ff ff 00 00 ff ff 00 00 00 00 00 00
080:     3c 0f 3b 00 01 00 00 c0 00 00 00 00 20 00 7f 00
090:     1c 06 00 00 01 00 00 00 00 00 00 00 8d 2d 00 00
0a0:     0f 07 16 00 16 00 00 00 00 1a 4d 62 72 e8 00 00
0b0:     01 00 00 00 00 00 00 00 cd cc 00 9d 8d 16 00 00
0c0:     01 00 00 1c 01 00 00 00 01 00 00 00 01 00 00 00
0d0:     01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
0e0:     01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
0f0:     01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
100:     00 28 4d 6e 00 20 4d 6e ff 00 7f 00 00 80 00 00
110:     64 00 01 00 00 00 00 00 38 00 00 00 00 2b 4d 6e
120:     40 25 4d 6e 40 44 df 69 eb ff 00 a0 10 f0 d7 57
130:     1c 06 00 80 0c 2b 4d 6e a0 24 4d 6e 00 80 e0 01
140:     20 41 30 00 00 22 c0 80 00 00 00 00 00 00 00 00
150:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
160:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
170:     80 00 ff 01 00 c0 00 00 00 00 00 00 00 00 00 00
180:     1e 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
190:     2a 00 00 00 00 00 00 00 b0 00 00 00 b3 81 00 00
1a0:     1e 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
1b0:     2a 00 00 00 00 00 00 00 b0 00 00 00 b3 81 00 00
1c0:     1e 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
1d0:     2a 00 00 00 00 00 00 00 b0 00 00 00 b3 81 00 00
1e0:     1e 00 00 00 08 00 00 00 6d 79 94 01 03 81 00 00
1f0:     2a 00 00 00 00 00 00 00 b0 00 00 00 b3 81 00 00
200:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
250:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260:     00 00 00 00 00 00 00 00 01 50 02 fe 00 01 00 00
270:     10 00 00 00 a1 00 00 00 11 50 02 fe a1 01 00 00
280:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
290:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2a0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2b0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2c0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2d0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2e0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2f0:     00 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
300:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
310:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
320:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
330:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
340:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
350:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
360:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
370:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
380:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
390:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3a0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b0:     00 00 00 00 04 00 00 00 ff ff 00 00 ff ff 00 00
3c0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3d0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3e0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3f0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
400:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
410:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
420:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
430:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
440:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
450:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
460:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
470:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
480:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
490:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4a0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4b0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4c0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4d0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4e0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4f0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
500:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
510:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
520:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
530:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
540:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
550:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
560:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
570:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
580:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
590:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5a0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5b0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5c0:     00 00 06 00 ff ff 00 00 00 00 00 00 00 00 00 00
5d0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5e0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5f0:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
600:     00 00 00 00
Comment 15 Christoph Gysin 2008-03-07 00:35:54 UTC
Same problem here. I got a Realtek 8139 with linux-2.6.23-gentoo-r6. Sometimes I  get the "NETDEV WATCHDOG" message, sometimes I don't. But anyway, if I generate constant high traffic on eth0 it suddenly dies.

pci=noacpi seems to partly fix the problem for me. I still get the message:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
NETDEV WATCHDOG: eth0: transmit timed out
eth0: link up, 100Mbps, full-duplex, lpa 0x41E1

But after that the interface hangs for 5s, then everything continues like before.

Let me know if you need more info about my configuration.

Before it happens:

$ ethtool -d eth0
RealTek RTL-8100B/8139D registers:
------------------------------
0x00: MAC Address                      00:03:2d:0b:75:24
0x08: Multicast Address Filter     0x80000000 0x00000000
0x10: Transmit Status Desc 0                  0x0008a042
0x14: Transmit Status Desc 1                  0x0008a072
0x18: Transmit Status Desc 2                  0x0008a042
0x1C: Transmit Status Desc 3                  0x0008a042
0x20: Transmit Start Addr  0                  0x1f280000
0x24: Transmit Start Addr  1                  0x1f280600
0x28: Transmit Start Addr  2                  0x1f280c00
0x2C: Transmit Start Addr  3                  0x1f281200
0x30: Rx buffer addr (C mode)                 0x1f290000
0x34: Early Rx Byte Count                              0
0x36: Early Rx Status                               0x0a
      ERxGood ERxOverWrite 
0x37: Command                                       0x0d
      Rx on, Tx on
0x38: Current Address of Packet Read (C mode)     0x593c
0x3A: Current Rx buffer address (C mode)          0x594c
0x3C: Interrupt Mask                              0xc07f
      SERR TimeOut RxFIFO LinkChg RxNoBuf TxErr TxOK RxErr RxOK 
0x3E: Interrupt Status                            0x0000
      
0x40: Tx Configuration                        0x77400680
0x44: Rx Configuration                        0x0000f78e
0x48: Timer count                             0x6d62d03c
0x4C: Missed packet counter                     0x000000
0x50: EEPROM Command                                0x00
0x51: Config 0                                      0x10
0x52: Config 1                                      0x8d
0x54: Timer interrupt                         0x00000000
0x58: Media status                                  0x10
0x59: Config 3                                      0xc5
0x5A: Config 4                                      0x88
0x5C: Multiple Interrupt Select                   0x0000
0x5E: PCI revision id                               0x10
0x60: Transmit Status of All Desc (C mode)        0xf00f
0x62: MII Basic Mode Control Register             0x1100
0x64: MII Basic Mode Status Register              0x782d
0x66: MII Autonegotiation Advertising             0x01e1
0x68: MII Link Partner Ability                    0x41e1
0x6A: MII Expansion                               0x0001
0x6C: MII Disconnect counter                      0x0000
0x6E: MII False carrier sense counter             0x0000
0x70: MII Nway test                               0x0704
0x72: MII RX_ER counter                           0x0000
0x74: MII CS configuration                        0x07c0
0x78: PHY parameter 1                         0x60f60c59
0x7C: Twister parameter                       0x7b732660
0x80: PHY parameter 2                               0x1a
0x84: PM CRC for wakeup frame 0                     0x00
0x85: PM CRC for wakeup frame 1                     0x00
0x86: PM CRC for wakeup frame 2                     0x00
0x87: PM CRC for wakeup frame 3                     0x00
0x88: PM CRC for wakeup frame 4                     0x00
0x89: PM CRC for wakeup frame 5                     0x00
0x8A: PM CRC for wakeup frame 6                     0x00
0x8B: PM CRC for wakeup frame 7                     0x00
0x8C: PM wakeup frame 0            0x80000000 0x00000000
0x94: PM wakeup frame 1            0x00000000 0x00000000
0x9C: PM wakeup frame 2            0x00000000 0x00000000
0xA4: PM wakeup frame 3            0x00000000 0x00000000
0xAC: PM wakeup frame 4            0x00000000 0x00000000
0xB4: PM wakeup frame 5            0x00800000 0x00000000
0xBC: PM wakeup frame 6            0x00000000 0x00000000
0xC4: PM wakeup frame 7            0x00020000 0x00000000
0xCC: PM LSB CRC for wakeup frame 0                 0x00
0xCD: PM LSB CRC for wakeup frame 1                 0x00
0xCE: PM LSB CRC for wakeup frame 2                 0x00
0xCF: PM LSB CRC for wakeup frame 3                 0x00
0xD0: PM LSB CRC for wakeup frame 4                 0x00
0xD1: PM LSB CRC for wakeup frame 5                 0x00
0xD2: PM LSB CRC for wakeup frame 6                 0x00
0xD3: PM LSB CRC for wakeup frame 7                 0x00
0xD8: Config 5                                      0x07

Diff against same output after eth0 hangs:

$ diff -u ethtool-d.before ethtool-d.after 
--- ethtool-d.before	2008-03-07 09:21:00.695940591 +0100
+++ ethtool-d.after	2008-03-07 09:19:57.672609881 +0100
@@ -3,8 +3,8 @@
 ------------------------------
 0x00: MAC Address                      00:03:2d:0b:75:24
 0x08: Multicast Address Filter     0x80000000 0x00000000
-0x10: Transmit Status Desc 0                  0x0008a042
-0x14: Transmit Status Desc 1                  0x0008a072
+0x10: Transmit Status Desc 0                  0x0008a072
+0x14: Transmit Status Desc 1                  0x0008a042
 0x18: Transmit Status Desc 2                  0x0008a042
 0x1C: Transmit Status Desc 3                  0x0008a042
 0x20: Transmit Start Addr  0                  0x1f280000
@@ -12,21 +12,21 @@
 0x28: Transmit Start Addr  2                  0x1f280c00
 0x2C: Transmit Start Addr  3                  0x1f281200
 0x30: Rx buffer addr (C mode)                 0x1f290000
-0x34: Early Rx Byte Count                              0
+0x34: Early Rx Byte Count                             32
 0x36: Early Rx Status                               0x0a
       ERxGood ERxOverWrite 
-0x37: Command                                       0x0d
+0x37: Command                                       0x0c
       Rx on, Tx on
-0x38: Current Address of Packet Read (C mode)     0x593c
-0x3A: Current Rx buffer address (C mode)          0x594c
+0x38: Current Address of Packet Read (C mode)     0xbfa0
+0x3A: Current Rx buffer address (C mode)          0x3fac
 0x3C: Interrupt Mask                              0xc07f
       SERR TimeOut RxFIFO LinkChg RxNoBuf TxErr TxOK RxErr RxOK 
-0x3E: Interrupt Status                            0x0000
-      
+0x3E: Interrupt Status                            0x0051
+      RxFIFO RxNoBuf RxOK 
 0x40: Tx Configuration                        0x77400680
 0x44: Rx Configuration                        0x0000f78e
-0x48: Timer count                             0x6d62d03c
-0x4C: Missed packet counter                     0x000000
+0x48: Timer count                             0x686f2a80
+0x4C: Missed packet counter                     0x000031
 0x50: EEPROM Command                                0x00
 0x51: Config 0                                      0x10
 0x52: Config 1                                      0x8d
@@ -46,7 +46,7 @@
 0x6E: MII False carrier sense counter             0x0000
 0x70: MII Nway test                               0x0704
 0x72: MII RX_ER counter                           0x0000
-0x74: MII CS configuration                        0x07c0
+0x74: MII CS configuration                        0x07c8
 0x78: PHY parameter 1                         0x60f60c59
 0x7C: Twister parameter                       0x7b732660
 0x80: PHY parameter 2                               0x1a
Comment 16 Christoph Gysin 2008-03-07 01:11:39 UTC
Hmm, alright, it seems that the boot option "irqpoll" fixes this for me. Haven't had the problem since. Does anybody know if this option might have any other drawbacks/side effects?
Comment 17 Francois Cartegnie 2008-03-12 16:55:12 UTC
Solves nothing, I always been on irqpoll.
What's your kernel version ? (Maybe we need to update bug summary's version)
Comment 18 Christoph Gysin 2008-03-13 01:35:25 UTC
As mentioned on #15, I'm using linux-2.6.23-gentoo-r6.
Comment 19 Jesse Morgan 2009-02-10 09:31:53 UTC
I'm also running into this on 2.6.24-gentoo-r4. The following messages repeated until the box was rebooted:

/var/log/messages:
Feb  9 21:54:03 xenon kernel: [1154047.642902] nv_stop_tx: TransmitterStatus remained busy<7>eth0: tx_timeout: dead entries!
Feb  9 22:03:24 xenon kernel: [1154607.776022] NETDEV WATCHDOG: eth0: transmit timed out
Feb  9 22:03:24 xenon kernel: [1154607.776027] eth0: Got tx_timeout. irq: 00000000
Feb  9 22:03:24 xenon kernel: [1154607.776028] eth0: Ring at 7b082000
Feb  9 22:03:24 xenon kernel: [1154607.776030] eth0: Dumping tx registers
...

From lspci -vvv:
00:07.0 Bridge: nVidia Corporation MCP61 Ethernet (rev a2)
        Subsystem: Giga-byte Technology Device e000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0 (250ns min, 5000ns max)
        Interrupt: pin A routed to IRQ 23
        Region 0: Memory at fb006000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at cc00 [size=8]
        Capabilities: [44] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Count=1/8 Enable-
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [6c] HyperTransport: MSI Mapping Enable- Fixed+
        Kernel driver in use: forcedeth
Comment 20 Francois Cartegnie 2009-02-19 17:19:18 UTC
No longer got this problem with 2.6.27.7

Note You need to log in before you can comment on or make changes to this bug.