Latest working kernel version: 2.6.25.4 Earliest failing kernel version: 2.6.22.10 Distribution: modified Rock linux 2.0.2 Hardware Environment: Sun Netra X4200 M2 Server, CK804 Ethernet Controller (rev a3) Software Environment: Any recent kernel.org kernel, either 32 bit or 64 bit Problem Description: The Nvidia CK48 chipset N2200 chip has an integrated NIC that hangs under specific conditions. The hang completely disables the NIC from sending or receiving packets. The conditions are as follows: 1.) configure the NIC for 100 Mb autoneg on 2.) Connect the NIC to a managed switch port configured for 100 Mb autoneg on. 3.) Boot up the server. For these conditions, the NIC will always link at 100 Mb half duplex, while the switch will link at 100 Mb full duplex. This is a bug in itself, but it isn't the failure mode. (Note that this NIC can be forced to 100 Mb full duplex by configuring it as such with ethtool.) This link mismatch is a necessary condition to reproduce the NIC hang. 5.) The NIC can fail in normal service. It can fail at boot time link negotiation as well. The boot time failure is easiest to reproduce. Feb 1 22:31:06 dut kernel: nv_stop_tx: TransmitterStatus remained busy<6>NETDEV WATCHDOG: eth2: transmit timed out Feb 1 22:31:06 dut kernel: eth2: Got tx_timeout. irq: 00000020 Feb 1 22:31:06 dut kernel: eth2: Ring at 21ebd8000 Feb 1 22:31:06 dut kernel: eth2: Dumping tx registers Feb 1 22:31:06 dut kernel: 0: 00000020 00000000 00000003 009803ca 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 20: 00000014 4f9e6480 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 40: 0420e20e 0000a855 00002e20 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 80: 003b0f3e 00000000 00000001 007f0020 0000061c 00000000 00000000 00002d5f Feb 1 22:31:06 dut kernel: a0: 0016070f 00000016 9e4f1400 00008064 00000001 00000000 6100cccd 0000049b Feb 1 22:31:06 dut kernel: c0: 10000101 00000001 00000001 00000001 00000001 00000001 00000001 00000001 Feb 1 22:31:06 dut kernel: e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 Feb 1 22:31:06 dut kernel: 100: 1ebd8800 1ebd8000 007f00ff 00008000 00000000 00000000 0000005f 1ebd8ae0 Feb 1 22:31:06 dut kernel: 120: 1ebd80e0 1fb1f440 ac0000ea 00000000 00000000 1ebd880c 1ebd800c 01e08000 Feb 1 22:31:06 dut kernel: 140: 00304120 c000260c 00000002 00000002 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 180: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 1a0: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 1c0: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 1e0: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 200: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 260: 00000000 00000000 fe020001 00000100 00000000 00000000 7e020001 00000100 Feb 1 22:31:07 dut kernel: 280: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 2c0: 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 Feb 1 22:31:07 dut kernel: eth2: Dumping tx ring Feb 1 22:31:07 dut kernel: 000: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 004: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 008: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 00c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 010: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 014: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 018: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 01c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 020: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 024: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 028: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 02c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 030: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 034: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 038: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 03c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 040: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 044: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 048: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 04c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 050: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 054: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 058: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 05c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 060: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 064: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 068: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 06c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 070: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 074: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 078: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 07c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 080: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 084: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 088: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 08c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 090: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 094: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 098: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 09c: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0a0: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0a4: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0a8: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0ac: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0b0: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0b4: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0b8: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0bc: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0c0: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0c4: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0c8: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0cc: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0d0: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0d4: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0d8: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0dc: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 0e0: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:08 dut kernel: 0e4: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:08 dut kernel: 0e8: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:08 dut kernel: 0ec: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:08 dut kernel: 0f0: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:08 dut kernel: 0f4: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:08 dut kernel: 0f8: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:31:08 dut kernel: 0fc: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Feb 1 22:35:48 dut init: Switching to runlevel: 0 rebooting... Feb 1 22:48:50 dut kernel: eth2: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:0a.0 shuting down. Feb 2 19:36:02 dut kernel: nv_stop_tx: TransmitterStatus remained busy<6>nv_stop_tx: TransmitterStatus remained busy<5>audit(1202009762.086:103): audit_pid=0 old=2625 by auid=4294967295 Feb 1 22:31:06 dut kernel: nv_stop_tx: TransmitterStatus remained busy<6>NETDEV WATCHDOG: eth2: transmit timed out Feb 1 22:31:06 dut kernel: eth2: Got tx_timeout. irq: 00000020 Feb 1 22:31:06 dut kernel: eth2: Ring at 21ebd8000 Feb 1 22:31:06 dut kernel: eth2: Dumping tx registers Feb 1 22:31:06 dut kernel: 0: 00000020 00000000 00000003 009803ca 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 20: 00000014 4f9e6480 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 40: 0420e20e 0000a855 00002e20 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 80: 003b0f3e 00000000 00000001 007f0020 0000061c 00000000 00000000 00002d5f Feb 1 22:31:06 dut kernel: a0: 0016070f 00000016 9e4f1400 00008064 00000001 00000000 6100cccd 0000049b Feb 1 22:31:06 dut kernel: c0: 10000101 00000001 00000001 00000001 00000001 00000001 00000001 00000001 Feb 1 22:31:06 dut kernel: e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 Feb 1 22:31:06 dut kernel: 100: 1ebd8800 1ebd8000 007f00ff 00008000 00000000 00000000 0000005f 1ebd8ae0 Feb 1 22:31:06 dut kernel: 120: 1ebd80e0 1fb1f440 ac0000ea 00000000 00000000 1ebd880c 1ebd800c 01e08000 Feb 1 22:31:06 dut kernel: 140: 00304120 c000260c 00000002 00000002 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 180: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 1a0: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 1c0: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 1e0: 00000016 00000008 0194796d 00008103 00000021 0000796d 0194000d 0000000f Feb 1 22:31:06 dut kernel: 200: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:06 dut kernel: 260: 00000000 00000000 fe020001 00000100 00000000 00000000 7e020001 00000100 Feb 1 22:31:07 dut kernel: 280: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 1 22:31:07 dut kernel: 2c0: 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 Feb 1 22:31:07 dut kernel: eth2: Dumping tx ring Feb 1 22:31:07 dut kernel: 000: 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 // 00000000 00000000 00000000 Note -- a soft reboot did not clear the failure. Steps to reproduce: The NIC must be in 100 Mb half duplex mode while the switch port is in 100 Mb full duplex mode. The quickest way to recreate this failure is to write a boot script that will repeatedly reboot the server, terminating the reboot sequence once the NIC has failed. After configuring the NIC as described above, and with a sustained packet rate of about 3000 packets per second, run a reboot test on the Sun X4200 M2 server. The reboot test simply tests if the NIC is linked and running after the boot. If it is working OK, then the reboot test reboots the server. My reboot test is quite effective at inducing this failure. Many failures occur in a couple hours. The longest I have seen this test run without failure is about 7 hours. The failure can be definitively identified by running the ethtool offline selftest on the NIC. This test will always fail, when the NIC has failed. Note this is at 100Mb. I am aware the ethtool selftest always fails when the NIC is configured for 1000Mb. Also note this failure I am describing does not occur when the CK804 NIC is configured for 1000Mb. Once the NIC fails, it is fatally hung. A warm reboot will not clear this failure. I have learned that I can clear the failure by powering the appliance off. Then powering it up. Then powering it down immediately after the power is on and the BIOS POST is starting to execute. Then reboot and let the X4200 M2 server boot up this time. The NIC will have the error cleared. Note this failure mode will also repeat running RHEL 5.1 kernels on either 32 or 64 bit installs. The failure is always the same with any of the kernels mentioned in this bug report. I did the extra testing with the RHEL kernels, because Sun Microsystems required it, even though the production kernels for the product are kernel.org kernels running Rock-2.0.2 distributions. More info: # lspci 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 04:00.0 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (A-Segment Bridge) (rev 09) 04:00.2 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (B-Segment Bridge) (rev 09) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 80:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 80:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 80:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 80:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8132 PCI-X Bridge (rev 12) 80:10.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12) 80:11.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8132 PCI-X Bridge (rev 12) 80:11.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12) 83:00.0 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (A-Segment Bridge) (rev 09) 83:00.2 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (B-Segment Bridge) (rev 09) 8e:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) 8e:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) 8e:02.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 02) # lspci -t -+-[0000:80]-+-00.0 | +-01.0 | +-0a.0 | +-0b.0-[0000:81]-- | +-0c.0-[0000:82]-- | +-0d.0-[0000:83-85]--+-00.0-[0000:85]-- | | \-00.2-[0000:84]-- | +-0e.0-[0000:86]-- | +-10.0-[0000:87]-- | +-10.1 | +-11.0-[0000:8e]--+-01.0 | | +-01.1 | | \-02.0 | \-11.1 \-[0000:00]-+-00.0 +-01.0 +-01.1 +-02.0 +-02.1 +-06.0 +-09.0-[0000:01]----03.0 +-0a.0 +-0b.0-[0000:02]-- +-0c.0-[0000:03]-- +-0d.0-[0000:04-06]--+-00.0-[0000:06]-- | \-00.2-[0000:05]-- +-0e.0-[0000:07]-- +-18.0 +-18.1 +-18.2 +-18.3 +-19.0 +-19.1 +-19.2 \-19.3 # lspci -v 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) Flags: bus master, 66MHz, fast devsel, latency 0 Capabilities: [44] HyperTransport: Slave or Primary Interface Capabilities: [e0] HyperTransport: MSI Mapping 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) Subsystem: nVidia Corporation Unknown device cb84 Flags: bus master, 66MHz, fast devsel, latency 0 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) Subsystem: nVidia Corporation Unknown device cb84 Flags: 66MHz, fast devsel I/O ports at 2800 [size=32] I/O ports at 0400 [size=64] I/O ports at 0440 [size=64] Capabilities: [44] Power Management version 2 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) (prog-if 10 [OHCI]) Subsystem: Sun Microsystems Computer Corp. Unknown device cb84 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 58 Memory at fe3ff000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) (prog-if 20 [EHCI]) Subsystem: Sun Microsystems Computer Corp. Unknown device cb84 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 66 Memory at fe3fec00 (32-bit, non-prefetchable) [size=256] Capabilities: [44] Debug port Capabilities: [80] Power Management version 2 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) (prog-if 8a [Master SecP PriP]) Subsystem: nVidia Corporation Unknown device cb84 Flags: bus master, 66MHz, fast devsel, latency 0 I/O ports at 2000 [size=16] Capabilities: [44] Power Management version 2 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) (prog-if 01 [Subtractive decode]) Flags: bus master, 66MHz, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=128 I/O behind bridge: 0000c000-0000cfff Memory behind bridge: fc200000-fe2fffff Prefetchable memory behind bridge: e2000000-e20fffff 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) Subsystem: nVidia Corporation Unknown device cb84 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 90 Memory at fe3fd000 (32-bit, non-prefetchable) [size=4K] I/O ports at dc00 [size=8] Capabilities: [44] Power Management version 2 00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=06, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=07, subordinate=07, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration Flags: fast devsel Capabilities: [80] HyperTransport: Host or Secondary Interface Capabilities: [a0] HyperTransport: Host or Secondary Interface Capabilities: [c0] HyperTransport: Host or Secondary Interface 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map Flags: fast devsel 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller Flags: fast devsel 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control Flags: fast devsel Capabilities: [f0] #0f [0010] 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration Flags: fast devsel Capabilities: [80] HyperTransport: Host or Secondary Interface Capabilities: [a0] HyperTransport: Host or Secondary Interface Capabilities: [c0] HyperTransport: Host or Secondary Interface 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map Flags: fast devsel 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller Flags: fast devsel 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control Flags: fast devsel Capabilities: [f0] #0f [0010] 01:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: ATI Technologies Inc Rage XL Flags: bus master, stepping, medium devsel, latency 64, IRQ 10 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] I/O ports at c800 [size=256] Memory at fe2ff000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at e2000000 [disabled] [size=128K] Capabilities: [5c] Power Management version 2 04:00.0 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (A-Segment Bridge) (rev 09) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=04, secondary=06, subordinate=06, sec-latency=64 Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0 Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [6c] Power Management version 2 Capabilities: [d8] PCI-X bridge device 04:00.2 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (B-Segment Bridge) (rev 09) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=04, secondary=05, subordinate=05, sec-latency=64 Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0 Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [6c] Power Management version 2 Capabilities: [d8] PCI-X bridge device 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) Flags: bus master, 66MHz, fast devsel, latency 0 Capabilities: [44] HyperTransport: Slave or Primary Interface Capabilities: [e0] HyperTransport: MSI Mapping 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) Subsystem: nVidia Corporation Unknown device cb84 Flags: bus master, 66MHz, fast devsel, latency 0 Memory at feaff000 (32-bit, non-prefetchable) [size=4K] 80:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) Subsystem: nVidia Corporation Unknown device cb84 Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 106 Memory at feafe000 (32-bit, non-prefetchable) [size=4K] I/O ports at fc00 [size=8] Capabilities: [44] Power Management version 2 80:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=81, subordinate=81, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 80:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=82, subordinate=82, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 80:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=83, subordinate=85, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=86, subordinate=86, sec-latency=0 Capabilities: [40] Power Management version 2 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable+ Capabilities: [58] HyperTransport: MSI Mapping Capabilities: [80] Express Root Port (Slot+) IRQ 0 80:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8132 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 64 Bus: primary=80, secondary=87, subordinate=87, sec-latency=64 Capabilities: [60] PCI-X bridge device Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration Capabilities: [c0] HyperTransport: Slave or Primary Interface Capabilities: [f4] HyperTransport: MSI Mapping 80:10.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12) (prog-if 10 [IO-APIC]) Subsystem: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC Flags: bus master, medium devsel, latency 0 Memory at feafd000 (64-bit, non-prefetchable) [size=4K] 80:11.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8132 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 64 Bus: primary=80, secondary=8e, subordinate=8e, sec-latency=64 I/O behind bridge: 0000e000-0000efff Memory behind bridge: fe500000-fe9fffff Capabilities: [60] PCI-X bridge device Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration Capabilities: [c0] HyperTransport: Revision ID: 2.00 Capabilities: [f4] HyperTransport: MSI Mapping 80:11.1 PIC: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC (rev 12) (prog-if 10 [IO-APIC]) Subsystem: Advanced Micro Devices [AMD] AMD-8132 PCI-X IOAPIC Flags: bus master, medium devsel, latency 0 Memory at feafc000 (64-bit, non-prefetchable) [size=4K] 83:00.0 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (A-Segment Bridge) (rev 09) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=83, secondary=85, subordinate=85, sec-latency=64 Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0 Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [6c] Power Management version 2 Capabilities: [d8] PCI-X bridge device 83:00.2 PCI bridge: Intel Corporation 41210 [Lanai] Serial to Parallel PCI Bridge (B-Segment Bridge) (rev 09) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=83, secondary=84, subordinate=84, sec-latency=64 Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0 Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [6c] Power Management version 2 Capabilities: [d8] PCI-X bridge device 8e:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 82 Memory at fe9e0000 (64-bit, non-prefetchable) [size=128K] I/O ports at ec00 [size=64] Capabilities: [dc] Power Management version 2 Capabilities: [e4] PCI-X non-bridge device Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- 8e:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 98 Memory at fe9c0000 (64-bit, non-prefetchable) [size=128K] I/O ports at e800 [size=64] Capabilities: [dc] Power Management version 2 Capabilities: [e4] PCI-X non-bridge device Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- 8e:02.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 02) Subsystem: LSI Logic / Symbios Logic Unknown device 3060 Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 74 I/O ports at e400 [disabled] [size=256] Memory at fe9bc000 (64-bit, non-prefetchable) [size=16K] Memory at fe9a0000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at fe600000 [disabled] [size=2M] Capabilities: [50] Power Management version 2 Capabilities: [98] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [68] PCI-X non-bridge device Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1
One more detail. I personally reproduced the NIC mismatch and the NIC TX failure many times in the lab on 5 different Netra servers, using quite a few different kernel.org and RHEL kernels. It happened on every kernel I tested, with 32 and 64 bit compiles. And it was produced in a data center using the 2.6.22.10 kernel several times completely independent of me. For more details, please see the bug ID 10885. And I need to clarify that the last kernel I tested this on was actually linux-2.6.24-rc8-git6. I mistated in the bug that I observed this failure on the 2.6.25.4 kernel. That is inaccurate. I don't know, if it exist in the 2.6.25.4 kernel, because I never tested that kernel. My error.
could you test if how that nic behaves with either "irqpoll" or "noapic acpi=off" ? (see http://bugzilla.kernel.org/show_bug.cgi?id=9015 - maybe related)
Can you try the latest ethtool? I recall there was an issue with older ethtool that would not send down the correct settings to nic driver.
For info, the problem appears also with last RHEL4 kernels also (2.6.9-78). And maybe before... forcedeth module versions 0.60 & 0.61 have the problem. Forcedeth module fails during big transfers after some seconds IF static configuration is set on switch (no autoneg, Full duplex) and autoneg is set on forcedeth card. In this cas, duplex can not be negotiate and eth falls back to 100 half duplex. In other cases, the transfers is done w/o problem. If transfer stalled, no more traffic can be done and network must be restarted. dmesg : ../.. forcedeth: Reverse Engineered nForce ethernet driver. Version 0.61. ACPI: PCI Interrupt 0000:00:14.0[A] -> GSI 21 (level, low) -> IRQ 201 PCI: Setting latency timer of device 0000:00:14.0 to 64 divert: allocating divert_blk for eth0 ../.. forcedeth 0000:00:14.0: ifname eth0, PHY OUI 0x1c1 @ 0, addr 00:19:db:44:b6:b8 forcedeth 0000:00:14.0: highdma pwrctl timirq gbit lnktim desc-v3 ../.. nv_stop_tx: TransmitterStatus remained busy<6>eth0: link down. nv_stop_tx: TransmitterStatus remained busy<6>eth0: link up. nv_stop_tx: TransmitterStatus remained busy ../.. As requested, "irqpoll" and "noapic acpi=off" options change nothing. HW config: NEC POWERMATE_VL360 C51MCP51 AMD3800+ 00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3) Behaviour with different HW: Problem does not appear with TG3 and E1000 modules. PS: restart network needs MACADDRESS field to be set to HWADDRESS in ifcfg-eth0 config file (RHEL^h^h^h^hLSB ;-)) to avoid reverse numbering address problem.
Please re-open if seen on a modern (2.6.32+) kernel