Most recent kernel where this bug did not occur: none Distribution: Debian Sarge with latest kernel Hardware Environment: compaq nc6000 Software Environment: Problem Description: The output engine of the tg3 driver freezes when generating high load. `ifconfig' shows incomming packets, however, outgoing counter is not incremented any more. Resetting the device (ifdown eth0, ifup eth0) heals the problem. Steps to reproduce: Heavily copy files to NFS disk.
On Fri, 2 Jun 2006 05:40:51 -0700 bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=6638 > > Summary: tg3 output freezes on compaq nc6000 > Kernel Version: 2.6.16.19 > Status: NEW > Severity: normal > Owner: jgarzik@pobox.com > Submitter: Klaus.Reichl@alcatel.at > > > Most recent kernel where this bug did not occur: none > Distribution: Debian Sarge with latest kernel > Hardware Environment: compaq nc6000 > Software Environment: > Problem Description: > The output engine of the tg3 driver freezes when generating high load. > > `ifconfig' shows incomming packets, however, outgoing counter is not incremented > any more. > > Resetting the device (ifdown eth0, ifup eth0) heals the problem. > > Steps to reproduce: > Heavily copy files to NFS disk. > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is.
Reply-To: mchan@broadcom.com On Fri, 2006-06-02 at 11:22 -0700, Andrew Morton wrote: > On Fri, 2 Jun 2006 05:40:51 -0700 > bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=6638 > > > > Summary: tg3 output freezes on compaq nc6000 > > Kernel Version: 2.6.16.19 > > Status: NEW > > Severity: normal > > Owner: jgarzik@pobox.com > > Submitter: Klaus.Reichl@alcatel.at > > > > > > Most recent kernel where this bug did not occur: none > > Distribution: Debian Sarge with latest kernel > > Hardware Environment: compaq nc6000 > > Software Environment: > > Problem Description: > > The output engine of the tg3 driver freezes when generating high load. > > > > `ifconfig' shows incomming packets, however, outgoing counter is not incremented > > any more. > > Please provide: 1. tg3 probing output during ifconfig up. 2. /proc/interrupts output to see if interrupt counter is increasing after failure. 3. "ethtool -d eth0 > dump" after the failure.
Created attachment 8305 [details] Requested action data taken
Created attachment 8306 [details] Requested action data taken
Attachment 8306 [details] is a duplicate of 8305.
Does the NETDEV WATCHDOG catch this after some time (a few hours)? Also, do you have tcp segmentation offloading enabled (ethtool -k eth0)? I'm seeing a problem that looks like what you describe and I haven't had the problem in a few days, since I disabled tso (ethtool -K eth0 tso off). I'm wondering if this is the same issue or something else. I'm on 2.6.17.4 btw.
>>>>> Thomas M Steenholdt == tmus@tmus.dk writes: > ------- Additional Comments From tmus@tmus.dk 2006-08-08 13:34 ------- > Does the NETDEV WATCHDOG catch this after some time (a few hours)? I'm not sure whether I waited that long (this laptop is my workhorse :-(). Where I wrote in one of my last postings, I can easily reproduce the situation, this is not true any more. Seems some precondition necessary has changed on our net or 2.6.17 has positive influence - yes I know I should not change Kernels when searching for a bug, but as I said before I work on that machine. > Also, do you have tcp segmentation offloading enabled (ethtool -k eth0)? Segment offload is off: bash# ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off > I'm seeing a problem that looks like what you describe and I haven't had the > problem in a few days, since I disabled tso (ethtool -K eth0 tso off). > I'm wondering if this is the same issue or something else. > I'm on 2.6.17.4 btw. This is 2.6.17. Will upgrade however as soon things get stabilized after my holidays. Best regards, Klaus
We have a similar problem. Some of our servers loose network connection during backup to NFS mount. 44 14e4:1648 (rev 10) 32 14e4:16a6 (rev 02) 119 14e4:16a7 (rev 02) 59 14e4:16c7 (rev 10) Only the 59 servers equipped with "14e4:16c7 (rev 10)" are freezing occasionally. None of the other servers has this problem.
I can confirm this bug using 2.6.15 from the Ubuntu distribution (Dapper). Prior to upgrading from Breezy (using 2.6.12), I didn't have this problem. The bug showed up first when I tried to backup my laptop (HP nc6120) using rdiff-backup over NFS. Strangely, the backup succeeded when I used the client-server-mode of rdiff-backup instead of NFS. Today, I was writing changes to a bunch of MP3 files on my server attached via NFS. Again, the driver locked: Wasn't able to do any networking neither locally nor on the Internet unless I ifdown'ed and ifup'ed the eth0 device (ifup and ifdown and Debian-based higher level networking mechanisms). If there's still interest in resolving this bug, I'd be willing to do some testing on the 2.6.{12|15|17} series kernels. The highest-versioned one will be added after my upgrade to Ubuntu Edgy in less than a week.
I've upgraded to 2.6.17 but still suffer from this bug. :( Although I've spent hours modifying MP3 files over NFS without a problem, with the new kernel rdiff-backup traffic still seems to be too much for the driver as it drops out. Has anyone made any progress on this issue? I'd be really willing to help debugging. I'll provide what mchan@broadcom.com sought before: 1. dmesg output of ifconfig -v eth0 up: Nov 9 00:50:58 localhost kernel: [17210726.392000] ADDRCONF(NETDEV_UP): eth0: link is not ready Nov 9 00:51:00 localhost kernel: [17210727.960000] tg3: eth0: Link is up at 100 Mbps, full duplex. Nov 9 00:51:00 localhost kernel: [17210727.960000] tg3: eth0: Flow control is on for TX and on for RX. Nov 9 00:51:00 localhost kernel: [17210727.960000] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready dmesg output of modprobe -v: [17210352.316000] tg3.c:v3.59.1 (August 25, 2006) [17210352.316000] ACPI: PCI Interrupt 0000:02:0e.0[A] -> GSI 16 (level, low) -> IRQ 169 [17210352.348000] eth0: Tigon3 [partno(BCM95705A50) rev 3003 PHY(5705)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:14:38:1a:20:0a [17210352.348000] eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[0] TSOcap[1] [17210352.348000] eth0: dma_rwctrl[763f0000] dma_mask[64-bit] 2. interrupt counter of eth0 (`i915@pci:0000:00:02.0, eth0' in /proc/interrupts) keeps increasing after network failure tremendously slowly, very close to stalling (note: I'm pinging some Internet host to make sure that still there should be some traffic) 3. ethtool output attached
Created attachment 9433 [details] output of "ethtool -d eth0" after NIC broke down
Hi, I've upgraded recently to 2.6.18.1 and am running Emacs INBOX again on the laptop (actually auto-save of my rather big INBOX triggered my problem originally). Since than (more than one week now), I did not see the problem again. I don't have time to compare the tg3 driver, however, I saw in the ChangeLogs that there was activity on tg3. Maybe somebody with knowledge about the changes could comment here, maybe we can close the issue ;-). Cheers, Klaus