Bug 11147
Summary: | tg3 + jumbo frames + scatter-gather == inoperative NIC | ||
---|---|---|---|
Product: | Drivers | Reporter: | Jon Nelson (jnelson-kernel-bugzilla) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, mcarlson |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25.9 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
{ lspci -t ; lspci -vvv; lspci -k -xxx ; } > lspci-all.frank
registers from ethtool -d eth0 > registers.tg3 registers (9000 byte mtu) registers (9000 byte mtu) with scatter-gather turned off |
Description
Jon Nelson
2008-07-22 12:45:54 UTC
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 22 Jul 2008 12:45:54 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11147 > > Summary: tg3 + jumbo frames + scatter-gather == inoperative NIC > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.25.9 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Network > AssignedTo: jgarzik@pobox.com > ReportedBy: jnelson-kernel-bugzilla@jamponi.net > > > Latest working kernel version: n/a > Earliest failing kernel version: 2.6.25.9 > Distribution: openSUSE 11.0 > Hardware Environment: 32bit x86 (AMD Athlon XP 2200+) > Software Environment: openSUSE 11.0 > Problem Description: > > When I enable jumbo frames the board still *appears* to work but > nobody can communicate with it. The other machines (multiple) in the > network don't see ANY traffic FROM the tg3. On the machine with the > tg3, tcpdump shows arp requests (from the other machines) and arp > replies. > > After much work, I have determined the problem to be a combination of > scatter-gather and jumbo frames. When both are used, the card does not > work correctly. Disabling *either* appears to work but no benefit is > realized until the interface is brought down and back up again. > > An "ethtool -t eth0" shows a single failed test (registers). I have a > register dump and anything else anybody might want if this can be > useful in bug-fixing. > > I also saw this: > > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000006] > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 > > > Steps to reproduce: > > I've also tried the tg3.[c,h] compiled against 2.6.25.9 but using the latest > git source from either linux-2.6 or netdev-2.6 as of 17 July 2008. > > tg3.c:v3.92.1 (June 9, 2008) > the tg3.c sha1sum is 97c198a8152045f2e7da5fe0d702df1cd185cb8d. > > Reply-To: jnelson@jamponi.net What other information can I provide? This is 100% reproducible. Jon Nelson wrote:
> What other information can I provide?
> This is 100% reproducible.
>
Please provide lspci output and tg3 signon output so that
we know what device is failing.
Created attachment 16949 [details]
{ lspci -t ; lspci -vvv; lspci -k -xxx ; } > lspci-all.frank
Created attachment 16950 [details]
registers from ethtool -d eth0 > registers.tg3
Reply-To: jnelson@jamponi.net On Wed, Jul 23, 2008 at 1:25 AM, Michael Chan <mchan@broadcom.com> wrote: > Jon Nelson wrote: > >> What other information can I provide? >> This is 100% reproducible. >> > > Please provide lspci output and tg3 signon output so that > we know what device is failing. OK. I added the output of: { lspci -t ; lspci -vvv; lspci -k -xxx ; } > lspci-all.frank as an attachment to the bug. Also: frank:~ # ethtool -t eth0 The test result is FAIL The test extra info: nvram test (online) 0 link test (online) 0 register test (offline) 1 memory test (offline) 0 loopback test (offline) 0 interrupt test (offline) 0 frank:~ # So, I added the output of: ethtool -d eth0 Now. The above is all with 1500 byte frames. If I switch to 9000 byte frames and take the interface down and then back up again, I am unable to communicate with it, *provided* scatter-gather is also enabled (which it is by default). From a cold boot (and 1500 byte frames): frank:~ # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: off frank:~ # I'll be adding ethtool -d using 9000 byte frames in the next 30 minutes. Is there anything else I can provide? Created attachment 16951 [details]
registers (9000 byte mtu)
Created attachment 16952 [details]
registers (9000 byte mtu) with scatter-gather turned off
Reply-To: jnelson@jamponi.net I am able to confirm that 2.6.26 from openSUSE-Factory (2.6.26-20-pae-20.1) seems to work for a minute or two and then fails in the same manner as 2.6.25.[9,11]. -- Jon Confirmed for 2.6.27.1 Will be trying 2.6.27.4 ASAP. All I have to do is turn scatter-gather *off* and the card works again. With it on *and* jumbo frames (9000 mtu) the card becomes inoperative. I can provide whatever debugging you like. Closing as obsolete. If this can be reproduced with a modern kernel please re-open |