Latest working kernel version: n/a Earliest failing kernel version: 2.6.25.9 Distribution: openSUSE 11.0 Hardware Environment: 32bit x86 (AMD Athlon XP 2200+) Software Environment: openSUSE 11.0 Problem Description: When I enable jumbo frames the board still *appears* to work but nobody can communicate with it. The other machines (multiple) in the network don't see ANY traffic FROM the tg3. On the machine with the tg3, tcpdump shows arp requests (from the other machines) and arp replies. After much work, I have determined the problem to be a combination of scatter-gather and jumbo frames. When both are used, the card does not work correctly. Disabling *either* appears to work but no benefit is realized until the interface is brought down and back up again. An "ethtool -t eth0" shows a single failed test (registers). I have a register dump and anything else anybody might want if this can be useful in bug-fixing. I also saw this: tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000006] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 Steps to reproduce: I've also tried the tg3.[c,h] compiled against 2.6.25.9 but using the latest git source from either linux-2.6 or netdev-2.6 as of 17 July 2008. tg3.c:v3.92.1 (June 9, 2008) the tg3.c sha1sum is 97c198a8152045f2e7da5fe0d702df1cd185cb8d.
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 22 Jul 2008 12:45:54 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11147 > > Summary: tg3 + jumbo frames + scatter-gather == inoperative NIC > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.25.9 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Network > AssignedTo: jgarzik@pobox.com > ReportedBy: jnelson-kernel-bugzilla@jamponi.net > > > Latest working kernel version: n/a > Earliest failing kernel version: 2.6.25.9 > Distribution: openSUSE 11.0 > Hardware Environment: 32bit x86 (AMD Athlon XP 2200+) > Software Environment: openSUSE 11.0 > Problem Description: > > When I enable jumbo frames the board still *appears* to work but > nobody can communicate with it. The other machines (multiple) in the > network don't see ANY traffic FROM the tg3. On the machine with the > tg3, tcpdump shows arp requests (from the other machines) and arp > replies. > > After much work, I have determined the problem to be a combination of > scatter-gather and jumbo frames. When both are used, the card does not > work correctly. Disabling *either* appears to work but no benefit is > realized until the interface is brought down and back up again. > > An "ethtool -t eth0" shows a single failed test (registers). I have a > register dump and anything else anybody might want if this can be > useful in bug-fixing. > > I also saw this: > > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000006] > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 > > > Steps to reproduce: > > I've also tried the tg3.[c,h] compiled against 2.6.25.9 but using the latest > git source from either linux-2.6 or netdev-2.6 as of 17 July 2008. > > tg3.c:v3.92.1 (June 9, 2008) > the tg3.c sha1sum is 97c198a8152045f2e7da5fe0d702df1cd185cb8d. > >
Reply-To: jnelson@jamponi.net What other information can I provide? This is 100% reproducible.
Jon Nelson wrote: > What other information can I provide? > This is 100% reproducible. > Please provide lspci output and tg3 signon output so that we know what device is failing.
Created attachment 16949 [details] { lspci -t ; lspci -vvv; lspci -k -xxx ; } > lspci-all.frank
Created attachment 16950 [details] registers from ethtool -d eth0 > registers.tg3
Reply-To: jnelson@jamponi.net On Wed, Jul 23, 2008 at 1:25 AM, Michael Chan <mchan@broadcom.com> wrote: > Jon Nelson wrote: > >> What other information can I provide? >> This is 100% reproducible. >> > > Please provide lspci output and tg3 signon output so that > we know what device is failing. OK. I added the output of: { lspci -t ; lspci -vvv; lspci -k -xxx ; } > lspci-all.frank as an attachment to the bug. Also: frank:~ # ethtool -t eth0 The test result is FAIL The test extra info: nvram test (online) 0 link test (online) 0 register test (offline) 1 memory test (offline) 0 loopback test (offline) 0 interrupt test (offline) 0 frank:~ # So, I added the output of: ethtool -d eth0 Now. The above is all with 1500 byte frames. If I switch to 9000 byte frames and take the interface down and then back up again, I am unable to communicate with it, *provided* scatter-gather is also enabled (which it is by default). From a cold boot (and 1500 byte frames): frank:~ # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: off frank:~ # I'll be adding ethtool -d using 9000 byte frames in the next 30 minutes. Is there anything else I can provide?
Created attachment 16951 [details] registers (9000 byte mtu)
Created attachment 16952 [details] registers (9000 byte mtu) with scatter-gather turned off
Reply-To: jnelson@jamponi.net I am able to confirm that 2.6.26 from openSUSE-Factory (2.6.26-20-pae-20.1) seems to work for a minute or two and then fails in the same manner as 2.6.25.[9,11]. -- Jon
Confirmed for 2.6.27.1 Will be trying 2.6.27.4 ASAP. All I have to do is turn scatter-gather *off* and the card works again. With it on *and* jumbo frames (9000 mtu) the card becomes inoperative. I can provide whatever debugging you like.
Closing as obsolete. If this can be reproduced with a modern kernel please re-open