Bug 12971

Summary: "tg3 transmit timed out" when transmitting at high bitrate
Product: Drivers Reporter: Nikolay (dobrev666)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan, alexey.kv, dobrev666, dwayne.fontenot, fragabr, mcarlson, mizvekov, rjw, stefanonafets
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 12398    
Attachments: tg3: Avoid RDMA engine lockups

Description Nikolay 2009-03-29 18:02:04 UTC
I have laptop - HP compaq 6715s with broadcom lan card (from lspci):
10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled kernel 2.6.29 from kernel.org. 

When I try to copy a file trough the LAN from this laptop with a command like this:
scp test.bin 192.168.0.1:/tmp

Some MBytes are copied and the transmit stops and there is some messages in dmesg:

WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0()
Hardware name: HP Compaq 6715s (GR897ES#ABB)
NETDEV WATCHDOG: eth0 (tg3): transmit timed out
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_devic
e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table ppdev lp parpor
t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel snd_hda_codec tg3 fan yen
ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal output rtc_core pcm
cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart container snd_tim
er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp soundcore k8temp s
nd_page_alloc hwmon
Pid: 0, comm: swapper Not tainted 2.6.29-smp #1
Call Trace:
 [<c01294b6>] warn_slowpath+0x86/0xa0
 [<c0120030>] ? sd_init_MC+0xa0/0xd0
 [<c0120fce>] ? __enqueue_entity+0x8e/0xb0
 [<c03a483f>] ? cpumask_next_and+0x1f/0x40
 [<c012260a>] ? find_busiest_group+0x18a/0x710
 [<c011f8e5>] ? enqueue_task+0x15/0x30
 [<c03a94ed>] ? strlcpy+0x1d/0x60
 [<c06bfc02>] dev_watchdog+0x1c2/0x1d0
 [<c014320b>] ? getnstimeofday+0x4b/0x120
 [<c01320f4>] run_timer_softirq+0x124/0x190
 [<c06bfa40>] ? dev_watchdog+0x0/0x1d0
 [<c012e03a>] __do_softirq+0x8a/0x150
 [<c0132614>] ? update_process_times+0x54/0x70
 [<c012e13b>] do_softirq+0x3b/0x50
 [<c012e47b>] irq_exit+0x3b/0x50
 [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90
 [<c0103940>] apic_timer_interrupt+0x28/0x30
 [<c0109978>] ? default_idle+0x38/0x50
 [<c0109b80>] c1e_idle+0x90/0xf0
 [<c0109b91>] ? c1e_idle+0xa1/0xf0
 [<c0101bea>] cpu_idle+0x4a/0x70
 [<c072f705>] rest_init+0x55/0x60
---[ end trace 0c67ed7bcfb14db6 ]---
tg3: eth0: transmit timed out, resetting
tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: Link is down.
tg3: eth0: Link is up at 100 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.

After that I cannot send or receive any data through the LAN. There is no ping to anywhere.
These commands fix the problem:

ip l set dev eth0 down
ip l set dev eth0 up

And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network again.

But this laptop has no problems when receiving data. I can copy with scp from other PC connected to the same lan with 11Mbytes/s.

I have search the web to find a solution with no success.
I have tried some kernel options like "irqpoll" and "acpi=off" with no success.
Comment 1 Nikolay 2009-04-02 08:36:27 UTC
Hi,

On Thu, Apr 2, 2009 at 2:43 AM, Andrew Morton <akpm@linux-foundation.org>wrote:

>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Sun, 29 Mar 2009 18:02:04 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=12971
> >
> >            Summary: "tg3 transmit timed out" when transmitting at high
> >                     bitrate
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: 2.6.29
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Network
> >         AssignedTo: drivers_network@kernel-bugs.osdl.org
> >         ReportedBy: dobrev666@gmail.com
> >         Regression: No
> >
> >
> > I have laptop - HP compaq 6715s with broadcom lan card (from lspci):
> > 10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast
> > Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled
> kernel
> > 2.6.29 from kernel.org.
> >
> > When I try to copy a file trough the LAN from this laptop with a command
> like
> > this:
> > scp test.bin 192.168.0.1:/tmp
> >
> > Some MBytes are copied and the transmit stops and there is some messages
> in
> > dmesg:
> >
> > WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0()
> > Hardware name: HP Compaq 6715s (GR897ES#ABB)
> > NETDEV WATCHDOG: eth0 (tg3): transmit timed out
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> > snd_seq_devic
> > e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table
> ppdev lp
> > parpor
> > t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel snd_hda_codec
> tg3
> > fan yen
> > ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal output
> > rtc_core pcm
> > cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart
> container
> > snd_tim
> > er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp
> soundcore
> > k8temp s
> > nd_page_alloc hwmon
> > Pid: 0, comm: swapper Not tainted 2.6.29-smp #1
> > Call Trace:
> >  [<c01294b6>] warn_slowpath+0x86/0xa0
> >  [<c0120030>] ? sd_init_MC+0xa0/0xd0
> >  [<c0120fce>] ? __enqueue_entity+0x8e/0xb0
> >  [<c03a483f>] ? cpumask_next_and+0x1f/0x40
> >  [<c012260a>] ? find_busiest_group+0x18a/0x710
> >  [<c011f8e5>] ? enqueue_task+0x15/0x30
> >  [<c03a94ed>] ? strlcpy+0x1d/0x60
> >  [<c06bfc02>] dev_watchdog+0x1c2/0x1d0
> >  [<c014320b>] ? getnstimeofday+0x4b/0x120
> >  [<c01320f4>] run_timer_softirq+0x124/0x190
> >  [<c06bfa40>] ? dev_watchdog+0x0/0x1d0
> >  [<c012e03a>] __do_softirq+0x8a/0x150
> >  [<c0132614>] ? update_process_times+0x54/0x70
> >  [<c012e13b>] do_softirq+0x3b/0x50
> >  [<c012e47b>] irq_exit+0x3b/0x50
> >  [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90
> >  [<c0103940>] apic_timer_interrupt+0x28/0x30
> >  [<c0109978>] ? default_idle+0x38/0x50
> >  [<c0109b80>] c1e_idle+0x90/0xf0
> >  [<c0109b91>] ? c1e_idle+0xa1/0xf0
> >  [<c0101bea>] cpu_idle+0x4a/0x70
> >  [<c072f705>] rest_init+0x55/0x60
> > ---[ end trace 0c67ed7bcfb14db6 ]---
> > tg3: eth0: transmit timed out, resetting
> > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
> > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
> > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
> > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
> > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> > tg3: eth0: Link is down.
> > tg3: eth0: Link is up at 100 Mbps, full duplex.
> > tg3: eth0: Flow control is on for TX and on for RX.
> >
> > After that I cannot send or receive any data through the LAN. There is no
> ping
> > to anywhere.
> > These commands fix the problem:
> >
> > ip l set dev eth0 down
> > ip l set dev eth0 up
> >
> > And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network
> again.
> >
> > But this laptop has no problems when receiving data. I can copy with scp
> from
> > other PC connected to the same lan with 11Mbytes/s.
> >
> > I have search the web to find a solution with no success.
> > I have tried some kernel options like "irqpoll" and "acpi=off" with no
> success.
>
> I assume that 2.6.28 was OK, and that this is a regression?


I have not tried 2.6.28, but 2.6.27.9 and there is no difference. If you
wish I can try 2.6.28. If you need additional information I will try to send
it.

Thanks,
Nikolay
Comment 2 Alexey Kunitskiy 2009-04-14 09:46:44 UTC
It seems that problem is in scatter-gather offload in tg3 driver. I have the same laptop and the same problem.
For quick fix try to turn it off:
> ethtool --offload eth0 sg off

And post here your results
Comment 3 Nikolay 2009-04-15 19:53:50 UTC
(In reply to comment #2)
> It seems that problem is in scatter-gather offload in tg3 driver. I have the
> same laptop and the same problem.
> For quick fix try to turn it off:
> > ethtool --offload eth0 sg off
> 
> And post here your results

It is working with "sg off", but the LAN is not fully utilized. I transmit with 9.5Mbytes/sec and receive with 11.4Mbytes/sec.

Thanks,
Nikolay
Comment 4 Matt Carlson 2009-04-15 23:05:58 UTC
On Thu, Apr 02, 2009 at 01:36:23AM -0700, Nikolay Dobrev wrote:
> Hi,
> 
> On Thu, Apr 2, 2009 at 2:43 AM, Andrew Morton
> <akpm@linux-foundation.org<mailto:akpm@linux-foundation.org>> wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Sun, 29 Mar 2009 18:02:04 GMT
>
> bugzilla-daemon@bugzilla.kernel.org<mailto:bugzilla-daemon@bugzilla.kernel.org>
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=12971
> >
> >            Summary: "tg3 transmit timed out" when transmitting at high
> >                     bitrate
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: 2.6.29
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Network
> >         AssignedTo:
> drivers_network@kernel-bugs.osdl.org<mailto:drivers_network@kernel-bugs.osdl.org>
> >         ReportedBy: dobrev666@gmail.com<mailto:dobrev666@gmail.com>
> >         Regression: No
> >
> >
> > I have laptop - HP compaq 6715s with broadcom lan card (from lspci):
> > 10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast
> > Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled kernel
> > 2.6.29 from kernel.org<http://kernel.org>.
> >
> > When I try to copy a file trough the LAN from this laptop with a command
> like
> > this:
> > scp test.bin 192.168.0.1:/tmp
> >
> > Some MBytes are copied and the transmit stops and there is some messages in
> > dmesg:
> >
> > WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0()
> > Hardware name: HP Compaq 6715s (GR897ES#ABB)
> > NETDEV WATCHDOG: eth0 (tg3): transmit timed out
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> > snd_seq_devic
> > e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table ppdev
> lp
> > parpor
> > t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel snd_hda_codec
> tg3
> > fan yen
> > ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal output
> > rtc_core pcm
> > cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart
> container
> > snd_tim
> > er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp
> soundcore
> > k8temp s
> > nd_page_alloc hwmon
> > Pid: 0, comm: swapper Not tainted 2.6.29-smp #1
> > Call Trace:
> >  [<c01294b6>] warn_slowpath+0x86/0xa0
> >  [<c0120030>] ? sd_init_MC+0xa0/0xd0
> >  [<c0120fce>] ? __enqueue_entity+0x8e/0xb0
> >  [<c03a483f>] ? cpumask_next_and+0x1f/0x40
> >  [<c012260a>] ? find_busiest_group+0x18a/0x710
> >  [<c011f8e5>] ? enqueue_task+0x15/0x30
> >  [<c03a94ed>] ? strlcpy+0x1d/0x60
> >  [<c06bfc02>] dev_watchdog+0x1c2/0x1d0
> >  [<c014320b>] ? getnstimeofday+0x4b/0x120
> >  [<c01320f4>] run_timer_softirq+0x124/0x190
> >  [<c06bfa40>] ? dev_watchdog+0x0/0x1d0
> >  [<c012e03a>] __do_softirq+0x8a/0x150
> >  [<c0132614>] ? update_process_times+0x54/0x70
> >  [<c012e13b>] do_softirq+0x3b/0x50
> >  [<c012e47b>] irq_exit+0x3b/0x50
> >  [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90
> >  [<c0103940>] apic_timer_interrupt+0x28/0x30
> >  [<c0109978>] ? default_idle+0x38/0x50
> >  [<c0109b80>] c1e_idle+0x90/0xf0
> >  [<c0109b91>] ? c1e_idle+0xa1/0xf0
> >  [<c0101bea>] cpu_idle+0x4a/0x70
> >  [<c072f705>] rest_init+0x55/0x60
> > ---[ end trace 0c67ed7bcfb14db6 ]---
> > tg3: eth0: transmit timed out, resetting
> > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
> > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
> > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
> > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
> > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> > tg3: eth0: Link is down.
> > tg3: eth0: Link is up at 100 Mbps, full duplex.
> > tg3: eth0: Flow control is on for TX and on for RX.
> >
> > After that I cannot send or receive any data through the LAN. There is no
> ping
> > to anywhere.
> > These commands fix the problem:
> >
> > ip l set dev eth0 down
> > ip l set dev eth0 up
> >
> > And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network
> again.
> >
> > But this laptop has no problems when receiving data. I can copy with scp
> from
> > other PC connected to the same lan with 11Mbytes/s.
> >
> > I have search the web to find a solution with no success.
> > I have tried some kernel options like "irqpoll" and "acpi=off" with no
> success.
> 
> I assume that 2.6.28 was OK, and that this is a regression?
> 
> I have not tried 2.6.28, but 2.6.27.9 and there is no difference. If you wish
> I can try 2.6.28. If you need additional information I will try to send it.

Does the problem go away if you try "pcie_aspm=off"?
Comment 5 Stefano 2009-04-16 13:31:37 UTC
Hello, this is my first post here.
Same laptop, same issue.
"pcie_aspm=off" does not solve the problem (tried to add this also if I didn't enable ASPM in kernel config ).

# ethtool --offload eth0 sg off
works, but transfer rates are decreased (but now I can do my job, than who cares? :-) , thank you!!).

If you need infos (about hardware, config, ecc..), please ask,
also if someone do a patch I will be happy to test if you whant.

Sorry for my bad english,
Stefano
Comment 6 linuxjacques 2009-04-16 17:20:58 UTC
Lenovo S10
BCM5906M ethernet
Same issue - I thought I had bad hardware since I had not seen this reported
anywhere else.
Just remotely displaying a non-trivial X app is enough to reproduce the issue.
Or logging in remotely and displaying lots of text (such as when compiling).

ethtool --offload eth0 sg off

works around the issue for me too (THANK YOU! I can use my laptop now).

Of course, ideally the driver bug would be fixed (unless this is a hardware bug).

I too am willing and able to test potential fixes.
Comment 7 Nikolay 2009-04-16 19:14:21 UTC
Hi,

On Thu, Apr 16, 2009 at 2:05 AM, Matt Carlson <mcarlson@broadcom.com> wrote:

> On Thu, Apr 02, 2009 at 01:36:23AM -0700, Nikolay Dobrev wrote:
> > Hi,
> >
> > On Thu, Apr 2, 2009 at 2:43 AM, Andrew Morton <akpm@linux-foundation.org
> <mailto:akpm@linux-foundation.org>> wrote:
> >
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Sun, 29 Mar 2009 18:02:04 GMT
> > bugzilla-daemon@bugzilla.kernel.org<mailto:
> bugzilla-daemon@bugzilla.kernel.org> wrote:
> >
> > > http://bugzilla.kernel.org/show_bug.cgi?id=12971
> > >
> > >            Summary: "tg3 transmit timed out" when transmitting at high
> > >                     bitrate
> > >            Product: Drivers
> > >            Version: 2.5
> > >     Kernel Version: 2.6.29
> > >           Platform: All
> > >         OS/Version: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Network
> > >         AssignedTo: drivers_network@kernel-bugs.osdl.org<mailto:
> drivers_network@kernel-bugs.osdl.org>
> > >         ReportedBy: dobrev666@gmail.com<mailto:dobrev666@gmail.com>
> > >         Regression: No
> > >
> > >
> > > I have laptop - HP compaq 6715s with broadcom lan card (from lspci):
> > > 10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast
> > > Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled
> kernel
> > > 2.6.29 from kernel.org<http://kernel.org>.
> > >
> > > When I try to copy a file trough the LAN from this laptop with a
> command like
> > > this:
> > > scp test.bin 192.168.0.1:/tmp
> > >
> > > Some MBytes are copied and the transmit stops and there is some
> messages in
> > > dmesg:
> > >
> > > WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0()
> > > Hardware name: HP Compaq 6715s (GR897ES#ABB)
> > > NETDEV WATCHDOG: eth0 (tg3): transmit timed out
> > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> > > snd_seq_devic
> > > e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table
> ppdev lp
> > > parpor
> > > t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel
> snd_hda_codec tg3
> > > fan yen
> > > ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal
> output
> > > rtc_core pcm
> > > cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart
> container
> > > snd_tim
> > > er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp
> soundcore
> > > k8temp s
> > > nd_page_alloc hwmon
> > > Pid: 0, comm: swapper Not tainted 2.6.29-smp #1
> > > Call Trace:
> > >  [<c01294b6>] warn_slowpath+0x86/0xa0
> > >  [<c0120030>] ? sd_init_MC+0xa0/0xd0
> > >  [<c0120fce>] ? __enqueue_entity+0x8e/0xb0
> > >  [<c03a483f>] ? cpumask_next_and+0x1f/0x40
> > >  [<c012260a>] ? find_busiest_group+0x18a/0x710
> > >  [<c011f8e5>] ? enqueue_task+0x15/0x30
> > >  [<c03a94ed>] ? strlcpy+0x1d/0x60
> > >  [<c06bfc02>] dev_watchdog+0x1c2/0x1d0
> > >  [<c014320b>] ? getnstimeofday+0x4b/0x120
> > >  [<c01320f4>] run_timer_softirq+0x124/0x190
> > >  [<c06bfa40>] ? dev_watchdog+0x0/0x1d0
> > >  [<c012e03a>] __do_softirq+0x8a/0x150
> > >  [<c0132614>] ? update_process_times+0x54/0x70
> > >  [<c012e13b>] do_softirq+0x3b/0x50
> > >  [<c012e47b>] irq_exit+0x3b/0x50
> > >  [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90
> > >  [<c0103940>] apic_timer_interrupt+0x28/0x30
> > >  [<c0109978>] ? default_idle+0x38/0x50
> > >  [<c0109b80>] c1e_idle+0x90/0xf0
> > >  [<c0109b91>] ? c1e_idle+0xa1/0xf0
> > >  [<c0101bea>] cpu_idle+0x4a/0x70
> > >  [<c072f705>] rest_init+0x55/0x60
> > > ---[ end trace 0c67ed7bcfb14db6 ]---
> > > tg3: eth0: transmit timed out, resetting
> > > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
> > > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
> > > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
> > > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
> > > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> > > tg3: eth0: Link is down.
> > > tg3: eth0: Link is up at 100 Mbps, full duplex.
> > > tg3: eth0: Flow control is on for TX and on for RX.
> > >
> > > After that I cannot send or receive any data through the LAN. There is
> no ping
> > > to anywhere.
> > > These commands fix the problem:
> > >
> > > ip l set dev eth0 down
> > > ip l set dev eth0 up
> > >
> > > And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network
> again.
> > >
> > > But this laptop has no problems when receiving data. I can copy with
> scp from
> > > other PC connected to the same lan with 11Mbytes/s.
> > >
> > > I have search the web to find a solution with no success.
> > > I have tried some kernel options like "irqpoll" and "acpi=off" with no
> success.
> >
> > I assume that 2.6.28 was OK, and that this is a regression?
> >
> > I have not tried 2.6.28, but 2.6.27.9 and there is no difference. If you
> wish I can try 2.6.28. If you need additional information I will try to send
> it.
>
> Does the problem go away if you try "pcie_aspm=off"?
>

"pcie_aspm=off" does not solve the problem, but "ethtool --offload eth0 sg
off" does solve it.
Some people post at bugzila.kernel.org, not in this thread, so please check

http://bugzilla.kernel.org/show_bug.cgi?id=12971

Thanks,
Nikolay
Comment 8 Matt Carlson 2009-04-16 22:10:55 UTC
O.K.  I managed to get my hands on a HP 5716s.  Let me see if I can repro this locally.  Stay tuned.
Comment 9 Rafael J. Wysocki 2009-04-26 11:32:27 UTC
Handled-By : Matt Carlson <mcarlson@broadcom.com>
Comment 10 Matt Carlson 2009-05-04 17:47:16 UTC
Created attachment 21215 [details]
tg3: Avoid RDMA engine lockups

This patch attempts to detect conditions that may cause the device's RDMA engine to lockup.
Comment 11 Matt Carlson 2009-05-04 18:02:14 UTC
O.K.  Installing Linux on that machine requires a bit of a dance.  I'll have to pursue that in the background.

Nikolay, can you try reenabling sg and applying the above patch to the 2.6.29 sources?  This sounds like it could be the source of your problems.
Comment 12 Dâniel Fraga 2009-05-04 18:04:38 UTC
I don't know if it helps, but I have this issue too and with sg off, it didn't work. But with gso off it seems to be better. I need more time to verify this... so I'm using for now:

ethtool --offload eth0 sg off gso off

Maybe sg off is useless for me, but I leave it off anyway.
Comment 13 Matt Carlson 2009-05-04 18:26:15 UTC
Hmmm.  Maybe this is an LSO problem.  What happens if you turn LSO off?

What hardware are you encountering this problem on?  I have an outstanding request for a Lenovo S10, but it may take a while to get my hands on it.  It would help a lot if I could get a machine that readily accepts Linux.
Comment 14 Dâniel Fraga 2009-05-04 18:47:35 UTC
(In reply to comment #13)
> Hmmm.  Maybe this is an LSO problem.  What happens if you turn LSO off?

Sorry, but what do you mean by LSO?

> What hardware are you encountering this problem on?  I have an outstanding

I have this issue on a IBM server x3200:

http://www-03.ibm.com/systems/x/hardware/tower/x3200m2/specs.html

06:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21)
Comment 15 Alexey Kunitskiy 2009-05-05 14:40:19 UTC
This problem was introduced in .27 and up kernel version. With .26 it was OK. At least I didn't observe it
Comment 16 Matt Carlson 2009-05-05 21:24:42 UTC
Thanks Aleksey.  I looked over the changes between .26 and .27.  There are a lot of changes, but most of it revolves around phylib support integration.  I haven't yet found any smoking guns.

The patch I submitted earlier presumes that the problem might have been brought about by a change in how the kernel uses the hardware's SG facility.  I still think that is the likeliest culprit at the moment.
Comment 17 Nikolay 2009-05-07 18:31:38 UTC
Hi Matt,

On Mon, May 4, 2009 at 9:03 PM, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12971
>
>
>
>
>
> --- Comment #11 from Matt Carlson <mcarlson@broadcom.com>  2009-05-04
> 18:02:14 ---
> O.K.  Installing Linux on that machine requires a bit of a dance.  I'll
> have to
> pursue that in the background.
>
> Nikolay, can you try reenabling sg and applying the above patch to the
> 2.6.29
> sources?  This sounds like it could be the source of your problems.


Sorry for the delayed answer, but the patch dos not solve the problem. The
transmit stops in the same way as before.

Thanks,
Nikolay

>
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
> You reported the bug.
>
Comment 18 Nikolay 2009-05-07 21:04:20 UTC
Hi Matt,
I look at the patch and see that the changes are in function tg3_start_xmit_dma_bug().
This function is included in

static const struct net_device_ops tg3_netdev_ops_dma_bug ...

It is used in the tg3_init_one()
...........
        if ((tp->tg3_flags3 & TG3_FLG3_5755_PLUS) ||
            GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5906)
                dev->netdev_ops = &tg3_netdev_ops;
        else
                dev->netdev_ops = &tg3_netdev_ops_dma_bug;

...........

In dmesg I have:
eth0: Tigon3 [partno(BCM95906) rev c002] (PCI Express) MAC address 00:1a:4b:78:90:64

So I think tg3_netdev_ops is used here, not tg3_netdev_ops_dma_bug.
Am I right?

Thanks,
Nikolay
Comment 19 Matt Carlson 2009-05-08 00:46:37 UTC
Right.  I forgot to include the part that pivots the 5906 over to the tg3_netdev_ops_dma_bug netdev_ops.  Sorry about that.
Comment 20 Matt Carlson 2009-05-08 16:46:02 UTC
Dwayne, what does 'ethtool -e eth0 offset 0x4 length 0x4' show on your system?

Can you also give me the output of 'lspci -vvv -xxx' for this device?
Comment 21 linuxjacques 2009-05-09 01:02:19 UTC
s10 ~ # ethtool -e eth0 offset 0x4 length 0x4
Address         Data
----------      ----
0x00000004      0x19
0x00000005      0x20
0x00000006      0x45
0x00000007      0x91

s10 ~ # lspci -vvv -xxx -s 02:00.0
02:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02)
        Subsystem: Lenovo Device 3a23
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 27
        Region 0: Memory at f0200000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Vital Product Data
                End
        Capabilities: [58] Vendor Specific Information <?>
        Capabilities: [e8] MSI: Mask- 64bit+ Count=1/1 Enable+
                Address: 00000000fee0300c  Data: 41a1
        Capabilities: [d0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [13c] Virtual Channel <?>
        Capabilities: [160] Device Serial Number 37-e2-f0-fe-ff-68-1e-00
        Kernel driver in use: tg3
        Kernel modules: tg3
00: e4 14 13 17 06 04 10 00 02 00 00 02 10 00 00 00
10: 04 00 20 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 23 3a
30: 00 00 fe bf 48 00 00 00 00 00 00 00 0b 01 00 00
40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 00 00 00
50: 03 58 00 80 78 00 00 00 09 e8 78 00 e8 80 00 0f
60: 00 00 00 00 00 00 00 00 98 02 02 c0 00 00 18 76
70: f2 10 00 00 c0 00 00 00 20 70 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 34 00 13 04 82 50 08 24
90: 29 92 00 01 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 10 00 01 00 a0 8f 04 05 00 50 11 00 11 6c 07 00
e0: 43 01 11 10 00 00 00 00 05 d0 81 00 0c 30 e0 fe
f0: 00 00 00 00 a1 41 00 00 00 00 00 00 00 00 00 00

:-)
Comment 22 Matheus Izvekov 2009-07-03 23:32:55 UTC
Same issue here on a lenovo S10. Turning scatter-gather off solves it.

Aren't the following bugs duplicates of this one, or at least related?
http://bugzilla.kernel.org/show_bug.cgi?id=11147
http://bugzilla.kernel.org/show_bug.cgi?id=12877
http://bugzilla.kernel.org/show_bug.cgi?id=11107
Comment 23 Matt Carlson 2009-11-03 17:30:24 UTC
This bug should be fixed by commit 92c6b8d16a36df3f28b2537bed2a56491fb08f11 (tg3: Fix 5906 transmit hangs).