I have laptop - HP compaq 6715s with broadcom lan card (from lspci): 10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled kernel 2.6.29 from kernel.org. When I try to copy a file trough the LAN from this laptop with a command like this: scp test.bin 192.168.0.1:/tmp Some MBytes are copied and the transmit stops and there is some messages in dmesg: WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0() Hardware name: HP Compaq 6715s (GR897ES#ABB) NETDEV WATCHDOG: eth0 (tg3): transmit timed out Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_devic e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table ppdev lp parpor t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel snd_hda_codec tg3 fan yen ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal output rtc_core pcm cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart container snd_tim er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp soundcore k8temp s nd_page_alloc hwmon Pid: 0, comm: swapper Not tainted 2.6.29-smp #1 Call Trace: [<c01294b6>] warn_slowpath+0x86/0xa0 [<c0120030>] ? sd_init_MC+0xa0/0xd0 [<c0120fce>] ? __enqueue_entity+0x8e/0xb0 [<c03a483f>] ? cpumask_next_and+0x1f/0x40 [<c012260a>] ? find_busiest_group+0x18a/0x710 [<c011f8e5>] ? enqueue_task+0x15/0x30 [<c03a94ed>] ? strlcpy+0x1d/0x60 [<c06bfc02>] dev_watchdog+0x1c2/0x1d0 [<c014320b>] ? getnstimeofday+0x4b/0x120 [<c01320f4>] run_timer_softirq+0x124/0x190 [<c06bfa40>] ? dev_watchdog+0x0/0x1d0 [<c012e03a>] __do_softirq+0x8a/0x150 [<c0132614>] ? update_process_times+0x54/0x70 [<c012e13b>] do_softirq+0x3b/0x50 [<c012e47b>] irq_exit+0x3b/0x50 [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90 [<c0103940>] apic_timer_interrupt+0x28/0x30 [<c0109978>] ? default_idle+0x38/0x50 [<c0109b80>] c1e_idle+0x90/0xf0 [<c0109b91>] ? c1e_idle+0xa1/0xf0 [<c0101bea>] cpu_idle+0x4a/0x70 [<c072f705>] rest_init+0x55/0x60 ---[ end trace 0c67ed7bcfb14db6 ]--- tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. tg3: eth0: Link is up at 100 Mbps, full duplex. tg3: eth0: Flow control is on for TX and on for RX. After that I cannot send or receive any data through the LAN. There is no ping to anywhere. These commands fix the problem: ip l set dev eth0 down ip l set dev eth0 up And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network again. But this laptop has no problems when receiving data. I can copy with scp from other PC connected to the same lan with 11Mbytes/s. I have search the web to find a solution with no success. I have tried some kernel options like "irqpoll" and "acpi=off" with no success.
Hi, On Thu, Apr 2, 2009 at 2:43 AM, Andrew Morton <akpm@linux-foundation.org>wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Sun, 29 Mar 2009 18:02:04 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=12971 > > > > Summary: "tg3 transmit timed out" when transmitting at high > > bitrate > > Product: Drivers > > Version: 2.5 > > Kernel Version: 2.6.29 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Network > > AssignedTo: drivers_network@kernel-bugs.osdl.org > > ReportedBy: dobrev666@gmail.com > > Regression: No > > > > > > I have laptop - HP compaq 6715s with broadcom lan card (from lspci): > > 10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast > > Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled > kernel > > 2.6.29 from kernel.org. > > > > When I try to copy a file trough the LAN from this laptop with a command > like > > this: > > scp test.bin 192.168.0.1:/tmp > > > > Some MBytes are copied and the transmit stops and there is some messages > in > > dmesg: > > > > WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0() > > Hardware name: HP Compaq 6715s (GR897ES#ABB) > > NETDEV WATCHDOG: eth0 (tg3): transmit timed out > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq > > snd_seq_devic > > e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table > ppdev lp > > parpor > > t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel snd_hda_codec > tg3 > > fan yen > > ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal output > > rtc_core pcm > > cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart > container > > snd_tim > > er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp > soundcore > > k8temp s > > nd_page_alloc hwmon > > Pid: 0, comm: swapper Not tainted 2.6.29-smp #1 > > Call Trace: > > [<c01294b6>] warn_slowpath+0x86/0xa0 > > [<c0120030>] ? sd_init_MC+0xa0/0xd0 > > [<c0120fce>] ? __enqueue_entity+0x8e/0xb0 > > [<c03a483f>] ? cpumask_next_and+0x1f/0x40 > > [<c012260a>] ? find_busiest_group+0x18a/0x710 > > [<c011f8e5>] ? enqueue_task+0x15/0x30 > > [<c03a94ed>] ? strlcpy+0x1d/0x60 > > [<c06bfc02>] dev_watchdog+0x1c2/0x1d0 > > [<c014320b>] ? getnstimeofday+0x4b/0x120 > > [<c01320f4>] run_timer_softirq+0x124/0x190 > > [<c06bfa40>] ? dev_watchdog+0x0/0x1d0 > > [<c012e03a>] __do_softirq+0x8a/0x150 > > [<c0132614>] ? update_process_times+0x54/0x70 > > [<c012e13b>] do_softirq+0x3b/0x50 > > [<c012e47b>] irq_exit+0x3b/0x50 > > [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90 > > [<c0103940>] apic_timer_interrupt+0x28/0x30 > > [<c0109978>] ? default_idle+0x38/0x50 > > [<c0109b80>] c1e_idle+0x90/0xf0 > > [<c0109b91>] ? c1e_idle+0xa1/0xf0 > > [<c0101bea>] cpu_idle+0x4a/0x70 > > [<c072f705>] rest_init+0x55/0x60 > > ---[ end trace 0c67ed7bcfb14db6 ]--- > > tg3: eth0: transmit timed out, resetting > > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] > > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] > > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 > > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 > > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > > tg3: eth0: Link is down. > > tg3: eth0: Link is up at 100 Mbps, full duplex. > > tg3: eth0: Flow control is on for TX and on for RX. > > > > After that I cannot send or receive any data through the LAN. There is no > ping > > to anywhere. > > These commands fix the problem: > > > > ip l set dev eth0 down > > ip l set dev eth0 up > > > > And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network > again. > > > > But this laptop has no problems when receiving data. I can copy with scp > from > > other PC connected to the same lan with 11Mbytes/s. > > > > I have search the web to find a solution with no success. > > I have tried some kernel options like "irqpoll" and "acpi=off" with no > success. > > I assume that 2.6.28 was OK, and that this is a regression? I have not tried 2.6.28, but 2.6.27.9 and there is no difference. If you wish I can try 2.6.28. If you need additional information I will try to send it. Thanks, Nikolay
It seems that problem is in scatter-gather offload in tg3 driver. I have the same laptop and the same problem. For quick fix try to turn it off: > ethtool --offload eth0 sg off And post here your results
(In reply to comment #2) > It seems that problem is in scatter-gather offload in tg3 driver. I have the > same laptop and the same problem. > For quick fix try to turn it off: > > ethtool --offload eth0 sg off > > And post here your results It is working with "sg off", but the LAN is not fully utilized. I transmit with 9.5Mbytes/sec and receive with 11.4Mbytes/sec. Thanks, Nikolay
On Thu, Apr 02, 2009 at 01:36:23AM -0700, Nikolay Dobrev wrote: > Hi, > > On Thu, Apr 2, 2009 at 2:43 AM, Andrew Morton > <akpm@linux-foundation.org<mailto:akpm@linux-foundation.org>> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Sun, 29 Mar 2009 18:02:04 GMT > > bugzilla-daemon@bugzilla.kernel.org<mailto:bugzilla-daemon@bugzilla.kernel.org> > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=12971 > > > > Summary: "tg3 transmit timed out" when transmitting at high > > bitrate > > Product: Drivers > > Version: 2.5 > > Kernel Version: 2.6.29 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Network > > AssignedTo: > drivers_network@kernel-bugs.osdl.org<mailto:drivers_network@kernel-bugs.osdl.org> > > ReportedBy: dobrev666@gmail.com<mailto:dobrev666@gmail.com> > > Regression: No > > > > > > I have laptop - HP compaq 6715s with broadcom lan card (from lspci): > > 10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast > > Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled kernel > > 2.6.29 from kernel.org<http://kernel.org>. > > > > When I try to copy a file trough the LAN from this laptop with a command > like > > this: > > scp test.bin 192.168.0.1:/tmp > > > > Some MBytes are copied and the transmit stops and there is some messages in > > dmesg: > > > > WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0() > > Hardware name: HP Compaq 6715s (GR897ES#ABB) > > NETDEV WATCHDOG: eth0 (tg3): transmit timed out > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq > > snd_seq_devic > > e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table ppdev > lp > > parpor > > t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel snd_hda_codec > tg3 > > fan yen > > ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal output > > rtc_core pcm > > cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart > container > > snd_tim > > er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp > soundcore > > k8temp s > > nd_page_alloc hwmon > > Pid: 0, comm: swapper Not tainted 2.6.29-smp #1 > > Call Trace: > > [<c01294b6>] warn_slowpath+0x86/0xa0 > > [<c0120030>] ? sd_init_MC+0xa0/0xd0 > > [<c0120fce>] ? __enqueue_entity+0x8e/0xb0 > > [<c03a483f>] ? cpumask_next_and+0x1f/0x40 > > [<c012260a>] ? find_busiest_group+0x18a/0x710 > > [<c011f8e5>] ? enqueue_task+0x15/0x30 > > [<c03a94ed>] ? strlcpy+0x1d/0x60 > > [<c06bfc02>] dev_watchdog+0x1c2/0x1d0 > > [<c014320b>] ? getnstimeofday+0x4b/0x120 > > [<c01320f4>] run_timer_softirq+0x124/0x190 > > [<c06bfa40>] ? dev_watchdog+0x0/0x1d0 > > [<c012e03a>] __do_softirq+0x8a/0x150 > > [<c0132614>] ? update_process_times+0x54/0x70 > > [<c012e13b>] do_softirq+0x3b/0x50 > > [<c012e47b>] irq_exit+0x3b/0x50 > > [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90 > > [<c0103940>] apic_timer_interrupt+0x28/0x30 > > [<c0109978>] ? default_idle+0x38/0x50 > > [<c0109b80>] c1e_idle+0x90/0xf0 > > [<c0109b91>] ? c1e_idle+0xa1/0xf0 > > [<c0101bea>] cpu_idle+0x4a/0x70 > > [<c072f705>] rest_init+0x55/0x60 > > ---[ end trace 0c67ed7bcfb14db6 ]--- > > tg3: eth0: transmit timed out, resetting > > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] > > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] > > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 > > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 > > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > > tg3: eth0: Link is down. > > tg3: eth0: Link is up at 100 Mbps, full duplex. > > tg3: eth0: Flow control is on for TX and on for RX. > > > > After that I cannot send or receive any data through the LAN. There is no > ping > > to anywhere. > > These commands fix the problem: > > > > ip l set dev eth0 down > > ip l set dev eth0 up > > > > And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network > again. > > > > But this laptop has no problems when receiving data. I can copy with scp > from > > other PC connected to the same lan with 11Mbytes/s. > > > > I have search the web to find a solution with no success. > > I have tried some kernel options like "irqpoll" and "acpi=off" with no > success. > > I assume that 2.6.28 was OK, and that this is a regression? > > I have not tried 2.6.28, but 2.6.27.9 and there is no difference. If you wish > I can try 2.6.28. If you need additional information I will try to send it. Does the problem go away if you try "pcie_aspm=off"?
Hello, this is my first post here. Same laptop, same issue. "pcie_aspm=off" does not solve the problem (tried to add this also if I didn't enable ASPM in kernel config ). # ethtool --offload eth0 sg off works, but transfer rates are decreased (but now I can do my job, than who cares? :-) , thank you!!). If you need infos (about hardware, config, ecc..), please ask, also if someone do a patch I will be happy to test if you whant. Sorry for my bad english, Stefano
Lenovo S10 BCM5906M ethernet Same issue - I thought I had bad hardware since I had not seen this reported anywhere else. Just remotely displaying a non-trivial X app is enough to reproduce the issue. Or logging in remotely and displaying lots of text (such as when compiling). ethtool --offload eth0 sg off works around the issue for me too (THANK YOU! I can use my laptop now). Of course, ideally the driver bug would be fixed (unless this is a hardware bug). I too am willing and able to test potential fixes.
Hi, On Thu, Apr 16, 2009 at 2:05 AM, Matt Carlson <mcarlson@broadcom.com> wrote: > On Thu, Apr 02, 2009 at 01:36:23AM -0700, Nikolay Dobrev wrote: > > Hi, > > > > On Thu, Apr 2, 2009 at 2:43 AM, Andrew Morton <akpm@linux-foundation.org > <mailto:akpm@linux-foundation.org>> wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Sun, 29 Mar 2009 18:02:04 GMT > > bugzilla-daemon@bugzilla.kernel.org<mailto: > bugzilla-daemon@bugzilla.kernel.org> wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=12971 > > > > > > Summary: "tg3 transmit timed out" when transmitting at high > > > bitrate > > > Product: Drivers > > > Version: 2.5 > > > Kernel Version: 2.6.29 > > > Platform: All > > > OS/Version: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: normal > > > Priority: P1 > > > Component: Network > > > AssignedTo: drivers_network@kernel-bugs.osdl.org<mailto: > drivers_network@kernel-bugs.osdl.org> > > > ReportedBy: dobrev666@gmail.com<mailto:dobrev666@gmail.com> > > > Regression: No > > > > > > > > > I have laptop - HP compaq 6715s with broadcom lan card (from lspci): > > > 10:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast > > > Ethernet PCI Express (rev 02). I am using slackware 12.2. I compiled > kernel > > > 2.6.29 from kernel.org<http://kernel.org>. > > > > > > When I try to copy a file trough the LAN from this laptop with a > command like > > > this: > > > scp test.bin 192.168.0.1:/tmp > > > > > > Some MBytes are copied and the transmit stops and there is some > messages in > > > dmesg: > > > > > > WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1c2/0x1d0() > > > Hardware name: HP Compaq 6715s (GR897ES#ABB) > > > NETDEV WATCHDOG: eth0 (tg3): transmit timed out > > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq > > > snd_seq_devic > > > e snd_pcm_oss snd_mixer_oss ipv6 cpufreq_stats powernow_k8 freq_table > ppdev lp > > > parpor > > > t_pc parport fuse snd_hda_codec_analog pcmcia snd_hda_intel > snd_hda_codec tg3 > > > fan yen > > > ta_socket snd_hwdep video rsrc_nonstatic rtc_cmos snd_pcm thermal > output > > > rtc_core pcm > > > cia_core ati_agp processor libphy rtc_lib psmouse i2c_piix4 agpgart > container > > > snd_tim > > > er snd thermal_sys sg serio_raw evdev button battery ac wmi shpchp > soundcore > > > k8temp s > > > nd_page_alloc hwmon > > > Pid: 0, comm: swapper Not tainted 2.6.29-smp #1 > > > Call Trace: > > > [<c01294b6>] warn_slowpath+0x86/0xa0 > > > [<c0120030>] ? sd_init_MC+0xa0/0xd0 > > > [<c0120fce>] ? __enqueue_entity+0x8e/0xb0 > > > [<c03a483f>] ? cpumask_next_and+0x1f/0x40 > > > [<c012260a>] ? find_busiest_group+0x18a/0x710 > > > [<c011f8e5>] ? enqueue_task+0x15/0x30 > > > [<c03a94ed>] ? strlcpy+0x1d/0x60 > > > [<c06bfc02>] dev_watchdog+0x1c2/0x1d0 > > > [<c014320b>] ? getnstimeofday+0x4b/0x120 > > > [<c01320f4>] run_timer_softirq+0x124/0x190 > > > [<c06bfa40>] ? dev_watchdog+0x0/0x1d0 > > > [<c012e03a>] __do_softirq+0x8a/0x150 > > > [<c0132614>] ? update_process_times+0x54/0x70 > > > [<c012e13b>] do_softirq+0x3b/0x50 > > > [<c012e47b>] irq_exit+0x3b/0x50 > > > [<c011467e>] smp_apic_timer_interrupt+0x5e/0x90 > > > [<c0103940>] apic_timer_interrupt+0x28/0x30 > > > [<c0109978>] ? default_idle+0x38/0x50 > > > [<c0109b80>] c1e_idle+0x90/0xf0 > > > [<c0109b91>] ? c1e_idle+0xa1/0xf0 > > > [<c0101bea>] cpu_idle+0x4a/0x70 > > > [<c072f705>] rest_init+0x55/0x60 > > > ---[ end trace 0c67ed7bcfb14db6 ]--- > > > tg3: eth0: transmit timed out, resetting > > > tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] > > > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] > > > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 > > > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 > > > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > > > tg3: eth0: Link is down. > > > tg3: eth0: Link is up at 100 Mbps, full duplex. > > > tg3: eth0: Flow control is on for TX and on for RX. > > > > > > After that I cannot send or receive any data through the LAN. There is > no ping > > > to anywhere. > > > These commands fix the problem: > > > > > > ip l set dev eth0 down > > > ip l set dev eth0 up > > > > > > And when I execute `scp test.bin 192.168.0.1:/tmp` there is no network > again. > > > > > > But this laptop has no problems when receiving data. I can copy with > scp from > > > other PC connected to the same lan with 11Mbytes/s. > > > > > > I have search the web to find a solution with no success. > > > I have tried some kernel options like "irqpoll" and "acpi=off" with no > success. > > > > I assume that 2.6.28 was OK, and that this is a regression? > > > > I have not tried 2.6.28, but 2.6.27.9 and there is no difference. If you > wish I can try 2.6.28. If you need additional information I will try to send > it. > > Does the problem go away if you try "pcie_aspm=off"? > "pcie_aspm=off" does not solve the problem, but "ethtool --offload eth0 sg off" does solve it. Some people post at bugzila.kernel.org, not in this thread, so please check http://bugzilla.kernel.org/show_bug.cgi?id=12971 Thanks, Nikolay
O.K. I managed to get my hands on a HP 5716s. Let me see if I can repro this locally. Stay tuned.
Handled-By : Matt Carlson <mcarlson@broadcom.com>
Created attachment 21215 [details] tg3: Avoid RDMA engine lockups This patch attempts to detect conditions that may cause the device's RDMA engine to lockup.
O.K. Installing Linux on that machine requires a bit of a dance. I'll have to pursue that in the background. Nikolay, can you try reenabling sg and applying the above patch to the 2.6.29 sources? This sounds like it could be the source of your problems.
I don't know if it helps, but I have this issue too and with sg off, it didn't work. But with gso off it seems to be better. I need more time to verify this... so I'm using for now: ethtool --offload eth0 sg off gso off Maybe sg off is useless for me, but I leave it off anyway.
Hmmm. Maybe this is an LSO problem. What happens if you turn LSO off? What hardware are you encountering this problem on? I have an outstanding request for a Lenovo S10, but it may take a while to get my hands on it. It would help a lot if I could get a machine that readily accepts Linux.
(In reply to comment #13) > Hmmm. Maybe this is an LSO problem. What happens if you turn LSO off? Sorry, but what do you mean by LSO? > What hardware are you encountering this problem on? I have an outstanding I have this issue on a IBM server x3200: http://www-03.ibm.com/systems/x/hardware/tower/x3200m2/specs.html 06:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21)
This problem was introduced in .27 and up kernel version. With .26 it was OK. At least I didn't observe it
Thanks Aleksey. I looked over the changes between .26 and .27. There are a lot of changes, but most of it revolves around phylib support integration. I haven't yet found any smoking guns. The patch I submitted earlier presumes that the problem might have been brought about by a change in how the kernel uses the hardware's SG facility. I still think that is the likeliest culprit at the moment.
Hi Matt, On Mon, May 4, 2009 at 9:03 PM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12971 > > > > > > --- Comment #11 from Matt Carlson <mcarlson@broadcom.com> 2009-05-04 > 18:02:14 --- > O.K. Installing Linux on that machine requires a bit of a dance. I'll > have to > pursue that in the background. > > Nikolay, can you try reenabling sg and applying the above patch to the > 2.6.29 > sources? This sounds like it could be the source of your problems. Sorry for the delayed answer, but the patch dos not solve the problem. The transmit stops in the same way as before. Thanks, Nikolay > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug. > You reported the bug. >
Hi Matt, I look at the patch and see that the changes are in function tg3_start_xmit_dma_bug(). This function is included in static const struct net_device_ops tg3_netdev_ops_dma_bug ... It is used in the tg3_init_one() ........... if ((tp->tg3_flags3 & TG3_FLG3_5755_PLUS) || GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5906) dev->netdev_ops = &tg3_netdev_ops; else dev->netdev_ops = &tg3_netdev_ops_dma_bug; ........... In dmesg I have: eth0: Tigon3 [partno(BCM95906) rev c002] (PCI Express) MAC address 00:1a:4b:78:90:64 So I think tg3_netdev_ops is used here, not tg3_netdev_ops_dma_bug. Am I right? Thanks, Nikolay
Right. I forgot to include the part that pivots the 5906 over to the tg3_netdev_ops_dma_bug netdev_ops. Sorry about that.
Dwayne, what does 'ethtool -e eth0 offset 0x4 length 0x4' show on your system? Can you also give me the output of 'lspci -vvv -xxx' for this device?
s10 ~ # ethtool -e eth0 offset 0x4 length 0x4 Address Data ---------- ---- 0x00000004 0x19 0x00000005 0x20 0x00000006 0x45 0x00000007 0x91 s10 ~ # lspci -vvv -xxx -s 02:00.0 02:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5906M Fast Ethernet PCI Express (rev 02) Subsystem: Lenovo Device 3a23 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 27 Region 0: Memory at f0200000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at <ignored> [disabled] Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Vital Product Data End Capabilities: [58] Vendor Specific Information <?> Capabilities: [e8] MSI: Mask- 64bit+ Count=1/1 Enable+ Address: 00000000fee0300c Data: 41a1 Capabilities: [d0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [13c] Virtual Channel <?> Capabilities: [160] Device Serial Number 37-e2-f0-fe-ff-68-1e-00 Kernel driver in use: tg3 Kernel modules: tg3 00: e4 14 13 17 06 04 10 00 02 00 00 02 10 00 00 00 10: 04 00 20 f0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 23 3a 30: 00 00 fe bf 48 00 00 00 00 00 00 00 0b 01 00 00 40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 00 00 00 50: 03 58 00 80 78 00 00 00 09 e8 78 00 e8 80 00 0f 60: 00 00 00 00 00 00 00 00 98 02 02 c0 00 00 18 76 70: f2 10 00 00 c0 00 00 00 20 70 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 34 00 13 04 82 50 08 24 90: 29 92 00 01 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 10 00 01 00 a0 8f 04 05 00 50 11 00 11 6c 07 00 e0: 43 01 11 10 00 00 00 00 05 d0 81 00 0c 30 e0 fe f0: 00 00 00 00 a1 41 00 00 00 00 00 00 00 00 00 00 :-)
Same issue here on a lenovo S10. Turning scatter-gather off solves it. Aren't the following bugs duplicates of this one, or at least related? http://bugzilla.kernel.org/show_bug.cgi?id=11147 http://bugzilla.kernel.org/show_bug.cgi?id=12877 http://bugzilla.kernel.org/show_bug.cgi?id=11107
This bug should be fixed by commit 92c6b8d16a36df3f28b2537bed2a56491fb08f11 (tg3: Fix 5906 transmit hangs).