Bug 3050

Summary: (net b44) Link is down! problem
Product: Drivers Reporter: Thomas Bekkering (thomas.bekkering)
Component: NetworkAssignee: Jeff Garzik (jgarzik)
Status: REJECTED INVALID    
Severity: high CC: bugzilla-kernel, bugzilla, masterdriverz, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.7 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel config
kernel config

Description Thomas Bekkering 2004-07-11 14:40:13 UTC
Distribution: Gentoo Linux
Hardware Environment: Acer TM 803LCi, Intel Centrino, 512MB
Software Environment: 

Problem Description:
When I try to connect to some servers the link goes down.
I think it is only hapening when it is using UDP.
Because when I browse the network with a smb browser nothing goes wrong.
But when I start a file transfer the link goes down.
The same thing is happening with FTP.

However, if I disable ACPI support in the kernel everything works fine.
I tried the driver from broadcom.com but that driver doesn't work good either.
Ifconfig says that i have 2000000 transfer errors with the broadcom driver.

But I need ACPI or else my tft dimming buttons wont work.

Is it ACPI or is it just the b44 driver

BTW: I tested the network cables the work fine.

Steps to reproduce:
Comment 1 Thomas Bekkering 2004-07-12 03:06:27 UTC
Created attachment 3339 [details]
Kernel config

The last kernel config I used
Comment 2 Thomas Bekkering 2004-07-19 07:56:17 UTC
I installed Windows on my laptop and then booted linux again. It does now work
again. Did Windows do something with that NIC?
Comment 3 Bernd Wurst 2005-01-14 06:30:30 UTC
I can confirm this problem.

Today, I upgraded from 2.6.7 to 2.6.10 and Not I get sth like this:

Jan 14 15:17:31 aragorn b44: eth0: Link is down.
Jan 14 15:17:36 aragorn b44: eth0: Link is up at 100 Mbps, full duplex.
Jan 14 15:17:36 aragorn b44: eth0: Flow control is on for TX and on for RX.
Jan 14 15:17:37 aragorn b44: eth0: Link is down.
Jan 14 15:17:39 aragorn b44: eth0: Link is up at 100 Mbps, full duplex.
Jan 14 15:17:39 aragorn b44: eth0: Flow control is on for TX and on for RX.
Jan 14 15:17:40 aragorn b44: eth0: Link is down.
Jan 14 15:17:51 aragorn b44: eth0: Link is up at 100 Mbps, full duplex.
Jan 14 15:17:51 aragorn b44: eth0: Flow control is on for TX and on for RX.
Jan 14 15:17:55 aragorn b44: eth0: Link is down.
Jan 14 15:18:02 aragorn b44: eth0: Link is up at 100 Mbps, full duplex.
Jan 14 15:18:02 aragorn b44: eth0: Flow control is on for TX and on for RX.
Jan 14 15:18:05 aragorn b44: eth0: Link is down.
Jan 14 15:18:07 aragorn b44: eth0: Link is up at 100 Mbps, full duplex.
Jan 14 15:18:07 aragorn b44: eth0: Flow control is on for TX and on for RX.
Jan 14 15:18:08 aragorn b44: eth0: Link is down.
Jan 14 15:18:10 aragorn b44: eth0: Link is up at 100 Mbps, full duplex.
Jan 14 15:18:10 aragorn b44: eth0: Flow control is on for TX and on for RX.
Jan 14 15:18:11 aragorn b44: eth0: Link is down.
Jan 14 15:18:13 aragorn b44: eth0: Link is up at 100 Mbps, full duplex.
Jan 14 15:18:13 aragorn b44: eth0: Flow control is on for TX and on for RX.
Jan 14 15:18:14 aragorn b44: eth0: Link is down.

The Hardware-LED goes off if the link is considered down.
Before that, I used 2.6.7 and did never ever see such a problem. Today, I
upgraded, but still did not see this happening until now, app. 7 hours after the
update.
What I did is opening an SMB-connection to my samba-server (but got timeouts). I
did not do this earlier today, so I think it's related to that.
Comment 4 Surakshan Mendis 2005-01-15 15:58:32 UTC
Have a look at #3765

I had this problem, but only when doing a network transfer (at the time FTP) but
only if at the same time I was writing to a sata disk (the transfer and write
did not have to be related)

I got a new nic, and moved from uni (vacation).
I just updated to 2.6.10, theres a few b44 updates (ignore carrier loss signals
etc..) so I updated and its seems to be working BUT I only tried with SMB
transfer. where as before I used FTP at 7 to 8 megaBytes a sec which is good for
a 100mbit nic. I'll try with ftp and report back
Comment 5 Bernd Wurst 2005-01-16 00:50:58 UTC
Upgrade to kernel 2.6.11-rc1 seems to fix this for me. I don't see the problem
now with samba any more.
Comment 6 Bernd Wurst 2005-02-21 03:47:00 UTC
Sorry, but atm, I see the problem reappearing with linux-2.6.11-rc4. I am fully
confused now, as it happens unreproducable with several versions from 2.6.7 to
2.6.11-rc4. 

bug #3765 seems to be a duplicate to this. I don't think it is related to S-ATA,
because I don't have this kind of hardware, I just have an Acer TM800 pretty
much the same as the reporter has. ;-)
Comment 7 Richard Tarrant 2005-02-25 05:48:39 UTC
I can confirm this behaviour on an Acer TM800, using kernels 2.6.9 and 2.6.10 on
Gentoo Linux. Disabling ACPI 'fixes' the issue, as suggested by the reporter.

Once traffic throughput reaches a certain level (only 10K/s is needed), the link
bounces and transfer rates drop to between 3 and 12K/s. For me, protocols make
no difference providing the traffic is fast enough to trigger the bounce.

What is confusing is that I've previously run most vanilla kernels up to and
including 2.6.9 with identical .config to the one used now, without any trouble.
I don't know if GCC and co. have anything to do with it, as the problem has only
started since I re-built a new clean system, with newer revisions of gcc (and
related tools) than before. 
Comment 8 Thomas Bekkering 2005-06-05 23:40:23 UTC
Resetting my bios and keeping the default settings on my acer TM800 seems to fix
this problem for me.
Comment 9 Bernd Wurst 2005-07-05 22:30:44 UTC
@comment #8: Did you do a hardware-reset (by opening the case and putting a
jumper) or just a "reload default settings" inside the BIOS software?

I tried the latter one and did not succeed.
Comment 10 Joachim Deguara 2006-07-17 02:02:41 UTC
I can confirm this exists on a Fujitsu Lifebook S2110 with a 'Broadcom
Corporation BCM4401-B0 100Base-TX (rev 02)' NIC.  Niether resetting the BIOS nor
using acpi=off helps.  The only way I can use the NIC without the link going
down and up with heavy network traffic is by setting it to 10Mb/s with ethtool.
 Unfortunately it has only maxed out at ~ 2.7Mb/s when set to 10Mb/s (which is
also puzzling but secondary to the link down with 100Mb/s).  This is with kernel
2.6.16 though I also tried the latest driver from Broadcom (v1.00g) and neither
worked.
Comment 11 Joachim Deguara 2006-07-17 02:27:46 UTC
scratch that last comment about bandwidth problems when setting speed to 10Mb/s.
 When using netcat I was able to fill the 10Mb/s.
Comment 12 Thomas Bekkering 2006-07-17 03:01:43 UTC
Compiling all SCSI drivers as modules can help.
Comment 13 Lorincz Andras 2006-10-04 23:53:54 UTC
Created attachment 9160 [details]
kernel config

I have also a broadcom 4401 NIC on an amilo l1310g and I use kernel 2.6.17.13.
It works when browsing samba share, or I can watch a movie from share but when
I try to copy something then begins copying a few MB then stops. After then if
I try to ping the other host, I get destination host unreachable. I can get it
wokr again just after a reboot. Tried also with acpi=off and pci=noacpi, but
it's worse. When setting the IP of the NIC I get:

b44: eth0: BUG!  Timeout waiting for bit 00000002 of register 42c to clear.
ADDRCONF(NETDEV_UP): eth0: link is not ready
lorand-nb:~# b44: eth0: Link is up at 100 Mbps, full duplex.
b44: eth0: Flow control is off for TX and off for RX.  "then I plug the cable"
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready 

The first line appears only when passing acpi=off and pci=noacpi to the kernel.
Comment 14 Lorincz Andras 2006-10-17 23:05:23 UTC
I found a workaround for this. If I compile the kernel with SMP suport then the 
NIC is working else it has problems.
Comment 15 Natalie Protasevich 2007-09-04 08:18:49 UTC
Any updates on this bug? Does current kernel work better for you?
Thanks.
Comment 16 Natalie Protasevich 2007-09-22 18:32:49 UTC
Problems that you described suggest that interrupts were not setup correctly, and the card was either in poll mode or riding on some other device if they were sharing interrupt, this would explain timeouts. The interrupt subsystem was reworked around 2.6.19. If anyone can confirm there is still a problem with his device, we'll keep the bug open.