Bug 11056

Summary: constant disconnects and reconnects on Broadcom NIC (wired) when under load
Product: Drivers Reporter: John Peters (anothersillyname)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED DUPLICATE    
Severity: high CC: alan, andreas, bishillo, bojan, marc321, noiano, rdunlap, sarannmr
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35 Subsystem:
Regression: No Bisected commit-id:

Description John Peters 2008-07-08 15:34:04 UTC
Latest working kernel version: can't confirm against previous kernel
Earliest failing kernel version: 2.6.25.6-55
Distribution: Fedora
Hardware Environment: 9400 Dell Laptop, Broadcom BCM4401-B NIC, Core Duo, 4GB RAM
Software Environment: Fedora 9, Kernel versions as above
Problem Description: I recently did two back to back kernel upgrades so am not sure when this started exactly.  Running X11vnc on a remote box and vncviewer on the local laptop I suddenly started to notice disconnects.  When I checked /var/log/messages I could see the following errors multiple times.

15:19:40 kernel: b44: eth0: powering down PHY
15:19:41 kernel: b44: eth0: Link is down.
15:19:44 kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
15:19:44 kernel: b44: eth0: Flow control is off for TX and off for RX.
15:19:52 kernel: b44: eth0: powering down PHY
15:19:53 kernel: b44: eth0: Link is down.
15:19:56 kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
15:19:56 kernel: b44: eth0: Flow control is off for TX and off for RX.

When I looked at the x11vnc output I noticed the following was happening with each disconnect.

rfbSendUpdateBuf: write: Connection timed out

What seems to be happening is if the NIC goes under load (i.e. If I view the x11vnc session or do anything in the session) the NIC borks, cycles and resets.

At first I thought it might be a suspect cable issue so have changed the cable and the port it's connected to on the switch....I've even tested the problem on another switch to ensure the problem is localised to the laptop.

I can replicate the problem with or without NetworkManager running (if NM is running the errors are different but the problem is the same).

Errors in /var/log/messages if NM running.

19:26:52  kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
19:26:52  kernel: b44: eth0: Flow control is off for TX and off for RX.
19:26:52  NetworkManager: <info>  (eth0): carrier now ON (device state 2)
19:26:52  NetworkManager: <info>  (eth0): device state change: 2 -> 3
19:26:52  NetworkManager: <info>  Activation (eth0) starting connection 'System eth0'
19:26:52  NetworkManager: <info>  (eth0): device state change: 3 -> 4
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) scheduled...
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) started...
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) scheduled...
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) complete.
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) starting...
19:26:52  NetworkManager: <info>  (eth0): device state change: 4 -> 5
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) successful.
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) scheduled.
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) complete.
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
19:26:52  NetworkManager: <info>  (eth0): device state change: 5 -> 7
19:26:52  NetworkManager: <info>  Activation (eth0) Beginning DHCP transaction.
19:26:52  dhclient: Internet Systems Consortium DHCP Client 4.0.0
19:26:52  dhclient: Copyright 2004-2007 Internet Systems Consortium.
19:26:52  dhclient: All rights reserved.
19:26:52  dhclient: For info, please visit http://www.isc.org/sw/dhcp/
19:26:52  dhclient: 
19:26:52  NetworkManager: <info>  dhclient started with pid 2644
19:26:52  NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) complete.
19:26:52  NetworkManager: <info>  DHCP: device eth0 state changed normal exit -> preinit
19:26:52  dhclient: Listening on LPF/eth0/00:xx:xx:xx:xx:xx
19:26:52  dhclient: Sending on   LPF/eth0/00:xx:xx:xx:xx:xx
19:26:52  dhclient: Sending on   Socket/fallback
19:26:55  dhclient: DHCPDISCOVER on eth0 to xxx.xxx.xxx.xxx port 67 interval 7
19:27:02  dhclient: DHCPDISCOVER on eth0 to xxx.xxx.xxx.xxx port 67 interval 10
19:27:02  dhclient: DHCPOFFER from xxx.xxx.xxx.xxx
19:27:02  dhclient: DHCPREQUEST on eth0 to xxx.xxx.xxx.xxx port 67
19:27:02  dhclient: DHCPACK from xxx.xxx.xxx.xxx
19:27:02  dhclient: bound to xxx.xxx.xxx.xxx -- renewal in 120000 seconds.
19:27:02  NetworkManager: <info>  DHCP: device eth0 state changed preinit -> bound
19:27:02  NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) scheduled...
19:27:02  NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) started...
19:27:02  NetworkManager: <info>    address xxx.xxx.xxx.xxx
19:27:02  NetworkManager: <info>    netmask xxx.xxx.xxx.xxx
19:27:02  NetworkManager: <info>    gateway xxx.xxx.xxx.xxx
19:27:02  NetworkManager: <info>    hostname '.xxxxxxxxx.org'
19:27:02  NetworkManager: <info>    nameserver 'xxx.xxx.xxx.xxx'
19:27:02  NetworkManager: <info>    domain name 'xxxxxxxxxx.org'
19:27:02  NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) scheduled...
19:27:02  NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) complete.
19:27:02  NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) started...
19:27:02  avahi-daemon[2734]: Joining mDNS multicast group on interface eth0.IPv4 with address xxx.xxx.xxx.xxx.
19:27:02  avahi-daemon[2734]: New relevant interface eth0.IPv4 for mDNS.
19:27:02  avahi-daemon[2734]: Registering new address record for xxx.xxx.xxx.xxx on eth0.IPv4.
19:27:03  NetworkManager: <info>  (eth0): device state change: 7 -> 8
19:27:03  NetworkManager: <info>  Policy set (eth0) as default device for routing and DNS.
19:27:03  NetworkManager: <info>  Activation (eth0) successful, device activated.
19:27:03  NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.

Steps to reproduce: run X11vnc remotely, connect using vncviewer, execute tasks on remote machine, watch the disconnect.
Comment 1 John Peters 2008-07-08 15:36:14 UTC
Sorry it just occurred to me that the machine was recently upgraded to 4gb, I don't remember it happening prior to that.
Comment 2 John Peters 2008-07-09 06:01:34 UTC
Now this is strange........

I had to turn off compiz on the laptop for another reason and the disconnects have stopped?

I suspect this could be a 3.25gb (which the OS actually shows) 4gb (actually fitted memory) PCI mapping issue?  Frankly I'm out of my depth here and while I've seen some weird stuff over the years with computers this one is pretty unusual.

Please HELP before my head explodes.
Comment 3 John Peters 2008-07-11 02:17:51 UTC
OK I reduced the memory back to 2gb and there are still disconnects when Compiz is turned on.

Any ideas?
Comment 4 marvin 2008-11-17 00:23:11 UTC
I have a similar problems with "BCM4401-B0 100Base-TX (rev 02)" nic.

https://bugs.launchpad.net/linux/+bug/279102

related?
Comment 5 Guille (bisho) 2008-11-21 00:20:14 UTC
I confirm this with the following hardware.

When using X session (specially with compiz, it's much more reproducible) the link of the ethernet controller gets down with high network traffic.

Using nvidia graphics card and Broadcom BCM4401 network controller.

00:05.0 VGA compatible controller [0300]: nVidia Corporation C51 [GeForce 6150 LE] [10de:0241] (rev a2)
	Subsystem: Dell Device [1028:01ed]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at fb000000 (64-bit, non-prefetchable) [size=16M]
	[virtual] Expansion ROM at 88000000 [disabled] [size=128K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nvidia
04:07.0 Ethernet controller [0200]: Broadcom Corporation BCM4401-B0 100Base-TX [14e4:170c] (rev 02)
	Subsystem: Dell Device [1028:01ed]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 19
	Region 0: Memory at fdbfc000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=2 PME-
	Kernel driver in use: b44
	Kernel modules: b44

04:09.0 Network controller [0280]: Elsa AG QuickStep 1000 [1048:1000] (rev 01)
	Subsystem: Elsa AG QuickStep 1000 [1048:1000]
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 7
	Region 0: Memory at fdbff000 (32-bit, non-prefetchable) [size=128]
	Region 1: I/O ports at 9c00 [size=128]
	Region 3: I/O ports at 9800 [size=4]
	Kernel modules: hisax
Comment 6 Saran 2009-05-01 18:38:21 UTC
It doesn't really have to be heavy load. My interface disconnects as soon as the DHCP gets an address. And if statically configured, stays for a little while longer. 
Does this problem disappear if one uses a b44 driver from older kernels.
The NIC used to work fine for me an year ago!
Comment 7 Andreas Schipplock 2009-10-20 17:48:08 UTC
I encounter the same issue on my Dell Vostro 1000 with the "08:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)" ethernet controller on linux 2.6.31.4. Once dhcpcd assigns an ip, the device gets powered down and I'm unable to establish a reliable connection. 

My last linux kernel was 2.6.28 something and it worked perfectly. 

With the new kernel I get the following message in dmesg being repeated a _lot_ : http://pastie.org/662308

Is there a chance to fix this issue somehow? Is it probably related to power management or some other part of the kernel?

Thanks.
Comment 8 Bojan Smojver 2009-11-18 22:32:46 UTC
I can see this problem on Dell Inspiron 6400 (03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)) and Fedora 12 (2.6.31.5-127.fc12.i686). Due to problems with Intel graphics, I am actually running metacity and nomodeset, so this is probably not related to anything compiz is doing.
Comment 9 Bojan Smojver 2009-12-17 00:20:23 UTC
Pretty sure this is not related to compiz or anything like that.

Anyone looking at b44 driver?
Comment 10 Marc 2010-08-21 15:46:33 UTC
This bug has been around for a long time. I wish I knew more about programming so I could help out.

Bug is still present in 2.6.35.y series. My experience is similar to Saran's. DHCP results in instant disconnect. Static IP is better.

In the interest of not duplicating info, see bug 14451 comment 4 for some of my specs.
Comment 11 Noiano 2010-11-04 16:15:52 UTC
I have the same issue with kernel 2.6.35-22 and NIC

02:05.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
	Subsystem: ASUSTeK Computer Inc. A7V8X motherboard
	Flags: bus master, fast devsel, latency 32, IRQ 20
	Memory at de000000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: [40] Power Management version 2
	Kernel driver in use: b44
	Kernel modules: b44

Is it possible there is no workaround? I got disconnected only when transferring data on the LAN, and yes the ip is statically assigned
Comment 12 Alan 2012-06-13 20:32:41 UTC

*** This bug has been marked as a duplicate of bug 7696 ***