Bug 14844

Summary: No more wireless interface eth1 after boot
Product: Networking Reporter: Christian Casteyde (casteyde.christian)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: Larry.Finger, linville, mb, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: Patch to see which part of map_ring_memory is failing
Replacement patch
A correct replacement
Test patch for problem allocating ring memory
Patch to revert commit 9bd568a50c4464330

Description Christian Casteyde 2009-12-19 16:33:46 UTC
Kernel 2.6.33-rc1 64bits with kmemcheck and other debug options
Athlon64 3GHz single core
Acer Aspire 1511LMi
Bluewhite 64 13 (32 bits port of Slackware)
Broadcom wireless interface

Since 2.6.33-rc1, I cannot configure any network interface, because they are not created anymore.
On my laptop, I have a built in eth0 Ethernet port (tg3), and a wireless wlan0 port. When I boot, wlan0 is renamed to eth1 by udev, then the interfaces are brought up by /etc/rc.d scripts (slackware based).

dmesg says that at boot:
b43 ssb0:0: firmware: requesting b43/ucode5.fw
b43 ssb0:0: firmware: requesting b43/pcm5.fw
b43 ssb0:0: firmware: requesting b43/b0g0initvals5.fw
b43 ssb0:0: firmware: requesting b43/b0g0bsinitvals5.fw
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0 ERROR: Failed to allocate or map pages for DMA ringmemory
b43-phy0 ERROR: Microcode not responding
b43-phy0 ERROR: You must go to http://wireless.kernel.org/en/users/Drivers/b43#devicefirmware and download the correct firmware for this driver version. Please carefully read all instructions on this website.
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0 ERROR: Failed to allocate or map pages for DMA ringmemory
b43-phy0 ERROR: Microcode not responding
b43-phy0 ERROR: You must go to http://wireless.kernel.org/en/users/Drivers/b43#devicefirmware and download the correct firmware for this driver version. Please carefully read all instructions on this website.

Of course I have the latest firmware.

If I do ifconfig /all, I get an error message telling it is unable to scan the network interface lists. However, if I issue iwconfig, I can see wlan0 (which therefore has not been renamed to eth1 as expected).

In syslog, I got the following errors:
Dec 19 15:38:33 athor dhcpcd[3176]: eth1: ioctl SIOCSIFFLAGS: No such device
Dec 19 15:45:32 athor dnsmasq[3360]: failed to access /var/dhcp/resolv.conf.eth1: No such file or directory
Dec 19 15:50:12 athor dhcpcd[3475]: eth1: dhcpcd not running

I haven't checked if I could associate and configure the wlan0 interface.
Comment 1 Larry Finger 2009-12-23 02:30:25 UTC
Which BCM43XX device?
Comment 2 Christian Casteyde 2009-12-23 18:24:43 UTC
Parial output of a previous dmesg log from the same machine:

tg3.c:v3.102 (September 1, 2009)
ACPI: PCI Interrupt Link [LNK2] enabled at IRQ 17
tg3 0000:02:06.0: PCI INT A -> Link[LNK2] -> GSI 17 (level, low) -> IRQ 17
eth0: Tigon3 [partno(BCM95788A50) rev 3003] (PCI:33MHz:32-bit) MAC address 00:c0:9f:3e:de:12
eth0: attached PHY is 5705 (10/100/1000Base-T Ethernet) (WireSpeed[0])
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
eth0: dma_rwctrl[763f0000] dma_mask[32-bit]
PPP generic driver version 2.4.2
PPP Deflate Compression module registered
PPP BSD Compression module registered
NET: Registered protocol family 24
usbcore: registered new interface driver cdc_ether
b43-phy0: Broadcom 4306 WLAN found (core revision 5)
phy0: Selected rate control algorithm 'minstrel'
Broadcom 43xx driver loaded [ Features: P, Firmware-ID: FW13 ]

Therefore 4306 integrated broadcom wireless chip on my laptop (Aspire 1511Lmi).
Comment 3 Larry Finger 2009-12-23 18:36:20 UTC
My 4306, rev 5 device works fine with i386 architecture. There mus be something strange with x86_64.

The other difference is that I only have 384 MB RAM. You likely have a lot more.

I will move the device to a different machine that uses a 64-bit OS and has 4 GB RAM. Perhaps that will trigger the same fault.
Comment 4 Michael Buesch 2009-12-23 19:23:18 UTC
Well, the obvious first thing is to find out whether it's the page allocation or the DMA remapping that fails.
Comment 5 Larry Finger 2009-12-23 19:48:31 UTC
Of course, but if I can duplicate it here, the debugging will be a lot easier.

That said, Murphy rules. I find that 2.6.31-rc1 from mainline will not even boot on my x86_64 system. My attempts to debug expose another regression in 2.6.33.
Comment 6 Larry Finger 2009-12-23 19:57:49 UTC
Created attachment 24269 [details]
Patch to see which part of map_ring_memory is failing

Please add this patch to 2.6.33-rc1 and post the error output.
Comment 7 Larry Finger 2009-12-25 03:56:18 UTC
Before you try the patch above, please do one other thing. Check your configuration to see if CONFIG_ACPI_WMI is set. If it is, generate a kernel with this variable not set and see if it works. On my system, the BCM4306 works fine as long as this routine is not configured.

There appears to be a bug in this routine that prevents firmware from loading. It not only affects b43, but one of my TV interfaces as well. There were some changes in this routine between 2.6.33-rc1 and -rc2, but the problem persists.
Comment 8 Christian Casteyde 2009-12-27 19:25:01 UTC
I've tested with CONFIG_ACPI_WMI unset on kernel 2.6.33-rc2.
It still cannot create the interface correctly (I get the following error on the console while the inet1 script is trying to start the network interface:

SIOCSIFFLAGS: Cannot allocate memory

). So I tried your patch and got the following traces:

b43 ssb0:0: firmware: requesting b43/ucode5.fw
b43 ssb0:0: firmware: requesting b43/pcm5.fw
b43 ssb0:0: firmware: requesting b43/b0g0initvals5.fw
b43 ssb0:0: firmware: requesting b43/b0g0bsinitvals5.fw
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0 ERROR: Failed to allocate or map pages for DMA ringmemory
b43-phy0 ERROR: Microcode not responding
b43-phy0 ERROR: You must go to http://wireless.kernel.org/en/users/Drivers/b43#devicefirmware and download the correct firmware for this driver version. Please carefully read all instructions on this website.
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0 ERROR: Failed to allocate or map pages for DMA ringmemory
b43-phy0 ERROR: Microcode not responding
b43-phy0 ERROR: You must go to http://wireless.kernel.org/en/users/Drivers/b43#devicefirmware and download the correct firmware for this driver version. Please carefully read all instructions on this website.
Comment 9 Larry Finger 2009-12-27 22:18:49 UTC
Are you sure you had my patch applied? With it, you should have gotten either 

b43-phy0 ERROR: Get free pages failed

or

b43-phy0 ERROR: ssb_dma_mapping failed

just before the 

b43-phy0 ERROR: Failed to allocate or map pages for DMA ringmemory

As you posted neither, it looks as if the patch were not applied.

Note: As I cannot duplicate this error with identical wireless hardware and the same architecture, debugging this will require your help.

How much memory do you have? I also want to confirm that you are using x86_64 architecture. The original posting seems to be contradictory with 64-bit kernel and 32-bit support.
Comment 10 Larry Finger 2009-12-27 22:43:07 UTC
Created attachment 24320 [details]
Replacement patch

Sorry, the previous patch has a typo. Please try this one.
Comment 11 Michael Buesch 2009-12-27 22:52:15 UTC
(In reply to comment #10)
> Created an attachment (id=24320) [details]
> Replacement patch
> 
> Sorry, the previous patch has a typo. Please try this one.

Do you realize that ssb_dma_mapping_error() will always fail with the following hunk applied?


===================================================================
--- wireless-testing.orig/include/linux/ssb/ssb.h
+++ wireless-testing/include/linux/ssb/ssb.h
@@ -484,10 +484,11 @@ static inline void __cold __ssb_dma_not_
 
 static inline int ssb_dma_mapping_error(struct ssb_device *dev, dma_addr_t addr)
 {
+	int err;
 	switch (dev->bus->bustype) {
 	case SSB_BUSTYPE_PCI:
 #ifdef CONFIG_SSB_PCIHOST
-		return pci_dma_mapping_error(dev->bus->host_pci, addr);
+		err = pci_dma_mapping_error(dev->bus->host_pci, addr);
 #endif
 		break;
 	case SSB_BUSTYPE_SSB:
Comment 12 Larry Finger 2009-12-27 23:02:06 UTC
Created attachment 24321 [details]
A correct replacement

This patch fixes the problem with the previous one. Thanks Michael.
Comment 13 Christian Casteyde 2009-12-28 09:53:02 UTC
With the latest patch, I got this:

EXT3-fs (hda3): using internal journal
EXT3-fs (hda3): mounted filesystem with writeback data mode
b43 ssb0:0: firmware: requesting b43/ucode5.fw
b43 ssb0:0: firmware: requesting b43/pcm5.fw
b43 ssb0:0: firmware: requesting b43/b0g0initvals5.fw
b43 ssb0:0: firmware: requesting b43/b0g0bsinitvals5.fw
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0 ERROR: ssb_dma_mapping failed for dmaaddr of 0x0
b43-phy0 ERROR: Failed to allocate or map pages for DMA ringmemory
b43-phy0 ERROR: Microcode not responding
b43-phy0 ERROR: You must go to http://wireless.kernel.org/en/users/Drivers/b43#devicefirmware and download the correct firmware for this driver version. Please carefully read all instructions on this website.
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0 ERROR: ssb_dma_mapping failed for dmaaddr of 0x0
b43-phy0 ERROR: Failed to allocate or map pages for DMA ringmemory
b43-phy0 ERROR: Microcode not responding
b43-phy0 ERROR: You must go to http://wireless.kernel.org/en/users/Drivers/b43#devicefirmware and download the correct firmware for this driver version. Please carefully read all instructions on this website.

I have a little more than 1GB RAM, and indeed running "64bit port of slackware" (I did a mistake in the first post, sorry):
christian@athor:~$ uname -a

Linux athor 2.6.32.2 #2 PREEMPT Sat Dec 19 11:24:25 CET 2009 x86_64 AMD Athlon 64 Processor 3000+ AuthenticAMD GNU/Linux

christian@athor:~$ free -t
             total       used       free     shared    buffers     cached
Mem:       1280208     691536     588672          0      47848     281840
-/+ buffers/cache:     361848     918360
Swap:       506008          0     506008
Total:     1786216     691536    1094680

Moreover, I'm running kmemcheck+kmemleak, which may eat more kernel address space, but it has always managed to boot until now.
Comment 14 Larry Finger 2009-12-28 12:19:07 UTC
Created attachment 24327 [details]
Test patch for problem allocating ring memory

After turning kmemleak on and restricting my system to the amount of memory that you have, I was able to duplicate the problem.

A BCM4306 uses 30-bit DMA, thus all DMA buffers and descriptors must be in the first 1 GB of RAM. It seems that when kmemleak is on, the DMA descriptors are above the 1 GB boundary. The patch forces then to be below that point.

This patch is unlikely to be the final one, but please test it.
Comment 15 Christian Casteyde 2009-12-28 23:18:23 UTC
Indeed, I also have the problem without kmemcheck.
However, your proposed patch fixes the problem: I get the wireless interface and managed to associate to the AP.
Comment 16 Larry Finger 2009-12-28 23:26:40 UTC
Created attachment 24334 [details]
Patch to revert commit 9bd568a50c4464330

Very good that the above patch fixes the problem. At least we understand the problem.

Between my posting that patch and your report, we decided to revert the faulty patch. As a result, I would like you to test the patch I am adding now.

Sorry for the extra work and thanks for testing.
Comment 17 Christian Casteyde 2009-12-29 12:51:54 UTC
This latest patch works nicely (applied on vanilla -rc2), with and without kmemcheck.

Moreover, I also used to have the following warning with patch from comment #14, that I didn't managed to reproduce with your latest patch:

ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
b43-phy0 ERROR: DMA tx mapping failure

Don't know what this means, but it didn't prevent me from sending/receiving packets.
Comment 18 Rafael J. Wysocki 2009-12-29 21:03:50 UTC
On Tuesday 29 December 2009, Christian Casteyde wrote:
> Yes, still present in 2.6.33-rc2
> 
> Le mardi 29 décembre 2009 16:09:57, vous avez écrit :
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.32.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=14844
> > Subject             : No more wireless interface eth1 after boot
> > Submitter   : Christian Casteyde <casteyde.christian@free.fr>
> > Date                : 2009-12-19 16:33 (11 days old)
Comment 19 Larry Finger 2009-12-29 21:38:46 UTC
John W. Linville just pushed a revert of "b43: Enforce DMA descriptor memory constraints" to DaveM. This change will fix this problem as soon as it hits mainline.
Comment 20 Christian Casteyde 2010-01-06 21:13:23 UTC
Fixed in 2.6.33-rc3.