Bug 14761 - 2.6.32 b43 low power wireless driver hogs CPU
Summary: 2.6.32 b43 low power wireless driver hogs CPU
Status: RESOLVED WILL_FIX_LATER
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Larry Finger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-07 21:39 UTC by Bob Billson
Modified: 2009-12-13 16:50 UTC (History)
4 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Fall back to PIO (8.77 KB, patch)
2009-12-08 23:51 UTC, Michael Buesch
Details | Diff

Description Bob Billson 2009-12-07 21:39:28 UTC
Trying out the b43 wireless low power driver in the 2.6.32 kernel. The module
loads however 'top' shows that "phy0" is using 42% of the CPU. The keyboard is
less responsive. The wireless does not connect to an open network. Using the Broadcom proprietary wireless driver, however, does work properly.

Computer is a Dell Inspiron 1545 (laptop). Wireless chip is Broadcom 4312 (pci-id: 14e4:4312). Distribution is Kubuntu 9.10. Kernel is vanilla 2.6.32
compiled myself. .config is attached.

Would be happy to test any fixes.

          bob
Comment 1 Andrew Morton 2009-12-07 22:02:26 UTC
Is this a regression?  Was any earlier kernel version OK?  If so, which version?

Thanks.
Comment 2 John W. Linville 2009-12-08 14:09:26 UTC
Earlier kernels didn't support this phy hardware, so no regression... :-)

I'll ask Gábor to take a look.
Comment 3 Larry Finger 2009-12-08 15:26:42 UTC
I would like more details as this is atypical behavior. On my AMD Turion 64X2 with a 2.0 GHz clock, the phy tasklet takes no more than 8% of 1 CPU when copying a CD image using NFS. The irq tasklet takes an additional 15%.

Is this one of the systems where PIO is being used because DMA fails?
Comment 4 Bob Billson 2009-12-08 19:52:19 UTC
(In reply to comment #1)
> Is this a regression?  Was any earlier kernel version OK?  If so, which
> version?

Hi Andrew,

No, not a regression. Before 2.6.32, the low-power wireless did not work
with the b43 driver distributed with the kernel at all. Ubuntu distributes
their kernel (currently 2.6.31-16) with Broadcom's closed b43 driver, which
works without any problems.
Comment 5 Bob Billson 2009-12-08 20:19:35 UTC
(In reply to comment #3)
> I would like more details as this is atypical behavior. On my AMD Turion 64X2
> with a 2.0 GHz clock, the phy tasklet takes no more than 8% of 1 CPU when
> copying a CD image using NFS. The irq tasklet takes an additional 15%.

Gladly, provide as much as I can. If you need, more let me know what.

The laptop is dual core Pentium, 2.0 gig. Other than running KDE 4.3
the machine is pretty much idle. Nothing unusual is running. No NFS or
CD burning not even a web browser.

> Is this one of the systems where PIO is being used because DMA fails?

hmm... possibly. Looking in /var/log/messages, I see:

Dec  5 19:29:08 babelfish kernel: [ 2199.833975] b43-phy0: Broadcom 4312 WLAN found (core revision 15)
Dec  5 19:29:08 babelfish kernel: [ 2199.861386] Broadcom 43xx driver loaded [ Features: P, Firmware-ID: FW13 ]
Dec  5 19:29:08 babelfish kernel: [ 2199.942240] b43 ssb0:0: firmware: requesting b43/ucode15.fw
Dec  5 19:29:08 babelfish kernel: [ 2199.995173] b43 ssb0:0: firmware: requesting b43/lp0initvals15.fw
Dec  5 19:29:08 babelfish kernel: [ 2199.999637] b43 ssb0:0: firmware: requesting b43/lp0bsinitvals15.fw
Dec  5 19:29:08 babelfish kernel: [ 2200.133274] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
Dec  5 19:29:10 babelfish kernel: [ 2201.534791] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Dec  5 19:30:46 babelfish kernel: [ 2297.472826] b43-phy0: Controller RESET (DMA error) ...
Dec  5 19:30:46 babelfish kernel: [ 2297.626282] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
Dec  5 19:30:47 babelfish kernel: [ 2299.023086] b43-phy0: Controller restarted
Dec  5 19:30:47 babelfish kernel: [ 2299.032312] b43-phy0: Controller RESET (DMA error) ...
Dec  5 19:30:47 babelfish kernel: [ 2299.190288] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
Dec  5 19:30:49 babelfish kernel: [ 2300.588100] b43-phy0: Controller restarted
Dec  5 19:30:49 babelfish kernel: [ 2300.588131] b43-phy0: Controller RESET (DMA error) ...
Dec  5 19:30:49 babelfish kernel: [ 2300.742290] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
Dec  5 19:30:50 babelfish kernel: [ 2302.140102] b43-phy0: Controller restarted
Dec  5 19:30:51 babelfish kernel: [ 2302.303285] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
Dec  5 19:30:52 babelfish kernel: [ 2303.699101] __ratelimit: 1 callbacks suppressed__ratelimit: 1 callbacks
Dec  5 19:30:52 babelfish kernel: [ 2303.699103] b43-phy0: Controller restarted
Dec  5 19:30:52 babelfish kernel: [ 2303.699134] b43-phy0: Controller RESET (DMA error) ...
Dec  5 19:30:52 babelfish kernel: [ 2303.853282] b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)

[these lines repeat until the wl module is unloaded or machine re-booted]

The line __ratelimit: 1 callbacks changes to "2 callbacks" after the first
loop and remains unchanged.

Does this help any?

        bob
Comment 6 Larry Finger 2009-12-08 20:57:45 UTC
That is the reason for hogging the CPU. When those DMA errors occur, the interface is restarted over and over.

We do not know the cause for those errors as none of the developers machines have the problem. Most occur for Atom systems, but some are on other CPUs as well.

If you want to use b43, tne workaround is to set the CONFIG_B43_FORCE_PIO variable in your configuration. Otherwise, you will need to use the Broadcom wl driver.

For completeness, could you please post the output of 'lspci'? That way, the identity of the host bridges on your system will be known.
Comment 7 John W. Linville 2009-12-08 21:02:53 UTC
Could we switch to PIO after we see a DMA error?
Comment 8 Larry Finger 2009-12-08 21:17:20 UTC
At the moment, b43 is either DMA or PIO. It used to have both capabilities, and b43legacy still does, but without Michael's help, I probably would mess it up completely.
Comment 9 Bob Billson 2009-12-08 21:19:15 UTC
(In reply to comment #6)
> Most occur for Atom systems, but some are on other CPUs as well.
 
It isn't an Atom. I feel special. :-) If any of the developers come up with patches, I'll be happy to test them.

> If you want to use b43, tne workaround is to set the CONFIG_B43_FORCE_PIO
> variable in your configuration. Otherwise, you will need to use the Broadcom
> wl
> driver.

I'll give this a try and report back. (May be a day or two)

> For completeness, could you please post the output of 'lspci'? That way, the
> identity of the host bridges on your system will be known.

Sure ...

00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)                                                                   
00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)                                            
00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)                                                   
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)                                                                  
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)                                                                  
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)                                                                  
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)                                                                 
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)                                                                          
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)                                                                          
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)                                                                          
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)                                                                          
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)                                                                  
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)                                                                  
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)                                                                  
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)                                                                 
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)          
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)   
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)                                                                             
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03) 
09:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8040 PCI-E Fast Ethernet Controller (rev 13)
09:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8040 PCI-E Fast Ethernet Controller (rev 13)
0c:00.0 Network controller: Broadcom Corporation BCM4312 802.11b/g (rev 01
Comment 10 Bob Billson 2009-12-08 21:23:03 UTC
BTW, the last line of the lspci had the closing ) chopped when I cut and pasted.
Comment 11 Michael Buesch 2009-12-08 23:51:22 UTC
Created attachment 24101 [details]
Fall back to PIO

Here's a (broken) patch that tries to implement such PIO fallback mechanism. It doesn't work, because it fails to drain queues properly or whatever. I don't know. Try it and you'll notice.

I suggest to _not_ apply this braindamage. It has lots of issues:

* Lots of code just to workaround a bug. Just get the developers a machine and fix the bug instead.
* New hardly-tested codepaths that blow up all the time, because nobody tests them. We already have enough of those codepaths.
* It adds bloat. Both unnecessary code for 99% of the people and it wastes kmalloc'ed memory, because we're forced to remove the dma/pio data structure union.
* Pulling DMA away under b43's ass means we drop all currently queued TX packets. I'm not sure the stack will like this.
* It doesn't work and I'm not going to check why.

So it does not work this way. What other options do we have?
* The bug could probably get fixed instead ;)
* Some flag voodoo magic could probably implemented that handles the PIO fallback using a whole card reset. This does not solve all issues, however. Especially because the whole restart logic is seriously broken in b43 (Did I talk about crap hardly-tested codepaths already?)

So my final suggestion is: Rip out all that restart crap and simply printk a message if the DMA error occurs telling the user that his device does not work and what to do...
Comment 12 John W. Linville 2009-12-09 15:50:29 UTC
Michael, thanks for the effort!  At least there is a basis for more work if someone decides to pursue that option.  Also thanks for the subtly phrased analysis of the situation. :-)  Still, it might be nice to have a run-time (or load-time) option to use PIO...?

Any takers on Michael's "final suggestion"?
Comment 13 Larry Finger 2009-12-09 15:58:29 UTC
I'll take a hack at removing the restart crap. To my knowledge, there has never been a case where it has worked.

Any thoughts on reinserting the code that lets one choose PIO at module load time rather than at compile time. It will make the module bigger, but it shouldn't be too bad.
Comment 14 Michael Buesch 2009-12-09 19:25:39 UTC
> I'll take a hack at removing the restart crap. To my knowledge, there has
> never
> been a case where it has worked.

It cannot work, because it doesn't notify the 802.11 stack about the restart. mac80211 does have a callback, but it's rather nontrivial to implement, because it has some assumptions about the start/stop device state. I'm not sure if it's worth worrying about. Device restart is hardly used in b43 anyway. I'd rather remove it completely than to fix it.

> Any thoughts on reinserting the code that lets one choose PIO at module load
> time rather than at compile time. It will make the module bigger, but it
> shouldn't be too bad.

Well, I guess we could do this. You wouldn't need to remove the union then.
Comment 15 Bob Billson 2009-12-13 00:28:26 UTC
> If you want to use b43, the workaround is to set the CONFIG_B43_FORCE_PIO
> variable in your configuration.

That worked. No CPU hogging.
Comment 16 Larry Finger 2009-12-13 00:53:59 UTC
You might be interested to learn that I have submitted a patch that will allow someone in your situation to switch to PIO merely by adding the option "pio=1" to the module load. No more having to rebuild the kernel.
Comment 17 Larry Finger 2009-12-13 00:55:29 UTC
Closed. This is a known problem - the fix is as yet unknown.
Comment 18 Bob Billson 2009-12-13 04:08:39 UTC
(In reply to comment #16)
> someone in your situation to switch to PIO merely by adding the option
> "pio=1"
> to the module load. No more having to rebuild the kernel.

Thanks on behalf of any other b43 users with the same problem Hopefully, a better fix can be found. Wonder if Broadcom's closed driver successfully uses DMA.
Comment 19 Larry Finger 2009-12-13 16:50:42 UTC
It does, but we have not found the difference between theirs and ours. Even more strange, most systems will work if you use wl, then unload it and load b43, or warm boot into b43 after loading wl. No, we do not understand. :)

Note You need to log in before you can comment on or make changes to this bug.