Bug 15825 - oops removing b43 with rmmod
oops removing b43 with rmmod
Status: RESOLVED CODE_FIX
Product: Networking
Classification: Unclassified
Component: Wireless
All Linux
: P1 normal
Assigned To: Larry Finger
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-21 10:42 UTC by bugzillakernelorg
Modified: 2012-07-11 15:44 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.34-rc5
Tree: Mainline
Regression: No


Attachments
current oops, untainted (6.60 KB, text/plain)
2010-04-22 07:29 UTC, bugzillakernelorg
Details
current kernel config (92.93 KB, application/octet-stream)
2010-04-22 07:30 UTC, bugzillakernelorg
Details
Test patch to debug the reason for ssb initialization failure (1.87 KB, patch)
2010-04-23 17:45 UTC, Larry Finger
Details | Diff
Test Patch V2 to debug the reason for ssb initialization failure (2.59 KB, patch)
2010-04-24 03:21 UTC, Larry Finger
Details | Diff
Test patch V3 to debug the reason for ssb initialization failure (3.38 KB, patch)
2010-04-24 04:51 UTC, Larry Finger
Details | Diff
Test Patch V4 to debug the reason for ssb initialization failure (1.01 KB, application/octet-stream)
2010-04-24 14:13 UTC, Larry Finger
Details
Patch to test for SPROM read failure with fallback (1.46 KB, patch)
2010-04-24 16:51 UTC, Larry Finger
Details | Diff
Patch V2 to test for SPROM read failure with fallback (2.00 KB, patch)
2010-04-24 22:23 UTC, Larry Finger
Details | Diff
Patch to test for existance of SPROM on machine (5.47 KB, patch)
2010-04-24 23:21 UTC, Larry Finger
Details | Diff
Additional patch to set fast powerup delay (913 bytes, patch)
2010-04-26 16:36 UTC, Larry Finger
Details | Diff
Additional patch to ssb pmu init (932 bytes, patch)
2010-04-28 15:30 UTC, Larry Finger
Details | Diff
Patch to detect SPROM at alternate location (4.76 KB, patch)
2010-05-06 03:15 UTC, Larry Finger
Details | Diff
dmesg with patch 26253 applied, ssb & b43 loaded, on AP (67.35 KB, text/plain)
2010-05-07 07:03 UTC, bugzillakernelorg
Details

Description bugzillakernelorg 2010-04-21 10:42:18 UTC
as part of trying to start bisecting a hard lockup when loading ssb; this occurred after probing, removing, probing, then finally attempting to remove; all in about 5 seconds (was trying to get ssb to freeze on this build)

[  359.312663] BUG: unable to handle kernel paging request at 051eb001
[  359.312695] IP: [<c012a558>] __ticket_spin_lock+0x8/0x20
[  359.312734] *pde = 00000000
[  359.312753] Oops: 0002 [#1] SMP
[  359.312771] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0/ssb0:0/firmware/ssb0:0/loading
[  359.312791] Modules linked in: b43(-) ssb arc4 mac80211 cfg80211 led_class binfmt_misc ppdev snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep fbcon tileblit font bitblit snd_pcm_oss snd_mixer_oss snd_pcm softcursor snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event joydev snd_seq snd_timer i915 drm_kms_helper snd_seq_device lp drm i2c_algo_bit parport snd uvcvideo videodev soundcore intel_agp v4l1_compat psmouse video agpgart output serio_raw snd_page_alloc atl1c ahci lzo_compress [last unloaded: ssb]
[  359.313017]
[  359.313038] Pid: 1726, comm: rmmod Tainted: G         C 2.6.34-rc5 #6 308F/Compaq Mini 110c-1100
[  359.313059] EIP: 0060:[<c012a558>] EFLAGS: 00210082 CPU: 1
[  359.313083] EIP is at __ticket_spin_lock+0x8/0x20
[  359.313099] EAX: 051eb001 EBX: 00200282 ECX: 051eb001 EDX: 00000100
[  359.313117] ESI: ed49ce1c EDI: e0871bf0 EBP: e0847e10 ESP: e0847e10
[  359.313134]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  359.313155] Process rmmod (pid: 1726, ti=e0846000 task=e09419a0 task.ti=e0846000)
[  359.313168] Stack:
[  359.313177]  e0847e18 c012a5e8 e0847e2c c057d90f 051eb001 ed49ce1c e0871bf0 e0847e40
[  359.313218] <0> c016139f 00000001 00000000 e0871bf0 e0847e50 c0161450 e0871c40 ed49ce1c
[  359.313263] <0> e0847e5c c01614aa ed49c2a0 e0847e6c f8d5dfbc ed49cd24 00000000 e0847e74
[  359.313308] Call Trace:
[  359.313337]  [<c012a5e8>] ? default_spin_lock_flags+0x8/0x10
[  359.313364]  [<c057d90f>] ? _raw_spin_lock_irqsave+0x2f/0x50
[  359.313389]  [<c016139f>] ? __queue_work+0x1f/0x50
[  359.313414]  [<c0161450>] ? queue_work_on+0x40/0x60
[  359.313437]  [<c01614aa>] ? queue_work+0x1a/0x20
[  359.313516]  [<f8d5dfbc>] ? ieee80211_queue_work+0x2c/0x40 [mac80211]
[  359.313571]  [<f8ed9fa8>] ? b43_led_brightness_set+0x28/0x30 [b43]
[  359.313599]  [<c049cb5f>] ? led_trigger_set+0xbf/0xd0
[  359.313621]  [<c049cc09>] ? led_trigger_unregister+0x99/0xa0
[  359.313691]  [<f8d5ecaa>] ? ieee80211_led_exit+0x1a/0x80 [mac80211]
[  359.313755]  [<f8d43100>] ? ieee80211_unregister_hw+0xb0/0xe0 [mac80211]
[  359.313782]  [<c01616ef>] ? cancel_work_sync+0xf/0x20
[  359.313823]  [<f8ebcd9a>] ? b43_remove+0xaa/0xc0 [b43]
[  359.313854]  [<f8e60292>] ? ssb_device_remove+0x22/0x40 [ssb]
[  359.313878]  [<c03d41e1>] ? __device_release_driver+0x51/0xb0
[  359.313902]  [<c03d42cf>] ? driver_detach+0x8f/0xa0
[  359.313927]  [<c03d34d3>] ? bus_remove_driver+0x63/0xa0
[  359.313951]  [<c03d4849>] ? driver_unregister+0x49/0x80
[  359.313983]  [<f8e60e80>] ? ssb_driver_unregister+0x10/0x20 [ssb]
[  359.314026]  [<f8edb9b2>] ? b43_exit+0x12/0x37 [b43]
[  359.314051]  [<c017d499>] ? sys_delete_module+0x169/0x200
[  359.314078]  [<c01e7132>] ? do_munmap+0x212/0x2e0
[  359.314104]  [<c0102fe3>] ? sysenter_do_call+0x12/0x28
[  359.314119] Code: b8 fd a3 12 c0 e9 59 ff ff ff 90 b9 00 a4 12 c0 b8 03 a4 12 c0 e9 49 ff ff ff 90 90 90 90 90 90 90 90 90 55 ba 00 01 00 00 89 e5 <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 8d b4 26 00
[  359.314357] EIP: [<c012a558>] __ticket_spin_lock+0x8/0x20 SS:ESP 0068:e0847e10
[  359.314392] CR2: 00000000051eb001
[  359.314410] ---[ end trace bbc1892e66551ae2 ]---
Comment 1 John W. Linville 2010-04-21 14:19:17 UTC
Not sure who needs to look at this...could we get that far down the call stack if ieee80211_queue_work were passing bad values to queue_work?
Comment 2 bugzillakernelorg 2010-04-22 01:40:03 UTC
can those bits be instrumented? i can turn it on and get it to oops again, gimme config options to set :]
Comment 3 Larry Finger 2010-04-22 02:15:17 UTC
You will have to debug this. I installed my BCM4311/2 tonight and ran about 20 minutes of a rmmod/insmod loop without any problem.
Comment 4 bugzillakernelorg 2010-04-22 02:18:49 UTC
alright; was just wondering what else i could turn on to help, i am using linux-2.6-git as of 05ce7bfe547c9fa967d9cab6c37867a9cb6fb3fa, and my card is an lp-phy in an hp mini if that's of any consequence
Comment 5 bugzillakernelorg 2010-04-22 02:20:02 UTC
oh, and it did involve removing ssb; and i was using modprobe (ie. modprobe b43, rmmod b43, rmmod ssb, modprobe b43)
Comment 6 Larry Finger 2010-04-22 02:50:54 UTC
Actually, my first test was a modprobe -r b43, sleep 2, modprobe b43, sleep 2. I just did a loop of rmmod b43 ; rmmod ssb; sleep 2 ; modprobe b43 ; sleep 10. That one ran for about 40 cycles.

My card is a G PHY, but it shouldn't make any difference. Most of the code that could be causing this kind of problem is common.

Using git-describe, my kernel is v2.6.34-rc5-59-g1ef6ce7.
Comment 7 bugzillakernelorg 2010-04-22 07:24:50 UTC
alright, its doing it after modprobe ssb; modprobe b43; rmmod b43

using v2.6.34-rc5-23-g05ce7bf

will attach new oops and .config
Comment 8 bugzillakernelorg 2010-04-22 07:29:03 UTC
Created attachment 26090 [details]
current oops, untainted
Comment 9 bugzillakernelorg 2010-04-22 07:30:33 UTC
Created attachment 26091 [details]
current kernel config

derived from ubuntu's 2.6.32 config, along with localmodconfig to reduce build time (its a netbook)
Comment 10 bugzillakernelorg 2010-04-22 07:40:24 UTC
it did it again when i rmmod'd b43 after posting those attachments, fault address is 6a72282e this time, how do i use gdb to look at this? or get it to dump something so i can use gdb?
Comment 11 bugzillakernelorg 2010-04-23 00:21:09 UTC
its oopsing at shutdown in apbt_cpuhp_notify (which the thing doesn't even have) regardless of whether i load b43 or ssb; gonna reup and see what it does
Comment 12 bugzillakernelorg 2010-04-23 11:22:30 UTC
this is the wrong place to post this; but regarding the freeze that started the investigation


it manages to return from here
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/ssb/main.c#l766

but not get to here
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/ssb/main.c#l783

console spam when successfully loaded:
[   77.289384] ssb: Core 0 found: ChipCommon (cc 0x800, rev 0x16, vendor 0x4243)
[   77.292437] ssb: Core 1 found: IEEE 802.11 (cc 0x812, rev 0x0F, vendor 0x4243)
[   77.295377] ssb: Core 2 found: PCMCIA (cc 0x80D, rev 0x0A, vendor 0x4243)
[   77.298451] ssb: Core 3 found: PCI-E (cc 0x820, rev 0x09, vendor 0x4243)
[   77.320252] ssb: Found rev 1 PMU (capabilities 0x02A62F01)
[   77.324665] ssb: SPROM revision 8 detected.
[   77.344416] ssb: Sonics Silicon Backplane found on PCI device 0000:01:00.0

console spam on failure:
[   77.289384] ssb: Core 0 found: ChipCommon (cc 0x800, rev 0x16, vendor 0x4243)
[   77.292437] ssb: Core 1 found: IEEE 802.11 (cc 0x812, rev 0x0F, vendor 0x4243)
[   77.295377] ssb: Core 2 found: PCMCIA (cc 0x80D, rev 0x0A, vendor 0x4243)
[   77.298451] ssb: Core 3 found: PCI-E (cc 0x820, rev 0x09, vendor 0x4243)

i may try some printf debugging tomorrow
Comment 13 Larry Finger 2010-04-23 15:10:43 UTC
I'm confused. What are those magic numbers #1766 and #1783 above? I first expected them to be line numbers, but drivers/ssb/main.c only has 1454 lines.

If "successfully" loaded and you get the PMU, SPROM, and PCI device messages logged, does that mean you can then unload ssb, and that if the log does not include those messages, your box hangs on unload?
Comment 14 Larry Finger 2010-04-23 17:45:24 UTC
Created attachment 26114 [details]
Test patch to debug the reason for ssb initialization failure

The patch will provide info on which portion of the ssb initialization is failing.
Comment 15 bugzillakernelorg 2010-04-23 23:56:04 UTC
sorry; it was a hard freeze, on failure the machine is utterly ruined and needs a power cycle, gonna try that patch out (also those are l766, goes right to the line in git)
Comment 16 bugzillakernelorg 2010-04-24 02:14:34 UTC
this thing is very ugly; its freezing in random places as i modify things, now its locking up after finding the pmu.

if i consume cpu with cat /dev/zero > /dev/null & it actually gets past probing but is "wrong"[1], and probing b43 will cause a flood of bad_io oopses

1:
wrong as in missing the pmu and sprom init messages

(typed out, rebooting to probe wl is a pain)
ssb: Core 0 ...
ssb: Core 1 ...
ssb: Core 2 ...
ssb: Core 3 found: PCI-E (...)
ssb: Sonics Silicon Backplane found on PCI device 0000:01:00.0

missing the pmu line and the rest
and if i rmmod ssb immediately after probing:

BUG: unable to handle kernel paging request at 00100104
IP: [<...>] ssb_bus_unregister+0x61/0xa0 [ssb]
*pde = 00000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

stack
ssb_pcihost_remove
pci_device_remove
__device_release_driver
driver_detach
bus_remove_driver
driver_unregister
sysfs_remove_file
pci_unregister_driver
b43_pci_ssb_bridge_exit
ssb_modexit
sys_delete_module
do_munmap
sysenter_do_call
Comment 17 bugzillakernelorg 2010-04-24 02:17:47 UTC
when i say freeze; even the blinking cursor on the text console stops blinking
Comment 18 Larry Finger 2010-04-24 03:09:35 UTC
Is this a netbook?

The crash in Comment #16 is understandable. If it cannot successfully register the ssb bus, then unregister will fail.

Assuming that you had applied the previous test patch, I will generate another one with a little better diagnostics.
Comment 19 Larry Finger 2010-04-24 03:21:54 UTC
Created attachment 26115 [details]
Test Patch V2 to debug the reason for ssb initialization failure

There was a bug in the previous patch that prevented it from ever working. Sorry.
Comment 20 bugzillakernelorg 2010-04-24 03:31:58 UTC
yes this is a netbook it has an lp-phy and wl calls it a 4315.

i checked the patch after the freeze but i missed that too :>

now using v2.6.34-rc5-130-gd5a3045 as well
Comment 21 bugzillakernelorg 2010-04-24 04:12:32 UTC
_bus_scan returned OK
_pci_init returned OK
_pcmcia_init returned OK
_bus_powerup returned OK
ssb: Found rev 1 PMU (capabilities 0x02A62F01)

then freeze
Comment 22 Larry Finger 2010-04-24 04:51:49 UTC
Created attachment 26118 [details]
Test patch V3 to debug the reason for ssb initialization failure

I think we are gaining :).
Comment 23 bugzillakernelorg 2010-04-24 05:45:01 UTC
back from _chipcommon_init
back from _mipscore_init
Comment 24 bugzillakernelorg 2010-04-24 08:35:34 UTC
it gets to pci.c:sprom_do_read

wasn't there a patch re: SSB_SPROM_BASE?

still investigating, but it just does this in a loop

sprom[i] = ioread16(bus->mmio + SSB_SPROM_BASE + (i * 2))
Comment 25 bugzillakernelorg 2010-04-24 09:11:34 UTC
it reads the first element (sprom[0] = 65535) then freezes reading the next
Comment 26 Larry Finger 2010-04-24 14:13:46 UTC
Created attachment 26119 [details]
Test Patch V4 to debug the reason for ssb initialization failure

Thanks for a really good job of debugging.

The 0xFFFF value returned by the first SPROM read is an indication that there is no at that address. This version of the patch will detect that condition and bail out.

We did have a patch for a relocated SPROM, which I think is your situation, but it didn't cure the problem on the test machine. I will be reconstituting the patch for you to try.
Comment 27 Larry Finger 2010-04-24 16:51:47 UTC
Created attachment 26120 [details]
Patch to test for SPROM read failure with fallback

This patch tests to see if the read of SPROM at offset 0x1000 returns 0xFFFF. If so, it tries at an offset of 0x0800. If that also fails, it returns an error code, otherwise it reads the SPROM at the new offset.

John: Will your box boot with this patch?
Comment 28 bugzillakernelorg 2010-04-24 19:16:08 UTC
what i'm wondering about is why it'd occasionally work with 2.6.31, was something like this backed out in 32+?
Comment 29 bugzillakernelorg 2010-04-24 19:20:01 UTC
#26's patch isn't text/plain
Comment 30 bugzillakernelorg 2010-04-24 19:58:12 UTC
first load of ssb goes fine with the patch; rmmod' and reprobe explodes

modprobe ssb, modprobe b43, rmmod b43 -> oops

gonna look at .31 and see why it worked "sometimes"
Comment 31 Larry Finger 2010-04-24 20:15:25 UTC
The "patch" from #26 can be ignored.

What does the dmesg output with the patch from #27 show?

Nothing like this was backed out that I can recall.

I'll check to see if something is not being freed that causes the oops on unload/reload.
Comment 32 bugzillakernelorg 2010-04-24 22:04:54 UTC
alright, to refine "fine", just got to a friends house and tried to probe ssb
again, and its freezing after "Found rev 1 PMU" just like before, even with the
patch, i think the SPROM isn't "ready" until some period after init
Comment 33 Larry Finger 2010-04-24 22:23:07 UTC
Created attachment 26125 [details]
Patch V2 to test for SPROM read failure with fallback

I think I got caught taking too many steps at once.

With this version, I only test the initial SPROM read.

I also found that that the code calling sprom_do_read() was unprepared for anything but success. That is fixed with this patch.
Comment 34 Larry Finger 2010-04-24 23:21:51 UTC
Created attachment 26126 [details]
Patch to test for existance of SPROM on machine

This patch is one that was applied to wireless-testing, but pulled in part because we did not have test machines.

Does this one report that the machine has no SPROM?
Comment 35 bugzillakernelorg 2010-04-24 23:26:25 UTC
when it succeeds and gets loaded without freezing, it says it has a "version 8" SPROM

from post #8's dmesg: [   77.324665] ssb: SPROM revision 8 detected.
Comment 36 bugzillakernelorg 2010-04-25 04:08:03 UTC
rebuilding with patch now; wondering how it functions when it does "find" an SPROM? if its not present; also i've used the ssb-sprom tool from bu3sch's git repo and it said the one i got from sysfs was junk (this was way way before i started getting freezes)
Comment 37 bugzillakernelorg 2010-04-25 04:39:37 UTC
ssb: No SPROM available!
ssb: Failed to register ...

now what :]

wl and ssb/b43 (when it works) has the same mac address, how could that be if there was no sprom?
Comment 38 Larry Finger 2010-04-25 05:15:12 UTC
I had forgotten that you had posted a log message where the Rev 8 SPROM was listed.

Obviously, your device does have an SPROM at offset 0x1000, but we have some problem with reading those data.

If you do a cold boot from power down, is b43 more likely to work if you load and unload wl first? There are some systems that work following the wl cycle, but get DMA errors when coming from a cold boot.

Have you been able to test whether 2.6.32 works better than 2.6.34-rcX?

I will try to find differences between the Broadcom driver and our SSB setup and generate some test patches.
Comment 39 bugzillakernelorg 2010-04-25 06:00:49 UTC
2.6.32 was when i started having near 100% boot failure, and began investigation

that being said, when using .31[1]; generally, it would freeze during a cold boot, following that with a forced power down; the very next boot would complete

i did not bother with wl when using .31; i didn't even know it was ssb causing the boot failures and it was easy enough to restart

1: that was when i was running ubuntu karmic; i've since upgraded to lucid beta, didn't get any logs for freezes on karmic
Comment 40 Matt Parnell 2010-04-26 02:06:09 UTC
I can confirm this with .32 and above here. I've tried with the wireless-testing tree, as well as with the Zen kernel that generally pulls in the latest -next stuff and despite this I still can't get b43 working.

I too have the HP mini 110 netbook, and it also has the bcm4312 rev 15 card. The only way to get b43 working here is to use the default Archlinux kernel (which is pretty vanilla in most ways), and pass acpi=off to grub. In this situation, b43 works perfectly for me. Any kernel version above .33 and it fails even with acpi=off.

I'm open to helping in whatever capacity I can. I'd be happy to test patches, and do whatever it takes to help get this bug squashed.
Comment 41 Larry Finger 2010-04-26 02:58:43 UTC
A bisection of where the problem starts between 2.6.32 and 2.6.33 would be helpful. You should use the wireless-testing tree for that. The Archlinux configuration should be fine.
Comment 42 Matt Parnell 2010-04-26 03:40:08 UTC
Problem with that is that I briefly tried b43 with .32 and don't remember if it worked or not, and then prior to that I hadn't bought the laptop yet, so I guess it'd be a good start for me to just try the .30 or .31 and .32 mainlines again and see if either of them have the same issue.

I'll look into it.
Comment 43 Larry Finger 2010-04-26 03:50:17 UTC
The 4315 device was not added until 2.6.32. It was in wireless-testing during 2.6.31, but not anywhere before that.

After thinking about it, using the mainline linux-2.6.git tree would be better as it has fewer reversions of commits. It is a fact of the way the wireless-testing tree was maintained until recently.
Comment 44 Larry Finger 2010-04-26 16:36:47 UTC
Created attachment 26147 [details]
Additional patch to set fast powerup delay

This patch should be applied on top of the other two. It is a quick hack to set a parameter found in the Broadcom driver.
Comment 45 John W. Linville 2010-04-26 17:45:26 UTC
Just for the record, as I mentioned on linux-wireless mailing list the patch posted there that is similar to the patch in comment 33 still produced a hang on my box.  But the "== 0xFFFF" test seems to have worked -- it hung somewhere on the return path after that.  I'm trying to pin-down the actual hang location.
Comment 46 Matt Parnell 2010-04-27 02:56:34 UTC
I applied those 3 patches to .33, and after the build and modprobing, I don't have a system freeze/crash. No interface is created, but b43 and ssb both load fine with them...so I guess that's a start.

Dmesg:
http://paste.pocoo.org/show/206573/
Comment 47 bugzillakernelorg 2010-04-27 03:25:14 UTC
i'll be trying the patch in #44 ASAP (should be tonight) thanks
Comment 48 Matt Parnell 2010-04-27 03:31:32 UTC
You're supposed to use #44 plus the two before it btw.

Oh...and just to make it clear, the modules load fine and the system doesn't crash, and I don't have to disable ACPI, but you need to know that nothing actually works...(yet)...no wireless interface is created by b43.
Comment 49 Larry Finger 2010-04-27 03:48:12 UTC
It is expected that b43 won't work. I'm just trying to get a feeling on what will avoid the freezes. We crawl before we run.

BTW, I have arranged to get a loan of a netbook with these problems so I can do debugging locally. I certainly hope it helps.
Comment 50 Matt Parnell 2010-04-27 04:09:38 UTC
Ok. That's fine.

And excellent on the netbook thing...I'm going on a trip May 8...and I hope that I can help in some little bit enough to hopefully get b43 working before then if possible, though I know progress can be slow.
Comment 51 bugzillakernelorg 2010-04-27 05:06:38 UTC
no change with #44, finds pmu, doesn't find sprom; i'll see if i can get it to ever load and see what it says
Comment 52 bugzillakernelorg 2010-04-27 05:16:42 UTC
checking that the first read is 0xFFFF and failing seems to make it never, even by chance succeed, gonna try backing that out
Comment 53 bugzillakernelorg 2010-04-27 06:22:06 UTC
still investigating after backing out patch(s), but it seems _much_ more readily to load ssb without a freeze with #44 (even more readily than on .31)
Comment 54 bugzillakernelorg 2010-04-27 06:24:31 UTC
as usual, spoke too soon; it loaded and reloaded nearly 30 times, now it wont do it at all, like before
Comment 55 Larry Finger 2010-04-28 15:30:49 UTC
Created attachment 26172 [details]
Additional patch to ssb pmu init

This patch, to be applied on top of #26125, #26126, and #26147, reflects a change in the PMU initialization. Does it make any difference?
Comment 56 Matt Parnell 2010-04-28 21:30:19 UTC
It changed the dmesg output a bit:

http://pastebin.org/190603
Comment 57 Larry Finger 2010-04-28 21:53:32 UTC
I'm sorry, but I didn't see any difference between the dmesg posting in #56 and the one in #46.
Comment 58 Matt Parnell 2010-04-28 23:57:57 UTC
I was mistaken. I coulden't get to the first paste for some reason earlier...DNS error or something.


In that case, the patch didn't really do anything visibly apparent.
Comment 59 bugzillakernelorg 2010-04-29 06:35:14 UTC
with applied patches:
Failed to register PCI version of SSB with error -19
Comment 60 Matt Parnell 2010-05-04 21:45:37 UTC
Any news on this bug?

I'll do anything I can to help.
Comment 61 Larry Finger 2010-05-06 03:15:28 UTC
Created attachment 26253 [details]
Patch to detect SPROM at alternate location

Please try this patch and report if the BCM4312 works after it is applied.
Comment 62 Matt Parnell 2010-05-07 00:54:06 UTC
The b43 interface works this time and I can bring it up, but I can't scan or connect. I'm getting some interesting error messages, though...

Dmesg:
b43-phy1: Broadcom 4312 WLAN found (core revision 15)
phy1: Selected rate control algorithm 'pid'
Broadcom 43xx driver loaded [ Features: PL, Firmware-ID: FW13 ]
b43 ssb0:0: firmware: requesting b43/ucode15.fw
b43 ssb0:0: firmware: requesting b43/lp0initvals15.fw
b43 ssb0:0: firmware: requesting b43/lp0bsinitvals15.fw
b43-phy1: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy1 ERROR: DMA RX reset timed out
b43-phy1 ERROR: DMA TX reset timed out

And when I do ifconfig wlan0 up:
SIOCSIFFLAGS: Unknown error 132


You're getting closer.
Comment 63 Matt Parnell 2010-05-07 01:10:57 UTC
Another development:

When I load from boot and use wicd, every time I scan the device is unloaded and reloaded, it appears. The LED cycles, etc...

Dmesg:
atl1c 0000:02:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
atl1c 0000:02:00.0: Unable to allocate MSI interrupt Error: -1
atl1c 0000:02:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
atl1c 0000:02:00.0: atl1c: eth0 NIC Link is Down
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
atl1c 0000:02:00.0: Unable to allocate MSI interrupt Error: -1
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
b43-phy0: Loading firmware version 478.104 (2008-07-01 00:50:23)
Comment 64 bugzillakernelorg 2010-05-07 06:48:02 UTC
alright, tried the patch; it "finds" the sprom (dmesg forthcoming), b43 doesn't work (networkmanager says "device not ready"), attempting to copy /sys/devices/pci*/*/*/ssb_sprom for inspection by Mr Buesch's ssb-sprom tool resulted in a hard freeze in the manner of the original freeze

it does say that it found it at 0x800
Comment 65 bugzillakernelorg 2010-05-07 07:00:42 UTC
spoke too soon _again_, after the freeze & reboot wifi comes up, performance is very poor however; posting this from the machine :]
Comment 66 bugzillakernelorg 2010-05-07 07:03:55 UTC
Created attachment 26270 [details]
dmesg with patch 26253 applied, ssb & b43 loaded, on AP
Comment 67 bugzillakernelorg 2010-05-07 07:09:57 UTC
oooh bonus; the rfkill light behaves properly now, too
Comment 68 bugzillakernelorg 2010-05-07 07:18:26 UTC
and further, i was able to copy the sprom

Reading input from "ssb_sprom"...
Raw input:  DE0168000187570600000000C0000A001400000080000F0047004700830164003009C0FC0000000000000000000001030003000002000200010004000400000002000000030002000E0047000028000001000500C0FCE005FFFFFFFF0A000101080000000000000039C3000000000000E7C60000010000000000C600F6504CB406001027010065190000000000000702000000001A04FA000A0E0B090E0200000500000000003F01FFFF0000FC03427B0200900700000000B80B00801404140400000000FA000000000000004C1DE204AA000000F40141000000000000000000DC9E00000000AC34000000000000030000000000000000000000000000001A0000000000F3089A00EC172B43233A6BC948190333A901EE1900000000EE19DE0FA100F844000000005800C1280000842800000C00104F0000000000000000000000000000000000000000000000000000000000000000000000000000000000004252434D5F544553545F535349440000000000000000000000000000000000003900000050000000C0FC00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Corrupt input data (crc: 0x89, expected: 0x00)

(from http://bu3sch.de/gitweb?p=b43-tools.git;a=summary)

sorry for the rapid fire bug messages; it also appears the "slowness" doesn't exist; it operates as normal, i had just turned on a bunch of debugging in the kernel and was seeing some unimportant messages and it seemed slow.
Comment 69 bugzillakernelorg 2010-06-08 05:38:07 UTC
hi; i know its been a while but i've only just got to use the netbook in some capacity with these fixes, the problem i'm having is networkmanager is saying "device not ready" for the wifi adapter; takes a bunch of reboots and trying to reload it to get it to work every once and a while.

i'm also getting "b43-phy warning: LEDs: Unknown behaviour" a bunch, with 0x5D, 0x67 and a few other values (they seem random) are dotting my dmesg around module init messages
Comment 70 bugzillakernelorg 2010-06-08 05:39:58 UTC
there are also a lot of "udev: renamed network interface to wlan0 to wlan5", it looks like its increasing

if i'm not mistaken isn't that tied to the mac address? which seems to indicate that its changing, investigating whether the sprom is really working appropriately is probably warranted
Comment 71 bugzillakernelorg 2010-06-08 06:42:14 UTC
agh, been using it since the last post; the packet loss or something is unreal because its nearly unusable
Comment 72 bugzillakernelorg 2010-06-15 03:33:57 UTC
the firmware only seems to load rarely, leading to "device not ready" and "cannot assign requested address" when operations are done on the if

it says its loading firmware version "478.104" 2008-07-01 00:50:23

looks like a whole new class of awesome; when the driver was ignorant of lpphy stuff insofar as just identifying it, it didn't have these problems
Comment 73 bugzillakernelorg 2010-06-15 03:39:17 UTC
just confirmed that the mac address is changing randomly whenever ssb is removed/reloaded
Comment 74 bugzillakernelorg 2010-06-15 04:15:38 UTC
as to #71, i replaced my router since i reported this bug originally; the new one let me set the phy power, i set it low; all the devices i have can reach it within the house, this as it is now can only reach it if its within 3 feet (rssi wrong?) hope that helps
Comment 75 Matt Parnell 2010-08-05 02:28:14 UTC
Any developments on this?

b43 still won't work, and I've tried on up to .35 rc....
Comment 76 Artem S. Tashkinov 2012-06-18 22:55:37 UTC
This often happens here too:

12:00.0 Network controller: Broadcom Corporation BCM4312 802.11b/g LP-PHY (rev 01)
        Subsystem: Dell Wireless 1397 WLAN Mini-Card
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at fbb00000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=2 PME-
        Capabilities: [58] Vendor Specific Information: Len=78 <?>
        Capabilities: [e8] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [d0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [13c v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number d3-64-a1-ff-ff-c8-70-f1
        Capabilities: [16c v1] Power Budgeting <?>
        Kernel driver in use: b43-pci-bridge

Laptop: Dell Vostro 3500-0240
Comment 77 Larry Finger 2012-06-19 00:50:58 UTC
Adding a comment to a nearly two-year old thread about an oops, and then not posting the dump does nothing. AFAIK, the other complaints were fixed log ago. In any case, I will not do anything based on this report.
Comment 78 Artem S. Tashkinov 2012-06-19 08:07:09 UTC
(In reply to comment #77)
> Adding a comment to a nearly two-year old thread about an oops, and then not
> posting the dump does nothing. AFAIK, the other complaints were fixed log ago.
> In any case, I will not do anything based on this report.

I'm sorry about that, here's a crash dump. If it's advisable I'll open a new bug report.

BUG: unable to handle kernel NULL pointer dereference at 0000004c
IP: [<c045ef4a>] drain_workqueue+0x1a/0x1b0
*pdpt = 0000000033d00001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP 
Modules linked in: vfat fat nf_conntrack_ipv4 ip6t_REJECT nf_defrag_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables b43(-) bcma mac80211 cfg80211 dell_wmi ssb uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media sparse_keymap iTCO_wdt iTCO_vendor_support intel_ips dell_laptop snd_hda_codec_hdmi r8169 i2c_i801 snd_hda_codec_idt rfkill coretemp dcdbas snd_hda_intel snd_hda_codec snd_hwdep snd_pcm mii mmc_core microcode snd_page_alloc snd_timer snd soundcore xts gf128mul dm_crypt crc32c_intel wmi usb_storage i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]

Pid: 856, comm: rmmod Not tainted 3.4.0-1.fc17.i686.PAE #1 Dell Inc. Vostro 3500/0G2R51
EIP: 0060:[<c045ef4a>] EFLAGS: 00010246 CPU: 3
EIP is at drain_workqueue+0x1a/0x1b0
EAX: c0ce4500 EBX: 00000000 ECX: 00000031 EDX: 00003131
ESI: 00000000 EDI: f6ec5540 EBP: f3d39e74 ESP: f3d39e54
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 0000004c CR3: 32795000 CR4: 000007f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process rmmod (pid: 856, ti=f3d38000 task=f26bf080 task.ti=f3d38000)
Stack:
 00000286 f3d39e6c 00000282 f452e498 00000282 f452e340 00000000 f6ec5540
 f3d39e88 c045f0f2 f452e340 f3c60000 f6ec5540 f3d39e9c f80f81a4 c045f40f
 f3d39e9c f452f180 f3d39eb0 f9a0f8c2 f4f3f864 f9a5609c f3ce5834 f3d39ebc
Call Trace:
 [<c045f0f2>] destroy_workqueue+0x12/0x120
 [<f80f81a4>] ieee80211_unregister_hw+0xd4/0x110 [mac80211]
 [<c045f40f>] ? cancel_work_sync+0xf/0x20
 [<f9a0f8c2>] b43_ssb_remove+0x92/0xa0 [b43]
 [<f80025b2>] ssb_device_remove+0x22/0x30 [ssb]
 [<c073476b>] __device_release_driver+0x5b/0xb0
 [<c0734f47>] driver_detach+0x87/0x90
 [<c0734363>] bus_remove_driver+0x73/0xe0
 [<c049dfb0>] ? show_refcnt+0x30/0x30
 [<c04ad319>] ? __stop_machine+0x99/0xd0
 [<c07353b9>] driver_unregister+0x49/0x80
 [<f80028b0>] ssb_driver_unregister+0x10/0x20 [ssb]
 [<f9a351c8>] b43_exit+0xd/0x23 [b43]
 [<c049f36a>] sys_delete_module+0x13a/0x2b0
 [<c055e4a3>] ? mntput_no_expire+0x23/0x100
 [<c04b4546>] ? __audit_syscall_exit+0x356/0x3b0
 [<c04b401c>] ? __audit_syscall_entry+0xbc/0x290
 [<c04b4546>] ? __audit_syscall_exit+0x356/0x3b0
 [<c094fedf>] sysenter_do_call+0x12/0x28
Code: fc ff ff 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5 57 56 53 83 ec 14 3e 8d 74 26 00 89 c3 b8 00 45 ce c0 e8 e6 99 4e 00 <8b> 43 4c 8d 50 01 85 c0 89 53 4c 75 03 83 0b 40 80 05 00 45 ce 
EIP: [<c045ef4a>] drain_workqueue+0x1a/0x1b0 SS:ESP 0068:f3d39e54
CR2: 000000000000004c
---[ end trace 5e16a8b8872b2956 ]---
Comment 79 Alan 2012-07-11 15:21:36 UTC
Please open a new report as directed
Comment 80 Artem S. Tashkinov 2012-07-11 15:44:31 UTC
(In reply to comment #79)
> Please open a new report as directed

I will probably do that but only in a distant future, since this laptop is no longer in my possession.

Note You need to log in before you can comment on or make changes to this bug.