Bug 14538
Summary: | Unable to associate with AP after resume since 2.6.32-rc6 | ||
---|---|---|---|
Product: | Networking | Reporter: | Christian Casteyde (casteyde.christian) |
Component: | Wireless | Assignee: | Larry Finger (Larry.Finger) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | Larry.Finger, linville, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32-rc6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 14230 | ||
Attachments: |
git bisect log
Possible patch Patch to log the ssb core scan results Patch to log information at resume time dmesg output after boot, with 3 patches applied dmesg output after resume, with 3 patches applied Test patch |
Description
Christian Casteyde
2009-11-03 22:07:36 UTC
What adapter? Is it PCI, PCMCIA or USB? Also, it should be relatively easy to find the commit that introduce the issue by bisection. oops, sorry. It's a PCI b43 adapter of an Aspire 1511 Lmi laptop. But I'm confident it's not the b43 driver since I was the tester of the commit done on this driver, and this commit worked on a 2.6.32-rc kernel source base. re-oops, the commit worked on -rc5 OK, thanks for the info. #define ERFKILL 132 /* Operation not possible due to RF-kill */ Did you happen to change your rfkill switch during suspend/resume cycle? No, just doing : closing the lid -> acpi event that start a script that does: rc.inet1 eth1_stop echo mem > /sys/power/state opening the lid -> acpi event that resumes the script here: rc.inet1 eth1_start ** this fails** then try to ping (fails), restart the script, etc. fails. rc.inet1 script does basically wpa/iwconfig + dhcp on slack What sort of laptop is it? Are you able to do the git bisect between 2.6.32-rc5 and 2.6.32-rc6? The laptop is an Acer Aspire 1511Lmi, quite old but very good to get suspend/resume and wireless problems :-) I'm bisecting, still 7 reboots to do. Finally, I didn't managed to bisect, because at the end the suspend to ram fails systematically. However, I narrowed the commits to the following logs appended (git bisect log). Created attachment 23656 [details] git bisect log I finally gave up after too many test failures (skip = the computer freezes when suspending). Another interesting point is that each time it fails, I don't get the warning reported in http://bugzilla.kernel.org/show_bug.cgi?id=13987 so this may well be a RF kill pb. Well, since it was apparently related to rfkill, I reverted the most probable patch on top of vanilla 2.6.32-rc6, that is: --- linux-2.6.32-rc5/drivers/net/wireless/b43/rfkill.c 2009-11-03 19:48:11.805090636 +0000 +++ linux-2.6.32-rc6/drivers/net/wireless/b43/rfkill.c 2009-11-03 19:48:17.955464847 +0000 @@ -33,7 +33,8 @@ & B43_MMIO_RADIO_HWENABLED_HI_MASK)) return 1; } else { - if (b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) + if (b43_status(dev) >= B43_STAT_STARTED && + b43_read16(dev, B43_MMIO_RADIO_HWENABLED_LO) & B43_MMIO_RADIO_HWENABLED_LO_MASK) return 1; } and indeed it works. So this is this commit that broke RF kill on my laptop. That is, if I use: if (/* b43_status(dev) >= B43_STAT_STARTED &&*/ the problem does not appear anymore. I've also seen that patch: --- linux-2.6.32-rc5/drivers/net/wireless/b43/main.c 2009-11-03 19:48:11.801464075 +0000 +++ linux-2.6.32-rc6/drivers/net/wireless/b43/main.c 2009-11-03 19:48:17.952088831 +0000 @@ -4501,7 +4501,6 @@ cancel_work_sync(&(wl->beacon_update_trigger)); - wiphy_rfkill_stop_polling(hw->wiphy); mutex_lock(&wl->mutex); if (b43_status(dev) >= B43_STAT_STARTED) { dev = b43_wireless_core_stop(dev); that could be involved, but I didn't tested the 4 combinations of these patches. Please note that the b43 commit fixes http://bugzilla.kernel.org/show_bug.cgi?id=14277 which I tested successfully, but the proposed patch only contained bounce buffer fix, not RFkill stuff. Maybe a partial revert should be done (I don't know if the RF kill bug sleeped in the bounce buffer fix, didn't checked the commit numbers). Forget what I said about http://bugzilla.kernel.org/show_bug.cgi?id=13987 in #10, since apparently the NMI occurs only at association or nearby, but as RFkill prevents association, I cannot see this other bug while this one is there. Still present in 2.6.32-rc7, and commenting out "b43_status(dev) >= B43_STAT_STARTED" still solves the problem. I've found the commit that triggers the problem (no problem before, and reverting it solves the problem). It's: d50bae33d1358b909ade05ae121d83d3a60ab63f Beware that reverting it would reopen bug #14181 as indicated in the comment. So both fix are broken indeed. Caused by: commit d50bae33d1358b909ade05ae121d83d3a60ab63f Author: Larry Finger <Larry.Finger@lwfinger.net> Date: Fri Oct 16 10:18:09 2009 -0500 b43: Fix Bugzilla #14181 and the bug from the previous 'fix' Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> (Christian, please also add the subject of the commit and ideally the author to the report in future, thanks). First-Bad-Commit : d50bae33d1358b909ade05ae121d83d3a60ab63f What are the details of the Broadcom device? Please show the results from dmesg | egreb "b43|ssb" Created attachment 23851 [details]
Possible patch
Please test this patch. It appears that there was/is a bug in our specs.
The proposed patch on -rc8 fails. The output of dmesg is appended below: christian@athor:~$ dmesg | egrep "b43|ssb" b43-pci-bridge 0000:02:08.0: PCI INT A -> Link[LNK4] -> GSI 19 (level, low) -> IRQ 19 ssb: Sonics Silicon Backplane found on PCI device 0000:02:08.0 b43-phy0: Broadcom 4306 WLAN found (core revision 5) b43 ssb0:0: firmware: requesting b43/ucode5.fw b43 ssb0:0: firmware: requesting b43/pcm5.fw b43 ssb0:0: firmware: requesting b43/b0g0initvals5.fw b43 ssb0:0: firmware: requesting b43/b0g0bsinitvals5.fw b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10) b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10) Created attachment 23856 [details]
Patch to log the ssb core scan results
Please add this patch and then resubmit the results of
dmesg | egrep "b43|ssb"
Created attachment 23859 [details]
Patch to log information at resume time
There is something that I don't understand. You say that if you eliminate the b43_status(dev) >= B43_STAT_STARTED test, then it works. On my system, however, when this routine is entered, the value of b43_status(dev) is 2, which is the value for B43)STAT_STARTED.
Please add this patch, which will print the value of b43_status(dev), and send the dmesg output from the "ACPI: Waking up from system sleep state S4" point.
OK, I've patched the kernel with the 3 patches from comment #16, #18 and #19, and rebooted. The first attached dmesg output is just after boot. The second one is after resume. In this case the status is not 2 anymore, but 0. Created attachment 23868 [details]
dmesg output after boot, with 3 patches applied
Created attachment 23869 [details]
dmesg output after resume, with 3 patches applied
Thanks for testing. Based on your findings, the status of 0 makes sense, I just don't know why. That is the real bug here. FYI, the code change to test for status >= 2 is needed as some architectures will fault and crash the system if one attempts to read a register when the interface is in the state indicated by status of 0 or 1. I'll let you know when I have another patch for testing. Created attachment 23876 [details]
Test patch
Please try this patch. It is a bandaid rather than a fix, and there will likely be resistance to including it; however, I want to know if your system works after including it. Note: This is the only patch that should be included. The others have been deleted.
This patch works more than expected. When applied on 2.6.32-rc8, I not only can connect to the network at resume, but it also seems to fix the NMI regression I reported in http://bugzilla.kernel.org/show_bug.cgi?id=13987 I do not know why it works at all (whereas reverting commit mentionned in #14 do not suffice to suppress the NMI), but it works: I made several suspend/resume in a row and I never managed to get the NMI anymore. So for me this fixes both bugs, at least it lets me use my network after resume, and seems to prevent code execution that would trigger the NMI also. Update: Still present in 2.6.12, and the proposed patch still fixes it (and http://bugzilla.kernel.org/show_bug.cgi?id=13987). Yes, sorry...it (or rather it's successor) didn't get pulled in time for 2.6.32... Update : Still present in 2.6.32.2, and the patch still fixes it. Btw, 2.6.32.2 has the patch for b43 legacy, but not for current b43 :-) Seems to be integrated in 2.6.33-rc1, I will test it soon. I missed the Cc for atable in the b43 patch, but included it in b43legacy. :) A note has been sent to GregKH and stable. The patch should be in 2.6.32.3 or .4. Is it the patch from comment #25 or another one? On 12/29/2009 10:24 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14538 > > > > > > --- Comment #30 from Rafael J. Wysocki <rjw@sisk.pl> 2009-12-29 16:24:53 --- > Is it the patch from comment #25 or another one? It is actually the patch in mainline commit c2ff581acab16c6af56d9e8c1a579bf041ec00b1. The code does the same things, but was rearranged to make it clearer and some comments have been added. Larry OK, so I'm closing this as fixed in the mainline and please make sure it appears in -stable. |