Bug 24402
Summary: | rtl8187 stop searching for network after suspend | ||
---|---|---|---|
Product: | Drivers | Reporter: | David Heidelberg (david) |
Component: | network-wireless | Assignee: | drivers_network-wireless (drivers_network-wireless) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | david, herton, htl10, johannes, Larry.Finger, linville |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 2.6.37-rc4-git | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
Proposed patch for driver rtl8187 part_of_new_dmesg Revised patch for rtl8187 dmesg_new Third version of patch for rtl8187 4th_dmesg Fourth patch version 5th_dmesg Avoid calling ieee80211_work_work when not needed |
Does ifconfig down, modprobe -r, modprobe, ifconfig up, or ifconfig down/up or modprobe -r/modprobe ? reset the device to a usable state? and or modprobe -r/modprobe mac80211 ? The problem is not *after suspend*, but *before* suspend: [ 1914.662031] WARNING: at net/mac80211/work.c:884 ieee80211_work_work+0x363/0x1240() [ 1914.662033] Hardware name: Satellite A210 [ 1914.662035] work scheduled while going to suspend [ 1914.662038] Pid: 3971, comm: kworker/u:1 Not tainted 2.6.37-rc4-00153-g59e57c6-dirty #5 [ 1914.662041] Call Trace: [ 1914.662046] [<ffffffff815ba9a3>] ? ieee80211_work_work+0x363/0x1240 ... [ 1914.662108] [<ffffffff815ada16>] ? ieee80211_unregister_hw+0x46/0x100 [ 1914.662112] [<ffffffff815e0fa7>] ? rtl8187_disconnect+0x30/0x6f [ 1914.662117] [<ffffffff813dd956>] ? usb_unbind_interface+0x56/0x150 ... Created attachment 39212 [details]
Proposed patch for driver rtl8187
Good catch Hin-Tak.
Please try this patch. I cannot test as my laptop does not suspend.
Patch tested, but it doesn't helped. Behaviour same, including warning. (In reply to comment #3) > Patch tested, but it doesn't helped. Behaviour same, including warning. A new dmesg? The oops should be at a different place. Also the problem *before* suspend seems to be related to an active scan - does it helps if you rfkill the device (i.e. slide the radio switch underneath the front of the keyboard) or ifconfig down it before suspend? Where is the new dmesg? when wifi rfkill is off while suspending, it work Created attachment 39232 [details]
part_of_new_dmesg
Created attachment 39292 [details]
Revised patch for rtl8187
Please try this patch. It appears that the work queue must be stopped.
Created attachment 39322 [details]
dmesg_new
After applying patch, still failing
Created attachment 39352 [details]
Third version of patch for rtl8187
This patch flushes the work queue.
Created attachment 39612 [details]
4th_dmesg
I'm afraid it didn't helped
I noticed something on the dmesg attached: ieee80211 phy0: hwaddr 00:c0:a8:ff:54:49, RTL8187BvE V0 + rtl8225z2, rfkill mask 2 In fact it is a B device, so why the patches don't work, because of the "if (!priv->is_rtl8187b)" check. It's strange the warning is happening now, I guess some code changed in mac80211 and may be triggering this, I don't know. I'll try to get access to an 8187b again too and try to reproduce it. okias, can you comment the if check that the last patch adds in rtl8187_dev.c and test again? The code on rtl8187_disconnect function should look like this after commenting: ... priv = dev->priv; /*if (!priv->is_rtl8187b) {*/ ieee80211_stop_queues(dev); flush_delayed_work(&priv->work); /*}*/ ... Created attachment 39762 [details]
Fourth patch version
The previous suggestion will certainly crash as the workqueue is not initialized for an RTL8187B.
We do, however, need to stop the ieee80211 queues for both kinds of devices. Try the newest version of the patch.
Created attachment 39852 [details]
5th_dmesg
Failed again
(In reply to comment #13) > The previous suggestion will certainly crash as the workqueue is not > initialized for an RTL8187B. Indeed, I forgot that priv->work is only used with rtl8187l But what bothers me is why this is happening now, we don't have any relevant changes in 2.6.36..2.6.37-rc4-git in rtl8187. Also this last patch also failed too. Looking at mac80211 code and rtl8187 code again, I think we have another problem, not in rtl8187. I don't see how rtl8187 is queuing work in this case which triggers the warning, I think it comes from somewhere else: __ieee80211_suspend sets local->suspended = true, but before that, it already calls ieee80211_stop_queues (in fact it calls ieee80211_stop_queues_by_reason(hw, IEEE80211_QUEUE_STOP_REASON_SUSPEND), which is the same as we calling ieee80211_stop_queues from the driver, only reason is different). Also before setting "local->suspended = true", __ieee80211_suspend also calls the stop callback in the driver: ... if (local->open_count) ieee80211_stop_device(local); ... In our stop callback, we cancel all delayed work in priv->work of rtl8187. Besides this, all work we schedule is the led work, but in rtl8187_disconnect, we call rtl8187_leds_exit before ieee80211_unregister_hw (which triggers the warning). Also, the last led command should come before "local->suspended = true", when led is set to off in ieee80211_stop_device, and workqueue is flushed there, so shouldn't cause issues. Anyway, if we were scheduling work after "local->suspended = true" in mac80211, we would be triggering the warning before when calling ieee80211_queue_delayed_work. Herton makes some really good points, which makes me wonder if the bug really is in rtl8187. You did not list this as a regression. Have you tested the standard kernel for your distribution? I guess it worked in 2.6.35 or 2.6.36, but not sure. I updating kernel from git, so I don't know for sure. I could try bisect between some work revision, but it can takes lot of hour rebooting and testing. I think this problem should happen since linux 2.6.34, after the commits: commit af6b637 "mac80211: generalise work handling" + commit b8bc4b0 "mac80211: support remain-on-channel command" The first one added the ieee80211_work_purge function calling it on ieee80211_stop, and the second is what would appear in a bisect, it's the one which should be causing the warning, because it adds the ieee80211_work_work on ieee80211_work_purge (ieee80211_work_work checks the local->suspended and does the warning). Because rtl8187 doesn't have suspend/resume hooks, usb core on suspend removes rtl8187 after mac80211 is suspended, calling rtl8187_disconnect. But on rtl8187_disconnect, we call ieee80211_unregister_hw, which calls dev_close, which calls ieee80211_stop, and in the end calls ieee80211_work_purge->ieee80211_work_work) In my mind there are two fixes we can do: - We add suspend/resume hooks on rtl8187, or - Do something on mac80211 to avoid situation above, to support drivers without suspend/resume hooks. Created attachment 39892 [details]
Avoid calling ieee80211_work_work when not needed
okias, please check this patch.
For example, zd1211rw is another case which could have this problem too, as rtl8187 it doesn't have suspend/resume hooks. So seems sensible to make a fix in mac80211.
Base problem fixed by last patch. After suspend NM wasn't able to work propertly, only after RFkill wifi and reload, it start work. But it's success :-) Ok, so really the warning was bogus at least (no work in work_list, as I would expect after mac80211 is suspended). And when there is really work queued on suspend, the warning will trigger. About you need to rfkill after suspend, may be network-manager is confused about removal of device by usb stack. Johannes, can you check if the patch on comment #19 for mac80211 is ok? If it is, I'll submit to linux-wireless. Yes, that seems reasonable. I guess the only change is actually killing the warning in this case. No, with that patch I'm able use wifi even after suspend. Without patch it doesn't work at all. (In reply to comment #23) > No, with that patch I'm able use wifi even after suspend. Without patch it > doesn't work at all. There is still the mis-understanding - the problem is not 'after suspend' (waking up), but "before suspend" (going to sleep).- the driver does not go to sleep properly, and therefore does not work later. As for network manager - it usually takes say, 30 seconds between waking up to NM re-establishing association so you may need to wait a little; but either way, there should be some log entries in syslog (/var/log/message or thereabouts) about NetworkManager detecting device state changes, etc. "wasn't able to work propertly" is just a bit vague. - the result of 'grep NetworkManager /var/log/messages' or 'grep -10 NetworkManager /var/log/messages' (include 10 lines before and after any NetworkManager entries) should be informative (but is really outside the scope of kernel bugzilla, really). Well, the patch really only removes the wrong warning, I don't expect it change behaviour (well, only difference is kernel is not 'warning tainted' anymore), as without it ieee80211_work_work runs but exit early anyway after the WARN, so even without the patch I would expect it not breaking anything. If even with the patch the warning was still being printed, then we would have something broken, but was not the case, just the warning was bogus. |
Created attachment 39122 [details] dmesg Laptop Toshiba A210-15j, 64bit gentoo. Suspend done by "hibernate-ram" Dmesg contain probably related warning. Bus 001 Device 002: ID 0bda:8197 Realtek Semiconductor Corp. RTL8187B Wireless Adapter