Kernel Bug Tracker – Bug 13223
compat-wireless: Frequent Probe Response timeouts and disassociations
Last modified: 2009-08-02 20:10:21 UTC
Current mainline with the ath9k from compat-wireless causes frequent disconnects due to probe response timeouts. Mainline sans the commit 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 was the last correctly working kernel version. Fix for 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 is known and expected to land in mainline shortly but the fix already exists in compat-wireless, testing of which uncovered this separate bug -
[ 340.966352] ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
[ 351.852009] wlan1: no IPv6 routers present
[ 376.916017] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating
[ 377.765211] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[ 377.767276] wlan1: authenticated
[ 377.767281] wlan1: associate with AP 00:1e:e5:6a:2c:c2
[ 377.771343] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)
[ 377.771346] wlan1: associated
[ 412.704016] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating
[ 413.549010] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[ 413.551662] wlan1: authenticated
[ 413.551667] wlan1: associate with AP 00:1e:e5:6a:2c:c2
[ 413.555733] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)
[ 413.555736] wlan1: associated
[ 663.008267] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating
[ 663.848934] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[ 663.853677] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[ 663.855729] wlan1: authenticated
[ 663.855731] wlan1: associate with AP 00:1e:e5:6a:2c:c2
[ 663.862015] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)
[ 663.862020] wlan1: associated
[ 698.284016] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating
01:04.0 Network controller: Atheros Communications Inc. AR5008 Wireless Network Adapter (rev 01)
Subsystem: D-Link System Inc Device 3a6d
Flags: bus master, 66MHz, medium devsel, latency 168, IRQ 16
Memory at f0200000 (32-bit, non-prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: ath9k
Kernel modules: ath9k
Some other useful dmesg stuff -
[ 325.446143] ath9k 0000:01:04.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 325.874899] ath: Country alpha2 being used: CO
[ 325.874902] ath: Regpair detected: 0x10
[ 325.876676] phy0: Selected rate control algorithm 'ath9k_rate_control'
[ 325.877598] cfg80211: Calling CRDA for country: CO
[ 325.877629] Registered led device: ath9k-phy0::radio
[ 325.877644] Registered led device: ath9k-phy0::assoc
[ 325.877657] Registered led device: ath9k-phy0::tx
[ 325.877671] Registered led device: ath9k-phy0::rx
[ 325.877679] phy0: Atheros AR5416 MAC/BB Rev:2 AR2133 RF Rev:81: mem=0xffffc20010180000, irq=16
[ 325.879227] udev: renamed network interface wlan0 to wlan1
[ 325.880202] cfg80211: Regulatory domain changed to country: CO
[ 325.880204] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[ 325.880207] (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
[ 325.880209] (5170000 KHz - 5250000 KHz @ 20000 KHz), (300 mBi, 1700 mBm)
[ 325.880211] (5250000 KHz - 5330000 KHz @ 20000 KHz), (300 mBi, 2300 mBm)
[ 325.880213] (5735000 KHz - 5835000 KHz @ 20000 KHz), (300 mBi, 3000 mBm)
Is that only compat? I thought previously you said it affected mainline (with txpower issue fixed) too (btw., instead of reverting the patch you can just run "iwconfig wlan0 txpower 30dBm")?
I'm also not sure how frequent this is ... can you see a pattern in there with more information maybe? I don't see a pattern so it could very well be a microwave oven or something, unfortunately.
1) Without reverting 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8, mainline is affected by Probe Timeouts - i.e. no connection at all. In other words not by this issue
2) Mainline, after Reverting 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 works perfectly fine - no issues at all
3) Compat-Wireless which has a fix to 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 is affected by this bug which is Probe "Response" Timeouts - which translates into "connects for a few seconds and disassociates"
I will try mainline without reverting 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 but setting txpower to 30dBM and see. (It would only make sense to try if the fix for txpower still not in mainline?)
And since mainline without 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 works perfectly fine for the exact same setup, unless my microwave is hidden from sight and decides to only impact compat-wireless but forgive mainline, its not the microwave ;)
Oh and I forgot to add this bug is 100% reproducible - i.e. never works. So no question of pattern I suppose.
Ok, thanks. Yes, setting txpower would only apply to mainline without the fix or reverted patch, but it doesn't matter much. Was just suggesting that in case you wanted to avoid recompiles.
Good to know mainline works though, so we can focus on the newer code.
Well it's not about a pattern to "does it work", I was thinking of a pattern in the time stream where it disassociates. Sometimes it takes 30 seconds between one association and the next disassoc, sometimes it takes a couple of minutes, according to your log.
I would appreciate if you could test, instead of compat, the wireless-testing git tree: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git
If you can reproduce the problem there, a bisect would be very useful, but then you need to remember to set txpower.
If you do that, commit ee8718c104baee36a8bfa56d8b8e61b9e8bc1da0 in the tree is identical to mainline -rc4 (except for the localversion-wireless file), and it would be extremely useful to have a bisect between that (you said mainline works) and the head of wireless-testing (which should not work, like compat).
With today's compat-wireless it lasted a bit longer - Today's mainline git (no reverts/customizations just plain main line) continues to be fine.
[ 772.243160] ADDRCONF(NETDEV_UP): wlan1: link is not ready
[ 783.021629] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[ 783.023786] wlan1: authenticated
[ 783.023788] wlan1: associate with AP 00:1e:e5:6a:2c:c2
[ 783.027811] wlan1: RX AssocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)
[ 783.027815] wlan1: associated
[ 783.034656] ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
[ 793.676259] wlan1: no IPv6 routers present
[ 1112.132021] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating
[ 1112.973339] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[ 1112.976121] wlan1: authenticated
[ 1112.976127] wlan1: associate with AP 00:1e:e5:6a:2c:c2
[ 1112.980190] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)
[ 1112.980192] wlan1: associated
Rafael, are you sure you want to track a regression in experimental code? He just said "Today's mainline git (no reverts/customizations just plain main line) continues to be fine."
I just saw the same probe response timeouts in mainline albeit after long time - no kidding. I am pretty sure I never saw those with previous mainline (i.e. prior to 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 ).
I will do some more testing with current mainline for a day or two and see. But for now I am OK if this is not tracked as a regression in mainline. We can do that later if I see consistent problems. Thanks.
If you just occasionally see it that's nothing unexpected really. Your AP might be flaky or something might be causing enough disturbance in the spectrum that too many frames get lost, etc. Normally you wouldn't even notice if you aren't specifically looking for that in the logs.
Ok, as of now, there is no connectivity issue, so as long as that remains the case I will ignore the mainline timeouts and continue on testing wireless.git as you mentioned earlier.
I tried 2.6.30_rc5-git1 in hope this was fixed, Bug is still present.
I tried wireless-testing.git(bf2c6a38af6095ee1116278f240ccb9c65c6337d), Bug is still present. So I bisected it and found
576f8ba3cc2dcb8ee593548990bde4f8e4d3707e is first bad commit
Author: Kalle Valo <firstname.lastname@example.org>
Date: Tue Feb 10 17:09:31 2009 +0200
mac80211: use ps-poll when dynamic power save mode is disabled
When a directed tim bit is set, mac80211 currently disables power save
ands sends a null frame to the AP. But if dynamic power save is
disabled, mac80211 will not enable power save ever gain. Fix this by
adding ps-poll functionality to mac80211. When a directed tim bit is
set, mac80211 sends a ps-poll frame to the AP and checks for the more
data bit in the returned data frames.
Using ps-poll is slower than waking up with null frame, but it's saves more
power in cases where the traffic is low. Userspace can control if either
ps-poll or null wakeup method is used by enabling and disabling dynamic
Signed-off-by: Kalle Valo <email@example.com>
Signed-off-by: John W. Linville <firstname.lastname@example.org>
:040000 040000 9b208f02227687e2c918a49a3f2c123c022346a5 c69d5b217810a975923db9f58e91dd3239de0ae0 M net
02:00.0 Network controller: Atheros Communications Inc. AR5008 Wireless Network Adapter (rev 01)
Physical Slot: 2
Flags: bus master, fast devsel, latency 0, IRQ 17
Memory at 90100000 (64-bit, non-prefetchable) [size=64K]
Capabilities:  Power Management version 2
Capabilities:  MSI: Mask- 64bit- Count=1/1 Enable-
Capabilities:  Express Legacy Endpoint, MSI 00
Capabilities:  MSI-X: Enable- Mask- TabSize=1
Capabilities:  Advanced Error Reporting
Capabilities:  Virtual Channel <?>
Kernel driver in use: ath9k
I hope someone can fix this now :)
This is the same as comment #0 for me
Good example for how bisect will totally go wrong when you can't really 100% reproduce the issue...
This doesn't really seem to make sense as the problem.
Please revert the commit and confirm that doing so solves it... And don't just try once, wireless connections aren't always 100% reproducible.
So the mainline is not affected, correct? Closing, please reopen if this is not the case.