Bug 13223 - compat-wireless: Frequent Probe Response timeouts and disassociations
Summary: compat-wireless: Frequent Probe Response timeouts and disassociations
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 13070
  Show dependency tree
 
Reported: 2009-05-02 18:08 UTC by paragw
Modified: 2009-08-02 20:10 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.30-rc3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description paragw 2009-05-02 18:08:10 UTC
Current mainline with the ath9k from compat-wireless causes frequent disconnects due to probe response timeouts. Mainline sans the commit 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 was the last correctly working kernel version. Fix for 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 is known and expected to land in mainline shortly but the fix already exists in compat-wireless, testing of which uncovered this separate bug -

[  340.966352] ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready                                                                                 
[  351.852009] wlan1: no IPv6 routers present                                                                                                     
[  376.916017] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating                                                                
[  377.765211] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2                                                                                      
[  377.767276] wlan1: authenticated                                                                                                               
[  377.767281] wlan1: associate with AP 00:1e:e5:6a:2c:c2                                                                                         
[  377.771343] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)                                                          
[  377.771346] wlan1: associated                                                                                                                  
[  412.704016] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating                                                                
[  413.549010] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2                                                                                      
[  413.551662] wlan1: authenticated                                                                                                               
[  413.551667] wlan1: associate with AP 00:1e:e5:6a:2c:c2                                                                                         
[  413.555733] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)                                                          
[  413.555736] wlan1: associated                                                                                                                  
[  663.008267] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating                                                                
[  663.848934] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2                                                                                      
[  663.853677] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2                                                                                      
[  663.855729] wlan1: authenticated                                                                                                               
[  663.855731] wlan1: associate with AP 00:1e:e5:6a:2c:c2                                                                                         
[  663.862015] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)                                                          
[  663.862020] wlan1: associated                                                                                                                  
[  698.284016] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating
Comment 1 paragw 2009-05-02 18:12:26 UTC
lspci -v 
01:04.0 Network controller: Atheros Communications Inc. AR5008 Wireless Network Adapter (rev 01)
        Subsystem: D-Link System Inc Device 3a6d                                                
        Flags: bus master, 66MHz, medium devsel, latency 168, IRQ 16                            
        Memory at f0200000 (32-bit, non-prefetchable) [size=64K]                                
        Capabilities: <access denied>                                                           
        Kernel driver in use: ath9k                                                             
        Kernel modules: ath9k 

Some other useful dmesg stuff -

[  325.446143] ath9k 0000:01:04.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16                                                                     
[  325.874899] ath: Country alpha2 being used: CO                                                                                                 
[  325.874902] ath: Regpair detected: 0x10                                                                                                        
[  325.876676] phy0: Selected rate control algorithm 'ath9k_rate_control'                                                                         
[  325.877598] cfg80211: Calling CRDA for country: CO                                                                                             
[  325.877629] Registered led device: ath9k-phy0::radio                                                                                           
[  325.877644] Registered led device: ath9k-phy0::assoc                                                                                           
[  325.877657] Registered led device: ath9k-phy0::tx                                                                                              
[  325.877671] Registered led device: ath9k-phy0::rx                                                                                              
[  325.877679] phy0: Atheros AR5416 MAC/BB Rev:2 AR2133 RF Rev:81: mem=0xffffc20010180000, irq=16                                                 
[  325.879227] udev: renamed network interface wlan0 to wlan1                                                                                     
[  325.880202] cfg80211: Regulatory domain changed to country: CO                                                                                 
[  325.880204]  (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)                                                                 
[  325.880207]  (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)                                                                      
[  325.880209]  (5170000 KHz - 5250000 KHz @ 20000 KHz), (300 mBi, 1700 mBm)                                                                      
[  325.880211]  (5250000 KHz - 5330000 KHz @ 20000 KHz), (300 mBi, 2300 mBm)                                                                      
[  325.880213]  (5735000 KHz - 5835000 KHz @ 20000 KHz), (300 mBi, 3000 mBm)
Comment 2 Johannes Berg 2009-05-04 16:50:09 UTC
Is that only compat? I thought previously you said it affected mainline (with txpower issue fixed) too (btw., instead of reverting the patch you can just run "iwconfig wlan0 txpower 30dBm")?

I'm also not sure how frequent this is ... can you see a pattern in there with more information maybe? I don't see a pattern so it could very well be a microwave oven or something, unfortunately.
Comment 3 paragw 2009-05-04 21:49:59 UTC
1) Without reverting 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8, mainline is affected by Probe Timeouts - i.e. no connection at all. In other words not by this issue
2) Mainline, after Reverting 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 works perfectly fine - no issues at all
3) Compat-Wireless which has a fix to 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 is affected by this bug which is Probe "Response" Timeouts - which translates into "connects for a few seconds and disassociates"

I will try mainline without reverting 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8  but setting txpower to 30dBM and see. (It would only make sense to try if the fix for txpower still not in mainline?)

And since mainline without 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8  works perfectly fine for the exact same setup, unless my microwave is hidden from sight and decides to only impact compat-wireless but forgive mainline, its not the microwave ;)
Comment 4 paragw 2009-05-04 21:52:00 UTC
Oh and I forgot to add this bug is 100% reproducible - i.e. never works. So no question of pattern I suppose.
Comment 5 Johannes Berg 2009-05-04 22:03:54 UTC
Ok, thanks. Yes, setting txpower would only apply to mainline without the fix or reverted patch, but it doesn't matter much. Was just suggesting that in case you wanted to avoid recompiles.

Good to know mainline works though, so we can focus on the newer code.

Well it's not about a pattern to "does it work", I was thinking of a pattern in the time stream where it disassociates. Sometimes it takes 30 seconds between one association and the next disassoc, sometimes it takes a couple of minutes, according to your log.
Comment 6 Johannes Berg 2009-05-04 22:24:03 UTC
I would appreciate if you could test, instead of compat, the wireless-testing git tree: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git

If you can reproduce the problem there, a bisect would be very useful, but then you need to remember to set txpower.

If you do that, commit ee8718c104baee36a8bfa56d8b8e61b9e8bc1da0 in the tree is identical to mainline -rc4 (except for the localversion-wireless file), and it would be extremely useful to have a bisect between that (you said mainline works) and the head of wireless-testing (which should not work, like compat).
Comment 7 paragw 2009-05-04 22:40:10 UTC
With today's compat-wireless it lasted a bit longer - Today's mainline git (no reverts/customizations just plain main line) continues to be fine.

[  772.243160] ADDRCONF(NETDEV_UP): wlan1: link is not ready
[  783.021629] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[  783.023786] wlan1: authenticated
[  783.023788] wlan1: associate with AP 00:1e:e5:6a:2c:c2
[  783.027811] wlan1: RX AssocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)
[  783.027815] wlan1: associated
[  783.034656] ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
[  793.676259] wlan1: no IPv6 routers present
[ 1112.132021] wlan1: no probe response from AP 00:1e:e5:6a:2c:c2 - disassociating
[ 1112.973339] wlan1: authenticate with AP 00:1e:e5:6a:2c:c2
[ 1112.976121] wlan1: authenticated
[ 1112.976127] wlan1: associate with AP 00:1e:e5:6a:2c:c2
[ 1112.980190] wlan1: RX ReassocResp from 00:1e:e5:6a:2c:c2 (capab=0x411 status=0 aid=1)
[ 1112.980192] wlan1: associated
Comment 8 Johannes Berg 2009-05-05 21:56:17 UTC
Rafael, are you sure you want to track a regression in experimental code? He just said "Today's mainline git (no reverts/customizations just plain main line) continues to be fine."
Comment 9 paragw 2009-05-05 23:03:56 UTC
I just saw the same probe response timeouts in mainline albeit after long time - no kidding. I am pretty sure I never saw those with previous mainline (i.e. prior to 47afbaf5af9454a7a1a64591e20cbfcc27ca67a8 ). 

I will do some more testing with current mainline for a day or two and see. But for now I am OK if this is not tracked as a regression in mainline. We can do that later if I see consistent problems. Thanks.
Comment 10 Johannes Berg 2009-05-05 23:13:01 UTC
If you just occasionally see it that's nothing unexpected really. Your AP might be flaky or something might be causing enough disturbance in the spectrum that too many frames get lost, etc. Normally you wouldn't even notice if you aren't specifically looking for that in the logs.
Comment 11 paragw 2009-05-06 00:28:59 UTC
Ok, as of now, there is no connectivity issue, so as long as that remains the case I will ignore the mainline timeouts and continue on testing wireless.git as you mentioned earlier.
Comment 12 Bernd Buschinski 2009-05-12 19:46:23 UTC
I tried 2.6.30_rc5-git1 in hope this was fixed, Bug is still present.
I tried wireless-testing.git(bf2c6a38af6095ee1116278f240ccb9c65c6337d), Bug is still present. So I bisected it and found


576f8ba3cc2dcb8ee593548990bde4f8e4d3707e is first bad commit
commit 576f8ba3cc2dcb8ee593548990bde4f8e4d3707e
Author: Kalle Valo <kalle.valo@nokia.com>
Date:   Tue Feb 10 17:09:31 2009 +0200

    mac80211: use ps-poll when dynamic power save mode is disabled

    When a directed tim bit is set, mac80211 currently disables power save
    ands sends a null frame to the AP. But if dynamic power save is
    disabled, mac80211 will not enable power save ever gain. Fix this by
    adding ps-poll functionality to mac80211. When a directed tim bit is
    set, mac80211 sends a ps-poll frame to the AP and checks for the more
    data bit in the returned data frames.

    Using ps-poll is slower than waking up with null frame, but it's saves more
    power in cases where the traffic is low. Userspace can control if either
    ps-poll or null wakeup method is used by enabling and disabling dynamic
    power save.

    Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

:040000 040000 9b208f02227687e2c918a49a3f2c123c022346a5 c69d5b217810a975923db9f58e91dd3239de0ae0 M      net


02:00.0 Network controller: Atheros Communications Inc. AR5008 Wireless Network Adapter (rev 01)
        Physical Slot: 2
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at 90100000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 2
        Capabilities: [50] MSI: Mask- 64bit- Count=1/1 Enable-
        Capabilities: [60] Express Legacy Endpoint, MSI 00
        Capabilities: [90] MSI-X: Enable- Mask- TabSize=1
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel <?>
        Kernel driver in use: ath9k


I hope someone can fix this now :)
This is the same as comment #0 for me
Comment 13 Johannes Berg 2009-05-12 20:19:42 UTC
Good example for how bisect will totally go wrong when you can't really 100% reproduce the issue...

http://article.gmane.org/gmane.linux.network/122327

This doesn't really seem to make sense as the problem.

Please revert the commit and confirm that doing so solves it... And don't just try once, wireless connections aren't always 100% reproducible.
Comment 14 Rafael J. Wysocki 2009-05-13 11:02:46 UTC
So the mainline is not affected, correct?  Closing, please reopen if this is not the case.

Note You need to log in before you can comment on or make changes to this bug.