Bug 13807

Summary: ath9k: Signal strength and stability dropped starting from 2.6.30 of 20-30%
Product: Drivers Reporter: hayarms
Component: network-wirelessAssignee: Luis Chamberlain (mcgrof)
Status: RESOLVED CODE_FIX    
Severity: normal CC: adam, adriano.lols, andrej, ath9k-devel, bugzilla, frank.dekervel, hayarms, hobbes1069, hugh, kernel, kristoffer.ericson, linville, matt, mcgrof, michael_zanetti, mike, nenad, netllama, paulo.albuquerque, senthilkumar, sujith, tom, vaibhav_19842002, zdevai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30 and up Subsystem:
Regression: Yes Bisected commit-id:
Attachments: compat-wireless-22-07-2009 ooops
WiFi hardware details

Description hayarms 2009-07-21 09:57:52 UTC
Hi, I'm using the ath9k driver on my Asus EEEPC 1000HE for my wireless connection and up to the kernel 2.6.29 everything was great (very stable connection and good signal strength).

As I upgraded to kernel 2.6.30 the signal strength dropped a lot . I get constant disconnections or no signal at all in places where I was having a stable connection with 2.6.29. This behaviour continued with the 2.6.30.1 kernel and I also tried 2.6.31-rc3-git4 yesterday without success ( I guess 2.6.30.2 doesn't solve the ploblem either because the changelog doesn't seem to contain ath9k fixes).

I also tried applying this patch : http://bugzilla.kernel.org/attachment.cgi?id=21958

that I found in this bug report that seemed similar: http://bugzilla.kernel.org/show_bug.cgi?id=13537

but without success.

There's also a discussion on the Arch Linux forum about this issue : http://bbs.archlinux.org/viewtopic.php?pid=574759

Under 2.6.30 I don't see any error message in dmesg.

This is the model of my card :

01:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)

but seems that also other cards , as specified in the arch forum thread, may suffer the problem.
Comment 1 Senthil Balasubramanian 2009-07-22 12:47:24 UTC
Please try with latest wireless testing and let me know if the problem still occurrs.
Comment 2 hayarms 2009-07-22 21:10:05 UTC
Created attachment 22453 [details]
compat-wireless-22-07-2009 ooops

The latest compat-wireless oopses on kernel 2.6.30.2 at boot
Comment 3 Luis Chamberlain 2009-07-22 21:13:30 UTC
Use wireless-testing directly then:

http://wireless.kernel.org/en/developers/Documentation/git-guide

Not sure what the issue may be regarding compat-wireless on 2.6.30, we can deal with that separately though.
Comment 4 Matt Behrens 2009-08-18 21:16:50 UTC
I believe I'm seeing this bug on my own ath9k (AR9280, as dmesg says) on my Asus 1000HEB.

I tried compat-wireless as of today on 2.6.31-6 (which is based on rc6, I think) and saw no change in performance (pretty abysmal transfer rates), though I did not see any more disassociations.

Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/414560
Comment 5 Michael Cronenworth 2009-09-07 09:29:23 UTC
+1 me too. Fedora just went from 2.6.29 to 2.6.30 and immediately I was having problems reconnecting to my WLAN. Turning off power management helps maintain a stable connection but I still get some packet loss from LAN to LAN pings. As soon as I turn on power management again I get dropped from the AP.

lspci
03:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)

dmesg
phy0: Atheros AR9280 MAC/BB Rev:2 AR5133 RF Rev:d0: mem=0xffffc20011160000, irq=17

uname
2.6.30.5-43.fc11.x86_64

I believe this[1] commit is the culprit, but I don't have a kernel git clone ready or built to test my theory.

[1] http://git.kernel.org/linus/3cbb5dd73697b3f1c677daffe29f00ace22b71e9
Comment 6 Frank Dekervel 2009-09-09 13:35:56 UTC
hi, above comments describe the problems i have. i run yesterdays wireless-testing (via wireless-compat on ubuntu kernel 2.6.31-9), and still i have high packet loss.

i tried two things already: disabling the rfkill poll in main.c as suggested by somebody on ath9k-devel and disabling DEFAULT_PS in the kernel config.

both didn't result in any difference for me. when i do ping -f [my router] i can clearly see periods of good connections (eg all pings are responded quickly) by periods of no connection at all (eg i see dots appearing very quickly). for me it looks like about 5 "good" seconds followed by 1 or 2 "bad" seconds

btw, i also had the disconnections mentioned in the original report, but that doesn't seem to happen anymore now.
Comment 7 Frank Dekervel 2009-09-10 12:31:32 UTC
hi, with wireless-testing from 2009-09-09 , disabled rfkill polling and disabled DEFAULT_PS i cannot reproduce the problem anymore. wireless works very well now for over a day.
Comment 8 Adriano Moura 2009-09-17 05:23:33 UTC
Could you provide some information on how to disable rfkill and DEFAULT_PS? I've tryed ./ath9k/main.c file but no deal...
And, when you say to disable DEFAULT_PS in the kernel config, you mean, to compile it from scratch with this option disabled?
Comment 9 Frank Dekervel 2009-09-17 06:13:49 UTC
hi,

i used compat-wireless-2.6, and i changed the following:

kervel@anthe:~/src/git/compat-wireless-2.6$ git diff master..kervel
diff --git a/config.mk b/config.mk
index b482b68..b8ebc0c 100644
--- a/config.mk
+++ b/config.mk
@@ -107,8 +107,8 @@ CONFIG_MAC80211_LEDS=y
 CONFIG_MAC80211_MESH=y

 CONFIG_CFG80211=m
-CONFIG_CFG80211_DEFAULT_PS=y
-CONFIG_CFG80211_DEFAULT_PS_VALUE=1
+CONFIG_CFG80211_DEFAULT_PS=n
+CONFIG_CFG80211_DEFAULT_PS_VALUE=0
 # CONFIG_CFG80211_REG_DEBUG=y

 CONFIG_LIB80211=m
@@ -129,7 +129,7 @@ CONFIG_ATH5K=m
 # CONFIG_ATH5K_DEBUG=y
 CONFIG_ATH5K_RFKILL=y
 CONFIG_ATH9K=m
-# CONFIG_ATH9K_DEBUG=y
+CONFIG_ATH9K_DEBUG=y

i changed main.c from ath9k.

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 3dc7b5a..536d10a 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1227,10 +1227,10 @@ static void ath9k_rfkill_poll_state(struct ieee80211_hw *hw)

        wiphy_rfkill_set_hw_state(hw->wiphy, blocked);

-       if (blocked)
+       /*if (blocked)
                ath_radio_disable(sc);
        else
-               ath_radio_enable(sc);
+               ath_radio_enable(sc);*/
 }

for ubuntu i've setup a PPA here: 
https://launchpad.net/~frank-dekervel/+archive/ppa

if you don't use ubuntu, you can avoid rebuilding the kernel by using compat-wireless-2.6
Comment 10 vaibhav 2009-09-26 11:11:48 UTC
I too have frequent disconnect problem with ath9k in 2.6.31 while 2.6.30 was not having such problems. In 2.6.31 in spite of good single strength bitrate keeps varing.

my logs are filled with:
wlan0: no probe response from AP <mac> - disassociating

compiled with ath9k debug got this extra info:

ath9k: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x40000020 

Frank's Patch :

1st i disabled only power save mode it didn't improve anything.
2nd time i applied power save + rfkill both parts of the patch and its working rock solid now.

Thank You Frank, it was really nightmare to use wireless without your patch.
Comment 11 John W. Linville 2009-11-19 15:02:21 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=538792
Comment 12 Andrej Podzimek 2009-11-21 21:17:10 UTC
Unfortunately, the patches didn't help at all. Network freezes are so frequent that Atheros cards became unusable on Linux.

As already mentioned in other threads, I think this is closely related to rate control malfunction. But it's just my stupid theory... ;-) There are two important things to note:

1) Most (if not all) freezes are related to movement around the access point (people walking by) or near the network card's antenna. Simply waving your hand around the antenna provokes a freeze.

2) When I use the old iwconfig to set the card's rate, something surprising happens: It takes about five seconds or more before the output from iwconfig reflects the rate change! When invoked sooner, iwconfig shows the previous rate.

(Yes, I know, iwconfig uses a deprecated API, doesn't show 802.11n rates and so forth...)

Manual rate changes on Intel adapters (both old ones based on ipw2200 and new ones based on the mac80211 framework) occur immediately (from a human's point of view).

If rate changes are so terribly slow (for whatever reason), couldn't this cause all the freezes? When signal quality drops rapidly, possibly due to movement around the antenna, the transfer rate must be decreased ASAP. When the rate change takes too long, the retry limit is reached and packets get lost.
Comment 13 Kristoffer Ericson 2009-11-22 15:25:48 UTC
Have you tried this?

> So that just means the driver is broken. Please run
> 
>       iwconfig wlan0 power off
> 
> to fix it then. And the patch below will disable it by default since
> it's broken.
> 
> johannes
> 
> --- wireless-testing.orig/drivers/net/wireless/ath/ath9k/main.c      
> 2009-11-22 11:44:41.000000000 +0100
> +++ wireless-testing/drivers/net/wireless/ath/ath9k/main.c    2009-11-22
> 11:45:26.000000000 +0100
> @@ -1893,6 +1893,8 @@ void ath_set_hw_capab(struct ath_softc *
>               BIT(NL80211_IFTYPE_ADHOC) |
>               BIT(NL80211_IFTYPE_MESH_POINT);
>  
> +     hw->wiphy->ps_default = false;
> +
>       hw->queues = 4;
>       hw->max_rates = 4;
>       hw->channel_change_time = 5000;
> 
>
Comment 14 Lonni J Friedman 2009-11-22 22:28:13 UTC
I too, am afflicted by this bug (even when disabling pwr management with iwconfig).  Unfortunately, the patches that Frank posted 2 months ago no longer will apply cleanly to the latest compat-wireless snapshot.  I attempted to apply it manually, but the code inside of main.c isn't similar enough for me to be certain of where to make the changes.
Comment 15 Paulo Albuquerque 2009-11-25 09:45:26 UTC
The solution proposed here is only a work around. At bug 14267 Rafael J. Wysocki did a bisect and possibly found the defect. There is proposed a bug fix that involves reverting a commit, seems to work (=
Comment 16 John W. Linville 2009-11-25 14:13:19 UTC
Not in linux-2.6 yet, but...

commit 54ab040d24904d1fa2c0a6a27936b7c56a4efb24
Author: John W. Linville <linville@tuxdriver.com>
Date:   Mon Nov 23 16:15:19 2009 -0500

    ath9k: set ps_default as false
    
    Copied from original one-line patch here:
    
        http://bugzilla.kernel.org/show_bug.cgi?id=14267#c26
    
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
Comment 17 Lonni J Friedman 2009-11-25 15:20:04 UTC
I'm a bit confused now.  From my admittedly naive understanding of the one line patch, all its doing is making the default power save mode as false inside the driver (rather than true).  Isn't that equivalent to running 'iwconfig wlan0 power off' or does it have a more dramatic effect that would be different than what the iwconfig command accomplishes?  

My concern is that I've already been manually disabling power mgmt with the iwconfig command (at boot), and it doesn't eliminate the problem.  I'm not sure if it makes it less likely, as I could never find a reliable means of triggering the problem, and I've only had this hardware for less than a week.

Thanks for putting up with my questions. I just don't want to pin my hopes on this as a fix if I'm potentially hitting a different/unrelated problem.
Comment 18 Michael Cronenworth 2009-11-25 20:39:13 UTC
(In reply to comment #17)
> I'm a bit confused now.  From my admittedly naive understanding of the one
> line
> patch, all its doing is making the default power save mode as false inside
> the
> driver (rather than true).  Isn't that equivalent to running 'iwconfig wlan0
> power off' or does it have a more dramatic effect that would be different
> than
> what the iwconfig command accomplishes?  

That is my understanding, too. This change will only help work-around part of the problem. I don't see how disabling power management is a permanent solution.

> 
> My concern is that I've already been manually disabling power mgmt with the
> iwconfig command (at boot), and it doesn't eliminate the problem.  I'm not
> sure
> if it makes it less likely, as I could never find a reliable means of
> triggering the problem, and I've only had this hardware for less than a week.
> 

Yes, the problem is not solve completely. Signal strengths are still reported wildly. If I'm 1 foot away from the AP I can see ~60% signal. If I'm ~30 ft and behind two walls I can see 60% signal. Signal used to be reported properly and correctly (IMHO) back before 2.6.30.

Also, disabling power management allows one to have a stable connection at idle, but if you start up a hulu.com video or bittorrent download I have seen random disconnects and the wireless device is rendered useless (cannot connect or scan) when this happens. I do not have any output (bad or good) from the kernel or any thing else when the device becomes munged.

Please do not close this issue by simply defaulting power saving off. It is much deeper than that.
Comment 19 Paulo Albuquerque 2009-11-25 21:28:26 UTC
I also agree that the work around is not a satisfactory solution. Please have a look at http://bugzilla.kernel.org/show_bug.cgi?id=14267#c26 where I think a more suitable solution is suggested. Quoting Rafael J. Wysocki:

> > I did a bisect yesterday on this, and the results
> > seemed to have worked over here by reverting:
> > 
> > 75e6c3b72b3ab01c47629f3fbd0fed4e6550bf3a
> > cfg80211: lower dynamic PS timeout to 100ms
Comment 20 Luis Chamberlain 2009-11-25 22:05:22 UTC
Are you guys using wireless-testing? Please try it.

http://wireless.kernel.org/en/developers/Documentation/git-guide

"Link quality" report issues won't be fixed on stable releases of the kernel. If you are not using bleeding edge its pointless to work with you on this report.

You will also get some other enhancements and fixes which sometimes cannot be backported and might also just not meet "stable" fix requirements.
Comment 21 Lonni J Friedman 2009-11-25 22:21:48 UTC
Luis,
I've not been using wireless-testing, as you're the first to suggest it.  After reading over the URL that you noted, I think what I need to do is: 

git clone git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git
cd wireless-testing
./scripts/driver-select ath9k
make
make install

And then unload and reload the driver(s) ?

Please confirm that's what you're requesting, and I'll give it a try.  Granted, its going to be hard to know for sure if bleeding edge fixes this, since I can't trigger the failure in any intentional way.
Comment 22 Luis Chamberlain 2009-11-25 22:28:26 UTC
No so wireless-testing is a full kernel git tree, just as you would clone Linus' tree. It constist of the 2.6.32-rc8 + all the queued up patches for the wireless subsystem for 2.6.33.

This is bleeding edge so chances are that if you have *any* issue it can get fixed here easier and quicker than waiting for a fix to propagate down the old stable kernel trees. Not that fixes should not go to stable trees, they should but the type of fixes that go to stable trees do not invovle things like "link quality" fixes, as that is purely minor. Even fixing some sort of connection problem may not be propagated to stable kernels as some patches may be toooo big for a stable kernel.. Please read this for more information:

http://wireless.kernel.org/en/users/Documentation/Fix_Propagation

Also when you have time just go over all of this documenation:

http://wireless.kernel.org/en/users/Documentation/

Now -- you need to compile an entire kernel, so no, you cannot just select to compile ath9k. We do have a backport package which lets you do that though and you wouldn't have to compile your entire kernel. For info on that see:

http://wireless.kernel.org/en/users/Download

For stable release see:

http://wireless.kernel.org/en/users/Download/stable

What card do you have again?
Comment 23 Luis Chamberlain 2009-11-25 22:28:50 UTC
Oh ok AR9280 it seems.
Comment 24 Luis Chamberlain 2009-12-01 07:57:00 UTC
Can someone please close this bug report. Patches for signal quality have been provided a while ago for newer kernels, if there are specific issues with newer kernels please report them.

If unclear why I am asking you to test later kernels please see:

http://wireless.kernel.org/en/users/Documentation/Fix_Propagation
Comment 25 Richard 2009-12-05 15:04:58 UTC
I tried the 12/1/2009 compat-wireless on Fedora 12 (2.6.31.6-145.fc12.x86_64) and the signal quality was not improved. It ranges from 47-72% and the AP is right across the room (clear air).

That being said, subjectively I seemed to get less disassociations although they still occur. 

Prior to installing the compat-wireless ath9k driver I could not complete a /home backup with BackupPC over wireless after many attempts. Afterwards I can, however, as I mentioned I still get disassociations. 

Also strange is that I get better signal from a couple of other AP's from my neighbors which I find strange as we're not in a crowded area (in fact we're on 3/4 acre lots). I'm wondering if with the current drivers I get better signals from a 802.11b AP instead of 802.11g? I tried using "iwlist wlan0 scanning" but it did not show what type of AP's they were. 

I did make sure that depmod.conf is loading updates first, but is there a way to verify that the new driver is the one being loaded?
Comment 26 Luis Chamberlain 2009-12-29 20:25:01 UTC
modprobe -l ath9k

it should list where in the filesystem (or path under /lib/modules/$(uname -r)/ on newer modutils) where the module is being picked up from.
Comment 27 Luis Chamberlain 2009-12-29 20:26:07 UTC
Already fixed upstream.
Comment 28 Andrej Podzimek 2010-01-02 19:18:35 UTC
Does 2.6.32.2 include the fix or do I need a GIT version?
Comment 29 Senthil Balasubramanian 2010-01-04 04:49:29 UTC
No need for GIT. 2.6.32 is sufficient.
Comment 30 Adriano Moura 2010-01-07 02:41:02 UTC
It is definitely much better on 2.6.32. Can't do deeper tests, but it haven't dropped a single time within my AP.

But, it still reports 16db of Tx-Power. I remenber it going well over 20db on older kernels (2.6.29 I think)
Comment 31 Andrej Podzimek 2010-01-19 14:12:55 UTC
The problem persists for me in 2.6.32. It is even much worse now. Inexplicable reassociations appear about once in 10 minutes in dmesg. However, the connection gets interrupted much more often than that. I doesn't last for more than 30 seconds.

Only ping floods can (temporarily) keep the connection alive.

I run 2.6.32-zen5, the only 2.6.32 that currently has Reiser4 support.

It's still the same old story:

    wlan0: deauthenticating from 00:23:f8:22:aa:a6 by local choice (reason=3)
    wlan0: direct probe to AP 00:23:f8:22:aa:a6 (try 1)
    wlan0: direct probe responded
    wlan0: authenticate with AP 00:23:f8:22:aa:a6 (try 1)
    wlan0: authenticated
    wlan0: associate with AP 00:23:f8:22:aa:a6 (try 1)
    wlan0: RX AssocResp from 00:23:f8:22:aa:a6 (capab=0x31 status=0 aid=1)
    wlan0: associated
    ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Sometimes there's a message I haven't seen before:

    ath9k: Two wiphys trying to scan at the same time

Should I try compat-wireless? Any suggestions?
Comment 32 Andrej Podzimek 2010-01-19 14:15:59 UTC
Created attachment 24630 [details]
WiFi hardware details

This is the dmesg output I get after inserting the PCMCIA card.
Comment 33 Andrej Podzimek 2010-01-19 14:24:24 UTC
BTW, is it possible to set other rate control algorithms than 'ath9k_rate_control' (PID, minstrel)?

As already mentioned, I think this issue might be related to rate control. When changing the rate manually, it takes 5 to 10 seconds before the change is reflected in the output of iwconfig.

This might mean that the rate doesn't drop quickly enough when necessary. Consequently, too many frames get lost and the problem propagates to the IP layer.

Just like before, many failures are related to movement near the antenna. They are less frequent at night when people don't walk by.
Comment 34 Andrej Podzimek 2010-02-15 16:53:55 UTC
2.6.31.12 works almost fine, about 1 freeze per hour.

2.6.32.8 seems to be a disaster. Freezes occur almost immediately when there's no network traffic. As usual, a ping flood unblocks the interface. But imagine doing this every minute... :-(

The current version is unusable for me. It can only operate under a constant ping flood.
Comment 35 Andrej Podzimek 2010-03-08 09:03:08 UTC
2.6.33 is also affected. Just 30 seconds with no traffic is a reliable trigger. Movement near the antenna will cause an immediate freeze, no matter if there is network traffic. Ping floods „work“ the usual way mentioned above.

As already mentioned, it takes an insane period of time (about 15 seconds right now) to change the transfer rate with iwconfig. I don't know whether this is just a mac80211/wext impairment or an issue related to the driver or adapter. If the latter holds and rate changes really take that long, it is obvious that the device can't react to outer conditions quickly enough. Consequently, packets get lost.

Sorry about my multi-posting. This was my last report on the topic. I was just attempting to make my Atheros hardware work, which seems to be impossible. I switched back to an Intel wireless card that just works. I don't believe in Atheros any more. :-)
Comment 36 michael_zanetti 2010-03-26 08:31:08 UTC
Still not solved for me in 2.6.32 too. Since 2.6.30 my ath9k card keeps disconnecting about once or twice a day. When this happens the process phy0 takes 100% cpu for about 10 seconds. I have to unload and reload the module before I'm able to reconnect again. 

my dmesg:
ath9k: Two wiphys trying to scan at the same time                                                                                                                                
wlan0: deauthenticating from 00:14:51:6a:77:6b by local choice (reason=3)                                                                                                        
wlan0: direct probe to AP 00:14:51:6a:77:6b (try 1)                                                                                                                              
wlan0: direct probe to AP 00:14:51:6a:77:6b (try 2)                                                                                                                              
wlan0: direct probe to AP 00:14:51:6a:77:6b (try 3)                                                                                                                              
wlan0: direct probe to AP 00:14:51:6a:77:6b timed out                                                                                                                            
wlan0: direct probe to AP 00:14:51:6a:77:6b (try 1)                                                                                                                              
wlan0: direct probe responded                                                                                                                                                    
wlan0: authenticate with AP 00:14:51:6a:77:6b (try 1)                                                                                                                            
wlan0: authenticated                                                                                                                                                             
wlan0: associate with AP 00:14:51:6a:77:6b (try 1)                                                                                                                               
wlan0: RX AssocResp from 00:14:51:6a:77:6b (capab=0x411 status=0 aid=3)                                                                                                          
wlan0: associated                                                                                                                                                                
ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready                                                                                                                               
wlan0: no IPv6 routers present                                                                                                                                                   
No probe response from AP 00:14:51:6a:77:6b after 500ms, disconnecting.                                                                                                          
wlan0: direct probe to AP 00:14:51:6a:77:6b (try 1)                                                                                                                              
wlan0: direct probe to AP 00:14:51:6a:77:6b (try 2)                                                                                                                              
wlan0: direct probe to AP 00:14:51:6a:77:6b (try 3)                                                                                                                              
wlan0: direct probe to AP 00:14:51:6a:77:6b timed out                                                                                                                            
ath9k 0000:0b:00.0: PCI INT A disabled