Bug 13807
Summary: | ath9k: Signal strength and stability dropped starting from 2.6.30 of 20-30% | ||
---|---|---|---|
Product: | Drivers | Reporter: | hayarms |
Component: | network-wireless | Assignee: | Luis Chamberlain (mcgrof) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | adam, adriano.lols, andrej, ath9k-devel, bugzilla, frank.dekervel, hayarms, hobbes1069, hugh, kernel, kristoffer.ericson, linville, matt, mcgrof, michael_zanetti, mike, nenad, netllama, paulo.albuquerque, senthilkumar, sujith, tom, vaibhav_19842002, zdevai |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.30 and up | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
compat-wireless-22-07-2009 ooops
WiFi hardware details |
Description
hayarms
2009-07-21 09:57:52 UTC
Please try with latest wireless testing and let me know if the problem still occurrs. Created attachment 22453 [details]
compat-wireless-22-07-2009 ooops
The latest compat-wireless oopses on kernel 2.6.30.2 at boot
Use wireless-testing directly then: http://wireless.kernel.org/en/developers/Documentation/git-guide Not sure what the issue may be regarding compat-wireless on 2.6.30, we can deal with that separately though. I believe I'm seeing this bug on my own ath9k (AR9280, as dmesg says) on my Asus 1000HEB. I tried compat-wireless as of today on 2.6.31-6 (which is based on rc6, I think) and saw no change in performance (pretty abysmal transfer rates), though I did not see any more disassociations. Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/414560 +1 me too. Fedora just went from 2.6.29 to 2.6.30 and immediately I was having problems reconnecting to my WLAN. Turning off power management helps maintain a stable connection but I still get some packet loss from LAN to LAN pings. As soon as I turn on power management again I get dropped from the AP. lspci 03:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01) dmesg phy0: Atheros AR9280 MAC/BB Rev:2 AR5133 RF Rev:d0: mem=0xffffc20011160000, irq=17 uname 2.6.30.5-43.fc11.x86_64 I believe this[1] commit is the culprit, but I don't have a kernel git clone ready or built to test my theory. [1] http://git.kernel.org/linus/3cbb5dd73697b3f1c677daffe29f00ace22b71e9 hi, above comments describe the problems i have. i run yesterdays wireless-testing (via wireless-compat on ubuntu kernel 2.6.31-9), and still i have high packet loss. i tried two things already: disabling the rfkill poll in main.c as suggested by somebody on ath9k-devel and disabling DEFAULT_PS in the kernel config. both didn't result in any difference for me. when i do ping -f [my router] i can clearly see periods of good connections (eg all pings are responded quickly) by periods of no connection at all (eg i see dots appearing very quickly). for me it looks like about 5 "good" seconds followed by 1 or 2 "bad" seconds btw, i also had the disconnections mentioned in the original report, but that doesn't seem to happen anymore now. hi, with wireless-testing from 2009-09-09 , disabled rfkill polling and disabled DEFAULT_PS i cannot reproduce the problem anymore. wireless works very well now for over a day. Could you provide some information on how to disable rfkill and DEFAULT_PS? I've tryed ./ath9k/main.c file but no deal... And, when you say to disable DEFAULT_PS in the kernel config, you mean, to compile it from scratch with this option disabled? hi, i used compat-wireless-2.6, and i changed the following: kervel@anthe:~/src/git/compat-wireless-2.6$ git diff master..kervel diff --git a/config.mk b/config.mk index b482b68..b8ebc0c 100644 --- a/config.mk +++ b/config.mk @@ -107,8 +107,8 @@ CONFIG_MAC80211_LEDS=y CONFIG_MAC80211_MESH=y CONFIG_CFG80211=m -CONFIG_CFG80211_DEFAULT_PS=y -CONFIG_CFG80211_DEFAULT_PS_VALUE=1 +CONFIG_CFG80211_DEFAULT_PS=n +CONFIG_CFG80211_DEFAULT_PS_VALUE=0 # CONFIG_CFG80211_REG_DEBUG=y CONFIG_LIB80211=m @@ -129,7 +129,7 @@ CONFIG_ATH5K=m # CONFIG_ATH5K_DEBUG=y CONFIG_ATH5K_RFKILL=y CONFIG_ATH9K=m -# CONFIG_ATH9K_DEBUG=y +CONFIG_ATH9K_DEBUG=y i changed main.c from ath9k. diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c index 3dc7b5a..536d10a 100644 --- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -1227,10 +1227,10 @@ static void ath9k_rfkill_poll_state(struct ieee80211_hw *hw) wiphy_rfkill_set_hw_state(hw->wiphy, blocked); - if (blocked) + /*if (blocked) ath_radio_disable(sc); else - ath_radio_enable(sc); + ath_radio_enable(sc);*/ } for ubuntu i've setup a PPA here: https://launchpad.net/~frank-dekervel/+archive/ppa if you don't use ubuntu, you can avoid rebuilding the kernel by using compat-wireless-2.6 I too have frequent disconnect problem with ath9k in 2.6.31 while 2.6.30 was not having such problems. In 2.6.31 in spite of good single strength bitrate keeps varing. my logs are filled with: wlan0: no probe response from AP <mac> - disassociating compiled with ath9k debug got this extra info: ath9k: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x40000020 Frank's Patch : 1st i disabled only power save mode it didn't improve anything. 2nd time i applied power save + rfkill both parts of the patch and its working rock solid now. Thank You Frank, it was really nightmare to use wireless without your patch. Unfortunately, the patches didn't help at all. Network freezes are so frequent that Atheros cards became unusable on Linux. As already mentioned in other threads, I think this is closely related to rate control malfunction. But it's just my stupid theory... ;-) There are two important things to note: 1) Most (if not all) freezes are related to movement around the access point (people walking by) or near the network card's antenna. Simply waving your hand around the antenna provokes a freeze. 2) When I use the old iwconfig to set the card's rate, something surprising happens: It takes about five seconds or more before the output from iwconfig reflects the rate change! When invoked sooner, iwconfig shows the previous rate. (Yes, I know, iwconfig uses a deprecated API, doesn't show 802.11n rates and so forth...) Manual rate changes on Intel adapters (both old ones based on ipw2200 and new ones based on the mac80211 framework) occur immediately (from a human's point of view). If rate changes are so terribly slow (for whatever reason), couldn't this cause all the freezes? When signal quality drops rapidly, possibly due to movement around the antenna, the transfer rate must be decreased ASAP. When the rate change takes too long, the retry limit is reached and packets get lost. Have you tried this?
> So that just means the driver is broken. Please run
>
> iwconfig wlan0 power off
>
> to fix it then. And the patch below will disable it by default since
> it's broken.
>
> johannes
>
> --- wireless-testing.orig/drivers/net/wireless/ath/ath9k/main.c
> 2009-11-22 11:44:41.000000000 +0100
> +++ wireless-testing/drivers/net/wireless/ath/ath9k/main.c 2009-11-22
> 11:45:26.000000000 +0100
> @@ -1893,6 +1893,8 @@ void ath_set_hw_capab(struct ath_softc *
> BIT(NL80211_IFTYPE_ADHOC) |
> BIT(NL80211_IFTYPE_MESH_POINT);
>
> + hw->wiphy->ps_default = false;
> +
> hw->queues = 4;
> hw->max_rates = 4;
> hw->channel_change_time = 5000;
>
>
I too, am afflicted by this bug (even when disabling pwr management with iwconfig). Unfortunately, the patches that Frank posted 2 months ago no longer will apply cleanly to the latest compat-wireless snapshot. I attempted to apply it manually, but the code inside of main.c isn't similar enough for me to be certain of where to make the changes. The solution proposed here is only a work around. At bug 14267 Rafael J. Wysocki did a bisect and possibly found the defect. There is proposed a bug fix that involves reverting a commit, seems to work (= Not in linux-2.6 yet, but... commit 54ab040d24904d1fa2c0a6a27936b7c56a4efb24 Author: John W. Linville <linville@tuxdriver.com> Date: Mon Nov 23 16:15:19 2009 -0500 ath9k: set ps_default as false Copied from original one-line patch here: http://bugzilla.kernel.org/show_bug.cgi?id=14267#c26 Signed-off-by: John W. Linville <linville@tuxdriver.com> I'm a bit confused now. From my admittedly naive understanding of the one line patch, all its doing is making the default power save mode as false inside the driver (rather than true). Isn't that equivalent to running 'iwconfig wlan0 power off' or does it have a more dramatic effect that would be different than what the iwconfig command accomplishes? My concern is that I've already been manually disabling power mgmt with the iwconfig command (at boot), and it doesn't eliminate the problem. I'm not sure if it makes it less likely, as I could never find a reliable means of triggering the problem, and I've only had this hardware for less than a week. Thanks for putting up with my questions. I just don't want to pin my hopes on this as a fix if I'm potentially hitting a different/unrelated problem. (In reply to comment #17) > I'm a bit confused now. From my admittedly naive understanding of the one > line > patch, all its doing is making the default power save mode as false inside > the > driver (rather than true). Isn't that equivalent to running 'iwconfig wlan0 > power off' or does it have a more dramatic effect that would be different > than > what the iwconfig command accomplishes? That is my understanding, too. This change will only help work-around part of the problem. I don't see how disabling power management is a permanent solution. > > My concern is that I've already been manually disabling power mgmt with the > iwconfig command (at boot), and it doesn't eliminate the problem. I'm not > sure > if it makes it less likely, as I could never find a reliable means of > triggering the problem, and I've only had this hardware for less than a week. > Yes, the problem is not solve completely. Signal strengths are still reported wildly. If I'm 1 foot away from the AP I can see ~60% signal. If I'm ~30 ft and behind two walls I can see 60% signal. Signal used to be reported properly and correctly (IMHO) back before 2.6.30. Also, disabling power management allows one to have a stable connection at idle, but if you start up a hulu.com video or bittorrent download I have seen random disconnects and the wireless device is rendered useless (cannot connect or scan) when this happens. I do not have any output (bad or good) from the kernel or any thing else when the device becomes munged. Please do not close this issue by simply defaulting power saving off. It is much deeper than that. I also agree that the work around is not a satisfactory solution. Please have a look at http://bugzilla.kernel.org/show_bug.cgi?id=14267#c26 where I think a more suitable solution is suggested. Quoting Rafael J. Wysocki: > > I did a bisect yesterday on this, and the results > > seemed to have worked over here by reverting: > > > > 75e6c3b72b3ab01c47629f3fbd0fed4e6550bf3a > > cfg80211: lower dynamic PS timeout to 100ms Are you guys using wireless-testing? Please try it. http://wireless.kernel.org/en/developers/Documentation/git-guide "Link quality" report issues won't be fixed on stable releases of the kernel. If you are not using bleeding edge its pointless to work with you on this report. You will also get some other enhancements and fixes which sometimes cannot be backported and might also just not meet "stable" fix requirements. Luis, I've not been using wireless-testing, as you're the first to suggest it. After reading over the URL that you noted, I think what I need to do is: git clone git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git cd wireless-testing ./scripts/driver-select ath9k make make install And then unload and reload the driver(s) ? Please confirm that's what you're requesting, and I'll give it a try. Granted, its going to be hard to know for sure if bleeding edge fixes this, since I can't trigger the failure in any intentional way. No so wireless-testing is a full kernel git tree, just as you would clone Linus' tree. It constist of the 2.6.32-rc8 + all the queued up patches for the wireless subsystem for 2.6.33. This is bleeding edge so chances are that if you have *any* issue it can get fixed here easier and quicker than waiting for a fix to propagate down the old stable kernel trees. Not that fixes should not go to stable trees, they should but the type of fixes that go to stable trees do not invovle things like "link quality" fixes, as that is purely minor. Even fixing some sort of connection problem may not be propagated to stable kernels as some patches may be toooo big for a stable kernel.. Please read this for more information: http://wireless.kernel.org/en/users/Documentation/Fix_Propagation Also when you have time just go over all of this documenation: http://wireless.kernel.org/en/users/Documentation/ Now -- you need to compile an entire kernel, so no, you cannot just select to compile ath9k. We do have a backport package which lets you do that though and you wouldn't have to compile your entire kernel. For info on that see: http://wireless.kernel.org/en/users/Download For stable release see: http://wireless.kernel.org/en/users/Download/stable What card do you have again? Oh ok AR9280 it seems. Can someone please close this bug report. Patches for signal quality have been provided a while ago for newer kernels, if there are specific issues with newer kernels please report them. If unclear why I am asking you to test later kernels please see: http://wireless.kernel.org/en/users/Documentation/Fix_Propagation I tried the 12/1/2009 compat-wireless on Fedora 12 (2.6.31.6-145.fc12.x86_64) and the signal quality was not improved. It ranges from 47-72% and the AP is right across the room (clear air). That being said, subjectively I seemed to get less disassociations although they still occur. Prior to installing the compat-wireless ath9k driver I could not complete a /home backup with BackupPC over wireless after many attempts. Afterwards I can, however, as I mentioned I still get disassociations. Also strange is that I get better signal from a couple of other AP's from my neighbors which I find strange as we're not in a crowded area (in fact we're on 3/4 acre lots). I'm wondering if with the current drivers I get better signals from a 802.11b AP instead of 802.11g? I tried using "iwlist wlan0 scanning" but it did not show what type of AP's they were. I did make sure that depmod.conf is loading updates first, but is there a way to verify that the new driver is the one being loaded? modprobe -l ath9k it should list where in the filesystem (or path under /lib/modules/$(uname -r)/ on newer modutils) where the module is being picked up from. Already fixed upstream. Does 2.6.32.2 include the fix or do I need a GIT version? No need for GIT. 2.6.32 is sufficient. It is definitely much better on 2.6.32. Can't do deeper tests, but it haven't dropped a single time within my AP. But, it still reports 16db of Tx-Power. I remenber it going well over 20db on older kernels (2.6.29 I think) The problem persists for me in 2.6.32. It is even much worse now. Inexplicable reassociations appear about once in 10 minutes in dmesg. However, the connection gets interrupted much more often than that. I doesn't last for more than 30 seconds. Only ping floods can (temporarily) keep the connection alive. I run 2.6.32-zen5, the only 2.6.32 that currently has Reiser4 support. It's still the same old story: wlan0: deauthenticating from 00:23:f8:22:aa:a6 by local choice (reason=3) wlan0: direct probe to AP 00:23:f8:22:aa:a6 (try 1) wlan0: direct probe responded wlan0: authenticate with AP 00:23:f8:22:aa:a6 (try 1) wlan0: authenticated wlan0: associate with AP 00:23:f8:22:aa:a6 (try 1) wlan0: RX AssocResp from 00:23:f8:22:aa:a6 (capab=0x31 status=0 aid=1) wlan0: associated ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready Sometimes there's a message I haven't seen before: ath9k: Two wiphys trying to scan at the same time Should I try compat-wireless? Any suggestions? Created attachment 24630 [details]
WiFi hardware details
This is the dmesg output I get after inserting the PCMCIA card.
BTW, is it possible to set other rate control algorithms than 'ath9k_rate_control' (PID, minstrel)? As already mentioned, I think this issue might be related to rate control. When changing the rate manually, it takes 5 to 10 seconds before the change is reflected in the output of iwconfig. This might mean that the rate doesn't drop quickly enough when necessary. Consequently, too many frames get lost and the problem propagates to the IP layer. Just like before, many failures are related to movement near the antenna. They are less frequent at night when people don't walk by. 2.6.31.12 works almost fine, about 1 freeze per hour. 2.6.32.8 seems to be a disaster. Freezes occur almost immediately when there's no network traffic. As usual, a ping flood unblocks the interface. But imagine doing this every minute... :-( The current version is unusable for me. It can only operate under a constant ping flood. 2.6.33 is also affected. Just 30 seconds with no traffic is a reliable trigger. Movement near the antenna will cause an immediate freeze, no matter if there is network traffic. Ping floods „work“ the usual way mentioned above. As already mentioned, it takes an insane period of time (about 15 seconds right now) to change the transfer rate with iwconfig. I don't know whether this is just a mac80211/wext impairment or an issue related to the driver or adapter. If the latter holds and rate changes really take that long, it is obvious that the device can't react to outer conditions quickly enough. Consequently, packets get lost. Sorry about my multi-posting. This was my last report on the topic. I was just attempting to make my Atheros hardware work, which seems to be impossible. I switched back to an Intel wireless card that just works. I don't believe in Atheros any more. :-) Still not solved for me in 2.6.32 too. Since 2.6.30 my ath9k card keeps disconnecting about once or twice a day. When this happens the process phy0 takes 100% cpu for about 10 seconds. I have to unload and reload the module before I'm able to reconnect again. my dmesg: ath9k: Two wiphys trying to scan at the same time wlan0: deauthenticating from 00:14:51:6a:77:6b by local choice (reason=3) wlan0: direct probe to AP 00:14:51:6a:77:6b (try 1) wlan0: direct probe to AP 00:14:51:6a:77:6b (try 2) wlan0: direct probe to AP 00:14:51:6a:77:6b (try 3) wlan0: direct probe to AP 00:14:51:6a:77:6b timed out wlan0: direct probe to AP 00:14:51:6a:77:6b (try 1) wlan0: direct probe responded wlan0: authenticate with AP 00:14:51:6a:77:6b (try 1) wlan0: authenticated wlan0: associate with AP 00:14:51:6a:77:6b (try 1) wlan0: RX AssocResp from 00:14:51:6a:77:6b (capab=0x411 status=0 aid=3) wlan0: associated ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready wlan0: no IPv6 routers present No probe response from AP 00:14:51:6a:77:6b after 500ms, disconnecting. wlan0: direct probe to AP 00:14:51:6a:77:6b (try 1) wlan0: direct probe to AP 00:14:51:6a:77:6b (try 2) wlan0: direct probe to AP 00:14:51:6a:77:6b (try 3) wlan0: direct probe to AP 00:14:51:6a:77:6b timed out ath9k 0000:0b:00.0: PCI INT A disabled |