Bug 15693

Summary: Plugging or unplugging notebook charger renders Atheros card unusable
Product: Drivers Reporter: registosites1
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED INSUFFICIENT_DATA    
Severity: high CC: aleksey.kamenskikh, fredbezies, linville, mark.langsdorf, me, registosites1, rjw, thomas
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: dmesg_after
wlan0_scan_1
wlan0_scan_2
wlan0_scan_3
lspci -vvv
lcpci -xxxx
/proc/modules
dmesg output on resume from STR

Description registosites1 2010-04-04 21:03:26 UTC
Created attachment 25854 [details]
dmesg_after

Overview:
Unplugging or plugging the notebook's charger will make the Atheros wireless card unusable if a scan is issued, as in changing the notebook from one place to another, loosing the connection to the access point and scanning for available access points. The card will remain unusable until the machine is powered off and on again, rebooting will not make the card work again.

Steps to Reproduce:
1) Turn on the machine with the charger plugged in and bring the network up with 'ifconfig wlan0 up', no association with the access point is needed to trigger the problem.

2) Issue a scan with 'iwlist wlan0 scan', the results are almost the expected results apart from the signal strength (see file wlan0_scan_1). If another scan is issued (see file wlan0_scan_2) then everything seems normal and it is still possible to associate with the access point. No errors or warnings are issued.

3) Unplug the notebook charger.

4) Issue another scan, this time no access point is detected (see file wlan0_scan_3) and there are errors in dmesg about "gain calibration timeout" (see file dmesg_after). No other errors are issued as far as I can see.

This problem also happens if the sequence is booting the machine on battery, performing steps 1) and 2), then plugging the charger and performing step 4).

The expected result would be a working network card after the scan. I can reproduce this problem with kernel 2.6.32.9 or newer and this problem seems to be easier to trigger with newer kernels (2.6.33 and 2.6.34). I cannot reproduce this problem with kernel 2.6.32.8 and as far as I remember this has been working just fine for a long time.

This as been tested with the kernel shipped by Arch Linux.
Comment 1 registosites1 2010-04-04 21:11:10 UTC
Created attachment 25855 [details]
wlan0_scan_1
Comment 2 registosites1 2010-04-04 21:12:26 UTC
Created attachment 25856 [details]
wlan0_scan_2
Comment 3 registosites1 2010-04-04 21:13:42 UTC
Created attachment 25858 [details]
wlan0_scan_3
Comment 4 registosites1 2010-04-04 21:14:46 UTC
Created attachment 25859 [details]
lspci -vvv
Comment 5 registosites1 2010-04-04 21:16:39 UTC
Created attachment 25860 [details]
lcpci -xxxx

Taken after triggering the problem.
Comment 6 registosites1 2010-04-04 21:17:34 UTC
Created attachment 25861 [details]
/proc/modules
Comment 7 John W. Linville 2010-04-05 15:31:42 UTC
Well, that sounds a bit..odd.  What kind of laptop do you have?  Could it be that the power change is inadvertantly setting an rfkill condition?
Comment 8 registosites1 2010-04-05 16:41:11 UTC
The notebook/laptop is a Packard Bell Easynote MX61-B-054PT.

The problem doesn't seem to be related with rfkill, after triggering the problem this is what I get:

> rfkill list all
0: phy0: Wireless LAN
	Soft blocked: no
	Hard blocked: no

Please note that I everything works well if I change only the kernel to version 2.6.32.8, otherwise I would have assumed the card was half toast or that the problem was caused by something else.

I have tried recompiling the kernel and changed "Networking support -> Wireles -> cfg80211 -> enable powersave by default (CONFIG_CFG80211_DEFAULT_PS)" from enabled to disabled and it does seem to help a little (when using the driver ath5k), but I can still trigger this problem if I associate with an access point before unplugging/plugging the notebook and issuing a scan.

If it helps, I have also tried using madwifi and ndiswrapper and the problem is the same, I just don't get the messages about the gain calibration timeout, which makes sense because those are issued by ath5k.
Comment 9 John W. Linville 2010-04-06 13:37:16 UTC
There really doesn't seem to be anything pertinent between 2.6.32.8 and 2.6.32.9.  You said you are using an archlinux kernel?  Have you reported this issue to the archlinux folks?  Can you recreate the issue with a plain kernel extracted from git?
Comment 10 registosites1 2010-04-06 21:29:37 UTC
I have just reported this in archlinux bug tracker, now I'll have to wait and
see if anyone has any ideas.

I haven't tried sources from git but have tried kernel 2.6.33.2 and 2.6.34-rc2
from [1] without any patches and I have used arch's config file as a starting
point and I can still trigger this problem.

I have also tried an openSUSE livecd and I can also trigger the problem ('uname
-r' says 2.6.33-6-desktop), this seems to indicate this is not something
specific to archlinux.

Note that with kernel 2.6.32.9 the problem could go by almost unnoticed because
it takes a few tries to trigger this problem, but with newer kernels it is
easier and more consistent to reproduce.

[1] ftp://ftp.kernel.org/pub/linux/kernel/v2.6/
Comment 11 registosites1 2010-04-10 18:07:58 UTC
I have tested a kernel from git as suggested above. I cannot confirm if the problem is still there because I have to boot with acpi=off (with this boot option I cannot reproduce the problem).

If I boot 2.6.33 with acpi=off I cannot reproduce the problem too. After some testing I have discovered that this problem only happens if I modprobe powernow_k8 (which I can only do if I boot without acpi=off so this problem may still be present in the git kernel).

If I don't have powernow_k8 inserted then I can plug and unplug the notebook charger and issue scans and wireless keeps working, regardless of being associated with an access point or not.

Any more ideas or things I can test?
Comment 12 Frederic Bezies 2010-04-11 17:12:51 UTC
I'm seeing a bug like this one, on frugalware-current which kernel is "only" "2.6.33.0"

I thought it was bug 14611, but the workaround proposed (no more power managing) works for me :(

More infos :

ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70)


eth0: no IPv6 routers present
ADDRCONF(NETDEV_UP): wlan0: link is not ready
wlan0: direct probe to AP 00:1d:6a:9b:6f:a0 (try 1)
wlan0: direct probe responded
wlan0: authenticate with AP 00:1d:6a:9b:6f:a0 (try 1)
wlan0: authenticated
wlan0: associate with AP 00:1d:6a:9b:6f:a0 (try 1)
wlan0: RX AssocResp from 00:1d:6a:9b:6f:a0 (capab=0x431 status=0 aid=1)
wlan0: associated
ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wlan0: no IPv6 routers present
eth0: link down.
CE: hpet increasing min_delta_ns to 15000 nsec
ath5k phy0: gain calibration timeout (2412MHz)
ath5k phy0: gain calibration timeout (2417MHz)
ath5k phy0: gain calibration timeout (2412MHz)
ath5k phy0: gain calibration timeout (2422MHz)
ath5k phy0: gain calibration timeout (2412MHz)
ath5k phy0: gain calibration timeout (2427MHz)
ath5k phy0: gain calibration timeout (2412MHz)
ath5k phy0: gain calibration timeout (2432MHz)
ath5k phy0: gain calibration timeout (2412MHz)
ath5k phy0: gain calibration timeout (2437MHz)
net_ratelimit: 10 callbacks suppressed
ath5k phy0: gain calibration timeout (2412MHz)
ath5k phy0: gain calibration timeout (2467MHz)
ath5k phy0: gain calibration timeout (2412MHz)
ath5k phy0: gain calibration timeout (2472MHz)
ath5k phy0: gain calibration timeout (2412MHz)
No probe response from AP 00:1d:6a:9b:6f:a0 after 500ms, disconnecting.
Comment 13 Aleksey Kamenskikh 2010-04-12 08:58:56 UTC
I have the same problem but it triggers on resume my Asus F3KA laptop from suspend to RAM. Tested with 2.6.33 and 2.6.33.2 kernels (using gentoo-sources).
Comment 14 Bob Copeland 2010-04-12 11:42:59 UTC
Any chance one of you can attempt a bisect to pin down the patch?  It sounds like a fairly small range of kernels that are affected.
Comment 15 Aleksey Kamenskikh 2010-04-12 13:11:59 UTC
Created attachment 25966 [details]
dmesg output on resume from STR

If I stop wlan0 interface before goinig into suspend to RAM, then wlan0 works fine after resume. But if I have not stopped wlan0 interface before suspend then dmesg reports that a new device phy1 registered, but it was phy0.
Comment 16 registosites1 2010-04-12 15:27:35 UTC
I have found a workaround for this, until the problem is found it seems that using phc-k8 [1] instead of powernow_k8 allows me to have a fully working wireless card and cpu frequency scaling.

As for the bisect, I'll give it a go in the weekend, still need to learn how to do it and time to test it, I'll post back when I find something.

[1] http://www.linux-phc.org/
Comment 17 John W. Linville 2010-08-18 19:06:05 UTC
Any word on this bisection?  Comment 16 suggests to me that the problem is somewhere other than ath5k...
Comment 18 John W. Linville 2010-10-04 19:31:34 UTC
Closing due to lack of response -- please reopen if/when requested information becomes available...