Bug 14561

Summary: ath5k card stops working after a while and requires reboot
Product: Networking Reporter: freeseek (giulio.genovese)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: RESOLVED DUPLICATE    
Severity: normal CC: gvlists, harviecz, linville, me, pavle.predic
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-14-generic from Ubuntu 9.04 Subsystem:
Regression: No Bisected commit-id:

Description freeseek 2009-11-08 19:53:54 UTC
This bug has been around at least since kernel version 2.6.28. I am using a NC20 laptop with an Atheros card, more precisely:
01:00.0 Ethernet controller: Atheros Communications Inc. AR5001 Wireless Network Adapter (rev 01)
Regularly, but more often when there is high transfer of files through the network, the wireless stops working. When that happens, the device wlan0 disappears and the following output shows up in dmesg:

wlan0: no probe response from AP 00:14:bf:d3:01:cf - disassociating
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
ath5k phy0: can't reset hardware (-5)
ath5k phy0: failed to wakeup the MAC Chip
__ratelimit: 14 callbacks suppressed

At that point, removing and inserting the ath5k module does not help, but if I try the following output shows up in dmesg:

cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
(2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
(2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
(5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
ath5k 0000:01:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
ath5k 0000:01:00.0: setting latency timer to 64
ath5k 0000:01:00.0: registered as 'phy0'
ath5k phy0: failed to wakeup the MAC Chip
ath5k 0000:01:00.0: PCI INT A disabled
ath5k: probe of 0000:01:00.0 failed with error -5

The only way I know how to get the wireless to work again is to reboot the whole system. Suspending and resuming also does not help. Other people with the same laptop experience the same problem. It would be nice if a more thorough debug was given, since presented like this the problem seems very similar to other bugs which I believe being unrelated.
Comment 1 Bob Copeland 2009-11-09 02:08:12 UTC
Thanks, I just got new hardware with which I can reproduce it after some time, but I haven't quite worked out a reliable way to make it fail.  Do you have any hints other than just transferring a lot of data?  Also can you post the line that looks like:

[   20.633526] ath5k phy0: Atheros AR5414 chip found (MAC: 0xa3, PHY: 0x61)
Comment 2 freeseek 2009-11-09 02:15:48 UTC
ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70)

I wish I had more insight on this. Transferring a lot of files seems to increase the chance of happening but it can happen also out of the blue when nothing is going on.
Comment 3 freeseek 2009-11-13 04:39:31 UTC
Today I reinstalled Ubuntu on the laptop and while downloading packages from the internet the driver has given up three times. So definitely copying large amount of files over the wireless triggers the driver to fail.
Comment 4 Pavle Predic 2009-11-21 02:24:42 UTC
I don't know if any of this helps, but here goes... I'm experiencing very similar issues on my MSI laptop with an Atheros AR5BXB63 chip. I didn't notice it has anything to do with downloading/uploading large files - in fact I just uploaded over 2G (within my LAN) without problems. It also doesn't have anything to do with uptime - sometimes it happens within a couple of hours of booting the system, sometimes after 5-6 hours of uptime. 

A couple of facts I find interesting:
- I once moved the HD from this laptop (MSI MS-6837D) to another (ASUS X50V) which also has an atheros chip (the system used the same module) - and it worked flawlessly! After a week or so, I returned the HD to MSI laptop and the problem reappeared.
- When this occurs, lspci doesn't list the device correctly, but instead gives an error that says somethig like: 'unknown header type 7f'.
- My laptop has a button for enabling/disabling wireless. And this is not an on/off button, but the one that cycles through different modes. Something tells me this is the key of the problem...
- When this happens the system becomes slow and unresponsive and it's rather difficult to control the mouse motion.
- Nothing helps but a reboot.
- I tried logging iwconfig output, and here's what I got at the moment of breakdown (note how the frequency changed even though I use a fixed value; also note how bit rate falls to 1Mb/s just before the failure):

[2009-11-20 22:50:52]
wlan0     IEEE 802.11bg  ESSID:"dlink"
          Mode:Managed  Frequency:2.412 GHz  Access Point: 00:1C:F0:F2:C4:6E
          Bit Rate=54 Mb/s   Tx-Power=27 dBm
          Retry min limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=47/70  Signal level=-63 dBm  Noise level=-96 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

[2009-11-20 22:50:53]
wlan0     IEEE 802.11bg  ESSID:"dlink"
          Mode:Managed  Frequency:2.412 GHz  Access Point: 00:1C:F0:F2:C4:6E
          Bit Rate=1 Mb/s   Tx-Power=27 dBm
          Retry min limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=47/70  Signal level=-63 dBm  Noise level=-96 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

[2009-11-20 22:50:54]
wlan0     IEEE 802.11bg  ESSID:"dlink"
          Mode:Managed  Frequency:2.412 GHz  Access Point: 00:1C:F0:F2:C4:6E
          Bit Rate=1 Mb/s   Tx-Power=27 dBm
          Retry min limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=47/70  Signal level=-63 dBm  Noise level=-96 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

[2009-11-20 22:50:55]
wlan0     IEEE 802.11bg  ESSID:"dlink"
          Mode:Managed  Frequency:2.427 GHz  Access Point: Not-Associated
          Tx-Power=27 dBm
          Retry min limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality:0  Signal level:0  Noise level:0
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0


Sorry for the rather long post. Hopefully someone will find it useful...
Comment 5 Geo 2010-02-02 20:12:22 UTC
I also experience this problem on Samsung NC20. Here are my observations. I looks like happening when I start more than a one application which requires WLAN. For example Kmail, Mozilla. Those applications does not require much bandwidth. I believe that it is more frequent after I installed frequency scaling monitor. Before installing frequency scaling monitor it used to have a more stable WLAN. Once the WLAN connection is lost, the network manager pops up asking me to enter the WLAN password. Even if I give the correct password it never connects. A reboot is required to sell WLAN connection again. Sometimes even after reboot I do not see any WLAN list. Today the connection broke while I was using Firefox and Kmail. I tried to close Firefox. It stuck somewhere. The screen also froze. I could not shut down the laptop. Ctrl+Alt+F1 gave me a terminal window. But I could not login. My login password kept echoing back on the screen. After a hard reboot WLAN did not start. After second reboot it started working and I write this. Is there any logs or traces I could generate to help you? Please let me know the commands.
Comment 6 John W. Linville 2010-03-01 19:02:51 UTC
Has everyone tested 2.6.33?  Before responding with "still broken", please make an effort to create a better description of the issue.  Include output from dmesg and /var/log/messages.  Even better, open a new bug that is specific to your problem -- this one seems to be a bit stale and seems to be drawing "me too" reports without specific details.
Comment 7 Geo 2010-03-05 02:42:41 UTC
No, I do not have 2.6.33. I have the current version of kernel 
from Ubuntu Jaunty  repository :Linux nc20 2.6.28-18-generic #59-
Ubuntu SMP Thu Jan 28 01:23:03 UTC 2010 i686 GNU/Linux
I use a Samsung NC20 netbook. Please see the var/log/messages and 
dmesg output attached. They are taken when WLAN networks were not 
visible on nm-applet. Is there anything else I could help with. I 
am not sure of trying 2.6.33 as it is not yet available for 
update.
Comment 8 John W. Linville 2010-03-05 03:42:23 UTC
Geo, I'm sorry but I just don't have time to spend on analyzing a kernel from over a year ago.  Please try 2.6.33 and if the problem persists then please provide new snapshots of the information requested.  Otherwise I simply cannot help you.
Comment 9 Pavle Predic 2010-03-06 01:04:21 UTC
I tried using ndiswrapper with this chip and had the same issues, so for all we know this might be just crappy hardware. This is the sort of message I would get in syslog while I was using ndiswrapper.
Mar  3 02:38:03 debian kernel: [ 4924.077406] ndiswrapper (mp_reset:64): wlan0 is being reset
Mar  3 02:38:07 debian kernel: [ 4928.099834] ndiswrapper (mp_reset:64): wlan0 is being reset

BTW, the model number of my atheros chip is AR5BXB63. Was just wondering if the other guys also have this chip or a different one?
Comment 10 Geo 2010-03-06 18:39:39 UTC
I have AR242x chip. I believe, many components of driver for 
different chips could be same.

lspci output
01:00.0 Ethernet controller: Atheros Communications Inc. AR242x 
802.11abg Wireless PCI Express Adapter (rev 01)
Comment 11 Geo 2010-03-06 20:17:24 UTC
John,

Thanks for the good work. I understand that it is an old release 
kernel. Ubuntu 9.04, i believe, is supported until Oct 2010. I 
update my netbook often and 2.6.28 is the latest kernel I could 
get. I try to follow the recommended update/upgrade procedure. 
Getting a network disconnection and having to reboot one or two 
times almost every day is a serious problem. You can see some 
more info at
 https://help.ubuntu.com/community/NC20#Wireless
It says "as of kernel 2.6.31-15, it has not been solved". I am 
not sure if it is a circular reference to this bug.
Regarding the upstreadm kernel even Lucid Lynx gives only 2.6.32 
not 2.6.33.

 I also do understand your frustration that kernel team may not 
be responsible for the Ubuntu distribution which packs an old 
kernel. Let us work together to make it better. Even a suggestion 
to restart the wireless manually without rebooting will be 
greatly appreciated by NC20 users. :-)

Personally, I am waiting to migrate to Lucid Lynx through Karmic 
Koala soon because it is long term supported. 

> --- Comment #8 from John W. Linville  
> Geo, I'm sorry but I just don't have
> time to spend on analyzing a kernel from over a year ago. 
<snip>
Comment 12 John W. Linville 2010-03-07 04:10:22 UTC
Geo, I'm not sure what you are asking.  I'm not an Ubuntu user -- I've never even used Debian very much.  I basically know nothing about Ubuntu's plans, policies, etc.  If you want Ubuntu-centric support then you need to get it from Canonical and/or the Ubuntu community.

That said, it is my understanding that there is some mechanism for Ubuntu users to get a "vanilla" kernel package installed with little or no trouble.  If you don't want to build your own kernel to test, then perhaps you can figure-out how to take advantage of that package?

In any case, there is really very little chance that I or anyone else can help you unless you can run modern kernels and test prospective patches.
Comment 13 Pavle Predic 2010-03-16 01:16:35 UTC
Good news. I upgraded to kernel 2.6.32 and tried the latest version of ndiswrapper (1.56) with the latest version of Windows XP drivers for my card - and it works (knock on wood). Over a week now without a glitch. It's quite likely that ath5 module works with 2.6.32 kernel, so this is something that you might want to test, Geo. In any case, there finally seems to be a solution to our problems. 

Geo, I am on Debian, and I found 2.6.32 in apt sources (not the stable repo, though), so I'm sure you can get it on Ubuntu.
Comment 14 Tomas Mudrunka 2010-05-03 23:51:40 UTC
I've got the same problem with 2.6.33:
https://bugzilla.kernel.org/show_bug.cgi?id=ath5k-wakeup
(i've created another bug, because at the beggining it seemed to be destroying EEPROM...)
Comment 15 John W. Linville 2010-08-13 16:02:23 UTC

*** This bug has been marked as a duplicate of bug 15843 ***
Comment 16 Geo 2010-08-16 00:55:39 UTC
Is this really a duplicate of 15843 which relates to wake up 
after sleep? For me it happens even without going to sleep. With 
Lucid upgrade(2.6.32-24-generic) it is less frequent though.
Comment 17 Tomas Mudrunka 2011-05-02 23:12:35 UTC
try adding nohz=off to kernel options in GRUB/LILO