Kernel Bug Tracker – Bug 12068
ath5k: card eventually fails to function, spams dmesg with can't reset hardware errors
Last modified: 2015-02-19 15:16:32 UTC
Latest working kernel version: [none]
Earliest failing kernel version: 2.6.27 [tested with .6 as well]
Hardware Environment: ThinkPad T61p
Software Environment: Ubuntu 8.10 x86_64
Eventually [there is no evident deterministic time period], my wireless card stops working under Linux. dmesg is flooded alternately with WARNINGs from warn_on_slowpath in rate_control_pid_get_rate and the following:
ath5k phy0: failed to reset the MAC Chip
ath5k phy0: can't reset hardware (-5)
Attempting to reload the driver does not help - I just get the above printout again.
Sometimes, when I first boot the machine, it outputs the above messages and then the card recovers, and I can connect to wireless networks - but if this behavior occurs once I've already connected to a wireless network, it will not recover, and will remain this way across driver reloads.
Steps to reproduce:
1) Have an ath5k miniPCI wireless card in my T61p.
2) Load the ath5k driver [in any version I've ever tried - I first tried the ath5k driver in 2.6.24 with the compat-wireless-old snapshots, and it also had this problem].
3) Try connecting to some wireless networks - eventually, it will stop working, and you will need to reboot.
This is fixed in wireless-testing:
Author: Bob Copeland <email@example.com>
Date: Mon Nov 3 22:14:00 2008 -0500
ath5k: correct handling of rx status fields
ath5k_rx_status fields rs_antenna and rs_more are u8s, but we
were setting them with bitwise ANDs of 32-bit values.
As a consequence, jumbo frames would not be discarded as intended.
Then, because the hw rate value of such frames is zero, and, since
"ath5k: rates cleanup", we do not fall back to the basic rate, such
packets would trigger the following WARN_ON:
------------[ cut here ]------------
WARNING: at net/mac80211/rx.c:2192 __ieee80211_rx+0x4d/0x57e [mac80211]()
Modules linked in: ath5k af_packet sha256_generic aes_i586 aes_generic cbc loop i915 drm binfmt_misc acpi_cpu
Pid: 0, comm: swapper Tainted: G W 2.6.28-rc2-wl #14
[<c012005d>] ? sched_debug_show+0x31e/0x9c6
[<c012489f>] ? vprintk+0x369/0x389
[<c0309539>] ? _spin_unlock_irqrestore+0x54/0x58
[<c011cd8f>] ? try_to_wake_up+0x14f/0x15a
[<f81918cb>] __ieee80211_rx+0x4d/0x57e [mac80211]
[<f828872a>] ath5k_tasklet_rx+0x5a1/0x5e4 [ath5k]
[<c013b9cd>] ? clockevents_program_event+0xd4/0xe3
[<f80e934a>] ? acpi_idle_enter_bm+0x2ad/0x31b [processor]
Signed-off-by: Bob Copeland <firstname.lastname@example.org>
Signed-off-by: John W. Linville <email@example.com>
This just need to be propagated to 2.6.27 if not already done.
I'm not so sure - can you test the patch and report if it helps (it should apply to .27 with fuzz). I didn't send it for 2.6.27 since it was a rate cleanup in .28-rc1 that started the above warning. However if we report packets with hwrate=1 for jumbo frames then it could still trigger a separate warning in the pid rate controller.
Patch is here:
I just pulled latest wireless-testing and ran it - it fixes can't reset hardware (as far as I've seen), but I still get warn_on_slowpath.
<IRQ> [<ffffffff8024e9b4>] warn_on_slowpath+0x64/0x90
[<ffffffff802e1d4b>] ? __slab_alloc+0x24b/0x260
[<ffffffff803b353c>] ? map_single+0x1fc/0x280
[<ffffffff80234059>] ? __phys_addr+0x9/0x50
[<ffffffff803b383d>] ? swiotlb_map_single_attrs+0x6d/0xf0
[<ffffffff80232b80>] ? swiotlb_map_single_phys+0x0/0x20
[<ffffffffa02b0669>] lbm_cw___ieee80211_rx+0x1c9/0x630 [lbm_cw_mac80211]
[<ffffffff803b31f0>] ? unmap_single+0x140/0x160
[<ffffffffa02f0c6d>] ath5k_tasklet_rx+0x34d/0x5c0 [ath5k]
<EOI> [<ffffffffa003ac09>] ? acpi_idle_enter_bm+0x287/0x2d7 [processor]
[<ffffffffa003ac01>] ? acpi_idle_enter_bm+0x27f/0x2d7 [processor]
[<ffffffff80272e39>] ? tick_nohz_get_sleep_length+0x9/0x30
[<ffffffff8044bb49>] ? cpuidle_idle_call+0xb9/0x100
[<ffffffff80210e95>] ? cpu_idle+0x75/0x110
[<ffffffff804f0536>] ? rest_init+0x66/0x70
This is different, re-opening the bug.
No, no, I lied. It doesn't fix can't reset hardware - it even spits out different kinds of can't reset hardware. Great.
Dec 10 11:59:26 eris kernel: [17580.657110] ath5k phy0: can't reset hardware (-11)
Dec 10 11:59:26 eris kernel: [17580.657137] ------------[ cut here ]------------
Dec 10 11:59:26 eris kernel: [17580.657143] WARNING: at /home/rich/linux-backports-modules-2.6.27-2.6.27/debian/build/build-generic/compat-wi
reless-2.6/net/mac80211/main.c:227 ieee80211_hw_config+0xac/0xb0 [lbm_cw_mac80211]()
Dec 10 11:59:26 eris kernel: [17580.657151] Modules linked in: aes_x86_64 aes_generic binfmt_misc af_packet rfcomm bridge stp bnep sco l2cap
bluetooth kvm_intel kvm kqemu ppdev autofs4 ipv6 acpi_cpufreq cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative cpufreq_
ondemand freq_table container pci_slot sbs sbshc iptable_filter ip_tables x_tables uvesafb sbp2 parport_pc lp parport pcmcia joydev thinkpad_
acpi rfkill arc4 nvram sdhci_pci snd_hda_intel ecb crypto_blkcipher snd_pcm_oss sdhci psmouse snd_mixer_oss evdev serio_raw yenta_socket mmc_
core snd_pcm rsrc_nonstatic pcmcia_core ricoh_mmc pcspkr ath5k iTCO_wdt iTCO_vendor_support snd_seq_dummy lbm_cw_mac80211 nvidia(P) snd_seq_o
ss battery lbm_cw_cfg80211 ac i2c_core led_class snd_seq_midi video output snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device wm
i snd soundcore button snd_page_alloc shpchp pci_hotplug intel_agp ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_piix ata_generic ah
ci pata_acpi libata scsi_mod ohci1394
Dec 10 11:59:26 eris kernel: ieee1394 uhci_hcd ehci_hcd e1000e usbcore dock thermal processor fan fbcon tileblit font bitblit softcursor fuse
Dec 10 11:59:26 eris kernel: [17580.657353] Pid: 7541, comm: wpa_supplicant Tainted: P W 2.6.27-9-generic #1
Dec 10 11:59:26 eris kernel: [17580.657358]
Dec 10 11:59:26 eris kernel: [17580.657359] Call Trace:
Dec 10 11:59:26 eris kernel: [17580.657371] [<ffffffff8024e9b4>] warn_on_slowpath+0x64/0x90
Dec 10 11:59:26 eris kernel: [17580.657388] [<ffffffffa0a6d0bf>] ? ath5k_hw_reset+0xf1f/0x1150 [ath5k]
Dec 10 11:59:26 eris kernel: [17580.657397] [<ffffffff803a73d5>] ? __ratelimit+0xa5/0xf0
Dec 10 11:59:26 eris kernel: [17580.657404] [<ffffffff80471665>] ? net_ratelimit+0x15/0x20
Dec 10 11:59:26 eris kernel: [17580.657419] [<ffffffffa0a70292>] ? ath5k_reset+0x232/0x240 [ath5k]
Dec 10 11:59:26 eris kernel: [17580.657445] [<ffffffffa0a1540c>] ieee80211_hw_config+0xac/0xb0 [lbm_cw_mac80211]
Dec 10 11:59:26 eris kernel: [17580.657476] [<ffffffffa0a2ce0d>] ieee80211_set_freq+0x8d/0x90 [lbm_cw_mac80211]
Dec 10 11:59:26 eris kernel: [17580.657502] [<ffffffffa0a16135>] ieee80211_ioctl_siwfreq+0xf5/0x100 [lbm_cw_mac80211]
Dec 10 11:59:26 eris kernel: [17580.657512] [<ffffffff804ea6b6>] ioctl_standard_call+0x66/0xf0
Dec 10 11:59:26 eris kernel: [17580.657520] [<ffffffff804ea650>] ? ioctl_standard_call+0x0/0xf0
Dec 10 11:59:26 eris kernel: [17580.657528] [<ffffffff804ea250>] ? ioctl_private_call+0x0/0xb0
Dec 10 11:59:26 eris kernel: [17580.657535] [<ffffffff804ea8d0>] wireless_process_ioctl+0xe0/0x170
Dec 10 11:59:26 eris kernel: [17580.657543] [<ffffffff804ea250>] ? ioctl_private_call+0x0/0xb0
Dec 10 11:59:26 eris kernel: [17580.657550] [<ffffffff804ea650>] ? ioctl_standard_call+0x0/0xf0
Dec 10 11:59:26 eris kernel: [17580.657557] [<ffffffff804ea9cf>] wext_ioctl_dispatch+0x6f/0xb0
Dec 10 11:59:26 eris kernel: [17580.657565] [<ffffffff804eab26>] wext_handle_ioctl+0x46/0x90
Dec 10 11:59:26 eris kernel: [17580.657573] [<ffffffff80466b2f>] dev_ioctl+0x3cf/0x410
Dec 10 11:59:26 eris kernel: [17580.657581] [<ffffffff802e97d9>] ? do_sync_write+0xf9/0x140
Dec 10 11:59:26 eris kernel: [17580.657589] [<ffffffff8026afaf>] ? hrtimer_start+0xdf/0x1b0
Dec 10 11:59:26 eris kernel: [17580.657596] [<ffffffff80454f01>] sock_ioctl+0x91/0x280
Dec 10 11:59:26 eris kernel: [17580.657604] [<ffffffff802f8586>] vfs_ioctl+0x36/0xb0
Dec 10 11:59:26 eris kernel: [17580.657610] [<ffffffff802f8883>] do_vfs_ioctl+0x283/0x2f0
Dec 10 11:59:26 eris kernel: [17580.657616] [<ffffffff802f8991>] sys_ioctl+0xa1/0xb0
Dec 10 11:59:26 eris kernel: [17580.657625] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
Dec 10 11:59:26 eris kernel: [17580.657630]
Dec 10 11:59:26 eris kernel: [17580.657634] ---[ end trace a6dd66994c3403c0 ]---
Dec 10 11:59:26 eris kernel: [17580.672186] wlan0: authenticate with AP 00:13:c3:59:48:91
Dec 10 11:59:26 eris kernel: [17580.702284] ath5k phy0: failed to wakeup the MAC Chip
Dec 10 11:59:26 eris kernel: [17580.702288] ath5k phy0: can't reset hardware (-5)
Yup I've seen this before. This is ieee80211_hw_config complaining because we return failure from ->config. I guess we should detect this condition then disable hardware. At least then it doesn't fill up syslog with a zillion errors. Fact is, once we're in this state there's not much to be done. Card is totally hung and only a power cycle (e.g. suspend/resume cycle or reboot) seems to bring it back.
*** Bug 12175 has been marked as a duplicate of this bug. ***
Created attachment 19574 [details]
add a reset lock
Please try this patch. It won't change the fact that mac80211 will spew errors if config fails for some reason, but it should fix the root cause of hanging the device.
Patch seems to have resolved it - I can't swear to it, since the problem is sporadic, but I definitely haven't seen it crop up since I compiled+installed the patched driver a few days ago.
What's the status of this bug. Is this still an issue with a recent kernel?
Works fine for me unpatched in 2.6.31. I think it worked in 2.6.28, but I can't swear to it at present.
Fedora 12, 220.127.116.11-174.fc12.i686
Bug is still exists in my system. I've recieve messages like this when downloading big files from internet:
Jan 12 00:57:56 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz)
Jan 12 00:58:46 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz)
Jan 12 01:01:04 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz)
Jan 12 01:01:06 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz)
Jan 12 01:01:27 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz)
Jan 12 01:01:28 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz)
At that moment, when this happens the system slows down terribly. And it affects the whole system. The program "top" can not show it. The mouse cursor begins to move jerkily. There is a strong decrease in connection speed (about 30-35KiB/s).
At this moment I can not show you debug log, because I need to recompile kernel.
I've got the same problem (also acer aspire) with 2.6.33:
(i've created another bug, because at the beggining it seemed to be destroying
I suffer the same symptoms in my LG E500. When the card eventually fails to function, I can just switch off the computer and switch it on again (rebooting does not solve the problem). Removing the module and loading again makes the system crazy (it eventually hard freezes).
Now I have compiled the ath5k module with debug support. Is there any specific debug info I could get to help solving this problem?
I have also seen that developers are sending lots of patches to ath5k-devel. How can I know if they are related with this bug? What's the best way to test them, should I apply the patch myself or do you maintain a repository with these patches already applied?
Thanks for your time.
2Cesar: Seems i have the same issue here:
To Thomas: It looks similar, although I wouldn't say it is exactly the same problem (at least, according to dmesg messages).
Hello, I have good news for ath5k (and bad news for my PC).
At least in my case, it seems to be a hardware problem, not a software problem.
I've been making some tests with Windows Vista, and I experience the same problems (after some hours working, the card stops working and I can only solve it by switching the computer off and on). So I wouldn't say it is an ath5k problem, but a hardware problem.
Cesar: And what if ath5k really corrupted the card? I've seen some EEPROM error in dmesg when those issues started. and there were more cases of really corrupted EEPROM in the past...
what card exactly do you have? (product, manufacturer and lspci -vv please)
I have same issues with Wistron CM9, but i guess that such popular card can't be so buggy...
Thomas: The card works fine for some hours before stops working... I don't think it would work at all if the EEPROM was permanently damaged. However, I'm not an expert at all in hardware nor drivers...
The card is an internal one, in an LG E500 laptop.
ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70)
03:00.0 Ethernet controller: Atheros Communications Inc. AR5001 Wireless Network Adapter [168c:001c] (rev 01)
Subsystem: Device 1a3b:1026
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at fe1f0000 (64-bit, non-prefetchable) [size=64K]
Capabilities:  Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities:  Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
Address: 00000000 Data: 0000
Capabilities:  Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities:  MSI-X: Enable- Mask- TabSize=1
Vector table: BAR=0 offset=00000000
PBA: BAR=0 offset=00000000
Capabilities:  Advanced Error Reporting <?>
Capabilities:  Virtual Channel <?>
Kernel driver in use: ath5k
Kernel modules: ath5k
Still occuring on the latest kernal.
00:09.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01)
Subsystem: Netgear Device 5e00
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at d4800000 (32-bit, non-prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: ath5k
Kernel modules: ath5k
Linux nautilus 3.3.2-1.fc16.i686 #1 SMP Sat Apr 14 01:11:09 UTC 2012 i686 i686 i386 GNU/Linux
This is on an internal pci card.
If you need any additional information or traces let me know.
This bug is old. I would recommend testing on
a newer kernel , the newest as of this writing
This bug relates to a very old kernel. Closing as obsolete.