Bug 12068
Summary: | ath5k: card eventually fails to function, spams dmesg with can't reset hardware errors | ||
---|---|---|---|
Product: | Networking | Reporter: | Rich Ercolani (rercola) |
Component: | Wireless | Assignee: | Luis Chamberlain (mcgrof) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, antoine, bem, chrisahrendt, dispiste, erik.andren, harviecz, mcgrof, me, mozilla_bugs, vitaliy.tokarev, xerofoify |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.3.2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | add a reset lock |
Description
Rich Ercolani
2008-11-19 16:39:37 UTC
This is fixed in wireless-testing: Author: Bob Copeland <me@bobcopeland.com> Date: Mon Nov 3 22:14:00 2008 -0500 ath5k: correct handling of rx status fields ath5k_rx_status fields rs_antenna and rs_more are u8s, but we were setting them with bitwise ANDs of 32-bit values. As a consequence, jumbo frames would not be discarded as intended. Then, because the hw rate value of such frames is zero, and, since "ath5k: rates cleanup", we do not fall back to the basic rate, such packets would trigger the following WARN_ON: ------------[ cut here ]------------ WARNING: at net/mac80211/rx.c:2192 __ieee80211_rx+0x4d/0x57e [mac80211]() Modules linked in: ath5k af_packet sha256_generic aes_i586 aes_generic cbc loop i915 drm binfmt_misc acpi_cpu Pid: 0, comm: swapper Tainted: G W 2.6.28-rc2-wl #14 Call Trace: [<c0123d1e>] warn_on_slowpath+0x41/0x5b [<c012005d>] ? sched_debug_show+0x31e/0x9c6 [<c012489f>] ? vprintk+0x369/0x389 [<c0309539>] ? _spin_unlock_irqrestore+0x54/0x58 [<c011cd8f>] ? try_to_wake_up+0x14f/0x15a [<f81918cb>] __ieee80211_rx+0x4d/0x57e [mac80211] [<f828872a>] ath5k_tasklet_rx+0x5a1/0x5e4 [ath5k] [<c013b9cd>] ? clockevents_program_event+0xd4/0xe3 [<c01283a9>] tasklet_action+0x94/0xfd [<c0127d19>] __do_softirq+0x8c/0x13e [<c0127e04>] do_softirq+0x39/0x55 [<c0128082>] irq_exit+0x46/0x85 [<c010576c>] do_IRQ+0x9a/0xb2 [<c010461c>] common_interrupt+0x28/0x30 [<f80e934a>] ? acpi_idle_enter_bm+0x2ad/0x31b [processor] [<c02976bf>] cpuidle_idle_call+0x65/0x9a [<c010262c>] cpu_idle+0x76/0xa6 [<c02fb402>] rest_init+0x62/0x64 Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> This just need to be propagated to 2.6.27 if not already done. I'm not so sure - can you test the patch and report if it helps (it should apply to .27 with fuzz). I didn't send it for 2.6.27 since it was a rate cleanup in .28-rc1 that started the above warning. However if we report packets with hwrate=1 for jumbo frames then it could still trigger a separate warning in the pid rate controller. Patch is here: http://bugzilla.kernel.org/show_bug.cgi?id=11901 I just pulled latest wireless-testing and ran it - it fixes can't reset hardware (as far as I've seen), but I still get warn_on_slowpath. Call Trace: <IRQ> [<ffffffff8024e9b4>] warn_on_slowpath+0x64/0x90 [<ffffffff802e1d4b>] ? __slab_alloc+0x24b/0x260 [<ffffffff803b353c>] ? map_single+0x1fc/0x280 [<ffffffff80234059>] ? __phys_addr+0x9/0x50 [<ffffffff803b383d>] ? swiotlb_map_single_attrs+0x6d/0xf0 [<ffffffff80232b80>] ? swiotlb_map_single_phys+0x0/0x20 [<ffffffffa02b0669>] lbm_cw___ieee80211_rx+0x1c9/0x630 [lbm_cw_mac80211] [<ffffffff803b31f0>] ? unmap_single+0x140/0x160 [<ffffffffa02f0c6d>] ath5k_tasklet_rx+0x34d/0x5c0 [ath5k] [<ffffffff802546c6>] tasklet_action+0x86/0x110 [<ffffffff80254d8c>] __do_softirq+0x8c/0x100 [<ffffffff8021417c>] call_softirq+0x1c/0x30 [<ffffffff80215875>] do_softirq+0x65/0xa0 [<ffffffff80254af5>] irq_exit+0x95/0xa0 [<ffffffff80215b1b>] do_IRQ+0x8b/0x100 [<ffffffff80212f0e>] ret_from_intr+0x0/0x29 <EOI> [<ffffffffa003ac09>] ? acpi_idle_enter_bm+0x287/0x2d7 [processor] [<ffffffffa003ac01>] ? acpi_idle_enter_bm+0x27f/0x2d7 [processor] [<ffffffff80272e39>] ? tick_nohz_get_sleep_length+0x9/0x30 [<ffffffff8044bb49>] ? cpuidle_idle_call+0xb9/0x100 [<ffffffff80210e95>] ? cpu_idle+0x75/0x110 [<ffffffff804f0536>] ? rest_init+0x66/0x70 This is different, re-opening the bug. No, no, I lied. It doesn't fix can't reset hardware - it even spits out different kinds of can't reset hardware. Great. Dec 10 11:59:26 eris kernel: [17580.657110] ath5k phy0: can't reset hardware (-11) Dec 10 11:59:26 eris kernel: [17580.657137] ------------[ cut here ]------------ Dec 10 11:59:26 eris kernel: [17580.657143] WARNING: at /home/rich/linux-backports-modules-2.6.27-2.6.27/debian/build/build-generic/compat-wi reless-2.6/net/mac80211/main.c:227 ieee80211_hw_config+0xac/0xb0 [lbm_cw_mac80211]() Dec 10 11:59:26 eris kernel: [17580.657151] Modules linked in: aes_x86_64 aes_generic binfmt_misc af_packet rfcomm bridge stp bnep sco l2cap bluetooth kvm_intel kvm kqemu ppdev autofs4 ipv6 acpi_cpufreq cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative cpufreq_ ondemand freq_table container pci_slot sbs sbshc iptable_filter ip_tables x_tables uvesafb sbp2 parport_pc lp parport pcmcia joydev thinkpad_ acpi rfkill arc4 nvram sdhci_pci snd_hda_intel ecb crypto_blkcipher snd_pcm_oss sdhci psmouse snd_mixer_oss evdev serio_raw yenta_socket mmc_ core snd_pcm rsrc_nonstatic pcmcia_core ricoh_mmc pcspkr ath5k iTCO_wdt iTCO_vendor_support snd_seq_dummy lbm_cw_mac80211 nvidia(P) snd_seq_o ss battery lbm_cw_cfg80211 ac i2c_core led_class snd_seq_midi video output snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device wm i snd soundcore button snd_page_alloc shpchp pci_hotplug intel_agp ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_piix ata_generic ah ci pata_acpi libata scsi_mod ohci1394 Dec 10 11:59:26 eris kernel: ieee1394 uhci_hcd ehci_hcd e1000e usbcore dock thermal processor fan fbcon tileblit font bitblit softcursor fuse Dec 10 11:59:26 eris kernel: [17580.657353] Pid: 7541, comm: wpa_supplicant Tainted: P W 2.6.27-9-generic #1 Dec 10 11:59:26 eris kernel: [17580.657358] Dec 10 11:59:26 eris kernel: [17580.657359] Call Trace: Dec 10 11:59:26 eris kernel: [17580.657371] [<ffffffff8024e9b4>] warn_on_slowpath+0x64/0x90 Dec 10 11:59:26 eris kernel: [17580.657388] [<ffffffffa0a6d0bf>] ? ath5k_hw_reset+0xf1f/0x1150 [ath5k] Dec 10 11:59:26 eris kernel: [17580.657397] [<ffffffff803a73d5>] ? __ratelimit+0xa5/0xf0 Dec 10 11:59:26 eris kernel: [17580.657404] [<ffffffff80471665>] ? net_ratelimit+0x15/0x20 Dec 10 11:59:26 eris kernel: [17580.657419] [<ffffffffa0a70292>] ? ath5k_reset+0x232/0x240 [ath5k] Dec 10 11:59:26 eris kernel: [17580.657445] [<ffffffffa0a1540c>] ieee80211_hw_config+0xac/0xb0 [lbm_cw_mac80211] Dec 10 11:59:26 eris kernel: [17580.657476] [<ffffffffa0a2ce0d>] ieee80211_set_freq+0x8d/0x90 [lbm_cw_mac80211] Dec 10 11:59:26 eris kernel: [17580.657502] [<ffffffffa0a16135>] ieee80211_ioctl_siwfreq+0xf5/0x100 [lbm_cw_mac80211] Dec 10 11:59:26 eris kernel: [17580.657512] [<ffffffff804ea6b6>] ioctl_standard_call+0x66/0xf0 Dec 10 11:59:26 eris kernel: [17580.657520] [<ffffffff804ea650>] ? ioctl_standard_call+0x0/0xf0 Dec 10 11:59:26 eris kernel: [17580.657528] [<ffffffff804ea250>] ? ioctl_private_call+0x0/0xb0 Dec 10 11:59:26 eris kernel: [17580.657535] [<ffffffff804ea8d0>] wireless_process_ioctl+0xe0/0x170 Dec 10 11:59:26 eris kernel: [17580.657543] [<ffffffff804ea250>] ? ioctl_private_call+0x0/0xb0 Dec 10 11:59:26 eris kernel: [17580.657550] [<ffffffff804ea650>] ? ioctl_standard_call+0x0/0xf0 Dec 10 11:59:26 eris kernel: [17580.657557] [<ffffffff804ea9cf>] wext_ioctl_dispatch+0x6f/0xb0 Dec 10 11:59:26 eris kernel: [17580.657565] [<ffffffff804eab26>] wext_handle_ioctl+0x46/0x90 Dec 10 11:59:26 eris kernel: [17580.657573] [<ffffffff80466b2f>] dev_ioctl+0x3cf/0x410 Dec 10 11:59:26 eris kernel: [17580.657581] [<ffffffff802e97d9>] ? do_sync_write+0xf9/0x140 Dec 10 11:59:26 eris kernel: [17580.657589] [<ffffffff8026afaf>] ? hrtimer_start+0xdf/0x1b0 Dec 10 11:59:26 eris kernel: [17580.657596] [<ffffffff80454f01>] sock_ioctl+0x91/0x280 Dec 10 11:59:26 eris kernel: [17580.657604] [<ffffffff802f8586>] vfs_ioctl+0x36/0xb0 Dec 10 11:59:26 eris kernel: [17580.657610] [<ffffffff802f8883>] do_vfs_ioctl+0x283/0x2f0 Dec 10 11:59:26 eris kernel: [17580.657616] [<ffffffff802f8991>] sys_ioctl+0xa1/0xb0 Dec 10 11:59:26 eris kernel: [17580.657625] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b Dec 10 11:59:26 eris kernel: [17580.657630] Dec 10 11:59:26 eris kernel: [17580.657634] ---[ end trace a6dd66994c3403c0 ]--- Dec 10 11:59:26 eris kernel: [17580.672186] wlan0: authenticate with AP 00:13:c3:59:48:91 Dec 10 11:59:26 eris kernel: [17580.702284] ath5k phy0: failed to wakeup the MAC Chip Dec 10 11:59:26 eris kernel: [17580.702288] ath5k phy0: can't reset hardware (-5) Yup I've seen this before. This is ieee80211_hw_config complaining because we return failure from ->config. I guess we should detect this condition then disable hardware. At least then it doesn't fill up syslog with a zillion errors. Fact is, once we're in this state there's not much to be done. Card is totally hung and only a power cycle (e.g. suspend/resume cycle or reboot) seems to bring it back. *** Bug 12175 has been marked as a duplicate of this bug. *** Created attachment 19574 [details]
add a reset lock
Please try this patch. It won't change the fact that mac80211 will spew errors if config fails for some reason, but it should fix the root cause of hanging the device.
Patch seems to have resolved it - I can't swear to it, since the problem is sporadic, but I definitely haven't seen it crop up since I compiled+installed the patched driver a few days ago. What's the status of this bug. Is this still an issue with a recent kernel? Works fine for me unpatched in 2.6.31. I think it worked in 2.6.28, but I can't swear to it at present. Fedora 12, 2.6.31.9-174.fc12.i686 Bug is still exists in my system. I've recieve messages like this when downloading big files from internet: Jan 12 00:57:56 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz) Jan 12 00:58:46 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz) Jan 12 01:01:04 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz) Jan 12 01:01:06 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz) Jan 12 01:01:27 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz) Jan 12 01:01:28 localhost kernel: ath5k phy0: noise floor calibration timeout (2422MHz) At that moment, when this happens the system slows down terribly. And it affects the whole system. The program "top" can not show it. The mouse cursor begins to move jerkily. There is a strong decrease in connection speed (about 30-35KiB/s). At this moment I can not show you debug log, because I need to recompile kernel. I've got the same problem (also acer aspire) with 2.6.33: https://bugzilla.kernel.org/show_bug.cgi?id=ath5k-wakeup (i've created another bug, because at the beggining it seemed to be destroying EEPROM...) I suffer the same symptoms in my LG E500. When the card eventually fails to function, I can just switch off the computer and switch it on again (rebooting does not solve the problem). Removing the module and loading again makes the system crazy (it eventually hard freezes). Now I have compiled the ath5k module with debug support. Is there any specific debug info I could get to help solving this problem? I have also seen that developers are sending lots of patches to ath5k-devel. How can I know if they are related with this bug? What's the best way to test them, should I apply the patch myself or do you maintain a repository with these patches already applied? Thanks for your time. 2Cesar: Seems i have the same issue here: https://bugzilla.kernel.org/show_bug.cgi?id=15843 To Thomas: It looks similar, although I wouldn't say it is exactly the same problem (at least, according to dmesg messages). Hello, I have good news for ath5k (and bad news for my PC). At least in my case, it seems to be a hardware problem, not a software problem. I've been making some tests with Windows Vista, and I experience the same problems (after some hours working, the card stops working and I can only solve it by switching the computer off and on). So I wouldn't say it is an ath5k problem, but a hardware problem. Cesar: And what if ath5k really corrupted the card? I've seen some EEPROM error in dmesg when those issues started. and there were more cases of really corrupted EEPROM in the past... what card exactly do you have? (product, manufacturer and lspci -vv please) I have same issues with Wistron CM9, but i guess that such popular card can't be so buggy... Thomas: The card works fine for some hours before stops working... I don't think it would work at all if the EEPROM was permanently damaged. However, I'm not an expert at all in hardware nor drivers... The card is an internal one, in an LG E500 laptop. Dmesg says: ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70) lspci -vv: 03:00.0 Ethernet controller: Atheros Communications Inc. AR5001 Wireless Network Adapter [168c:001c] (rev 01) Subsystem: Device 1a3b:1026 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at fe1f0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable- Address: 00000000 Data: 0000 Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us ClockPM- Suprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [90] MSI-X: Enable- Mask- TabSize=1 Vector table: BAR=0 offset=00000000 PBA: BAR=0 offset=00000000 Capabilities: [100] Advanced Error Reporting <?> Capabilities: [140] Virtual Channel <?> Kernel driver in use: ath5k Kernel modules: ath5k Still occuring on the latest kernal. 00:09.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01) Subsystem: Netgear Device 5e00 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at d4800000 (32-bit, non-prefetchable) [size=64K] Capabilities: <access denied> Kernel driver in use: ath5k Kernel modules: ath5k umod -a Linux nautilus 3.3.2-1.fc16.i686 #1 SMP Sat Apr 14 01:11:09 UTC 2012 i686 i686 i386 GNU/Linux This is on an internal pci card. If you need any additional information or traces let me know. This bug is old. I would recommend testing on a newer kernel , the newest as of this writing is 3.15.1. Cheers Nick This bug relates to a very old kernel. Closing as obsolete. |