Latest working kernel version: n/a Earliest failing kernel version: up to now Distribution: fedora, vanilla Hardware Environment: 08:04.0 Ethernet controller: Atheros Communications Inc. AR2413 802.11bg NIC (rev 01) Subsystem: AMBIT Microsystem Corp. Unknown device 0418 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 21 Region 0: Memory at c0200000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- Kernel driver in use: ath5k_pci Kernel modules: ath5k Software Environment: installed are system-config-network-tui-1.5.10-1.fc9.noarch system-config-network-1.5.10-1.fc9.noarch kdenetwork-4.1.3-1.fc9.i386 kdenetwork-libs-4.1.3-1.fc9.i386 NetworkManager-gnome-0.7.0-0.11.svn4022.4.fc9.i386 NetworkManager-0.7.0-0.11.svn4022.4.fc9.i386 NetworkManager-glib-0.7.0-0.11.svn4022.4.fc9.i386 Problem Description: I got thousands of messages like: ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: unable to reset hardware: -11 and after removing the driver i cannot reinsert it. only hardware restart helps. I'm still capable to make a wireless connection but if it is lost i need to restart the computer. the messages that i got are: ath5k_pci 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ath5k_pci 0000:08:04.0: registered as 'phy0' ath5k phy0: Atheros AR2413 chip found (MAC: 0x78, PHY: 0x45) device-mapper: multipath: version 1.0.5 loaded ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: noise floor calibration timeout (2437MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2437 Mhz) ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2412 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2422 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: can't reset hardware (-11) ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: can't reset hardware (-11) ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2412 Mhz) ath5k phy0: gain calibration timeout (2417MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2417 Mhz) ath5k phy0: noise floor calibration failed (2417MHz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2422 Mhz) ath5k phy0: gain calibration timeout (2427MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2427 Mhz) ath5k phy0: gain calibration timeout (2432MHz) ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2412 Mhz) ath5k phy0: gain calibration timeout (2417MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2417 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2422 Mhz) ath5k phy0: gain calibration timeout (2427MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2427 Mhz) ath5k phy0: gain calibration timeout (2432MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2432 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: can't reset hardware (-11) ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2412 Mhz) ath5k phy0: gain calibration timeout (2417MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2417 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2422 Mhz) ath5k phy0: gain calibration timeout (2427MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2427 Mhz) ath5k phy0: gain calibration timeout (2462MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2462 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2422 Mhz) ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2412 Mhz) ath5k phy0: gain calibration timeout (2417MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2417 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2422 Mhz) ath5k phy0: gain calibration timeout (2427MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2427 Mhz) ath5k phy0: gain calibration timeout (2432MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2432 Mhz) ath5k phy0: gain calibration timeout (2437MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2437 Mhz) ath5k phy0: gain calibration timeout (2442MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2442 Mhz) ath5k phy0: gain calibration timeout (2447MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2447 Mhz) ath5k phy0: gain calibration timeout (2452MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2452 Mhz) ath5k phy0: gain calibration timeout (2457MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2457 Mhz) ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2412 Mhz) ath5k phy0: gain calibration timeout (2417MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2417 Mhz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2422 Mhz) ath5k phy0: gain calibration timeout (2427MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2427 Mhz) ath5k phy0: gain calibration timeout (2432MHz) ath5k phy0: ath5k_chan_set: unable to reset channel (2432 Mhz) ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: unable to reset hardware: -11 ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: unable to reset hardware: -11 ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: unable to reset hardware: -11 ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: unable to reset hardware: -11 ath5k_pci 0000:08:04.0: PCI INT A disabled ath5k_pci 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ath5k_pci 0000:08:04.0: registered as 'phy1' ath5k phy1: Atheros AR2413 chip found (MAC: 0x78, PHY: 0x45) ath5k phy1: gain calibration timeout (2412MHz) ath5k phy1: unable to reset hardware: -11 ath5k phy1: gain calibration timeout (2412MHz) ath5k phy1: unable to reset hardware: -11 ath5k phy1: gain calibration timeout (2412MHz) ath5k phy1: unable to reset hardware: -11 ath5k_pci 0000:08:04.0: PCI INT A disabled ath5k_pci 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ath5k_pci 0000:08:04.0: registered as 'phy2' ath5k phy2: gain calibration timeout (2412MHz) ath5k phy2: unable to reset hardware: -11 ath5k phy2: Atheros AR2413 chip found (MAC: 0x78, PHY: 0x45) ath5k phy2: gain calibration timeout (2412MHz) ath5k phy2: unable to reset hardware: -11 ath5k phy2: gain calibration timeout (2412MHz) ath5k phy2: unable to reset hardware: -11 ath5k_pci 0000:08:04.0: PCI INT A disabled ath5k_pci 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ath5k_pci 0000:08:04.0: registered as 'phy3' ath5k phy3: Atheros AR2413 chip found (MAC: 0x78, PHY: 0x45) ath5k phy3: gain calibration timeout (2412MHz) ath5k phy3: unable to reset hardware: -11 ath5k phy3: gain calibration timeout (2412MHz) ath5k phy3: unable to reset hardware: -11 ath5k_pci 0000:08:04.0: PCI INT A disabled ath5k_pci 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ath5k_pci 0000:08:04.0: registered as 'phy4' ath5k phy4: Atheros AR2413 chip found (MAC: 0x78, PHY: 0x45) ath5k phy4: gain calibration timeout (2412MHz) ath5k phy4: unable to reset hardware: -11 ath5k phy4: gain calibration timeout (2412MHz) ath5k phy4: unable to reset hardware: -11 ath5k_pci 0000:08:04.0: PCI INT A disabled ath5k_pci 0000:08:04.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ath5k_pci 0000:08:04.0: registered as 'phy5' ath5k phy5: Atheros AR2413 chip found (MAC: 0x78, PHY: 0x45) ath5k phy5: gain calibration timeout (2412MHz) ath5k phy5: unable to reset hardware: -11 ath5k phy5: gain calibration timeout (2412MHz) ath5k phy5: unable to reset hardware: -11 Steps to reproduce: This is on notebook acer aspire 5051awxmi with atheros AR5BMB5 (this is what the label says). I think this is AR5005. madwifi works fine.
Please try compat-wireless to see if this is still an issue for you on wireless-testing: http://wireless.kernel.org/en/users/Download
This looks similar to what I'm seeing on the Acer Aspire One, using Ubuntu Intrepid, including the linux-backports-modules-intrepid package, which iiuc is a packaging of compat-wireless. For me it happens only occasionally on initial poweron, but somewhat more frequently after a reboot rather than power-cycle.
I've seen it for quite a while though it is much less frequent these days. A suspend-resume cycle will also work. "unable to reset hardware: -11" means the card is fully hung at that point and needs powering down. It would be helpful if you can consistently reproduce this on the latest compat-wireless.
Created attachment 19109 [details] dmesg with kernel panic backtrace I tried compat-wireless-02-12-2008 and this is the result from it: a kernel panic. at first it worked fine and gave me stronger wireless signal. but when i tried to remove the driver for a second time i got lots of kernel panics (see the file). and then i couldn't do anything else. I tried to reinsert the driver for two more times but without luck.
hey, it'more than 10 day and there is no message here. if i can somehow help, let me know.
The problem is, while I see it happen, I cannot consistently reproduce it. Are you able to? We can fix the backtraces but can't yet fix the underlying problem without knowing which sequence is hanging the card. By the way, AFAICT this is the same bug as http://bugzilla.kernel.org/show_bug.cgi?id=12068.
yes, the bug seem very similar to this one (he gets error -5 i got -11). I cannot reproduce it either. maybe it is something connected to the acpi system. maybe the acpi can try to wake up the card, or just prevent it from entering this sleep state. since the power reboot can correct this, then maybe you can try to make the card go throught a "fake" power reboot so that it can wake up. these are just ideas :)
Well, I guess one thing to try is debugging what actually causes the reset. That can be had by: $ echo intr > /debug/ath5k/phy0/debug $ echo beacon > /debug/ath5k/phy0/debug That will add a whole lot of debugging messages to the syslog, but it will tell us which interrupt preceeded the reset (probably INT_FATAL, which also won't tell much, but at least that would narrow things a bit). Or for less debugging output, put a printk directly in ath5k_intr() in the various cases that reschedule restq.
ok i can try it but where should i look for /debug/ath5k/phy07debug? Are you talking for the source code? there is nothing like this in /proc? if this is in the source how can i recompile ontly the ath5k module? I don't want to recompile the whole wireless-compat, because I don't need them and it takes about 15 mins.
You just need to be sure that ath5k is compiled with debug support, and mount debugfs somewhere (as root: mkdir /debug && mount -t debugfs none /debug). You can edit config.mk(?) to only build ath5k (and mac80211 etc) if you want to rebuild it. Without looking, I'm not 100% sure of the details wrt compat-wireless. Another thought occurs to me - we often hit this during a scan when mac80211 calls ->config() to set the channels. That could be a coincidence (since this just happens to invoke reset which could generally be broken), or it could be racing with other hardware code. I'm going to hack up some code to torture test reset, maybe I can get it in the bad state more reliably.
I'll try it with the wireless-compat-2008-12-10 and report back.
Also have encountered this bug, and did turn on debugging
Created attachment 19316 [details] debug messages 1 this is part of the dmesg that contains the debug messages. i tried it about 15 times untill i manage to 'lock' the card. this is on fedora-kernel-2.6.27.9-69.fc9.i686 with compat-wireless-2008-12-11. I've mark the places in the file that can be interested.
Created attachment 19317 [details] debug messages 2 here are other debug messages. it is not that easy to lock the card. I've marked some interesting places in the file. the message that looks strange to me is: wlan0: Failed to config new BSSID to the low-level driver after this i need to do a hardware reset. let me know if more info is needed.
Created attachment 19318 [details] card registers when locked these are the card registeres when it is 'locked'
Ok, well it's definitely doing a config() when setting bssid, then a scan happens. interrupt before that is TXDESC, looks harmless. I also changed calib_tim to 1 second and made it always reset; that seemed to worsen things to where I could frequently get lockup. It can spend 20s trying to calibrate the noise level so wouldn't be surprised if it races with itself in reset.
is there a position in the registers that defines when the card is working normally? if so then we can check for the corresponding flag before a reset and avoid the endless loop.
Is there something else I can do to help with the debuging or testing of this?
For now, no... I think it is clear that reset with config() is problematic, and with some changes I can reproduce it pretty frequently now. It might just be a matter of locking the sc mutex within config, or scheduling the changes to happen inside restq.
I was looking into the change log and saw this: commit 5a3503abfc5a2e51a27c0b28339e04b24cedad60 ath5k: Update interrupt masking code commit 994d90627030722ff38ef134907c7b3c7d3aebae ath5k: Clean up eeprom parsing and add missing calibration data commit 23c401574b16cb2b6d2231ba405ebf85b8c87de5 ath5k: ignore the return value of ath5k_hw_noise_floor_calibration maybe the last one can help to resolve the problem. if because of the noise-floor-calibration the card needs to call some reset function, then this commit make it ignore the message. and some new bits from the legacy HAL have been added. Are these in the compat-wireless-2008-12-20.tar.bz2 so that I can test them?
this is what I get with compat-wireless-2008-12-11 on 2.6.27.9-74.fc9.i686: ath5k phy0: noise floor calibration timeout (2422MHz) ---this message repeats 32x times------------- ath5k phy0: noise floor calibration timeout (2422MHz) ath5k phy0: failed to warm reset the MAC Chip ath5k phy0: can't reset hardware (-5) ath5k phy0: noise floor calibration timeout (2422MHz) ----this messages repeats 40x times------------- ath5k phy0: failed to warm reset the MAC Chip ath5k phy0: can't reset hardware (-5) This is a floud of calibration errors, but there are new ones: Failed to warm reset the Mac Chip
Indeed, I thought 23c401574b16cb2b6d2231ba405ebf85b8c87de5 had gone to 2.6.28 already. You can still get hangs with the card but then you usually get the -5 error.
ok, this means that the problem is still unsolved. how can i help? I'm not a programmer but not a newbie either. who should/will take care of this? is there a difference between error -11 and -5? if the problem is in the reset routine, maybe atheros should give a hint. I haven't experienced many problems with the madwifi driver based on the proprietary HAL.
I am working on it... there's no real difference, it comes down to trying to write to a register, trying to read back the result, then the card starts returning junk (all ones). We return different error codes based on when the read fails.
thank you for this. actually after a cold start up i got these: ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: can't reset hardware (-11) ath5k phy0: gain calibration timeout (2417MHz) ath5k phy0: can't reset hardware (-11) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: can't reset hardware (-11) ath5k phy0: gain calibration timeout (2427MHz) ath5k phy0: can't reset hardware (-11) ath5k phy0: gain calibration timeout (2432MHz) ath5k phy0: can't reset hardware (-11) I couldn't make any connection :) And there is something else: I have another problem, that at first looks to have nothing to do with the atheros card. sometimes my screen (!) locks up and only hardware reset helps. i filed bugs against the xserver, the ati driver etc but nothing helps (kernel logs and xserver logs show nothing). everything works fine only if i use the compat-wireless package or the madwifi packaga. with the madwifi I've never experienced such problems. for 2 days i updated from fedora kernel 2.6.27.9-69 to 2.6.27.9-74. and the problems occur only when i'm using the default (in fedora 9) aht5k driver. after recompiling ath5k for the lates kernel everything is fine (this occured everytime untill i managed to finish the recompilation and load the new ath5k). this sounds insane but through the last 4 months i've tried different kernels and ati-drives/xserver. while the ath5k driver is loaded i get my corrupted display in 75% of all cases. I'm almost 100% sure that this comes from the card because after inserting compat-wireless-2008-12-11 I've not experienced any lockups even with my old kernel. before this i got the lockups in almost 75% of all cases. Therefore i think that the current ath5k drivers hijacks some irqs in such a way that it totaly locks my mashine. is this possible? I'd like to enable full debuging for this but point it to a separate file so that i can easily post them here. how can i send the debug in a separate file?
Created attachment 19458 [details] atheros log messages I hope this can shed some light in this problem. This is from fedora 9 kernel 2.6.27.9-74 with compat-wireless-2008-12-21. actually the kernel initializes the card (at least some of the cfg80211 routines go through) and then the ath5k driver cannot set it. this is from a cold start up.
Created attachment 19475 [details] Use a spinlock around ath5k_hw_reset Please try this patch. I haven't had a chance to really test it other than to see that it didn't lock up immediately.
I did apply the patch against compat-wireless-2008-12-21 by running the following in the ath5k directory: patch -b < bobcopeland.diff I will give feedback soon. thank you,
I still haven't test it. My problem is that the lockup occurs randomly and the best way to reproduce it was to remove and reinsert the driver. But it didn't happen all the time. Sometimes i repeat this more than 15! times in order to lockup the card. So I'm not sure that I can properly test the patch. You said that after modifying something in the reset routine it was easier to lock the card (Comment #19). can you post what you modified so that i can directly force the card to reset itself or even locks itself up? With the compat-wireless-2008-12-21 it is a little bit harder to lockup the card. And the problems happens less frequently (less than 50%) than earlier. so it's not that easy to verify the patch. How can I force that card to reset itself?
Created attachment 19489 [details] ath5k debug messages debug=0x103f Here are some debug messages from the compat-wireless-2008-12-25 with the applied patch. The debug level was set to 0x103f. I've marked some places in the file that can be interested (here3 and here5). I removed and reinserted the driver 56! consequtive times and could not lockup the card. I think the patch works. I also got some messages in the log files that the AP couldn't authenticate the card. The AP disconnected the card 3 times but it didn't lockup and made the connection. I think the disconnect was because the card wasn't ready and couldn't transmit correctly the key. But as I said no problems so far. If there is a way to force it to lockup let me know.
Ok thanks for testing. I had changed ath5k_calinterval to a lower number to make the lockup happen sooner. The other thing you can try is applying the patch on top of vanilla 2.6.28. I did see one hard lockup but I'm not sure if it was related to the patch or not, so I'll post the patch to ath5k-devel for more testing just to be sure.
As far as I am concerned, the patch seems to work pretty good. With compat-wireless-2008-12-21 I did observed frequent lock ups when waking from hibernation. After hibernation, I sometimes had no wireless connection anymore and the error "can not reset hardware (-11)" appeared in syslog. Since I applied the patch, I did approximatively 15 cycles (hibernation - wake up) and did not observed a single lock up. If I can do any ohter test, I would be glad to help. Thanks,
Created attachment 19497 [details] ath5k debug messages2 debug=0x0033 I set ath5k_calinterval=2 (I think this is low enough) and made 19 cycles (remove -> insert the driver). No lockups so far. The debug messages (debug=0x0033) are attached. I think the patch works fine. What about the ath5k_calinterval? With the value set to 2 I couldn't get neither the -5 error nor the -11. Maybe the value should be lowered.
(In reply to comment #31) > > The other thing you can try is applying the patch on top of vanilla 2.6.28. > I just had to change the offsets for the vanilla 2.6.28. That's enough. I haven't recompiled the kernel, though, because it takes me about 40 min but the patch works with it. Here are the new offsets: @@ -525,6 +525,7 @@ ath5k_pci_probe(struct pci_dev *pdev, @@ -2664,6 +2665,7 @@ ath5k_reset(struct ath5k_softc *sc, bool stop, bool change_channel) @@ -2672,7 +2672,11 @@ ath5k_reset(struct ath5k_softc *sc, bool stop, bool change_channel) @@ -152,6 +152,7 @@ struct ath5k_softc { @@ -1620,7 +1620,7 @@ int ath5k_hw_rfregs(struct ath5k_hw *ah, struct ieee80211_channel *channel,
I the last 3-4 days I've experienced different situations that earlier always resulted in the card locking up inself. With the patch this doesn't happen. I managed to make connection in all the cases. Only once I had to reinsert the driver but still no lockup occurred. I've disabled the debugging and therefore cannot attach any logs. But as i said the patch is definitely (at least my case) working. Thank you Bob. Can you post the patch in http://bugzilla.kernel.org/show_bug.cgi?id=12068 ? Maybe this help in their case, too.
Ok thanks for testing. Good idea, I'll post it on the other bug too. I also posted it on ath5k-devel list a few days ago, so far no reports but I'll submit it for upstream in a week or so unless problems arise.
Any news if this has already entered upstream/mainline?
Sorry, I still haven't sent it yet. I can reliably lock up my machine (by setting cal_interval to 1 and making it always reset the card). I spent some time chasing that with netconsole and nmi watchdog but to little avail. I want it to see some wider testing so perhaps now that the merge window is closed.
Created attachment 20240 [details] fixed patch for latest compat-wireless This patch has been working for last month but I got an error with the lastest compat-wireless-20090213. This is the fixed version. Any plans to queue this for stable? As long as cal_interval is > 1 it solves the problem. In case it is set to 1 all of us will got this reset lock anyway. Until then it is a workable solution.
*** Bug 12849 has been marked as a duplicate of this bug. ***
Hey, Bob! What's the story on this? Do we need this patch?
I don't think so. In 2.6.30 we completely rewrote the reset stuff, and it seems to have largely solved these sorts of issues (I for one haven't seen them in a long time). Consequently I didn't really want to submit this patch because it is a bit cargo-cultish, and I could make the kernel hang with it if I tried hard. Though if anyone can still get these errors in 2.6.30 then I can forward-port it and put some time back into finding the lockup.
Sure, fine...Joshua, is this working for you on 2.6.30?
I'm still stuck with 2.6.27.25 because i don't have time to update. I've applied the patch ever since it was posted. I think you can close this, if the whole code has been rewritten. I'll reopen it if I see this again.