Bug 16436
Summary: | ath5k (AR5001) does not work after resume and fails with "ath5k phy0: gain calibration timeout" | ||
---|---|---|---|
Product: | Networking | Reporter: | boris64 (bugzilla.kernel.org) |
Component: | Wireless | Assignee: | networking_wireless (networking_wireless) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, fransschreuder, izmmishao5, juho.kurki, linville, me, mickflemm, neo.tida, phomes, rui.zhang, sdr, trenn, zxcasdqwe |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 56331 | ||
Attachments: |
lspci -vv
dmesg-after-boot-working-wlan dmesg-after-resume-no-working-wlan ath_info 0xfebf0000 dmesg with 2.6.32-08666-g292e004 dmesg with 2.6.32-08667-g557a701 dmesg with 2.6.32-08667-g557a701 after shutdown The problem is still there for Linux 3.3.4 |
Created attachment 27197 [details]
dmesg-after-boot-working-wlan
Created attachment 27198 [details]
dmesg-after-resume-no-working-wlan
*** Bug 16435 has been marked as a duplicate of this bug. *** Bob & Nick, is there any more information boris64 can provide to make this report helpful? ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70) Your card is an AR2425, these cards are known to have problems during suspend/resume cycle (the weird thing is that other people report a failed warm reset, i haven't seen a failure on gain calibration). Check out this thread... http://www.spinics.net/lists/linux-wireless/msg51379.html Sorry for taking this long, but i'm on holiday. Sadly the posted patch (-> http://www.spinics.net/lists/linux-wireless/msg51379.html) doesn't apply anymore on a current kernel, so i can't test if disabling that L0s/L1 stuff will make my wireless adapter work as expected. Anything else i could do or try out? Do you guys some more infos? What kernel are you attempting to use to test the patch? I tried this patch on kernel-2.6.35.{1,2,3}. If i'm correct the patch made it into 2.6.35.4, which i just booted into. Well, guess what, it didn't work out. Same error message after suspend. Anything else i could try out? Any suggestions? Any more infos needed? Yes, it looks like an equivalent patch went into 2.6.35.4. Nick & Bob, any other thoughts? The kernel is built with CONFIG_PCIEASPM right? Otherwise the patch helps not at all. This should be ok, right? [config] .. CONFIG_PCIEASPM=y CONFIG_PCIEASPM_DEBUG is not set .. [/config] Is there any (other?) debug option i could turn on to see what's really going on? *sigh* Is there anything else i could do to help debug this issue? As the status of this bug is set to "NEEDINFO", what other infos do you guys need? Nick, what do you think? Maybe we should add more registers to the reg debug file and get a snapshot of these before/after suspend? Well this bug is not an easy one, it seems from your logs that card survives one resume (and connects fine to the AP) and fails on the second one with gain calibration timeout. This error indicates that the PHY is not properly initialized. First i want you to tell us if unloading and reloading the module makes any difference, then we are going for more hardcore stuff :-) Have in mind that we don't have any documentation on the PHY/RF parts and since this bug also happens on MadWiFi (uses a HAL that's very close to what Atheros uses on their drivers) we are on our own here... Theory 1: Seems that 2425 and 5424 (Swan and Condor) chips have some tweaks related to PCI-E operation that we don't use (MadWiFi also doesn't use them) but are included in HAL sources (and commented out). Maybe we should try them out and see if they work in your case. Try using any of them inside ath5k_hw_reset function (drivers/net/wireless/ath/ath5k/reset.c), after writing initvals. (Link goes to L1 when MAC goes to sleep and loopback the link down to reset) ath5k_hw_reg_write(ah, AR5K_PCIE_PM_CTL, AR5K_PCIE_PM_CTL_L1_WHEN_D2 | AR5K_PCIE_PM_CTL_LDRESET_EN); (Avoid skips before last TS2) AR5K_REG_ENABLE_BITS(ah, AR5K_PCIE_PM_CTL, AR5K_PCIE_PM_CTL_PSM_D2); (Assert power reset along with pci reset) AR5K_REG_DISABLE_BITS(ah, AR5K_PCIE_PM_CTL, AR5K_PCIE_PM_CTL_PSM_D1); And another one to be added after SERDES programming inside ath5k_hw_attach (drivers/net/wireless/ath/ath5k/attach.c) (No idea what this does, I just saw it on some reg dumps PCIE_WAEN stands for PCIE work around enable -got that from ah_osdep.c) ath5k_hw_reg_write(ah, AR5K_PCIE_WAEN, 0x0000000f); Maybe we should add the various workarounds in the code and enable them through module options... Theory 2: When a warm reset happens all units stop, radio also, in order to reduce warm resets newer cards (after AR2413) introduced the "synth only channel change" that means eg. when we want to scan or quickly switch channel without changing MAC parameters (bssid masks etc) or PHY parameters (modulation mode) we can hit the analog parts directly (getting direct access to the analog bus -think of it as "live" RF Buffer-) and set up synth parameters. We also fire up gain calibration and nf calibration there but it might be different (well we have a difference on our code anyway because we actually wait for gain calibration to complete, they don't but i don't think that's the issue because it wouldn't happen on MadWiFi then + it would happen 1-2 times not always after resume). Notice that MadWiFi also doesn't support synth-only channel change but Atheros windows driver does. Theory 3: Atheros provides card vendors with an ability to store on EEPROM a series of register tweaks to be performed after reset, RF Buffer modification etc. This mechanism is called EAR (EEPROM Added Registers) and both ath5k and MadWiFi don't support it yet, again Atheros windows driver does so your card vendor might have some register tweaks related to card's design that we don't do. Theory 4: So some PHY/RF registers need delay on access when card uses the external 32Khz ref clock, normally during reset we disable the external clock capability (means that card won't switch to external clock while we initialize PHY and write PHY registers and RF buffer) but maybe something is wrong with it in your case so we can't initialize PHY/RF properly. You can check that by using ath_info utility. Just download ath_info from MadWiFi svn and run ./ath_info 0xfebf0000 if you see that your card has external 32KHz crystal then you might try to disable that. Just edit drivers/net/wireless/ath/ath5k/reset.c and remove (comment out) any calls to ath5k_hw_set_sleep_clock. 2 and 3 are in my todo list forever, sorry for that ;-( Created attachment 35602 [details]
ath_info 0xfebf0000
(In reply to comment #14) > Well this bug is not an easy one, it seems from your logs that card survives > one resume (and connects fine to the AP) and fails on the second one with > gain > calibration timeout. This error indicates that the PHY is not properly > initialized. First i want you to tell us if unloading and reloading the > module > makes any difference, then we are going for more hardcore stuff :-) No difference, sorry. [dmesg output while unloading/reloading ath5k module] ... Oct 31 20:17:22 localhost kernel: [ 668.189967] ath5k phy0: gain calibration timeout (2457MHz) Oct 31 20:17:24 localhost kernel: [ 670.233246] ath5k 0000:05:00.0: PCI INT A disabled Oct 31 20:17:28 localhost kernel: [ 674.459327] cfg80211: Calling CRDA to update world regulatory domain Oct 31 20:17:28 localhost kernel: [ 674.484819] ath5k 0000:05:00.0: PCI INT A -> Link[LNED] -> GSI 19 (level, low) -> IRQ 19 Oct 31 20:17:28 localhost kernel: [ 674.484831] ath5k 0000:05:00.0: setting latency timer to 64 Oct 31 20:17:28 localhost kernel: [ 674.484920] ath5k 0000:05:00.0: registered as 'phy0' Oct 31 20:17:28 localhost kernel: [ 674.489644] cfg80211: World regulatory domain updated: Oct 31 20:17:28 localhost kernel: [ 674.489650] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) Oct 31 20:17:28 localhost kernel: [ 674.489654] (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Oct 31 20:17:28 localhost kernel: [ 674.489657] (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) Oct 31 20:17:28 localhost kernel: [ 674.489660] (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) Oct 31 20:17:28 localhost kernel: [ 674.489663] (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Oct 31 20:17:28 localhost kernel: [ 674.489666] (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Oct 31 20:17:29 localhost kernel: [ 674.981866] ath: EEPROM regdomain: 0x60 Oct 31 20:17:29 localhost kernel: [ 674.981870] ath: EEPROM indicates we should expect a direct regpair map Oct 31 20:17:29 localhost kernel: [ 674.981875] ath: Country alpha2 being used: 00 Oct 31 20:17:29 localhost kernel: [ 674.981877] ath: Regpair used: 0x60 Oct 31 20:17:29 localhost kernel: [ 674.981993] phy0: Selected rate control algorithm 'minstrel_ht' Oct 31 20:17:29 localhost kernel: [ 674.990618] Registered led device: ath5k-phy0::rx Oct 31 20:17:29 localhost kernel: [ 674.990648] Registered led device: ath5k-phy0::tx Oct 31 20:17:29 localhost kernel: [ 674.990659] ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70) Oct 31 20:17:29 localhost kernel: [ 675.017510] net_ratelimit: 5 callbacks suppressed Oct 31 20:17:29 localhost kernel: [ 675.017515] ath5k phy0: gain calibration timeout (2412MHz) Oct 31 20:17:29 localhost kernel: [ 675.045186] ath5k phy0: gain calibration timeout (2412MHz) ... [/dmesg output while unloading/reloading ath5k module] > > Have in mind that we don't have any documentation on the PHY/RF parts and > since > this bug also happens on MadWiFi (uses a HAL that's very close to what > Atheros > uses on their drivers) we are on our own here... Well, on madwifi it's not able to reconnect to the ap after resume (sometimes the adapter survives one or more supends), but there are no error messages in dmesg. The result, however, is the same: no wlan connection or scan results (via "iwlist ath0 scan") possible until reboot. > > Theory 1: Seems that 2425 and 5424 (Swan and Condor) chips have some tweaks > related to PCI-E operation that we don't use (MadWiFi also doesn't use them) > but are included in HAL sources (and commented out). Maybe we should try them > out and see if they work in your case. > > Try using any of them inside ath5k_hw_reset function > (drivers/net/wireless/ath/ath5k/reset.c), after writing initvals. > > (Link goes to L1 when MAC goes to sleep > and loopback the link down to reset) > ath5k_hw_reg_write(ah, AR5K_PCIE_PM_CTL, > AR5K_PCIE_PM_CTL_L1_WHEN_D2 | > AR5K_PCIE_PM_CTL_LDRESET_EN); > > (Avoid skips before last TS2) > AR5K_REG_ENABLE_BITS(ah, AR5K_PCIE_PM_CTL, > AR5K_PCIE_PM_CTL_PSM_D2); > > (Assert power reset along with pci reset) > AR5K_REG_DISABLE_BITS(ah, AR5K_PCIE_PM_CTL, > AR5K_PCIE_PM_CTL_PSM_D1); > > And another one to be added after SERDES programming inside ath5k_hw_attach > (drivers/net/wireless/ath/ath5k/attach.c) > > (No idea what this does, I just saw it on some reg dumps > PCIE_WAEN stands for PCIE work around enable -got that from ah_osdep.c) > ath5k_hw_reg_write(ah, AR5K_PCIE_WAEN, > 0x0000000f); > > Maybe we should add the various workarounds in the code and enable them > through > module options... I tried this (with my very limited programming/c skills) and it looks like it didn't work. I entered those lines of code as shown below (hope i did it in the right place). [reset.c] ... /* * Main reset function */ int ath5k_hw_reset(struct ath5k_hw *ah, enum nl80211_iftype op_mode, struct ieee80211_channel *channel, bool change_channel) { struct ath_common *common = ath5k_hw_common(ah); u32 s_seq[10], s_led[3], staid1_flags, tsf_up, tsf_lo; u32 phy_tst1; u8 mode, freq, ee_mode; int i, ret; ee_mode = 0; staid1_flags = 0; tsf_up = 0; tsf_lo = 0; freq = 0; mode = 0; /* laptop tweaks */ // Link goes to L1 when MAC goes to sleep and loopback the link down to reset ath5k_hw_reg_write(ah, AR5K_PCIE_PM_CTL, AR5K_PCIE_PM_CTL_L1_WHEN_D2 | AR5K_PCIE_PM_CTL_LDRESET_EN); // Avoid skips before last TS2 AR5K_REG_ENABLE_BITS(ah, AR5K_PCIE_PM_CTL, AR5K_PCIE_PM_CTL_PSM_D2); // Assert power reset along with pci reset AR5K_REG_DISABLE_BITS(ah, AR5K_PCIE_PM_CTL, AR5K_PCIE_PM_CTL_PSM_D1); /* laptop tweaks fin */ ... [/reset.c] > Theory 2: When a warm reset happens all units stop, radio also, in order to > reduce warm resets newer cards (after AR2413) introduced the "synth only > channel change" that means eg. when we want to scan or quickly switch channel > without changing MAC parameters (bssid masks etc) or PHY parameters > (modulation > mode) we can hit the analog parts directly (getting direct access to the > analog > bus -think of it as "live" RF Buffer-) and set up synth parameters. We also > fire up gain calibration and nf calibration there but it might be different > (well we have a difference on our code anyway because we actually wait for > gain > calibration to complete, they don't but i don't think that's the issue > because > it wouldn't happen on MadWiFi then + it would happen 1-2 times not always > after > resume). Notice that MadWiFi also doesn't support synth-only channel change > but > Atheros windows driver does. > > Theory 3: Atheros provides card vendors with an ability to store on EEPROM a > series of register tweaks to be performed after reset, RF Buffer modification > etc. This mechanism is called EAR (EEPROM Added Registers) and both ath5k and > MadWiFi don't support it yet, again Atheros windows driver does so your card > vendor might have some register tweaks related to card's design that we don't > do. > > Theory 4: So some PHY/RF registers need delay on access when card uses the > external 32Khz ref clock, normally during reset we disable the external clock > capability (means that card won't switch to external clock while we > initialize > PHY and write PHY registers and RF buffer) but maybe something is wrong with > it > in your case so we can't initialize PHY/RF properly. You can check that by > using ath_info utility. Just download ath_info from MadWiFi svn and run > > ./ath_info 0xfebf0000 Please check the attachment for ath_info output. > > if you see that your card has external 32KHz crystal then you might try to > disable that. Just edit drivers/net/wireless/ath/ath5k/reset.c and remove > (comment out) any calls to ath5k_hw_set_sleep_clock. There is no 32KHz crystal (if ath_info is correct). But there is one thing i forgot to mention: The bios battery is dead for quiet some time now and this laptops date needs to be synchronized on every reboot. Could this be a problem for the ath5k driver? > > 2 and 3 are in my todo list forever, sorry for that ;-( No need to be sorry, i'm happy about every kind of help here ;) OK let's be brave and bypass the gain calibration check and see if we have further problems... go to reset.c again near line 1282 and comment out this chunk... 1284 if (ath5k_hw_register_timeout(ah, AR5K_PHY_AGCCTL, 1285 AR5K_PHY_AGCCTL_CAL, 0, false)) { 1286 ATH5K_ERR(ah->ah_sc, "gain calibration timeout (%uMHz)\n", 1287 channel->center_freq); 1288 } ...and see what happens With lines 1284-1288 commented out the result is the same. No connection after resume, and of course no "gain calibration timeout" messages in dmesg. When scanning for networks with iwlist i get: "wlan0 No scan results" Similar problem here [ 20.450150] ath5k 0000:07:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 [ 20.450163] ath5k 0000:07:00.0: setting latency timer to 64 [ 20.450274] ath5k 0000:07:00.0: registered as 'phy0' [ 21.006108] ath: EEPROM regdomain: 0x60 [ 21.006110] ath: EEPROM indicates we should expect a direct regpair map [ 21.006115] ath: Country alpha2 being used: 00 [ 21.006117] ath: Regpair used: 0x60 [ 21.031592] Registered led device: ath5k-phy0::rx [ 21.031865] Registered led device: ath5k-phy0::tx [ 21.031871] ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70) [ 22.234823] ath5k phy0: gain calibration timeout (2412MHz) [ 22.281569] ath5k phy0: gain calibration timeout (2412MHz) (.....) [ 166.689764] ath5k phy0: gain calibration timeout (2457MHz) [ 168.814595] ath5k phy0: gain calibration timeout (2412MHz) My connection is already broken from startup, mostly when I boot on battery power. When I boot and it works, my connection is usually stable and will stay on forever. (also suspend / hibernate will sometimes cause this error). kernel version: 2.6.35-22 (ubuntu 10.10). (but it already happened from version 2.6.32 (31 was still alright). Is there a way how I can supply more information? I have built debug versions of ath5k.ko and ath.ko, but how do I actually debug them? What kind of info can I possibly give? I found that I do need to turn off my computer completely to get my card back to working again, a computer reset is not enough. Is there a way to do a real "reset" of the card (for example, clear all registers) when the error occurs? e.g. within these lines of reset.c: 1284 if (ath5k_hw_register_timeout(ah, AR5K_PHY_AGCCTL, 1285 AR5K_PHY_AGCCTL_CAL, 0, false)) { 1286 ATH5K_ERR(ah->ah_sc, "gain calibration timeout (%uMHz)\n", 1287 channel->center_freq); 1288 } Can you check out latest wireless-testing and retry adding the lines on reset.c after line 1220 (ath5k_hw_commit_eeprom_settings(ah, channel, ee_mode);) ? When I download the git tree, I do indeed see the line you are mentioning, but when I download the source from the ubuntu repos I don't see it. When I try to compile the git tree, I am compiling the whole kernel. Could you give me a clue how to compile only the wireless module for ubuntu, current kernel? Or should I build the whole new kernel? You can use compat-wireless, it also contains latest code from wireless-testing and you wont have to compile everything ;-) No, I still get a gain calibration timeout. It happens when I boot on battery power, and then (before a connection establishes) plugin the AC adaptor. Whenever I already have a connection (boot on battery power and leave it until connected, or whenever I boot on AC power) it stays stable as hell. ath5k phy0: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61) I use openwrt trunk on a routerstation board with compat-wireless-2011-01-05 Some times when I use wifi command in openwrt and some times without making any change after some minutes ath5k starts to show following error and it needs to reboot the router to get rid of this. ath5k phy0: gain calibration timeout (2412MHz) ath5k phy0: gain calibration timeout (2417MHz) ath5k phy0: gain calibration timeout (2422MHz) ath5k phy0: gain calibration timeout (2427MHz) ath5k phy0: gain calibration timeout (2432MHz) ath5k phy0: gain calibration timeout (2437MHz) ath5k phy0: gain calibration timeout (2442MHz) ath5k phy0: gain calibration timeout (2447MHz) ath5k phy0: gain calibration timeout (2452MHz) ath5k phy0: gain calibration timeout (2457MHz) ath5k phy0: gain calibration timeout (2462MHz) ath5k phy0: gain calibration timeout (2467MHz) ath5k phy0: gain calibration timeout (2472MHz) ath5k phy0: gain calibration timeout (2484MHz) __ratelimit: 1 callbacks suppressed ath5k phy0: gain calibration timeout (5200MHz) ath5k phy0: gain calibration timeout (5220MHz) ath5k phy0: gain calibration timeout (5240MHz) ath5k phy0: gain calibration timeout (5260MHz) ath5k phy0: gain calibration timeout (5280MHz) ath5k phy0: gain calibration timeout (5300MHz) ath5k phy0: gain calibration timeout (5320MHz) ath5k phy0: gain calibration timeout (5500MHz) ath5k phy0: gain calibration timeout (5520MHz) ath5k phy0: gain calibration timeout (5540MHz) __ratelimit: 1 callbacks suppressed ath5k phy0: gain calibration timeout (5580MHz) ath5k phy0: gain calibration timeout (5600MHz) ath5k phy0: gain calibration timeout (5620MHz) ath5k phy0: gain calibration timeout (5640MHz) ath5k phy0: gain calibration timeout (5660MHz) ath5k phy0: gain calibration timeout (5680MHz) ath5k phy0: gain calibration timeout (5700MHz) ath5k phy0: gain calibration timeout (5745MHz) ath5k phy0: gain calibration timeout (5765MHz) ath5k phy0: gain calibration timeout (5785MHz) __ratelimit: 1 callbacks suppressed ath5k phy0: gain calibration timeout (5825MHz) ath5k phy0: gain calibration timeout (2457MHz) I also got this error with AR5413 chipset. I think that powernow-k8 causes this error. I compile kernel with CONFIG_X86_POWERNOW_K8=m and then blacklist powernow-k8 module. Now my wireles network adapter works perfectly. can confirm what without powernow-k8 works fine. I have tried to "git bisect" there are results: git bisect start '--' 'arch/x86/kernel/cpu/cpufreq/powernow-k8.c' 'arch/x86/kernel/cpu/cpufreq/powernow-k8.h' 'arch/x86/kernel/cpu/cpufreq/powernow-k8.o' # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # bad: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38 git bisect bad 521cb40b0c44418a4fd36dc633f575813d59a43d # bad: [a2fed573f065e526bfd5cbf26e5491973d9e9aaa] x86, cpufreq: Add APERF/MPERF support for AMD processors git bisect bad a2fed573f065e526bfd5cbf26e5491973d9e9aaa # good: [a4636818f8e0991f32d9528f39cf4f3d6a7d30a3] cpumask: rename tsk_cpumask to tsk_cpus_allowed git bisect good a4636818f8e0991f32d9528f39cf4f3d6a7d30a3 # bad: [8defcaa6ba157f215c437939c3adcd1dbfa1a8fa] Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq git bisect bad 8defcaa6ba157f215c437939c3adcd1dbfa1a8fa # good: [557a701c16553b0b691dbb64ef30361115a80f64] [CPUFREQ] Fix use after free of struct powernow_k8_data git bisect good 557a701c16553b0b691dbb64ef30361115a80f64 git rev-list 8defcaa6ba157f215c437939c3adcd1dbfa1a8fa ^557a701c16553b0b691dbb64ef30361115a80f64 -- drivers/net/wireless/ath/ath5k/ ff30b3642c1f56a5ae6522b78e82be867086c637 359207c687cc8f4f9845c8dadd0d6dabad44e584 7f9d3577e2603ca279c3176b696eba392f21cbe2 671adc93b6472eaa0142a88d096c945f7b07893a 242ab7ad689accafd5e87ffd22b85cf1bf7fbbef will try this revisions too previous results are wrong... To detect bad or good I tried to 1) boot up with powersupply 2) check what cpu freq is 1.8G 3) check iwlist scanning 4) remove powersupply 5) check what cpu freq is 0.8G 6) down wlan0, up wlan0 7) check iwlist scanning multiple times But it does not give correct results every time... I have tried bisect again: git bisect start '--' 'arch/x86/kernel/cpu/cpufreq/powernow-k8.c' 'arch/x86/kernel/cpu/cpufreq/powernow-k8.h' # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # bad: [557a701c16553b0b691dbb64ef30361115a80f64] [CPUFREQ] Fix use after free of struct powernow_k8_data git bisect bad 557a701c16553b0b691dbb64ef30361115a80f64 # good: [c53614ec17fe6296a696aa4ac71a799814bb50c1] [CPUFREQ] powernow-k8: Fix test in get_transition_latency() git bisect good c53614ec17fe6296a696aa4ac71a799814bb50c1 # good: [b8cbe7e82ec8b55d7bbdde66fc69e788fde00dc6] [CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c git bisect good b8cbe7e82ec8b55d7bbdde66fc69e788fde00dc6 # good: [e2f74f355e9e2914483db10c05d70e69e0b7ae04] [ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface git bisect good e2f74f355e9e2914483db10c05d70e69e0b7ae04 Now to test I have build all drivers, start X, NetworkManager. After discovering what this problem exists at 557a701c16553b0b691dbb64ef30361115a80f64 again, I have build previous revision 292e0041c3b22c5347092152504d814119554b57 and on this revision have no this problem. so it looks like problem not in ath5k module... Created attachment 53842 [details]
dmesg with 2.6.32-08666-g292e004
Created attachment 53852 [details]
dmesg with 2.6.32-08667-g557a701
diffing dmesgs shows some interesting info --- dmesg-2.6.32-08666-g292e004 2011-04-09 01:00:51.000000000 +0400 +++ dmesg-2.6.32-08667-g557a701 2011-04-09 01:00:51.000000000 +0400 @@ -1,4 +1,4 @@ -Linux version 2.6.32-08666-g292e004 (root@nataly-hostx) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #13 SMP Sat Apr 9 00:33:14 SAMST 2011 +Linux version 2.6.32-08667-g557a701 (root@nataly-hostx) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 SMP Sat Apr 9 00:10:57 SAMST 2011 Command line: root=/dev/sda6 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) @@ -118,9 +118,11 @@ Console: colour VGA+ 80x25 console [tty0] enabled hpet clockevent registered -Fast TSC calibration using PIT -Detected 1808.004 MHz processor. -Calibrating delay loop (skipped), value calculated using timer frequency.. 3616.00 BogoMIPS (lpj=7232016) +Fast TSC calibration failed +TSC: Unable to calibrate against PIT +TSC: using HPET reference calibration +Detected 1807.934 MHz processor. +Calibrating delay loop (skipped), value calculated using timer frequency.. 3615.86 BogoMIPS (lpj=7231736) Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Mount-cache hash table entries: 256 Created attachment 53872 [details] dmesg with 2.6.32-08667-g557a701 after shutdown After full shutdown there is no difference described at Comment #33 Reassigning on the basis of comment 30... I'm on 2.6.38.8-32.fc15.x86_64 and I can reproduce it pretty consistently. I just need to compile/distcheck a few modules in gnome and it dies. I'm guessing either high cpu or high temp does it. hw: ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70) (sorry - hit send too soon) It's great that kernel bugzilla is back. can you please verify if the problem still exists in the latest upstream kernel? I still have it on a vanilla 32bit SMP 3.3.3 with AR5414 . I upgrade kernels very often and I first started noticing the problem with 3.3 series (3.3.2, to be precise). I've had these kernels in 3.2/3.3 series: 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.2.8 3.2.9 3.2.11 3.3.2 3.3.3 Hmm, come to think of it, maybe it's a different bug - I don't suspend/resume at all. My driver just stops working once in a while. Created attachment 73219 [details]
The problem is still there for Linux 3.3.4
This is a grep from my message log done with
tail -n 80000 /var/log/messages | fgrep -a -e 'ath5k' -e 'May 6 06:18:28' -e 'syslog-ng starting up' | head -n 1300 > ath5k.log
it shows that the problem is still there for Linux 3.3.4
this is a desktop install no suspend/resume features enabled just cpufreq infrastructure
(In reply to comment #41) > Created an attachment (id=73219) [details] > The problem is still there for Linux 3.3.4 > > This is a grep from my message log done with > > tail -n 80000 /var/log/messages | fgrep -a -e 'ath5k' -e 'May 6 06:18:28' -e > 'syslog-ng starting up' | head -n 1300 > ath5k.log > > it shows that the problem is still there for Linux 3.3.4 > > this is a desktop install no suspend/resume features enabled just cpufreq > infrastructure Here is lspci -v 04:06.0 Ethernet controller: Atheros Communications Inc. AR2413/AR2414 Wireless Network Adapter [AR5005G(S) 802.11bg] (rev 01) Subsystem: Atheros Communications Inc. Compex Wireless 802.11 b/g MiniPCI Adapter, Rev A1 [WLM54G] Flags: bus master, medium devsel, latency 168, IRQ 20 Memory at fdce0000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Kernel driver in use: ath5k and ./ath_info 0xfdce0000 -==Device Information==- MAC Revision: 2413 (0x78) Device type: 2 2GHz PHY Revision: 2413 (0x56) /============== EEPROM Information =============\ | EEPROM Version: 5.3 | EEPROM Size: 16 kbit | | EEMAP: 2 | Reg. Domain: 0x809C | |================= Capabilities ================| | 802.11a Support: no | Turbo-A disabled: no | | 802.11b Support: yes | Turbo-G disabled: yes | | 802.11g Support: yes | 2GHz XR disabled: no | | RFKill Support: no | 5GHz XR disabled: no | | 32kHz Crystal: no | | \===============================================/ /=========================================================\ | Calibration data common for all modes | |=========================================================| | CCK/OFDM gain delta: 1 | | CCK/OFDM power delta: 251 | | Scaled CCK delta: 5 | | 2GHz Antenna gain: 1 | | 5GHz Antenna gain: 0 | | Turbo 2W maximum dBm: 38 | | Target power start: 0x17c | | EAR Start: 0x1d6 | \=========================================================/ /=========================================================\ | Calibration data for 802.11b operation | |=========================================================| | I power: 0x00 | Q power: 0x00 | | Use fixed bias: 0x00 | Max turbo power: 0x00 | | Max XR power: 0x00 | Switch Settling Time: 0x23 | | Tx/Rx attenuation: 0x1c | TX end to XLNA On: 0x00 | | TX end to XPA Off: 0x00 | TX end to XPA On: 0x07 | | 62db Threshold: 0x1c | XLNA gain: 0x0c | | XPD: 0x01 | XPD gain: 0x0a | | I gain: 0x00 | Tx/Rx margin: 0x19 | | False detect backoff: 0x00 | Noise Floor Threshold: -1 | | ADC desired size: -38 | PGA desired size: -80 | |=========================================================| | Antenna control 0: 0x00 | Antenna control 1: 0x02 | | Antenna control 2: 0x25 | Antenna control 3: 0x25 | | Antenna control 4: 0x21 | Antenna control 5: 0x21 | | Antenna control 6: 0x01 | Antenna control 7: 0x26 | | Antenna control 8: 0x26 | Antenna control 9: 0x22 | | Antenna control 10: 0x22 | Antenna control 11: 0x00 | |=========================================================| | Octave Band 0: 0 | db 0: 0 | | Octave Band 1: 3 | db 1: 5 | | Octave Band 2: 0 | db 2: 0 | | Octave Band 3: 0 | db 3: 0 | \=========================================================/ /============== Per rate power calibration ===========\ | Freq | 1Mbit/s | 2Mbit/s | 5.5Mbit/s | 11Mbit/s | |======|============|==========|===========|==========| | 2412 | 19.01 | 19.01 | 19.01 | 19.01 | |======|============|==========|===========|==========| | 2484 | 19.01 | 19.01 | 19.01 | 19.01 | \=====================================================/ /====================== Per channel power calibration ===================\ | Freq | pwr_i | pwr_0 | pwr_1 | pwr_2 | pwr_3 | | | pddac_i | pddac_0 | pddac_1 | pddac_2 | pddac_3 | |======|=========|=============|=============|=============|=============| | 2412 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 7 | 16 | 37 | 75 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.00 | 22.00 | 25.00 | | | 12 | 22 | 41 | 69 | 101 | |======|=========|=============|=============|=============|=============| | 2472 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 9 | 19 | 44 | 89 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.50 | 22.50 | 25.00 | | | 14 | 25 | 47 | 79 | 111 | \========================================================================/ /=========================================================\ | Calibration data for 802.11g operation | |=========================================================| | I power: 0x00 | Q power: 0x10 | | Use fixed bias: 0x01 | Max turbo power: 0x26 | | Max XR power: 0x26 | Switch Settling Time: 0x31 | | Tx/Rx attenuation: 0x1c | TX end to XLNA On: 0x00 | | TX end to XPA Off: 0x00 | TX end to XPA On: 0x0e | | 62db Threshold: 0x1c | XLNA gain: 0x0c | | XPD: 0x01 | XPD gain: 0x0a | | I gain: 0x00 | Tx/Rx margin: 0x19 | | False detect backoff: 0x00 | Noise Floor Threshold: -1 | | ADC desired size: -38 | PGA desired size: -80 | |=========================================================| | Antenna control 0: 0x00 | Antenna control 1: 0x02 | | Antenna control 2: 0x25 | Antenna control 3: 0x25 | | Antenna control 4: 0x21 | Antenna control 5: 0x21 | | Antenna control 6: 0x01 | Antenna control 7: 0x26 | | Antenna control 8: 0x26 | Antenna control 9: 0x22 | | Antenna control 10: 0x22 | Antenna control 11: 0x02 | |=========================================================| | Octave Band 0: 0 | db 0: 0 | | Octave Band 1: 3 | db 1: 5 | | Octave Band 2: 0 | db 2: 0 | | Octave Band 3: 0 | db 3: 0 | \=========================================================/ /==================== Turbo mode infos ===================\ | Switch Settling time: 0x62 | Tx/Rx margin: 0x19 | | Tx/Rx attenuation: 0x1c | ADC desired size: -32 | | PGA desired size: -80 | | \=========================================================/ /============== Per rate power calibration ===========\ | Freq | 6-24Mbit/s | 36Mbit/s | 48Mbit/s | 54Mbit/s | |======|============|==========|===========|==========| | 2412 | 19.00 | 18.00 | 17.00 | 16.00 | |======|============|==========|===========|==========| | 2437 | 19.00 | 18.00 | 17.00 | 16.00 | |======|============|==========|===========|==========| | 2462 | 19.00 | 18.00 | 17.00 | 16.00 | \=====================================================/ /====================== Per channel power calibration ===================\ | Freq | pwr_i | pwr_0 | pwr_1 | pwr_2 | pwr_3 | | | pddac_i | pddac_0 | pddac_1 | pddac_2 | pddac_3 | |======|=========|=============|=============|=============|=============| | 2412 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 6 | 12 | 31 | 68 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.00 | 21.50 | 24.00 | | | 11 | 20 | 35 | 61 | 86 | |======|=========|=============|=============|=============|=============| | 2437 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 1 | 11 | 29 | 68 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.50 | 21.50 | 24.50 | | | 12 | 22 | 37 | 62 | 94 | |======|=========|=============|=============|=============|=============| | 2472 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 3 | 11 | 29 | 66 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.00 | 22.00 | 24.50 | | | 11 | 22 | 38 | 65 | 97 | \========================================================================/ GPIO registers: CR 0x00000003, DO 0x00000001, DI 0x00000011 STA_ID0: 74:ea:3a:d9:a5:c4 STA_ID1: 0x1000c4a5, AP: 0, IBSS: 0, KeyCache Disable: 0 TIMER0: 0x00000030, TBTT: 48, TU: 0x1a5c0030 TIMER1: 0x0007ffff, DMAb: 65535, TU: 0x1a5bffff (-49) TIMER2: 0x01ffffff, SWBA: 65535, TU: 0x1a7fffff (+2359247) TIMER3: 0x00000031, ATIM: 49, TU: 0x1a5c0031 (+1) TSF: 0x000000696c13457e, TSFTU: 1233, TU: 0x1a5b04d1 BEACON: 0x00000000 LAST_TSTP: 0x6c12e19b Managed to grabb a ath_info with "broken" state of the driver. This is with cpufreq completely disabled from menuconfig. sdr@hristo ~/tmp/ath_info $ cat info_in_error.log -==Device Information==- MAC Revision: 2413 (0x78) Device type: 2 2GHz PHY Revision: 2413 (0x56) /============== EEPROM Information =============\ | EEPROM Version: 5.3 | EEPROM Size: 16 kbit | | EEMAP: 2 | Reg. Domain: 0x809C | |================= Capabilities ================| | 802.11a Support: no | Turbo-A disabled: no | | 802.11b Support: yes | Turbo-G disabled: yes | | 802.11g Support: yes | 2GHz XR disabled: no | | RFKill Support: no | 5GHz XR disabled: no | | 32kHz Crystal: no | | \===============================================/ /=========================================================\ | Calibration data common for all modes | |=========================================================| | CCK/OFDM gain delta: 1 | | CCK/OFDM power delta: 251 | | Scaled CCK delta: 5 | | 2GHz Antenna gain: 1 | | 5GHz Antenna gain: 0 | | Turbo 2W maximum dBm: 38 | | Target power start: 0x17c | | EAR Start: 0x1d6 | \=========================================================/ /=========================================================\ | Calibration data for 802.11b operation | |=========================================================| | I power: 0x00 | Q power: 0x00 | | Use fixed bias: 0x00 | Max turbo power: 0x00 | | Max XR power: 0x00 | Switch Settling Time: 0x23 | | Tx/Rx attenuation: 0x1c | TX end to XLNA On: 0x00 | | TX end to XPA Off: 0x00 | TX end to XPA On: 0x07 | | 62db Threshold: 0x1c | XLNA gain: 0x0c | | XPD: 0x01 | XPD gain: 0x0a | | I gain: 0x00 | Tx/Rx margin: 0x19 | | False detect backoff: 0x00 | Noise Floor Threshold: -1 | | ADC desired size: -38 | PGA desired size: -80 | |=========================================================| | Antenna control 0: 0x00 | Antenna control 1: 0x02 | | Antenna control 2: 0x25 | Antenna control 3: 0x25 | | Antenna control 4: 0x21 | Antenna control 5: 0x21 | | Antenna control 6: 0x01 | Antenna control 7: 0x26 | | Antenna control 8: 0x26 | Antenna control 9: 0x22 | | Antenna control 10: 0x22 | Antenna control 11: 0x00 | |=========================================================| | Octave Band 0: 0 | db 0: 0 | | Octave Band 1: 3 | db 1: 5 | | Octave Band 2: 0 | db 2: 0 | | Octave Band 3: 0 | db 3: 0 | \=========================================================/ /============== Per rate power calibration ===========\ | Freq | 1Mbit/s | 2Mbit/s | 5.5Mbit/s | 11Mbit/s | |======|============|==========|===========|==========| | 2412 | 19.01 | 19.01 | 19.01 | 19.01 | |======|============|==========|===========|==========| | 2484 | 19.01 | 19.01 | 19.01 | 19.01 | \=====================================================/ /====================== Per channel power calibration ===================\ | Freq | pwr_i | pwr_0 | pwr_1 | pwr_2 | pwr_3 | | | pddac_i | pddac_0 | pddac_1 | pddac_2 | pddac_3 | |======|=========|=============|=============|=============|=============| | 2412 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 7 | 16 | 37 | 75 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.00 | 22.00 | 25.00 | | | 12 | 22 | 41 | 69 | 101 | |======|=========|=============|=============|=============|=============| | 2472 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 9 | 19 | 44 | 89 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.50 | 22.50 | 25.00 | | | 14 | 25 | 47 | 79 | 111 | \========================================================================/ /=========================================================\ | Calibration data for 802.11g operation | |=========================================================| | I power: 0x00 | Q power: 0x10 | | Use fixed bias: 0x01 | Max turbo power: 0x26 | | Max XR power: 0x26 | Switch Settling Time: 0x31 | | Tx/Rx attenuation: 0x1c | TX end to XLNA On: 0x00 | | TX end to XPA Off: 0x00 | TX end to XPA On: 0x0e | | 62db Threshold: 0x1c | XLNA gain: 0x0c | | XPD: 0x01 | XPD gain: 0x0a | | I gain: 0x00 | Tx/Rx margin: 0x19 | | False detect backoff: 0x00 | Noise Floor Threshold: -1 | | ADC desired size: -38 | PGA desired size: -80 | |=========================================================| | Antenna control 0: 0x00 | Antenna control 1: 0x02 | | Antenna control 2: 0x25 | Antenna control 3: 0x25 | | Antenna control 4: 0x21 | Antenna control 5: 0x21 | | Antenna control 6: 0x01 | Antenna control 7: 0x26 | | Antenna control 8: 0x26 | Antenna control 9: 0x22 | | Antenna control 10: 0x22 | Antenna control 11: 0x02 | |=========================================================| | Octave Band 0: 0 | db 0: 0 | | Octave Band 1: 3 | db 1: 5 | | Octave Band 2: 0 | db 2: 0 | | Octave Band 3: 0 | db 3: 0 | \=========================================================/ /==================== Turbo mode infos ===================\ | Switch Settling time: 0x62 | Tx/Rx margin: 0x19 | | Tx/Rx attenuation: 0x1c | ADC desired size: -32 | | PGA desired size: -80 | | \=========================================================/ /============== Per rate power calibration ===========\ | Freq | 6-24Mbit/s | 36Mbit/s | 48Mbit/s | 54Mbit/s | |======|============|==========|===========|==========| | 2412 | 19.00 | 18.00 | 17.00 | 16.00 | |======|============|==========|===========|==========| | 2437 | 19.00 | 18.00 | 17.00 | 16.00 | |======|============|==========|===========|==========| | 2462 | 19.00 | 18.00 | 17.00 | 16.00 | \=====================================================/ /====================== Per channel power calibration ===================\ | Freq | pwr_i | pwr_0 | pwr_1 | pwr_2 | pwr_3 | | | pddac_i | pddac_0 | pddac_1 | pddac_2 | pddac_3 | |======|=========|=============|=============|=============|=============| | 2412 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 6 | 12 | 31 | 68 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.00 | 21.50 | 24.00 | | | 11 | 20 | 35 | 61 | 86 | |======|=========|=============|=============|=============|=============| | 2437 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 1 | 11 | 29 | 68 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.50 | 21.50 | 24.50 | | | 12 | 22 | 37 | 62 | 94 | |======|=========|=============|=============|=============|=============| | 2472 | | | | | | |------|---------|-------------|-------------|-------------|-------------| | | 0 | 4.00 | 8.50 | 13.00 | 0.00 | | | 3 | 11 | 29 | 66 | 0 | |------|---------|-------------|-------------|-------------|-------------| | | 10 | 14.00 | 18.00 | 22.00 | 24.50 | | | 11 | 22 | 38 | 65 | 97 | \========================================================================/ GPIO registers: CR 0x00000003, DO 0x00000001, DI 0x00000011 STA_ID0: 74:ea:3a:d9:a5:c4 STA_ID1: 0x1000c4a5, AP: 0, IBSS: 0, KeyCache Disable: 0 TIMER0: 0x00000030, TBTT: 48, TU: 0x00010030 TIMER1: 0x0007ffff, DMAb: 65535, TU: 0x0000ffff (-49) TIMER2: 0x01ffffff, SWBA: 65535, TU: 0x003fffff (+4128719) TIMER3: 0x00000031, ATIM: 49, TU: 0x00010031 (+1) TSF: 0x00000000001e0162, TSFTU: 1920, TU: 0x00000780 BEACON: 0x00000000 LAST_TSTP: 0xda8c7183 I also use the ath5k on my built-in wireless card on my laptop. The driver broke somewhere around 2.6.[23][0-9]. I did try to bisect it down, but this specific kernel version had so much ath5k changes, that this was not possible. I then bought a rt61pci.ko supported card (marked Linux supported on the cover already) and it did a great job. I doubt that cpufreq had anything to do with it, at least on my machine. I can confirm that with openSUSE 12.2 (Kernel 3.4 based) the ath5k works again for me. It's definitely worth giving a latest kernel a try! As there have been different reports how the bug shows up, compare with comment #40: > Hmm, come to think of it, maybe it's a different bug - I don't > suspend/resume at all. My driver just stops working once in a while. I guess everybody should try a recent kernel, I could imagine the bug can get closed. Setting to needinfo now. Closing as obsolete, if this is still seen with modern kernels please re-open and update |
Created attachment 27196 [details] lspci -vv On my Asus-F5N notebook the ath5k module works really fine together with the built-in Atheros AR5001 wireless adapter. Well, until i set this laptop into suspend mode (via pm-suspend). After resume my syslog gets spammed with messages like "kernel: ath5k phy0: gain calibration timeout (2427MHz)". Whatever i do next (unload+load ath5k or even reboot), i'll _never_ be able to get a working wlan connection again. If i do a 'iwlist wlan0 scan' no network shows up (aka 'no scan results'). There is just one way to get a working wlan adapter again: 1) Power-off 2) Power-on 3) Wlan via ath5k works again (until pm-suspend of course) Please tell me what logs etc. you need to debug/fix this issue. PS: I already tried the latest compat-wireless stuff with the very same result. On WindowsXP the wireless adapter works as expected after suspend/resume, so i think this can't be just some kind of hardware/firmware error. FYI this also happens with madwifi (module ath_pci).