When I ifconfig wlan0 up, the machine hardlocks. No mouse/kbd, no ping response over wired ethernet. Using the hard rfkill switch, the machine becomes responsive again, but the interface remains down. the error is a repeating "ath5k: phy0: too many interrupts, giving up for now" punctuated occasionally by " With rfkill support compiled into the kernel, with the switch back on (radios on), it still believes the switch to be off (kill position) and I can no longer ifconfig up the device. Without rfkill support, I can up the device without lockup, however, the only function I can find to work is iwlist wlan0 scanning. It can see my two APs according to the report, but if I iwconfig wlan0 essid "Wire2" it does not find or associate. I then specified the ap's mac address and channel, and it still refused to associate. (Receive only, perhaps?) attatched are lspci -vnn, dmesg of the errors during the event, iwconfig and ifconfig of the device after the event, and kernel .config.
Created attachment 34462 [details] lspci -vnn
Created attachment 34472 [details] dmesg of event
Created attachment 34482 [details] kernel config
iwconfig after event: wlan0 IEEE 802.11bg ESSID:off/any Mode:Managed Access Point: Not-Associated Tx-Power=20 dBm Retry long limit:7 RTS thr:off Fragment thr:off Encryption key:off Power Management:off iwconfig after event: wlan0 Link encap:Ethernet HWaddr 00:16:e3:31:02:61 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt storm on IFF_UP?
Any way for me to examine this?
It would show up in /proc/interrupts but it definitely sounds like an interrupt storm. Maybe you can turn on interrupt tracing to find out which interrupt is triggering? It'll fill your logs but you only have to have it on for a short period of time... use something like "modprobe ath5k debug=2; ifconfig wlan0 up; sleep .1; modprobe -r ath5k"
It's probably the GPIO interrupt used for RFKill switch, it's weird because we handle it correctly so far and i don't remember any similar bug reports. Maybe we should disable it if it generates too many interrupts (it should generate an interrupt on each switch) but then we won't be able to come back (maybe enable it periodicaly ?). Also if you want to debug an interrupt storm it's possible that syslog will also hang so before you ifup the interface switch to your primary console, bring up the interface and cat /proc/kmsg.
So, after using bob's command string, (output attatched) modprobing with debug=2 had an unusual side effect. it now works properly. this is the first time i've had this card work in this laptop. (extra info: fujitsu laptop that originally was sold with intel wifi cards but has the ar card in it now). Under how it had been operating before, I assumed it would hang up the system after the ifconfig up ... it didn't, completed, and unloaded successfully. I was rather shocked. I tested by loading the module without debug, worked fine. Older version of kernel? worked fine. in fact, i'm posting this response using the wireless right now. I am attaching new dmesg log above, because it is still throwing a ton of callback suppressed and ath5k_intr messages in debug. To verify this didn't clear some state left over by windows, I have rebooted into windows and back out, and after the reboots the card continued to function. Somehow, it appears the debug=2 enabled my card to work again without causing a massive interrupt storm. is this possible? nothing else was changed in terms of software or config (still running posted config that generated these issues)
Created attachment 35122 [details] debug dmesg while working
(In reply to comment #8) > It's probably the GPIO interrupt used for RFKill switch, it's weird because > we > handle it correctly so far and i don't remember any similar bug reports. > Maybe > we should disable it if it generates too many interrupts (it should generate > an > interrupt on each switch) but then we won't be able to come back (maybe > enable > it periodicaly ?). Also if you want to debug an interrupt storm it's possible > that syslog will also hang so before you ifup the interface switch to your > primary console, bring up the interface and cat /proc/kmsg. Also, when the issue was happening as I originally described, after ifconfig up I would have been unable to give any input or commands at all, and I feel that even if i did ifconfig wlan0 up; cat /proc/kmsg it still would have not executed the cat command until I used the rfkill switch
Had problem again. rebooted, executed debug statements, recieved expected behavior as per original bug report. had to engage rfkill to unlock computer. Attached are dmesg with ath5k debug=2 and copy of /proc/interrupts after event. Afterwards, I was able to reload ath5k and use the card, the first time before debug and the second time with debug I was unable to make use of the card.
Created attachment 35162 [details] dmesg ath5k debug=2 with machine lockup
Created attachment 35172 [details] proc/interrupts after lockup event
Good call on the GPIO int, Nick -- that is what it seems to be (status 0x1000000/0x800814b5). Odd that it continues to be raised...? ath5k_hw_get_isr(ah, &status); /* NB: clears IRQ too */ Shouldn't that clear it? Is it at all possible that you have a switch that is flaky? Or not fully switched?
Wouldn't that make it nonfunctional under windows? Windows respects the rfkill as far as I know. would there be a userspace way to monitor this? (perhaps something that adding rfkill support back into my config would show?) Yes, every time this happens, I can reboot into windows, have working wifi, reboot out, and not have it in linux.
attempted using rfkill event per Linville's suggestion to gather data: rfkill event 1288190719.228493: idx 0 type 1 op 0 soft 0 hard 0 modprobe -r ath5k rfkill event <no output> modprobe ath5k rfkill event 1288190782.311988: idx 1 type 1 op 0 soft 0 hard 0 each time i manipulated the rfkill switch repeatedly I then reloaded the module, and started up rfkill event on a seperate vt and ifconfig up'd the interface. machine locked, output is as follows after unlock using rfkill. 1288191198.634367: idx 2 type 1 op 0 soft 0 hard 0 1288191211.716144: idx 2 type 1 op 2 soft 0 hard 1 manipulation afterwards yeilded no more output, and ifconfig refused to attempt to re-raise the interface saying that it was rfkill'd.
Extra data while I have the card out of my laptop: labeled AR5BMB5/PA3458U-1MPC
It's interesting in the case where it worked, we still got a gpio interrupt but it wasn't re-triggered. I wonder if the windows driver does debouncing or something like that.
Would you like a copy of the windows driver I use on the laptop for dissecting?
I acquired a new Atheros card, AR5212, and it exhibits the same exact behavior. 01:0d.0 Ethernet controller [0200]: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter [168c:0013] (rev 01) Subsystem: Fujitsu Limited. Device [10cf:1234] Flags: bus master, medium devsel, latency 168, IRQ 11 Memory at d0200000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Kernel driver in use: ath5k Kernel modules: ath5k
the behavior continues under 2.6.37-rc5
Upon further investigation into interrupt handling on the system, I have discovered that at least under Windows and Darwin when used on this laptop - the 8259 PIC controls all interrupts and the processor APIC does absolutely nothing - could this be a corner case not accounted for in the driver?
Have you tried booting with "noapic" on the kernel command line?
Closing due to lack of response...