Bug 20962 - ath5k / AR2413 causes machine hard lockup on ifconfig wlan0 up
Summary: ath5k / AR2413 causes machine hard lockup on ifconfig wlan0 up
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-23 00:25 UTC by Gary Sparkes
Modified: 2011-03-17 19:28 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.36
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci -vnn (6.38 KB, text/plain)
2010-10-23 00:26 UTC, Gary Sparkes
Details
dmesg of event (45.68 KB, text/plain)
2010-10-23 00:28 UTC, Gary Sparkes
Details
kernel config (55.62 KB, text/plain)
2010-10-23 00:29 UTC, Gary Sparkes
Details
debug dmesg while working (120.52 KB, text/plain)
2010-10-26 19:11 UTC, Gary Sparkes
Details
dmesg ath5k debug=2 with machine lockup (32.40 KB, text/plain)
2010-10-27 02:36 UTC, Gary Sparkes
Details
proc/interrupts after lockup event (632 bytes, text/plain)
2010-10-27 02:36 UTC, Gary Sparkes
Details

Description Gary Sparkes 2010-10-23 00:25:59 UTC
When I ifconfig wlan0 up, the machine hardlocks. No mouse/kbd, no ping response over wired ethernet.

Using the hard rfkill switch, the machine becomes responsive again, but the interface remains down.

the error is a repeating "ath5k: phy0: too many interrupts, giving up for now"
punctuated occasionally by "

With rfkill support compiled into the kernel, with the switch back on (radios on), it still believes the switch to be off (kill position) and I can no longer ifconfig up the device.

Without rfkill support, I can up the device without lockup, however, the only function I can find to work is iwlist wlan0 scanning. It can see my two APs according to the report, but if I iwconfig wlan0 essid "Wire2" it does not find or associate. I then specified the ap's mac address and channel, and it still refused to associate. (Receive only, perhaps?)

attatched are lspci -vnn, dmesg of the errors during the event, iwconfig and ifconfig of the device after the event, and kernel .config.
Comment 1 Gary Sparkes 2010-10-23 00:26:47 UTC
Created attachment 34462 [details]
lspci -vnn
Comment 2 Gary Sparkes 2010-10-23 00:28:19 UTC
Created attachment 34472 [details]
dmesg of event
Comment 3 Gary Sparkes 2010-10-23 00:29:07 UTC
Created attachment 34482 [details]
kernel config
Comment 4 Gary Sparkes 2010-10-23 00:32:03 UTC
iwconfig after event: 


wlan0     IEEE 802.11bg  ESSID:off/any  
          Mode:Managed  Access Point: Not-Associated   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off


iwconfig after event:


wlan0     Link encap:Ethernet  HWaddr 00:16:e3:31:02:61  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
Comment 5 John W. Linville 2010-10-25 19:01:46 UTC
Interrupt storm on IFF_UP?
Comment 6 Gary Sparkes 2010-10-26 14:28:55 UTC
Any way for me to examine this?
Comment 7 Bob Copeland 2010-10-26 17:56:15 UTC
It would show up in /proc/interrupts but it definitely sounds like an interrupt storm.  

Maybe you can turn on interrupt tracing to find out which interrupt is triggering?  It'll fill your logs but you only have to have it on for a short period of time... use something like "modprobe ath5k debug=2; ifconfig wlan0 up; sleep .1; modprobe -r ath5k"
Comment 8 Nick Kossifidis 2010-10-26 18:49:33 UTC
It's probably the GPIO interrupt used for RFKill switch, it's weird because we handle it correctly so far and i don't remember any similar bug reports. Maybe we should disable it if it generates too many interrupts (it should generate an interrupt on each switch) but then we won't be able to come back (maybe enable it periodicaly ?). Also if you want to debug an interrupt storm it's possible that syslog will also hang so before you ifup the interface switch to your primary console, bring up the interface and cat /proc/kmsg.
Comment 9 Gary Sparkes 2010-10-26 19:09:47 UTC
So, after using bob's command string, (output attatched) modprobing with debug=2 had an unusual side effect. it now works properly. this is the first time i've had this card work in this laptop. (extra info: fujitsu laptop that originally was sold with intel wifi cards but has the ar card in it now).

Under how it had been operating before, I assumed it would hang up the system after the ifconfig up ... it didn't, completed, and unloaded successfully. I was rather shocked. I tested by loading the module without debug, worked fine. Older version of kernel? worked fine. in fact, i'm posting this response using the wireless right now. I am attaching new dmesg log above, because it is still throwing a ton of callback suppressed and ath5k_intr messages in debug.


To verify this didn't clear some state left over by windows, I have rebooted into windows and back out, and after the reboots the card continued to function.

Somehow, it appears the debug=2 enabled my card to work again without causing a massive interrupt storm. is this possible? nothing else was changed in terms of software or config (still running posted config that generated these issues)
Comment 10 Gary Sparkes 2010-10-26 19:11:08 UTC
Created attachment 35122 [details]
debug dmesg while working
Comment 11 Gary Sparkes 2010-10-26 19:12:42 UTC
(In reply to comment #8)
> It's probably the GPIO interrupt used for RFKill switch, it's weird because
> we
> handle it correctly so far and i don't remember any similar bug reports.
> Maybe
> we should disable it if it generates too many interrupts (it should generate
> an
> interrupt on each switch) but then we won't be able to come back (maybe
> enable
> it periodicaly ?). Also if you want to debug an interrupt storm it's possible
> that syslog will also hang so before you ifup the interface switch to your
> primary console, bring up the interface and cat /proc/kmsg.

Also, when the issue was happening as I originally described, after ifconfig up I would have been unable to give any input or commands at all, and I feel that even if i did ifconfig wlan0 up; cat /proc/kmsg it still would have not executed the cat command until I used the rfkill switch
Comment 12 Gary Sparkes 2010-10-27 02:35:06 UTC
Had problem again. rebooted, executed debug statements, recieved expected behavior as per original bug report. had to engage rfkill to unlock computer. Attached are dmesg with ath5k debug=2 and copy of /proc/interrupts after event.

Afterwards, I was able to reload ath5k and use the card, the first time before debug and the second time with debug I was unable to make use of the card.
Comment 13 Gary Sparkes 2010-10-27 02:36:25 UTC
Created attachment 35162 [details]
dmesg ath5k debug=2 with machine lockup
Comment 14 Gary Sparkes 2010-10-27 02:36:57 UTC
Created attachment 35172 [details]
proc/interrupts after lockup event
Comment 15 John W. Linville 2010-10-27 14:08:15 UTC
Good call on the GPIO int, Nick -- that is what it seems to be (status 0x1000000/0x800814b5).  Odd that it continues to be raised...?

ath5k_hw_get_isr(ah, &status);          /* NB: clears IRQ too */

Shouldn't that clear it?

Is it at all possible that you have a switch that is flaky?  Or not fully switched?
Comment 16 Gary Sparkes 2010-10-27 14:11:20 UTC
Wouldn't that make it nonfunctional under windows? Windows respects the rfkill as far as I know. would there be a userspace way to monitor this? (perhaps something that adding rfkill support back into my config would show?)

Yes, every time this happens, I can reboot into windows, have working wifi, reboot out, and not have it in linux.
Comment 17 Gary Sparkes 2010-10-27 15:23:10 UTC
attempted using rfkill event per Linville's suggestion to gather data:

rfkill event
1288190719.228493: idx 0 type 1 op 0 soft 0 hard 0
modprobe -r ath5k
rfkill event
<no output>
modprobe ath5k
rfkill event
1288190782.311988: idx 1 type 1 op 0 soft 0 hard 0 

each time i manipulated the rfkill switch repeatedly

I then reloaded the module, and started up rfkill event on a seperate vt and ifconfig up'd the interface. machine locked, output is as follows after unlock using rfkill.

1288191198.634367: idx 2 type 1 op 0 soft 0 hard 0
1288191211.716144: idx 2 type 1 op 2 soft 0 hard 1

manipulation afterwards yeilded no more output, and ifconfig refused to attempt to re-raise the interface saying that it was rfkill'd.
Comment 18 Gary Sparkes 2010-10-27 20:30:09 UTC
Extra data while I have the card out of my laptop:

labeled AR5BMB5/PA3458U-1MPC
Comment 19 Bob Copeland 2010-10-30 15:57:29 UTC
It's interesting in the case where it worked, we still got a gpio interrupt but it wasn't re-triggered.  I wonder if the windows driver does debouncing or something like that.
Comment 20 Gary Sparkes 2010-10-31 13:17:20 UTC
Would you like a copy of the windows driver I use on the laptop for dissecting?
Comment 21 Gary Sparkes 2010-12-17 05:58:23 UTC
I acquired a new Atheros card, AR5212, and it exhibits the same exact behavior. 

01:0d.0 Ethernet controller [0200]: Atheros Communications Inc.  Atheros AR5001X+ Wireless Network Adapter [168c:0013] (rev 01)
     Subsystem: Fujitsu Limited. Device [10cf:1234]
     Flags: bus master, medium devsel, latency 168, IRQ 11
     Memory at d0200000 (32-bit, non-prefetchable) [size=64K]
     Capabilities: [44] Power Management version 2
     Kernel driver in use: ath5k
     Kernel modules: ath5k
Comment 22 Gary Sparkes 2010-12-17 06:54:58 UTC
the behavior continues under 2.6.37-rc5
Comment 23 Gary Sparkes 2011-02-06 03:50:29 UTC
Upon further investigation into interrupt handling on the system, I have discovered that at least under Windows and Darwin when used on this laptop - the 8259 PIC controls all interrupts and the processor APIC does absolutely nothing - could this be a corner case not accounted for in the driver?
Comment 24 John W. Linville 2011-02-07 15:58:51 UTC
Have you tried booting with "noapic" on the kernel command line?
Comment 25 John W. Linville 2011-03-17 19:28:42 UTC
Closing due to lack of response...

Note You need to log in before you can comment on or make changes to this bug.