After switching from madwifi to ath5k, I get kernel panics within minutes of using the ath5k driver. I'm attaching a photo of the panic dump. Here's the output of lspci (under madwifi/220.127.116.11):
05:00.0 Ethernet controller: Atheros Communications Inc. AR242x 802.11abg Wireless PCI Express Adapter (rev 01)
Subsystem: Askey Computer Corp. Device 7106
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at f0800000 (64-bit, non-prefetchable) [size=64K]
Capabilities:  Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities:  MSI: Mask- 64bit- Count=1/1 Enable-
Address: 00000000 Data: 0000
Capabilities:  Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <128ns, L1 <2us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities:  MSI-X: Enable- Mask- TabSize=1
Vector table: BAR=0 offset=00000000
PBA: BAR=0 offset=00000000
Capabilities:  Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities:  Virtual Channel <?>
Kernel driver in use: ath_pci
Kernel modules: ath_pci
Created attachment 20783 [details]
a photo of the dump
(In reply to comment #1)
> Created an attachment (id=20783) [details]
> a photo of the dump
Shot in the dark, but do you get the same with 18.104.22.168? There was a patch in it that might help. Also are you using adhoc or managed mode?
Tried out 22.214.171.124, unfortunately the system still duly freezes within minutes. I'm using managed mode (with WPA2, if that matters).
If there's something more I can do to help debug the problem, please let me know, as I have no experience with debugging kernel issues.
Can you post your config?
Also, if you could turn off automatic association with APs, then try to grab a scan with iw, that might help:
$ sudo iw dev wlan0 scan trigger
# do this a few times
$ sudo iw dev wlan0 scan dump >> dump.log
Created attachment 20904 [details]
handle rate control errors with a warning
Can you try this patch and report whether it helps, and if so which warnings it produces?
I'm posting my config. The iw scan (both of the commands) fails with a:
command failed: Operation not supported (-95).
(In reply to comment #5)
> Can you try this patch and report whether it helps, and if so which warnings
I'll try out the patch and post the results.
Created attachment 20921 [details]
(In reply to comment #6)
> I'm posting my config. The iw scan (both of the commands) fails with a:
> command failed: Operation not supported (-95).
Ok, thank you. That's ok, it probably requires very recent kernel + iw
(wireless-testing and iw from git e.g.).
> (In reply to comment #5)
> > Can you try this patch and report whether it helps, and if so which
> warnings it
> > produces?
> I'll try out the patch and post the results.
Ok great, thanks!
Tried out the patch, unfortunately my kernel still panics very quickly. I'm attaching the warnings I get.
Created attachment 20939 [details]
excerpt from /var/log/messages.log
(In reply to comment #9)
> Tried out the patch, unfortunately my kernel still panics very quickly. I'm
> attaching the warnings I get.
So it actually panics after it emits the warning? Or it just emits the warning?
> Apr 10 15:15:25 ogi-laptop kernel: [ 83.710063] minstrel: invalid rate
> report 1 (n=1)
So minstrel actually has only one available rate, that sounds messed up. What sounds even more messed up is that it's asking us to send on a rate that isn't supported.
By any chance does this help:
(In reply to comment #11)
> So it actually panics after it emits the warning? Or it just emits the
Yes it panics afterwards, but only later, and not immediately after the warning.
> By any chance does this help:
Will try it out and post the results.
Unfortunately, still no cigar. I get the same warning, and the kernel still panics later.
BTW I noticed that the kernel stack trace now looks slighlty different than the original one I posted, now ending in ath5k_tx, but I'm not aware exactly when on the road from the original 2.6.29 did this change. I'm attaching a photo of the trace.
Created attachment 21082 [details]
Kernel stack trace photo
I installed crda and this resolved the issue for me. No more panics nor warnings in messages.log.
Still, I'm not closing the bug, because I'm not sure this is the intended behaviour - from what I gather, my wireless should still work without crda, just with (possibly) less available channels. So I'm letting someone more knowledgeable decide.
(In reply to comment #15)
> I installed crda and this resolved the issue for me. No more panics nor
> warnings in messages.log.
> Still, I'm not closing the bug, because I'm not sure this is the intended
> behaviour - from what I gather, my wireless should still work without crda,
> just with (possibly) less available channels. So I'm letting someone more
> knowledgeable decide.
Ahhhh very interesting. Thank you for tracking this down, this helps a lot. No, the kernel shouldn't crash without crda. I'll remove it from my test system and see if I can reproduce.
Sorry, scrap that - it's unrelated to crda. I was trying it out on a different AP. Only then I realized that your hunch about rates was spot on. My AP was set to a fixed TX rate of 2 Mbps (duh!), causing the failure.
Interestingly, if I use hostapd with only a 2mbps rate, I eventually get a lockup on the client side, and in a couple of cases the machine running the AP has also panicked. Seems different from the one above stack trace, but I haven't fully successful in capturing all the relevant logs yet.
Patch is here:
Sorry for taking so long to respond, but I didn't have the time to test this properly. Works like a charm, still running after a couple of hours of regular load. Many thanks.