Bug 13581
Summary: | ath9k doesn't work with newer kernels | ||
---|---|---|---|
Product: | Networking | Reporter: | Matteo Croce (rootkit85) |
Component: | Wireless | Assignee: | Luis Chamberlain (mcgrof) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | adam, andrej, ath9k-devel, info, j, linville, mcgrof, rjw, sujith |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.30 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 13070 | ||
Attachments: |
modprobe ath9k debug=0xffffffff
iwlist wlan0 scan ram align hack |
Description
Matteo Croce
2009-06-19 12:04:19 UTC
ioctl[SIOCSIWSCAN]: Device or resource busy Hm, can you try this: iwlist wlan0 scan and provide the dmesg output of that. Are there no other instances of wpa_supplicant running? Can you also install the latest iw, and provide the output of 'iw event -t'. http://wireless.kernel.org/en/users/Documentation/iw http://wireless.kernel.org/en/users/Documentation/Reporting_bugs Do you happen to have NetworkManager running in the background when you start wpa_supplicant manually? It is known to disconnect the connection created by wpa_supplicant if it was not the one asking for the connection in the first place; this results in a dmesg output that looks like the one shown here. If you do not have NetworkManager (or some other software that could behave similarly) running, please attach more verbose debug output from wpa_supplicant (-ddt on command line). Yes I know than networkManager and connman do disconnects me, but I was using starting the only wpa_supplicant instance by hand Matteo, can you provide more details as I asked? sure: root@macbook-luca:~# iwlist wlan0 scan wlan0 Interface doesn't support scanning : Network is down root@macbook-luca:~# ifconfig wlan0 up root@macbook-luca:~# iwlist wlan0 scan wlan0 Interface doesn't support scanning : Device or resource busy root@macbook-luca:~# iw event -t ^C root@macbook-luca:~# iw event -t # start wpa_supplicant # kill wpa_supplicant 1245891191.161903: wlan0 (phy #0): scan aborted # start wpa_supplicant # kill wpa_supplicant 1245891209.162864: wlan0 (phy #0): scan aborted Matteo, are you sure you do not have rfkill button pressed? If you do not have rfkill button enabled, please try loading ath9k with debugging enabled. Please read: http://wireless.kernel.org/en/users/Drivers/ath9k/debug Please use 0xffffffff for debug and attach the compressed log here or somewhere for retrieval. Can't test now, the notebook is far away from here, but MacBooks hasn't a wifi button Please provide feedback I have no rfkill module loaded Please provide the compressed log of running ath9k with debugging enabled. I have the same problem. My ath9k driver is broken with 2.6.31-rc8-zen1. THe strange thing is that iwlist wlan0 scan is working. ifup fails. I've attached my dmesg output with debug=0xffffffff option on. Created attachment 23019 [details]
modprobe ath9k debug=0xffffffff
dmesg dump of modprobe ath9k debug=0xffffffff on an Amilo xa 3530 laptop.
Created attachment 23023 [details]
iwlist wlan0 scan
Dump of dmesg of iwlist wlan0 scan.
Exactly the same problem here, since 2.6.30. 2.6.31 is affected as well. Here comes an *important* note: This probably has nothing to do with ath9k. I can see exactly the same problem with both ipw2200 and ath9k. AFAIK, the former doesn't even use the mac80211 module. The problem could be somewhere much deeper than in ath9k. There could be something wrong with wpa_supplicant or with the wireless extensions API implementation it talks to. BTW, sometimes ping6 -q -i .001 <address-of-my-server> helps, but it takes up to 15 seconds of this terrible ping flood before the network starts working again. Lower packet frequencies mostly don't help and the interface needs to be brought down and up again. It seems to me that ping6 -q -i .1 <my-server> significantly reduces the probaility of total network freezes. Disassociations still *do* occur, but the connection recovers automatically in most cases, unlike situations with no ping at all. Unfortunately, this recovery usually takes a couple of seconds, which is just enough to interrupt all the data streams and VoIP calls and make the user scream with anger. I'm having a similar problem here with 2.6.31.4 in Arch Linux on an Eee 1000HE. It seems like it might be rfkill related since when I have bluetooth enabled, I can usually get a connection, but it will disconnect me shortly after. If I have bluetooth disabled, I cannot get a connection at all and get: SIOCSIFFLAGS: Unknown error 132 Which is rfkill related. However, I don't know if this is asus-laptop or ath9k related, so I will be creating a new report when I get home and have access to my laptop again. This does indeed seem rfkill related, note that rfkill was completely rewritten for the 2.6.31 kernel. Please try out the new rfkill userspace application to see if you can query the rfkill status: http://wireless.kernel.org/en/users/Documentation/rfkill I think there is support for a command: rfkill unblock all I didn't get to play around too much, but I was able to try the userspace application. It looks like eee-laptop exposes a second set of devices(?) and when I enable or disable one things act strange. However, one time I was able to reboot, unblock all, and everything worked fine. Hopefully tonight or tomorrow I will be able to do some more testing and post results. Can you try: git revert 5d423ccd7ba4285f1084e91b26805e1d0ae978ed > git revert 5d423ccd7ba4285f1084e91b26805e1d0ae978ed
Does this apply to eee only or to ath9k supported devices in general? (I have a PCMCIA device and don't see any "second set of devices" as reported by Adam.)
Honestly, I don't know how to get the source. Cloned the kernel repository, switched branch to v2.6.31 and reverted the change -- that worked fine. But the kernel source I obtained was 2.6.31, not 2.6.31.5 (the version I'm using right now), judging by the Makefile. There was no tag called v2.6.31.5 on the list.
Should I just ignore this and try 2.6.31? Or is there a better way to get the source and revert one commit? I guess there are more GIT paths to play with and I just cloned from the wrong one...
(In reply to comment #19) > Can you try: > > git revert 5d423ccd7ba4285f1084e91b26805e1d0ae978ed On 2.6.31.5 or 2.6.32? No, the commit 5d423ccd7ba4285f1084e91b26805e1d0ae978ed could be affecting other devices as well. To get the tree: git clone git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-allstable.git Since you were on 2.6.31 go ahead and just try that: git checkout -b linux-2.6.31.5 v2.6.31.5 Compile and test that to ensure it does not work. Then revert the patch I am indicating to you: git checkout -b linux-2.6.31.5-revert-applied git revert 5d423ccd7ba4285f1084e91b26805e1d0ae978ed compile and test that, see if your ath9k then works. If it does fix it then please try now a patch for 2.6.31.5 which should might fix the issue without a full revert of the questioned patch. I will attach the patch next. The patch applies on a clean 2.6.31.5. So you would do: git checkout linux-2.6.31.5 git checkout -b linux-2.6.31.5-with-new-fix git am ram-align-hack.patch compile and test that. Created attachment 23664 [details]
ram align hack
This is the ram-align-hack.patch, apply this on to 2.6.31.5 and compile/test. This is supposed to fix the issue introduced by patch I asked you to revert but it *might* not fix it; would like your test results.
Testing this patch is pointless unless you confirm reverting the patch in question helps.
Also please attach "debug" the the kernel parameters line. On grub this would be editing /boot/grub/menu.lst and for your specific kernel line add debug. Then please post your full dmesg output on each boot, with the 2.6.31.5 kernel, with the revert and then with the new ram-align-hack.patch. Unfortunately, only my Eee uses ath9k which takes quite a while to finish compiling and I need it for school. So it might not be until Saturday that I can try this. I will report back when I finish the compiling and testing. Compiling a kernel on my eeepc takes 17 minutes... You could cross compile. This is simple if you have a machine with the same architecture around and is beefy. Just make sure you copy a good config over and use make tar-pkg Then untar that stuff to / on the eeepc. I had a hard time doing this myself on my eeepc when I had one due to the amount of space on the eepc. But if you have a ~4 GB USB stick this shouldn't be too bad. Don't forget to generate an initramfs if you need one. (In reply to comment #26) > Compiling a kernel on my eeepc takes 17 minutes... Feel free to test this then, I will do it when I'm not at school, work, or spending time with my family. Patch looks good so far here. My problem was somewhat intermittent, so if it comes back I'll open a new issue. Thanks again for all the help. Both the reverted and the patched versions work just as bad as the original version and much much worse than last week's compat-wirless. Failures occur about once in 30 minutes with compat-wireless. All the other versions fail every five minutes or so. That makes the user feel like throwing the whole computer out of the window. Those failures are almost "reproducible"... Just wave your hand quickly in front of the card's antenna and it fails immediately. As I have already said many times, this must be a rate control issue. I'm convinced that multiple rate control algorithms should be tested before trying anything else. Surprisingly, most failures don't even get logged! The disassociation events correspond to the hardest failures that block the interface for minutes or forever. When shorter failures are "handled" and fixed by a flood ping in time, they don't get logged at all. Take a look here: http://bugzilla.kernel.org/show_bug.cgi?id=13807 This seems to be helping some people. this issue seems to have been the align issue which the user reported fix. The fix was upstream and non-ath9k related. I can confirm too that it's fixed |