Bug 12110
Summary: | ath9k causes computer to hang after long data transmissions | ||
---|---|---|---|
Product: | Networking | Reporter: | Barry Green (barry) |
Component: | Wireless | Assignee: | Luis Chamberlain (mcgrof) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | aaronkelley, anders1, stubenschrott, sujith |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27.10 (Ubuntu 2.6.27-11-generic) | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Debug messages
wpa_supplicant messages before computer hang syslog of coomunication with two AP SMP fixes v3 Serialization v4 Backport of serialization patches -- v6 serialization for 2.6.27-2.6.29 |
Description
Barry Green
2008-11-27 03:23:01 UTC
You are reporting a bug on 2.6.27-7 (whatever that is) when using compat-wireless. This means the issue you are facing is not present on 2.6.27 but on wireles-testing. Please understand compat-wireless comes from wireless-testing and as such its under development. Our best bet is to take this on the lists as we probably already have a fix for this. Please see the mailing lists for pending patches. John will soon updates wireless-testing with any pending patches from last week. 2.6.27-7-generic is what "uname -a" reports for Ubuntu 8.10. However, it is not correct that the issue is not present on 2.6.27. I have also experienced this same issue on the stock 2.6.27 kernel as included with Ubuntu 8.10. I logged the issue as using compat-wireless because I thought it was a good idea to use the latest version of the ath9k code. So can this be reopened? Yes, sorry about that. Please try disabling 11n and see if you get the issue. You seem to be able to reproduce it easily so this should hopefully help. Also please try to reproduce by going into a virtual terminal, so that if there is an oops maybe you can at least take a picture of it. You can also try to enable more debugging messages in ath9k. You do that by changing drivers/net/wireless/ath9k/core.h as follows: diff --git a/drivers/net/wireless/ath9k/core.h b/drivers/net/wireless/ath9k/core.h index f0c5437..b831271 100644 --- a/drivers/net/wireless/ath9k/core.h +++ b/drivers/net/wireless/ath9k/core.h @@ -111,7 +111,7 @@ enum ATH_DEBUG { ATH_DBG_ANY = 0xffffffff }; -#define DBG_DEFAULT (ATH_DBG_FATAL) +#define DBG_DEFAULT (ATH_DBG_ANY) #define DPRINTF(sc, _m, _fmt, ...) do { \ if (sc->sc_debug & (_m)) \ This means just change ATH_DBG_FATAL to ATH_DBG_ANY and recompile. This should work either on 2.6.27 or on wireless-testing/compat-wireless. Please provide some debug messages if possible. Created attachment 19120 [details]
Debug messages
I've attached the contents of /var/log/messages (chopped the middle out since there was a lot of repeated debug information). This is with ATH_DBG_ANY. The hang happened at timestamp Dec 2 23:47:12. You can see the restart message when I had to hard reboot at Dec 2 23:50:29. This time it did bring down my router again. I did switch to a virtual console but there was no debug output printed to the console. I also tried turning off 802.11n at the router, but still had the same problem. This log is from when 802.11n was on.
Barry, please try these patches: http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2008-11-22/27-IOMMU-01/ Let me know how it goes. Does the latest compat-wireless have these patches applied, and can I just use that? I'd rather not have to rebuild my whole kernel if at all possible. Affirmative. Now I don't get a connection, but the driver no longer hangs my PC. I'm not getting any association with the AP when using wpa_supplicant on 802.11n. OK, an update. I've experimented and managed association with the AP on channel 10 only, in 802.11n mode, with WPA. However, the connection is so slow (less than 1KB/s) as to make it unusable. So far, any other channel I try to associate with (2,3,8 and 11) causes the whole computer to hang. I'm now using compat-wireless-2008-12-11, and am just about to recompile with ATH_DBG_ANY to get more information. Created attachment 19286 [details]
wpa_supplicant messages before computer hang
Turning on ATH_DBG_ANY didn't give me any more messages than having it turned off. I've done some more testing and found the following:
- If I turn off encryption completely, I can associate with the AP, but again the transfer rate is less than 1KB/s.
- Both 11n and 11g cause the computer to hang, except when I associate on channel 10 as explained above.
I've attached a screenshot of the wpa_supplicant messages leading up to the hang.
Not sure where to go from here - any suggestions? Please boot into single user mode, then connect manually using iwconfig. See if you can reproduce an issue and see if you can get an oops message. (In reply to comment #12) > Not sure where to go from here - any suggestions? Barry! I definitely reckon your issue with pc hanging is due to AMD cpu in your computer. I’ve been facing different difficulties in computers based on some AMD cpus much more often than in the ones built on Intel architecture. For example, the latest Freebsd 7.1 freezes on AMD Phenom X3 even while booting. As for Ath9k, my laptop with Intel Pentium M 1.6 never hangs while connecting to AP in 802.11n . I’m using channel 1. See my comment to the bug in speed of ath9k http://bugzilla.kernel.org/show_bug.cgi?id=12373#c4 Please close this bug report if it is not present on a stable release of the kernel. If you are testing compat-wireless / wireless-testing stuff then please use the linux-wirless mailing list to report issues. OK, I have tried this on the 2.6.27-9 kernel that in now current in Ubuntu 8.10. With no encryption, I can associate and so far have not experienced a crash. With WPA2 encryption, I get the crash as soon as wpa_supplicant tries to associate with the AP. I will now try to use the nmi_watchdog setting to get an oops message. Will update in a few hours. Can't get a crash on a console, only when I'm in Gnome (that's what I meant above when I mentioned that I still got the crash). I tried nmi_watchdog=1 (I do have IO-APIC) but it didn't give me anything, although as I say, I was in Gnome at the time. Barry Ubuntu's latest kernel is 2.6.27-11 from what I can understand. Can you confirm if you can upgrade. On Ubuntu, please provide the output of: cat /proc/version_signature That will tell us the exact kernel version, the "2.6.27-9" is completely useless to me. I need something like 2.6.27.9 or 2.6.27.11. Also can you clarify what you mean about how you can get a crash? Am I understanding correctly that you cannot reproduce a hard crash when on a console and not using GNOME? Output of "cat /proc/version_signature": Ubuntu 2.6.27-9.19-generic To clarify - I can only get a crash when using GNOME. You are understanding correctly in saying that I cannot reproduce a hard crash when on a console. Unfortunately it seems Ubuntu started using /proc/version_signature to determine the exact kernel version on Jaunty, not intrepid so this is useless to me. I did manage to find out what an Ubuntu "linux-image-2.6.27-9-generic" uses though, its 2.6.27.2. This is ancient!!! And you need at least 2.6.27.8 to get the latest 2.6.27 ath9k critical fixes. I checked with the Ubuntu kernel team and they have a "2.6.27-12" kernel package being worked on which will have all updates through 2.6.27.11, this is only available through their git tree right now. They do have some other kernels though but only available through the intrepid-proposed sources. For example their 2.6.27-11 kernel has updates through 2.6.27.10. As of Ubuntu policy packages from -proposed only get posted into -updates (what users have by default) only about once per quarter. Because of this and since ath9k got its critical DMA updates in the beginning of December it explains why Ubuntu users don't yet have anything > 2.6.27.2 and why some may be experiencing issues with ath9k. To get Ubuntu's 2.6.27-11 kernel please add to your /etc/apt/sources.list the following entries: # Ubuntu proposed changes deb http://us.archive.ubuntu.com/ubuntu/ intrepid-proposed main restricted deb-src http://us.archive.ubuntu.com/ubuntu/ intrepid-proposed main restricted Then do: sudo apt-get install linux-image-2.6.27-12-generic Please let me know if that fixes your issues. Sorry I meant: sudo apt-get install linux-image-2.6.27-11-generic as -12 is not yet available. OK, I will try this later today. However, note that above I was also testing using compat-wireless (I tried versions right up until the end of December last year) and got the same issue. The end of December is old for wireless-testing purposes. There's been quite a few patches in since then. Tested with 2.6.27-11 as suggested above and I can still reproduce the issue. However, only while in GNOME and only when I set the router to n-only mode. Barry, I have been stress testing ath9k using iperf over TCP on Ubuntu's 2.6.27-11 on AR5416 abg Cardbus card against an WRT610N AP with 802.11n enabled only on 5 GHz using HT40 configuration with no encryption for over 4 1/2 hours so far and its still cruising with rates averaging 18Mbps - 28 Mbps but staying more around 28 MBps. No crashes, not outages. Can you please tell me what AP you have, what exact settings you are using for "802.11n only". I need to know if you are using HT20, HT40, encryption and if so what type. Also can you please come up steps to reproduce this? Are you transfering data? Are you using iperf? Are you idle? Since this is occurring when only using GNOME are you certain its an ath9k issue? If so what makes you believe that is the case? Are you using any proprietary graphics driver? AP: I've tried 2 different APs - the Linksys WAG325N Wireless-N router and the Billion 7402NX (which is what I'm currently using). The problem can be reproduced with either. Settings: The last crash I got was when using WPA2 with AES. I don't know if it's using HT20 or HT40 since the AP only has a setting to choose "20MHz" or "40/20MHz". Can I get this information from ath9k? Is there a way to force ath9k to use HT40? I can get the crash by: 1. "modprobe ath9k". 2. Start wpa_supplicant. 3. Start to transfer a large file via FTP. After up to 3 mins, I get the hang. Obviously without any hard evidence I can't be certain that it's an ath9k issue. However, my PC never hangs like this if I'm not trying to use ath9k. Previously I was using the same wireless card with the madwifi driver and it was solid as a rock. I am using the proprietary nvidia graphics driver. HT configuration is negotiated with the AP as far as I can tell. Will have to check how exactly that is figured out when the AP enables both. Anyway I've just stress tested the driver against 11n-only mode AP WPA2 AES in HT40 and HT20 and no crashes occurred. Can you do: sudo rmmod ath9k sudo dmesg -c > /dev/null sudo modprobe -l ath9k sudo modprobe ath9k sudo dmesg -c uname -a and paste everything here? Also, can you try your tests and rmmod nvidia module prior to testing? If it still crashes with the proprietary nvidia module present can you try using no encryption too and see if that makes a difference. Since I cannot reproduce this I want for you to be able to narrow the issue down. You can also enable debugging as indicated in comment #3 and see if upon a crash there is something useful in the log. hello, I have the same wifi hardware so I report this. Hope it helps, this happens to my pc: Ubuntu 8.10 (cat /proc/version_signature: Ubuntu 2.6.27-9.19-generic) Linksys WMP300N PCI card (AR5008 rev 01) Linksys WAG325N Wireless-N router (firmware v1.00.12) AP config: 11n enabled, WPA-PSK no crash but pc never connects to AP: - networkmanager scans and reports correctly the AP - when supplicant connection state changes from 2 -> 3, authentication times out I will try to install linux-image-2.6.27-11-generic and report what happens BR/Marco Ubuntu 2.6.27-9.19-generic is ancient, please add intrepid-proposed to your /etc/apt/sources.list and upgrade and try 2.6.27-11. Command output: mythtv@anpanman:~$ sudo rmmod ath9k mythtv@anpanman:~$ sudo dmesg -c >/dev/null mythtv@anpanman:~$ sudo modprobe -l ath9k /lib/modules/2.6.27-11-generic/kernel/drivers/net/wireless/ath9k/ath9k.ko mythtv@anpanman:~$ sudo modprobe ath9k mythtv@anpanman:~$ sudo dmesg -c [74759.572396] ath9k: 0.1 [74759.572967] ath9k 0000:01:01.0: PCI INT A -> Link[LNKB] -> GSI 17 (level, low) -> IRQ 17 [74760.006381] phy1: Selected rate control algorithm 'ath9k_rate_control' [74760.010727] phy1: Atheros 5416: mem=0xfaca0000, irq=17 [74760.012362] udev: renamed network interface wlan0 to ath0 mythtv@anpanman:~$ uname -a Linux anpanman 2.6.27-11-generic #1 SMP Thu Jan 22 17:22:40 UTC 2009 i686 GNU/Linux root@marco-desktop:/home/marco# rmmod ath9k root@marco-desktop:/home/marco# dmesg -c > /dev/null root@marco-desktop:/home/marco# modprobe -l ath9k /lib/modules/2.6.27-11-generic/kernel/drivers/net/wireless/ath9k/ath9k.ko root@marco-desktop:/home/marco# modprobe ath9k root@marco-desktop:/home/marco# dmesg -c [ 652.844783] ath9k: 0.1 [ 652.844842] ath9k 0000:01:01.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 [ 653.282128] phy1: Selected rate control algorithm 'ath9k_rate_control' [ 653.283276] phy1: Atheros 5416: mem=0xffffc200109c0000, irq=21 root@marco-desktop:/home/marco# uname -a Linux marco-desktop 2.6.27-11-generic #1 SMP Fri Jan 23 13:58:13 UTC 2009 x86_64 GNU/Linux it does not connect to the AP, although now I get the following: Jan 25 03:58:19 marco-desktop kernel: [ 811.602744] wlan0: authenticate with AP 00:18:39:a7:e1:d1 Jan 25 03:58:19 marco-desktop kernel: [ 811.605045] wlan0: authenticated Jan 25 03:58:19 marco-desktop kernel: [ 811.605052] wlan0: associate with AP 00:18:39:a7:e1:d1 Jan 25 03:58:19 marco-desktop kernel: [ 811.609536] wlan0: RX AssocResp from 00:18:39:a7:e1:d1 (capab=0x431 status=0 aid=1) Jan 25 03:58:19 marco-desktop kernel: [ 811.609544] wlan0: associated Jan 25 03:58:19 marco-desktop kernel: [ 811.609742] wlan0 (WE) : Wireless Event too big (366) Jan 25 03:59:04 marco-desktop kernel: [ 856.935477] wlan0: disassociating by local choice (reason=3) I've now tested after removing the nvidia module and using the nv driver instead. However, I still got the crash. I'm currently testing with no encryption, and so far, even after 10 minutes of iperf, no crash. So next I'll compile the latest compat-wireless and turn on debugging, then test again with encryption on. Looks like I spoke too soon. With encryption off, I started an iperf test lasting for 60 minutes. At some point during this, the crash happened. Marco -- if you cannot even associate with your AP that is separate issue, please open up a separate bug report on that and please use the latest vanilla stock kernel on kernel.org. I'd simply recommend to upgrade to 2.6.28.2 (Ubuntu Jaunty has this) though as there are a lot of changes and enhancements for ath9k there. Alternatively you can also just try to install compat-wireless which is based on the wireless-testing git tree: http://wireless.kernel.org/en/users/Download Barry -- I cannot reproduce on 5 GHz on all sorts of different configurations, I'm going to try 2 GHz now. Since we cannot get an oops message to work with can you help us try to narrow it down further? For instance can you try to scp over a 100M file to a box, if it hits the bug then try 50 M, if it doesn't then try 200M (after a fresh reboot). Also please provide the output of: free -m cat /proc/cpuinfo cat /proc/interrupts lspci Been running iperf over WPA2 AES on 11n-only HT20 configuration and I see no crashes. I did see a lag in communication though but after a few seconds everything was running smooth. So, so far I am unable to reproduce this. Unfortunately without a captured oops message and without being unable to reproduce this we cannot help you. Will try to see if we have the APs you described at the office but what would help the best is to try to get an oops from you. Can you try to use the latest vanilla 2.6.27 which is not at 2.6.27.13 from kernel.org? http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.27.13.tar.bz2 This kernel doesn't have any ath9k specific patches but I am curious if other parts of the kernel have been massaged which may fix your issue. Additionally you can try the latest wireless-testing git tree which is based as of this writing on 2.6.29-rc2, it just has also the latest wireless subsystem enhancements. Wireless testing: http://wireless.kernel.org/en/developers/Documentation/git-guide If you don't want to compile the entire kernel you can just try compat-wireless to get only the wireless subsystem upgraded: http://wireless.kernel.org/en/users/Download I am very interested in getting your issue fixed on the 2.6.27 kernel series, kernel crashes are simply unacceptable, but would like to know if you still are seeing your issue on the latest wireless subsystem (through wireless-testing or compat-wirelss) on later kernels (2.6.28 or 2.6.29-rc2). Oh BTW my new tests were now on 2.4 GHz, so no luck there either on trying to reproduce this. The fact that you cannot reproduce a console makes me wonder if this is really ath9k specific. Your /proc/interrupts output is appreciated. Can you try to test again over the console and see if you can reproduce? Created attachment 19987 [details]
syslog of coomunication with two AP
before reading the last posts I recompiled ath9k driver with:
+#define DBG_DEFAULT (ATH_DBG_ANY)
Hoping it helps, I am attaching a syslog where two AP are used:
- 00:14:6c:a9:71:40 : NETGEAR DG834G + Linksys WMP300N PCI card, WPA-PSK: association works and link is established
- 00:18:39:a7:e1:d1 : Linksys WAG325N Wireless-N router + Linksys WMP300N PCI card, WPA-PSK: association does not work
the preshared key is the same for both AP, also the SSID
now I will try compat-wireless.
Please disable Network Manager, and use wpa_supplicant from the console and see if the problem still occurs. NetworkManager: <WARN> get_secrets_cb(): Couldn't get connection secrets: applet-device-wifi.c.1512 (get_secrets_dialog_response_cb): canceled. NetworkManager: <info> (wlan0): device state change: 6 -> 9 NetworkManager: <info> Activation (wlan0) failed for access point (NTGR_p5ypqysp1fqt3cdu5iaarxz) NetworkManager: <info> Marking connection 'Auto NTGR_p5ypqysp1fqt3cdu5iaarxz' invalid. NetworkManager: <info> Activation (wlan0) failed. NetworkManager: <info> (wlan0): device state change: 9 -> 3 NetworkManager: <info> (wlan0): deactivating device (reason: 0). That was from the log. So maybe NM has something to do with this. I have already tried to stop NetworkManager and use wpa_supplicant from the console, but the result is the same. Marco -- please keep your issue, which is irrelevant to this bug report, separate. This specific bug report is about a hang that is caused by connecting to an AP. Your issue is not being able to establish a connection at all. Please file a separate bug report. Output from "free -m": total used free shared buffers cached Mem: 2024 1789 234 0 48 1150 -/+ buffers/cache: 590 1433 Swap: 4690 3 4687 Output from /proc/cpuinfo: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping : 3 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 1999.95 clflush size : 64 power management: ts fid vid ttp tm stc processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping : 3 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 1999.95 clflush size : 64 power management: ts fid vid ttp tm stc Output from /proc/interrupts: CPU0 CPU1 0: 112 2466 IO-APIC-edge timer 1: 0 2 IO-APIC-edge i8042 3: 1 6 IO-APIC-edge 4: 1 976 IO-APIC-edge serial 7: 1 0 IO-APIC-edge parport0 8: 0 62 IO-APIC-edge rtc0 9: 0 0 IO-APIC-fasteoi acpi 12: 0 4 IO-APIC-edge i8042 14: 0 0 IO-APIC-edge pata_amd 15: 0 0 IO-APIC-edge pata_amd 16: 9 1532 IO-APIC-fasteoi ivtv1 17: 1076 165749 IO-APIC-fasteoi ivtv0 18: 36742 10315086 IO-APIC-fasteoi bttv0, bt878 19: 41434 22896955 IO-APIC-fasteoi ohci1394, nvidia 21: 19669 4214618 IO-APIC-fasteoi sata_nv, HDA Intel 22: 79967 36610948 IO-APIC-fasteoi ehci_hcd:usb2, sata_nv 23: 8906 2074623 IO-APIC-fasteoi ohci_hcd:usb1, sata_nv 218: 110570 42612308 PCI-MSI-edge eth2 NMI: 0 0 Non-maskable interrupts LOC: 48716368 46390275 Local timer interrupts RES: 3540254 1631848 Rescheduling interrupts CAL: 98308 96379 function call interrupts TLB: 218727 160647 TLB shootdowns SPU: 0 0 Spurious interrupts ERR: 1 MIS: 0 I'm just about to try the latest compat-wireless. Finally reproduced on a single-user console, no encryption, using 11n at 2.4GHz. Ran iperf once, it ran OK. Ran it again, got an immediate crash. Using compat-wireless-2009-02-06 with the serialization patches applied. Barry, good to hear you are able to reproduce, how easily can you reproduce? Did you manage to see a panic at all? Also on the above /proc/interrupts output I do not see ath9k, can you print out that file with ath9k loaded? Also can you reproduce the crash if you boot with this option: maxcpus=1 You can add that to your grub menu.lst. Bary, please remember to enable NMI watchdog when reproducing over the console: nmi_watchdog=1 Just got another crash. Using compat-wireless-2009-02-14, with your latest serialisation patches, nmi_watchdog=1. Same as before, single user console, no encryption, but this time it happened as soon as I did "modprobe ath9k". Didn't get any console output, just a complete freeze. I'll try with maxcpus=1 now. I did notice that in /proc/interrupts, I was seeing ivtv and ath sharing an interrupt. Might this be an issue? I enabled maxcpus=1 and I've spent the last 45 minutes trying to produce a crash, but it's been rock solid, both with and without encryption. I used iperf to perform 10M, 100M and 1000M tests, and could not get a crash. I am having the exact same issue with my AR5008 based card and ath9k and ubuntu 8.10 (or 9.04alpha4) on a corei7 cpu - and it also works flawlessly with maxcpus=1. I couldn't test on windows, as i don't have a windows cd. I just saw windows users also have NMI:parity errors with AR5008 sometimes, so this might be related. Actually someone has a "modded" inf files which seems to work, no idea what they did to it, maybe it is a help though: http://www.laptopvideo2go.com/forum/lofiversion/index.php/t15297-0.html I also noticed that even when using madwifi, i get *occasional* lockups with my card, but usually "just" once a day or so, and only after transfering large amounts of data, so maybe the problem lies in a different part of the kernel or probably really in a hardware compatibility issue with multicore as the same card works flawlessly with ath9k on my girlfriend's single CPU machine. I am pretty sure I ran into this bug as well on archlinux using the 2.6.28 kernel. I also have a dual core AMD athlonX2 4200+ cpu. The wireless card I have is a linksys wmp110. It has reports to lspci as an AR5008. My issue is when running this with the ath9k module loaded it randomly hard locks my computer. I always have to use the reboot. I have seen it hard lock while doing iwlist wlan0 scan. I have also seen it work for 2 hours without issue, then for no apparent reason it hard locks. I have to admit, this is my first wireless setup. But, getting the connection setup is not the problem. It's keeping it without locking up my PC. I have tried with encryption and no encryption. It doesn't make a difference. I will try the maxcpus=1 on boot and see if that helps. After that I will try the 2.6.29 rc kernel and see if that helps. I will report back, I need to get this fixed. Like Barry, I enabled maxcpus=1 on boot and my computer has been rock solid for 20 hours now. I have downloaded multiple torrents over 1.4gb and used youtube.com under four tabs in FF for couple hours. Could not get my computer to hard lock. I also tried spamming the iwlist wlan0 scan command which had also caused a hard lock for me. Still rock solid. Created attachment 20383 [details]
SMP fixes v3
Barry can you try these patches.
I can't get to the patch, says 'Not Found' Not that they would work for me, but I would at least like to try them out as well. Using a little common sense (browsing the ath9k directory of tht link), I found that: http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-02-24/serialization-v3.patch But why wouldn't they work for you? Issue is resolved for me, when can we expect to see this patch pushed into the main tree? I used compat-wireless-2009-02-26 with the new patches, and while I would love to report success, I actually got a crash the second time I ran iperf with a 50MB data transfer. Its not going upstream just yet as I am not happy with the patch just yet in fact I think the d1 d0 patch is just a paper bag nasty fix for another issue which I'm trying to zero in on now. Will post as soon as I have updates. Thanks for testing and the reports -- they help a lot. Created attachment 20475 [details]
Serialization v4
Give these a shot, these are now on their way to wireless-testing.
You can try out the compat-wireless stuff now too, that has these patches merged. Tested iperf with 10M, 100M, 1000M and 2000M and no crash. Looking good, well done Luis! Doesn't seem to work for me, tested with compat-wireless-2009-03-12 At first I could not even load the module correctly, as I have those options: options cfg80211 ieee80211_regdom=EU [ 94.988866] cfg80211: Using static regulatory domain info [ 94.988871] cfg80211: Regulatory domain: EU [ 94.988873] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [ 94.988875] (2402000 KHz - 2482000 KHz @ 40000 KHz), (600 mBi, 2000 mBm) [ 94.988878] (5170000 KHz - 5190000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) [ 94.988880] (5190000 KHz - 5210000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) [ 94.988883] (5210000 KHz - 5230000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) [ 94.988885] (5230000 KHz - 5330000 KHz @ 40000 KHz), (600 mBi, 2000 mBm) [ 94.988887] (5490000 KHz - 5710000 KHz @ 40000 KHz), (600 mBi, 3000 mBm) [ 95.084872] ath9k 0000:09:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 95.518451] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 95.518454] IP: [<ffffffffa0c1934e>] freq_reg_info_regd+0x2e/0x180 [cfg80211] [ 95.518461] PGD 1a98ba067 PUD 14d1d3067 PMD 0 [ 95.518463] Oops: 0000 [1] SMP [ 95.518465] CPU 3 [ 95.518466] Modules linked in: ath9k(+) mac80211 rfkill cfg80211 led_class binfmt_misc rfcomm bridge stp bnep sco l2cap bluetooth ipv6 ppdev acpi_cpufreq cpufreq_powersave cpufreq_stats cpufreq_ondemand freq_table cpufreq_conservative cpufreq_userspace container sbs sbshc video output pci_slot wmi battery iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy nvidia(P) snd_seq_oss i2c_core snd_seq_midi pcspkr evdev snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd button soundcore shpchp pci_hotplug snd_page_alloc ext3 jbd mbcache sr_mod cdrom pata_acpi sd_mod crc_t10dif sg pata_jmicron usbhid hid ata_generic ahci libata ohci1394 scsi_mod ieee1394 uhci_hcd dock ehci_hcd usbcore thermal processor fan fbcon tileblit font bitblit softcursor fuse [ 95.518503] Pid: 6841, comm: modprobe Tainted: P 2.6.27-13-generic #1 [ 95.518504] RIP: 0010:[<ffffffffa0c1934e>] [<ffffffffa0c1934e>] freq_reg_info_regd+0x2e/0x180 [cfg80211] [ 95.518509] RSP: 0018:ffff88014d519b48 EFLAGS: 00010286 [ 95.518510] RAX: 0000000000000000 RBX: ffffffffa0cadfe0 RCX: ffff88014d519ba0 [ 95.518511] RDX: ffff88014d519bac RSI: 000000000024cde0 RDI: ffff88015e5840a0 [ 95.518513] RBP: ffff88014d519b78 R08: ffffffffa0ca5b28 R09: 0000000000000004 [ 95.518514] R10: ffff88014d519b08 R11: ffffffffa0c23228 R12: 0000000000000000 [ 95.518515] R13: ffff88015e585e58 R14: 0000000000000000 R15: ffff88014d519bac [ 95.518517] FS: 00007fee20e7e6e0(0000) GS:ffff8801af06b080(0000) knlGS:0000000000000000 [ 95.518518] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 95.518519] CR2: 0000000000000004 CR3: 00000001aa5f3000 CR4: 00000000000006e0 [ 95.518521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 95.518522] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 95.518524] Process modprobe (pid: 6841, threadinfo ffff88014d518000, task ffff880152088000) [ 95.518525] Stack: ffff88014d519ba0 ffffffffa0cadfe0 0000000000000000 ffff88015e585e58 [ 95.518528] 0000000000000000 0000000000000000 ffff88014d519bd8 ffffffffa0c1959b [ 95.518531] 0000000000000004 ffffffffa0ca5b28 ffff88015e5840a0 0000000000000000 [ 95.518534] Call Trace: [ 95.518538] [<ffffffffa0c1959b>] wiphy_apply_custom_regulatory+0xdb/0x180 [cfg80211] [ 95.518546] [<ffffffffa0c8d99c>] ath_attach+0x5fc/0xb70 [ath9k] [ 95.518551] [<ffffffff80254929>] ? tasklet_init+0x9/0x30 [ 95.518557] [<ffffffffa0c95652>] ath_pci_probe+0x192/0x370 [ath9k] [ 95.518561] [<ffffffff803a4d6a>] ? kobject_get+0x1a/0x30 [ 95.518565] [<ffffffff803bc219>] ? pci_match_id+0x9/0xa0 [ 95.518568] [<ffffffff803bd1c0>] pci_device_probe+0xe0/0x140 [ 95.518572] [<ffffffff8034c033>] ? sysfs_create_link+0x13/0x20 [ 95.518577] [<ffffffff80431a82>] really_probe+0x72/0x1a0 [ 95.518580] [<ffffffff80431c00>] driver_probe_device+0x50/0x60 [ 95.518583] [<ffffffff80431c9b>] __driver_attach+0x8b/0x90 [ 95.518585] [<ffffffff80431c10>] ? __driver_attach+0x0/0x90 [ 95.518588] [<ffffffff8043120b>] bus_for_each_dev+0x6b/0xa0 [ 95.518591] [<ffffffff802e23eb>] ? kmem_cache_alloc+0x8b/0xd0 [ 95.518596] [<ffffffffa0087000>] ? ath9k_init+0x0/0x57 [ath9k] [ 95.518600] [<ffffffff804318e1>] driver_attach+0x21/0x30 [ 95.518602] [<ffffffff80430a78>] bus_add_driver+0x1f8/0x270 [ 95.518607] [<ffffffffa0087000>] ? ath9k_init+0x0/0x57 [ath9k] [ 95.518611] [<ffffffff80431e95>] driver_register+0x75/0x170 [ 95.518616] [<ffffffffa0087000>] ? ath9k_init+0x0/0x57 [ath9k] [ 95.518619] [<ffffffff803bd4c2>] __pci_register_driver+0x72/0xc0 [ 95.518624] [<ffffffffa0087000>] ? ath9k_init+0x0/0x57 [ath9k] [ 95.518630] [<ffffffffa0c95853>] ath_pci_init+0x23/0x28 [ath9k] [ 95.518635] [<ffffffffa008701e>] ath9k_init+0x1e/0x57 [ath9k] [ 95.518639] [<ffffffff8020a041>] do_one_initcall+0x41/0x170 [ 95.518642] [<ffffffff8026c3e1>] ? __blocking_notifier_call_chain+0x21/0x90 [ 95.518647] [<ffffffff8027d1a5>] sys_init_module+0xb5/0x1f0 [ 95.518650] [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b [ 95.518651] [ 95.518652] [ 95.518652] Code: e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 e8 4a 93 5f df 48 8b 05 5b 20 01 00 4c 8b 1d 4c 20 01 00 48 89 4d d0 4d 85 c0 49 89 d7 <8b> 40 04 4d 0f 45 d8 83 f8 03 0f 84 aa 00 00 00 83 e8 01 0f 84 [ 95.518671] RIP [<ffffffffa0c1934e>] freq_reg_info_regd+0x2e/0x180 [cfg80211] [ 95.518676] RSP <ffff88014d519b48> [ 95.518677] CR2: 0000000000000004 [ 95.518680] ---[ end trace 1cbd0f4b44c744df ]--- Without those options, loading the US regulatory settings, I could load the module without any errors in dmesg. But a few seconds later (I guess when nm-applet tries to connect to the network), the same complete lockup as before: No mouse, no keyboard, nothing. Reboot required. Maybe this is a slightly different error, as I always get the lockups while connecting, while some other users here, like Barry, could connect with the 'old' ath9k but only got lockups after transfering data? The serialization patch series was reverted from wireless-testing so that's why you got a hang, we're going to submit something smaller. Created attachment 20518 [details]
Backport of serialization patches -- v6 serialization for 2.6.27-2.6.29
The 2.6.30 have been submitted to John. If you want to use this on older kernels you can use the backported versions of the patches. Once John propagates the 2.6.29 patches to Linus then the other stable series will get the fixes for 2.6.27 and 2.6.28. For those that can't wait you can use the patches in the folder here.
I just thought I'd just chime back in on this issue. But, first of all, thank you luis and barry(debugging work)for working on this issue and getting it resolved. I sure do appreciate it. I applied the patch to the 2.6.28 source and compiled the kernel. All went well, there absolutely no issues with the patch and compilation. Also the machine has been running for close to 12 hours now with no signs of problems. I think I am fully satisfied with this patch for now, thanks again for all the hard work. |