Kernel Bug Tracker – Bug 31452
ath9k: throughput issue in 802.11n and also IBSS mode
Last modified: 2011-04-30 20:05:09 UTC
The troughput of my wireless connection with ath9k (AR9285 chipset) in 802.11n mode with 2.6.38 is much lower (up to over 10 times) compared to 22.214.171.124. 802.11g seems to be unaffected. I also noticed that the "Invalid misc" counter shown by iwconfig rises quickly.
I'm experiencing this too, but not in 802.11n mode. I am trying to use ath5k to host an ad-hoc connection and ath9k to connect to it. The throughput is measured in 1xx bytes per second with 2.6.38 on both of these devices, making it unusable. I also see 80-100% packet loss between the client 9k and the host 5k. This happens if both or either machine is running 2.6.38.
2.6.37 works fine and I get reasonable packet loss (5-10%, not bad for my dinky host card). The throughput is several Mb/s with 2.6.37.
I attempted to bisect and ended up with a commit way back in November 2010, which should have been merged as part of 2.6.37. I assume something went wrong on my bisect for me to end up there, but it does appear that compat-wireless from 11/22 works while 11/24 doesn't. The commit that the bisect gave me was 8aec7af9.
That bisection was done intermittently with available compat-wireless archives. I am in the middle of testing on real kernels that I am running my hardware from, instead of just using compat-wireless to pull in from the git tree. I will let you know if this looks any different, but beware that so far it doesn't seem to; I have been dipped back into 37-rc7 and while I marked it as good since it behaved slightly better than later versions, it was still much slower than 37 final. I'm not sure what's going on there but everyone seems to agree something is messed up.
This bug is also being tracked on Launchpad at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/735171 and discussed on Arch Linux Mailing List at http://mailman.archlinux.org/pipermail/arch-general/2011-March/019002.html .
Have you tried reverting 8aec7af9 on 2.6.38? Does that resolve the issue?
I did try but I was unable to successfully resolve the merge conflicts. I came out with compile errors. I haven't tried again since. I was making the rounds in IRC on Saturday looking for help merging but couldn't find any; if you can provide a patch I'd happily apply and test it.
It is a bit ugly to revert...I took a whack at it -- let me know if it runs and if so, does it still show the bug?
Created attachment 51542 [details]
No patch applies with git apply and all hunks fail with patch -p1 when attempting to apply from v2.6.38 (521cb40b0c) and from f70f5b9dc. What commit should I use as the basis? Are you using wireless-next tree? Right now I'm applying to Linus's tree, can grab wireless-next again if necessary.
wireless-testing -- wireless-next-2.6 should work as well.
Can you give me an SHA ref to make sure I'm on the right thing? I tried on HEAD from wireless-next (7d2c16befae) and got the same issue.
Here it applies without any error or any message at all on 7d2c16befae67b901e6750b845661c1fdffd19f1, either with 'patch -p1' or with 'git am'.
Hmm, must have been some error with line-endings or something, I redownloaded and it now applies to 7d2c16befae without error. Thanks for that, sorry for the hassle. It appears to compile correctly -- I'll reboot in a sec and test.
It works well with 8aec7af9 reverted using the patch here, so it seems my bisection at least hit on something. Any ideas why 2.6.37 works and 2.6.38 doesn't, and why my bisect doesn't show me the incompatible change in 38 that is colliding with this change that went into 37?
Thanks for all the help so far -- I am definitely happy to see this, maybe I can use the compat-wireless build method to get this running on an otherwise stable 38.
This patch only affects ath5k. As expected it doesn't change anything with my ath9k and it's 802.11n performance. I tried to bisect between 2.6.37 and 2.6.38, but I wasn't successful because I had to skip many revisions that caused problems like unloadable modules and kernel panics. Also the effect varies strongly, so testing is pain in the ass.
I'll attempt to bisect on my 9k machine sometime in the near future. Work is going to be ramping up, though, so if anyone else gets time before I post the results here please don't wait on me.
Ah, blast the ath5k/ath9k confusion... Jeff, could you open a separate bug to address the problem associated with the commit you identified?
John: I have done it, please see bug #31922. https://bugzilla.kernel.org/show_bug.cgi?id=31922
(In reply to comment #13)
> I'll attempt to bisect on my 9k machine sometime in the near future. Work is
> going to be ramping up, though, so if anyone else gets time before I post the
> results here please don't wait on me.
you got anything regarding ath9k.
There seems to be a problem with hardware crypto. After unloading ath9k and modprobing with nowhwcrypt=1, download speed on a test file jumped from shaky 45 kb/s to 3.7 MB/s...
Can you test if this resolves your issues? If not I open a new bug.
I tried modprobing ath9k with nohwcrypt=1 on my laptop with an AR9285, and it resulted in a complete freeze (panic?) after a short time: no keyboard/mouse, X display frozen, no SSH or even ping.
However, this did improve throughput/resulted in fewer dropped packets for the amount of time that it did work. :-\
I'm running Arch with the latest "generic" kernel from their repos (2.6.38-ARCH).
It really looks like hardware encryption is related to this problem. With nohwcrypt=1 I have a stable data rate and no noticeable packet loss.
Creating /etc/modprobe.d/ath9.conf with : options ath9k nohwcrypt=1
fix the problem (Look like the speed is a little bit faster than with 2.6.37 but maybe it's just the Internet network...)
With my AR9285 and with 2.6.38-2 (Arch Linux) kernel all works great, but after suspend/resume network speed decreases from ~70Mbit/s to 4Mbit/s. I've tried to unload/load module manually, adding nohwcrypt=1 to /etc/modprobe.d/ath9k.conf, editing /etc/pm/config.d/config with SUSPEND_MODULES="ath9k" option - nothing helps.
WiFi works fine after reboot, but only before suspend.
Some more info.
wlan0 IEEE 802.11bgn ESSID:"mech"
Mode:Managed Frequency:2.417 GHz Access Point: 00:23:69:C2:67:04
Bit Rate=150 Mb/s Tx-Power=17 dBm
Retry long limit:7 RTS thr:off Fragment thr:off
Link Quality=52/70 Signal level=-58 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:43 Invalid misc:671 Missed beacon:0
Error in dmesg output (after resume from suspend):
btusb 1-1.5:1.0: no reset_resume for driver btusb?
btusb 1-1.5:1.1: no reset_resume for driver btusb?
ata6: SATA link down (SStatus 0 SControl 300)
usb 1-1.2: reset high speed USB device using ehci_hcd and address 3
irq 17: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.38-ARCH #1
<IRQ> [<ffffffff813aa802>] ? __report_bad_irq.isra.3+0x33/0x81
[<ffffffff810b86ce>] ? note_interrupt+0x18e/0x1d0
[<ffffffff810b9415>] ? handle_fasteoi_irq+0xc5/0xf0
[<ffffffff8100decd>] ? handle_irq+0x1d/0x30
[<ffffffff8100db55>] ? do_IRQ+0x55/0xd0
[<ffffffff813b25d3>] ? ret_from_intr+0x0/0x15
<EOI> [<ffffffff812ddf12>] ? poll_idle+0x32/0x70
[<ffffffff812ddeee>] ? poll_idle+0xe/0x70
[<ffffffff812df5b3>] ? menu_select+0xb3/0x330
[<ffffffff812ddfe8>] ? cpuidle_idle_call+0x98/0x350
[<ffffffff81009226>] ? cpu_idle+0xb6/0x100
[<ffffffff81392f2d>] ? rest_init+0x91/0xa4
[<ffffffff8160ccbd>] ? start_kernel+0x401/0x40e
[<ffffffff8160c347>] ? x86_64_start_reservations+0x132/0x136
[<ffffffff8160c140>] ? early_idt_handler+0x0/0x71
[<ffffffff8160c44d>] ? x86_64_start_kernel+0x102/0x111
[<ffffffffa03ced30>] (ath_isr+0x0/0x250 [ath9k])
Disabling IRQ #17
intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 10286, limit 9000
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
PM: resume of devices complete after 2059.580 msecs
PM: Finishing wakeup.
Restarting tasks ... done.
video LNXVIDEO:00: Restoring backlight state
wlan0: deauthenticated from 00:23:69:c2:67:04 (Reason: 6)
cfg80211: Calling CRDA for country: RU
wlan0: authenticate with 00:23:69:c2:67:04 (try 1)
wlan0: associate with 00:23:69:c2:67:04 (try 1)
wlan0: RX AssocResp from 00:23:69:c2:67:04 (capab=0x431 status=0 aid=1)
EXT4-fs (sda3): re-mounted. Opts: commit=0
EXT4-fs (sda4): re-mounted. Opts: commit=0
…as a sort of update to my previous comment above, the nohwcrypt option actually seems to work fine now. I've done a reboot recently, and so far so good; no abrupt lockups or poor performance. I'm wondering if this might have to do with suspend, as I had done that before the lockup occurred after trying to modprobe it with nohwcrypt=1 (i.e. suspended with ath9k loaded normally, woke up, rmmodded ath9k, modprobed ath9k with nohwcrypt=1, lockup occurred).
Note also that I was doing an scp at the time to test network throughput…I'm wondering if perhaps heavy network activity had anything to do with it?
For what it's worth, this bug also affects my 64-bit Ubuntu system with an AR2003 chipset. On the 2.6.35 kernel wireless-n performance using ath9k is excellent, with 0% packet loss to the router. Loading the 2.6.38 kernel causes 20%-40% loss to my router.
modprobing with nohwcrypt=1 fixes the problem. Have not tested suspend/resume yet.
Thank you all for figuring this out! I hope a fix makes its way into the kernel in time for the upcoming Ubuntu release, but they may be past the kernel freeze already.
(In reply to comment #24)
> modprobing with nohwcrypt=1 fixes the problem. Have not tested suspend/resume
> Thank you all for figuring this out! I hope a fix makes its way into the
> kernel in time for the upcoming Ubuntu release, but they may be past the kernel
> freeze already.
Don't forget that this is just a workarround and no solution.
Same problem on the AR9280, on Arch Linux x86_64. 802.11n drop over 50% of the packages unless I load ath9k nohwcrypt.
Why is this bug in NEEDINFO state? What info is required?
I am having some problem with ath9k on Arch Linux x86_64, kernel 2.6.38. My wireless card is the AR9285. I am using wireless G, and while the packet loss is not constant, it intermittently hops up to around 30%, then back down. Other times it's a steady 1% to 5% loss, which makes using SSH irritating, but doesn't affect downloads that much.
Of course, this was not an issue with 2.6.37 and earlier, so I have been using kernel 2.6.32 (Arch's LTS kernel) since, and have not tested to see if nohwcrypt fixes the issue.
we will look into this very soon.
Created attachment 55392 [details]
Patch that should fix the slow performance issue on ath9k with kernel 2.6.38
I have applied the patch and it looks like the performance is back to normal. But while testing I noticed another strange thing: under heavy load the link quality meter often shows 15/70 instead of the expected value. Can anyone cofirm this?
Fixed by commit 115dad7 (ath9k_hw: partially revert "fix dma descriptor rx
error bit parsing").