Bug 31452

Summary: ath9k: throughput issue in 802.11n and also IBSS mode
Product: Networking Reporter: Richard Schütz (r.schtz)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: acknex32, alexandr.mekh, ath9k-devel, benjarobin+kernel, bruce, elykdav, hokasch, jeff, kmueller, linville, rjw, ryanskingsbury+kernel, shafi.wireless, thomas
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 27352    
Attachments: 0001-Revert-ath5k-Support-synth-only-channel-change-for-A.patch
Patch that should fix the slow performance issue on ath9k with kernel 2.6.38

Description Richard Schütz 2011-03-19 19:06:43 UTC
The troughput of my wireless connection with ath9k (AR9285 chipset) in 802.11n mode with 2.6.38 is much lower (up to over 10 times) compared to 2.6.37.4. 802.11g seems to be unaffected. I also noticed that the "Invalid misc" counter shown by iwconfig rises quickly.
Comment 1 Jeff Cook 2011-03-19 23:01:07 UTC
I'm experiencing this too, but not in 802.11n mode. I am trying to use ath5k to host an ad-hoc connection and ath9k to connect to it. The throughput is measured in 1xx bytes per second with 2.6.38 on both of these devices, making it unusable. I also see 80-100% packet loss between the client 9k and the host 5k. This happens if both or either machine is running 2.6.38.

2.6.37 works fine and I get reasonable packet loss (5-10%, not bad for my dinky host card). The throughput is several Mb/s with 2.6.37.

I attempted to bisect and ended up with a commit way back in November 2010, which should have been merged as part of 2.6.37. I assume something went wrong on my bisect for me to end up there, but it does appear that compat-wireless from 11/22 works while 11/24 doesn't. The commit that the bisect gave me was 8aec7af9.

That bisection was done intermittently with available compat-wireless archives. I am in the middle of testing on real kernels that I am running my hardware from, instead of just using compat-wireless to pull in from the git tree. I will let you know if this looks any different, but beware that so far it doesn't seem to; I have been dipped back into 37-rc7 and while I marked it as good since it behaved slightly better than later versions, it was still much slower than 37 final. I'm not sure what's going on there but everyone seems to agree something is messed up.

This bug is also being tracked on Launchpad at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/735171 and discussed on Arch Linux Mailing List at http://mailman.archlinux.org/pipermail/arch-general/2011-March/019002.html .
Comment 2 John W. Linville 2011-03-21 13:44:19 UTC
Have you tried reverting 8aec7af9 on 2.6.38?  Does that resolve the issue?
Comment 3 Jeff Cook 2011-03-21 15:54:48 UTC
I did try but I was unable to successfully resolve the merge conflicts. I came out with compile errors. I haven't tried again since. I was making the rounds in IRC on Saturday looking for help merging but couldn't find any; if you can provide a patch I'd happily apply and test it.
Comment 4 John W. Linville 2011-03-21 17:45:46 UTC
It is a bit ugly to revert...I took a whack at it -- let me know if it runs and if so, does it still show the bug?
Comment 5 John W. Linville 2011-03-21 17:46:16 UTC
Created attachment 51542 [details]
0001-Revert-ath5k-Support-synth-only-channel-change-for-A.patch
Comment 6 Jeff Cook 2011-03-21 18:42:32 UTC
No patch applies with git apply and all hunks fail with patch -p1 when attempting to apply from v2.6.38 (521cb40b0c) and from f70f5b9dc. What commit should I use as the basis? Are you using wireless-next tree? Right now I'm applying to Linus's tree, can grab wireless-next again if necessary.
Comment 7 John W. Linville 2011-03-21 19:01:51 UTC
wireless-testing -- wireless-next-2.6 should work as well.
Comment 8 Jeff Cook 2011-03-21 19:04:40 UTC
Can you give me an SHA ref to make sure I'm on the right thing? I tried on HEAD from wireless-next (7d2c16befae) and got the same issue.
Comment 9 John W. Linville 2011-03-21 19:39:36 UTC
Here it applies without any error or any message at all on 7d2c16befae67b901e6750b845661c1fdffd19f1, either with 'patch -p1' or with 'git am'.
Comment 10 Jeff Cook 2011-03-21 22:33:03 UTC
Hmm, must have been some error with line-endings or something, I redownloaded and it now applies to 7d2c16befae without error. Thanks for that, sorry for the hassle. It appears to compile correctly -- I'll reboot in a sec and test.
Comment 11 Jeff Cook 2011-03-21 22:50:59 UTC
It works well with 8aec7af9 reverted using the patch here, so it seems my bisection at least hit on something. Any ideas why 2.6.37 works and 2.6.38 doesn't, and why my bisect doesn't show me the incompatible change in 38 that is colliding with this change that went into 37?

Thanks for all the help so far -- I am definitely happy to see this, maybe I can use the compat-wireless build method to get this running on an otherwise stable 38.
Comment 12 Richard Schütz 2011-03-21 23:21:14 UTC
This patch only affects ath5k. As expected it doesn't change anything with my ath9k and it's 802.11n performance. I tried to bisect between 2.6.37 and 2.6.38, but I wasn't successful because I had to skip many revisions that caused problems like unloadable modules and kernel panics. Also the effect varies strongly, so testing is pain in the ass.
Comment 13 Jeff Cook 2011-03-22 10:33:07 UTC
I'll attempt to bisect on my 9k machine sometime in the near future. Work is going to be ramping up, though, so if anyone else gets time before I post the results here please don't wait on me.
Comment 14 John W. Linville 2011-03-22 13:31:57 UTC
Ah, blast the ath5k/ath9k confusion...  Jeff, could you open a separate bug to address the problem associated with the commit you identified?
Comment 15 Jeff Cook 2011-03-26 20:08:00 UTC
John: I have done it, please see bug #31922. https://bugzilla.kernel.org/show_bug.cgi?id=31922
Comment 16 shafi 2011-03-28 05:00:00 UTC
(In reply to comment #13)
> I'll attempt to bisect on my 9k machine sometime in the near future. Work is
> going to be ramping up, though, so if anyone else gets time before I post the
> results here please don't wait on me.

Hi,
   you got anything regarding ath9k.
Comment 17 hokasch 2011-04-13 09:46:08 UTC
There seems to be a problem with hardware crypto. After unloading ath9k and modprobing with nowhwcrypt=1, download speed on a test file jumped from shaky 45 kb/s to 3.7 MB/s... 

Can you test if this resolves your issues? If not I open a new bug.
Comment 18 acknex32 2011-04-13 14:30:42 UTC
I tried modprobing ath9k with nohwcrypt=1 on my laptop with an AR9285, and it resulted in a complete freeze (panic?) after a short time: no keyboard/mouse, X display frozen, no SSH or even ping.

However, this did improve throughput/resulted in fewer dropped packets for the amount of time that it did work. :-\

I'm running Arch with the latest "generic" kernel from their repos (2.6.38-ARCH).
Comment 19 Richard Schütz 2011-04-13 15:23:03 UTC
It really looks like hardware encryption is related to this problem. With nohwcrypt=1 I have a stable data rate and no noticeable packet loss.
Comment 20 Benjamin Robin 2011-04-13 20:34:17 UTC
Creating /etc/modprobe.d/ath9.conf with : options ath9k nohwcrypt=1
fix the problem (Look like the speed is a little bit faster than with 2.6.37 but maybe it's just the Internet network...)
Comment 21 Alexandr Mekh 2011-04-15 04:59:04 UTC
With my AR9285 and with 2.6.38-2 (Arch Linux) kernel all works great, but after suspend/resume network speed decreases from ~70Mbit/s to 4Mbit/s. I've tried to unload/load module manually, adding nohwcrypt=1 to /etc/modprobe.d/ath9k.conf, editing /etc/pm/config.d/config with SUSPEND_MODULES="ath9k" option - nothing helps.
WiFi works fine after reboot, but only before suspend.
Comment 22 Alexandr Mekh 2011-04-15 05:22:51 UTC
Some more info.
$iwconfig 
wlan0     IEEE 802.11bgn  ESSID:"mech"  
          Mode:Managed  Frequency:2.417 GHz  Access Point: 00:23:69:C2:67:04   
          Bit Rate=150 Mb/s   Tx-Power=17 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=52/70  Signal level=-58 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:43  Invalid misc:671   Missed beacon:0

Error in dmesg output (after resume from suspend):

btusb 1-1.5:1.0: no reset_resume for driver btusb?
btusb 1-1.5:1.1: no reset_resume for driver btusb?
ata6: SATA link down (SStatus 0 SControl 300)
usb 1-1.2: reset high speed USB device using ehci_hcd and address 3
irq 17: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.38-ARCH #1
Call Trace:
<IRQ>  [<ffffffff813aa802>] ? __report_bad_irq.isra.3+0x33/0x81
 [<ffffffff810b86ce>] ? note_interrupt+0x18e/0x1d0
 [<ffffffff810b9415>] ? handle_fasteoi_irq+0xc5/0xf0
 [<ffffffff8100decd>] ? handle_irq+0x1d/0x30
 [<ffffffff8100db55>] ? do_IRQ+0x55/0xd0
 [<ffffffff813b25d3>] ? ret_from_intr+0x0/0x15
 <EOI>  [<ffffffff812ddf12>] ? poll_idle+0x32/0x70
 [<ffffffff812ddeee>] ? poll_idle+0xe/0x70
 [<ffffffff812df5b3>] ? menu_select+0xb3/0x330
 [<ffffffff812ddfe8>] ? cpuidle_idle_call+0x98/0x350
 [<ffffffff81009226>] ? cpu_idle+0xb6/0x100
 [<ffffffff81392f2d>] ? rest_init+0x91/0xa4
 [<ffffffff8160ccbd>] ? start_kernel+0x401/0x40e
 [<ffffffff8160c347>] ? x86_64_start_reservations+0x132/0x136
 [<ffffffff8160c140>] ? early_idt_handler+0x0/0x71
 [<ffffffff8160c44d>] ? x86_64_start_kernel+0x102/0x111
handlers:
[<ffffffffa03ced30>] (ath_isr+0x0/0x250 [ath9k])

Disabling IRQ #17
intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 10286, limit 9000
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
PM: resume of devices complete after 2059.580 msecs
PM: Finishing wakeup.
Restarting tasks ... done.
video LNXVIDEO:00: Restoring backlight state
wlan0: deauthenticated from 00:23:69:c2:67:04 (Reason: 6)
cfg80211: Calling CRDA for country: RU
wlan0: authenticate with 00:23:69:c2:67:04 (try 1)
wlan0: authenticated
wlan0: associate with 00:23:69:c2:67:04 (try 1)
wlan0: RX AssocResp from 00:23:69:c2:67:04 (capab=0x431 status=0 aid=1)
wlan0: associated
EXT4-fs (sda3): re-mounted. Opts: commit=0
EXT4-fs (sda4): re-mounted. Opts: commit=0
Comment 23 acknex32 2011-04-15 16:59:18 UTC
…as a sort of update to my previous comment above, the nohwcrypt option actually seems to work fine now. I've done a reboot recently, and so far so good; no abrupt lockups or poor performance.  I'm wondering if this might have to do with suspend, as I had done that before the lockup occurred after trying to modprobe it with nohwcrypt=1 (i.e. suspended with ath9k loaded normally, woke up, rmmodded ath9k, modprobed ath9k with nohwcrypt=1, lockup occurred).

Note also that I was doing an scp at the time to test network throughput…I'm wondering if perhaps heavy network activity had anything to do with it?
Comment 24 Ryan 2011-04-17 13:12:57 UTC
For what it's worth, this bug also affects my 64-bit Ubuntu system with an AR2003 chipset.  On  the 2.6.35 kernel wireless-n performance using ath9k is excellent, with 0% packet loss to the router.  Loading the 2.6.38 kernel causes 20%-40% loss to my router.

modprobing with nohwcrypt=1 fixes the problem.  Have not tested suspend/resume yet.

Thank you all for figuring this out!  I hope a fix makes its way into the kernel in time for the upcoming Ubuntu release, but they may be past the kernel freeze already.
Comment 25 Richard Schütz 2011-04-17 13:16:21 UTC
(In reply to comment #24)
> modprobing with nohwcrypt=1 fixes the problem.  Have not tested
> suspend/resume
> yet.
> 
> Thank you all for figuring this out!  I hope a fix makes its way into the
> kernel in time for the upcoming Ubuntu release, but they may be past the
> kernel
> freeze already.

Don't forget that this is just a workarround and no solution.
Comment 26 Thomas Bächler 2011-04-22 11:17:06 UTC
Same problem on the AR9280, on Arch Linux x86_64. 802.11n drop over 50% of the packages unless I load ath9k nohwcrypt.

Why is this bug in NEEDINFO state? What info is required?
Comment 27 tm512 2011-04-22 22:20:30 UTC
I am having some problem with ath9k on Arch Linux x86_64, kernel 2.6.38. My wireless card is the AR9285. I am using wireless G, and while the packet loss is not constant, it intermittently hops up to around 30%, then back down. Other times it's a steady 1% to 5% loss, which makes using SSH irritating, but doesn't affect downloads that much.

Of course, this was not an issue with 2.6.37 and earlier, so I have been using kernel 2.6.32 (Arch's LTS kernel) since, and have not tested to see if nohwcrypt fixes the issue.
Comment 28 shafi 2011-04-25 06:51:10 UTC
we will look into this very soon.
Comment 29 Hauke Mehrtens 2011-04-25 13:09:15 UTC
Created attachment 55392 [details]
Patch that should fix the slow performance issue on ath9k with kernel 2.6.38
Comment 30 Richard Schütz 2011-04-25 14:54:06 UTC
I have applied the patch and it looks like the performance is back to normal. But while testing I noticed another strange thing: under heavy load the link quality meter often shows 15/70 instead of the expected value. Can anyone cofirm this?
Comment 31 Rafael J. Wysocki 2011-04-30 20:04:32 UTC
Fixed by commit 115dad7 (ath9k_hw: partially revert "fix dma descriptor rx
error bit parsing").