Bug 30342 - ath5k has 90-95% outgoing packet loss for certain packet sizes unless nohwcrypt is specified
Summary: ath5k has 90-95% outgoing packet loss for certain packet sizes unless nohwcry...
Status: RESOLVED OBSOLETE
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-01 23:21 UTC by Musaraigne
Modified: 2012-08-17 11:02 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.35
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Musaraigne 2011-03-01 23:21:48 UTC
The bug was reported in Ubuntu Lucid and Maverick almost a year ago, but it seems that nobody filed a bug here.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/568090

I'm not sure if it affects only AR2413 or if it can affect other chipsets.


According to my tests, outgoing packets with a size of the form
  size = 128*k + 81 + m
or
  size = 128*k + 105 + m
for any k>=2 and 0<=m<=7

are dropped randomly in 90-95% of the cases. Conversely, all other packet sizes work fine.

On systems affected by the bug,
  ping -M do -s 596 www.google.com
should result in 90% packet loss (because 624 = 128*4 + 105 + 7; the 28 byte difference comes from network headers added by ping)
while
  ping -M do -s 597 www.google.com
should result in negligible packet loss.
Comment 1 John W. Linville 2011-03-02 13:30:08 UTC
Definitely need someone familiar with the hardware/driver for this one...
Comment 2 John W. Linville 2011-03-29 18:23:38 UTC
Bob and/or Nick, ping?
Comment 3 Bob Copeland 2011-03-31 12:44:29 UTC
Well, that is quite a find.  I can't reproduce on my hardware though, maybe it does only affect AR2413.  What encryption are you using? WEP/TKIP/CCMP?
Comment 4 Nick Kossifidis 2011-03-31 14:19:52 UTC
I could only think of a relation to 40bit and 104bit key lengths. If we were talking on bits instead of bytes we could say that 81 + n = 80 + m = 40*2 + 1byte(max) and 105 + n = 104 + m = 104 + 1byte(max), where 0 <= n <= 7 and 1 <= m <= 8. But this is weird, I haven't looked at key cache code for a long time, it's now part of the common ath module between ath5k/ath9k so Atheros people have also worked on that part. Are we sure this is a software bug ? Have you tried madwifi or ndiswrapper + windows driver ? Maybe you 've hit a hw bug. Also can you use a card in monitor mode and see if you can get any more infos with wireshark ?
Comment 5 Musaraigne 2011-04-01 14:09:25 UTC
I don't have the affected system on hand any more, but I'll try to answer your questions. I just mentioned this bug on Ubuntu Launchpad so that people with access to the hardware can do further testing.

Encryption type: for me it was on a 802.11g WPA2 network, but I've seen several posts saying it happened with WEP too.

Chipset: all reports that specified a chipset mentioned AR2413, so even if it was not the only chipset affected it is at least the most popular among them.

Software vs hardware? The bug is definitely caused by ath5k: madwifi works fine, windows drivers on windows work fine (haven't tried ndiswrapper + windows driver). Of course ath5k could be triggering a hardware bug that isn't triggered by other drivers, which would explain why not all cards are affected, but either way it's an issue that could be fixed in software.

Capture: as far as I remember, I didn't see anything unusual in wireshark: the lost packets showed up in wireshark but simply didn't get sent. Received packets were fine.



The 40/104 bit theory doesn't make much sense to me -- is there any reason for 40 to be multiplied by 2 but not 104, or for bits to get translated to bytes? Sounds to me more like a coincidence due to the fact that key lengths expressed in bits are multiples of 8.

I didn't want to attempt any analysis in my initial report, but you could define
    number_of_blocks = ceiling(packet_size/8)
and then the packet will encounter the bug iff number_of_blocks>=32 and the lower 4 bits of number_of_blocks are equal to 1011 or 1110 in binary:
    bad_packet = (bit7 | bit6 | bit5) & bit3 & bit1 & (bit2 ^ bit0)

I have no idea how the driver or hardware works, or if 8-byte blocks are used at all by the hardware, but perhaps there could be some I/O register containing number_of_blocks (or 8*number_of_blocks), and if this register is not reset at some point, then bad things can happen depending on its value? Of course this is pure speculation, but it gives an example of a plausible mechanism and how to fix the bug in this case.

Also, perhaps looking for an explanation of why exactly those packet sizes are a problem is pointless: if you're triggering an undefined behavior of the circuit, those sizes would simply be a byproduct of the hardware circuit synthesis process and would bring no insight on how to solve the problem.
Comment 6 Jeremy 2011-04-13 23:30:22 UTC
This is affecting 2.6.37-2.6.38.2 as well and seems to be related to the hw encryption referred to in bug #31452.  My setup is AP mode running WPA2 on an AR2413 based card.  After disabling hw encryption on the module, the packet loss no longer occurs and throughput returns to normal levels.

My last round of testing was on 2.6.38.2.
Comment 7 Peter Ford 2011-06-09 13:09:19 UTC
My first comment was on the Ubuntu bug listed above.  Some of the subsequent comments there have speculated whether this only affects AR2413 devices using an Askey subsystem.  I recently had a chance to look again at the laptop where I encountered the bug.  It *does* use an Askey subsystem.  And after re-enabling hwcrypt I was able to reproduce the ping behaviour described by Musaraigne.

For more details, see this condensed version of my terminal session, below.

The kernel was 2.6.32.32 (IIRC).

The stats I got were:
 Packet Size, % Loss
 597, 0
 596, 81
 595, 83
 589, 83
 588, 0

 
admin@laptop:~$ lspci -vnn

09:04.0 Ethernet controller [0200]: Atheros Communications Inc. AR2413 802.11bg NIC [168c:001a] (rev 01)
	Subsystem: Askey Computer Corp. Device [144f:7094]
	Flags: bus master, medium devsel, latency 168, IRQ 22
	Memory at c0110000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: <access denied>
	Kernel driver in use: ath5k
	Kernel modules: ath5k

admin@laptop:~$ 
admin@laptop:~$ ping -M do -s 596 www.google.com
PING www.l.google.com (209.85.143.99) 596(624) bytes of data.
--- www.l.google.com ping statistics ---
103 packets transmitted, 19 received, 81% packet loss, time 102510ms
rtt min/avg/max/mdev = 42.848/43.904/45.322/0.665 ms
admin@laptop:~$
Comment 8 Alan 2012-08-17 11:02:19 UTC
If this is still seen in modern kernels (3.2 etc) please update/re-open thanks

Note You need to log in before you can comment on or make changes to this bug.