The bug was reported in Ubuntu Lucid and Maverick almost a year ago, but it seems that nobody filed a bug here. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/568090 I'm not sure if it affects only AR2413 or if it can affect other chipsets. According to my tests, outgoing packets with a size of the form size = 128*k + 81 + m or size = 128*k + 105 + m for any k>=2 and 0<=m<=7 are dropped randomly in 90-95% of the cases. Conversely, all other packet sizes work fine. On systems affected by the bug, ping -M do -s 596 www.google.com should result in 90% packet loss (because 624 = 128*4 + 105 + 7; the 28 byte difference comes from network headers added by ping) while ping -M do -s 597 www.google.com should result in negligible packet loss.
Definitely need someone familiar with the hardware/driver for this one...
Bob and/or Nick, ping?
Well, that is quite a find. I can't reproduce on my hardware though, maybe it does only affect AR2413. What encryption are you using? WEP/TKIP/CCMP?
I could only think of a relation to 40bit and 104bit key lengths. If we were talking on bits instead of bytes we could say that 81 + n = 80 + m = 40*2 + 1byte(max) and 105 + n = 104 + m = 104 + 1byte(max), where 0 <= n <= 7 and 1 <= m <= 8. But this is weird, I haven't looked at key cache code for a long time, it's now part of the common ath module between ath5k/ath9k so Atheros people have also worked on that part. Are we sure this is a software bug ? Have you tried madwifi or ndiswrapper + windows driver ? Maybe you 've hit a hw bug. Also can you use a card in monitor mode and see if you can get any more infos with wireshark ?
I don't have the affected system on hand any more, but I'll try to answer your questions. I just mentioned this bug on Ubuntu Launchpad so that people with access to the hardware can do further testing. Encryption type: for me it was on a 802.11g WPA2 network, but I've seen several posts saying it happened with WEP too. Chipset: all reports that specified a chipset mentioned AR2413, so even if it was not the only chipset affected it is at least the most popular among them. Software vs hardware? The bug is definitely caused by ath5k: madwifi works fine, windows drivers on windows work fine (haven't tried ndiswrapper + windows driver). Of course ath5k could be triggering a hardware bug that isn't triggered by other drivers, which would explain why not all cards are affected, but either way it's an issue that could be fixed in software. Capture: as far as I remember, I didn't see anything unusual in wireshark: the lost packets showed up in wireshark but simply didn't get sent. Received packets were fine. The 40/104 bit theory doesn't make much sense to me -- is there any reason for 40 to be multiplied by 2 but not 104, or for bits to get translated to bytes? Sounds to me more like a coincidence due to the fact that key lengths expressed in bits are multiples of 8. I didn't want to attempt any analysis in my initial report, but you could define number_of_blocks = ceiling(packet_size/8) and then the packet will encounter the bug iff number_of_blocks>=32 and the lower 4 bits of number_of_blocks are equal to 1011 or 1110 in binary: bad_packet = (bit7 | bit6 | bit5) & bit3 & bit1 & (bit2 ^ bit0) I have no idea how the driver or hardware works, or if 8-byte blocks are used at all by the hardware, but perhaps there could be some I/O register containing number_of_blocks (or 8*number_of_blocks), and if this register is not reset at some point, then bad things can happen depending on its value? Of course this is pure speculation, but it gives an example of a plausible mechanism and how to fix the bug in this case. Also, perhaps looking for an explanation of why exactly those packet sizes are a problem is pointless: if you're triggering an undefined behavior of the circuit, those sizes would simply be a byproduct of the hardware circuit synthesis process and would bring no insight on how to solve the problem.
This is affecting 2.6.37-2.6.38.2 as well and seems to be related to the hw encryption referred to in bug #31452. My setup is AP mode running WPA2 on an AR2413 based card. After disabling hw encryption on the module, the packet loss no longer occurs and throughput returns to normal levels. My last round of testing was on 2.6.38.2.
My first comment was on the Ubuntu bug listed above. Some of the subsequent comments there have speculated whether this only affects AR2413 devices using an Askey subsystem. I recently had a chance to look again at the laptop where I encountered the bug. It *does* use an Askey subsystem. And after re-enabling hwcrypt I was able to reproduce the ping behaviour described by Musaraigne. For more details, see this condensed version of my terminal session, below. The kernel was 2.6.32.32 (IIRC). The stats I got were: Packet Size, % Loss 597, 0 596, 81 595, 83 589, 83 588, 0 admin@laptop:~$ lspci -vnn 09:04.0 Ethernet controller [0200]: Atheros Communications Inc. AR2413 802.11bg NIC [168c:001a] (rev 01) Subsystem: Askey Computer Corp. Device [144f:7094] Flags: bus master, medium devsel, latency 168, IRQ 22 Memory at c0110000 (32-bit, non-prefetchable) [size=64K] Capabilities: <access denied> Kernel driver in use: ath5k Kernel modules: ath5k admin@laptop:~$ admin@laptop:~$ ping -M do -s 596 www.google.com PING www.l.google.com (209.85.143.99) 596(624) bytes of data. --- www.l.google.com ping statistics --- 103 packets transmitted, 19 received, 81% packet loss, time 102510ms rtt min/avg/max/mdev = 42.848/43.904/45.322/0.665 ms admin@laptop:~$
If this is still seen in modern kernels (3.2 etc) please update/re-open thanks