Bug 13237 - ath5k: some arp replies are dropped in master mode
Summary: ath5k: some arp replies are dropped in master mode
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-04 00:05 UTC by aCaB
Modified: 2010-04-15 21:37 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.29.2 + wireless-testng
Subsystem:
Regression: No
Bisected commit-id:


Attachments
pcaps (6.38 KB, application/octet-stream)
2009-05-04 00:05 UTC, aCaB
Details
pcap with beacon and missing arp reply (9.55 KB, application/octet-stream)
2009-05-04 19:37 UTC, aCaB
Details
Get a new beacon each time for AP mode (3.00 KB, patch)
2009-05-06 03:31 UTC, Bob Copeland
Details | Diff
send buffered frames after beacons (5.70 KB, patch)
2009-06-15 03:55 UTC, Bob Copeland
Details | Diff
A packet capture showing ath5k AP acknowledging a PS poll, and not then responding (507.45 KB, application/octet-stream)
2010-02-06 11:25 UTC, Simon Farnsworth
Details

Description aCaB 2009-05-04 00:05:05 UTC
Created attachment 21203 [details]
pcaps

Quick description:
I can connect a number of devices to hostapd without any problems except an htc g1 running android. The phone does associate and does get its ip address from a dhcp server but cannot get any reply to its arp requests.


Longer story...
To make sure the problem didn't lay in the phone I've run some tests.

The AP: Hostapd (0-70 git) in open mode on an AR5213 running kernel 2.6.29.2 + Wireless-testing (5ece8a46f9aeb9d6f80a3c4414f2c27682dc706b) + ap mode and other misc patches found at https://lists.ath5k.org/pipermail/ath5k-devel/2009-April/002314.html. This box also runs a dhcp server.

A working client: atheros 5xx in station mode (this could really be anything else). The client associates, gets its dynamic ip address via a dhcp, pings a remote host via the wifi link.

The "broken" client: htc g1 (sorry no idea about the hw specs). For the tests the same steps as for the working client are taken. Please note that, so far, i haven't got any problems connecting to other APs.

A sniffing client: iwl 3945 in monitor mode. I've used this one to make sure the arp are replies are not missed by the phone and to check that what the AP thinks it sends, is actually sent over the air.


The test shows that with the working client arp replies are sent correctly. With the broken client, instead, they appear as they are going out from the AP but they never reach the air.

The bridge_* pcaps in the tarball are dumped on the ap, the dump_* pcaps are dumped on the sniffer.


As a side note, i can reproduce the problem using arpspoof:
- arpspoof -i wlan0 192.168.0.115
This one works. "Spoofed" ARP replies sent to the broadcast are really sent out.

- arpspoof -i wlan0 -t 192.168.0.120 192.168.0.115
This one doesn't work (please note that you may need to force an arp entry for 192.168.0.120). Targeted ARP replies are NOT sent to the air.


Thanks!
Comment 1 John W. Linville 2009-05-04 14:48:43 UTC
Nice analysis!  I'm not sure OTOH what might account for this behavior -- hopefully Johannes or Bob will have some ideas... :-)
Comment 2 Johannes Berg 2009-05-04 15:40:48 UTC
Ummmm. The problem is quite clearly that the phone is in powersave mode and never wakes up to ask for the ARP reply.

Now, the capture file shows _no_ beacons!

If you filtered them, please provide a capture file with beacons in it showing the _bad_ behaviour as seen from the _capture_ (iwl3945) station.

If you didn't filter them, then ath5k isn't sending beacons, and it's not really surprising the client never gets a frame since they get buffered and never sent out...

Please also enable CONFIG_MAC80211_VERBOSE_PS_DEBUG and check kernel log.
Comment 3 aCaB 2009-05-04 16:00:13 UTC
(In reply to comment #2)
> Now, the capture file shows _no_ beacons!
> 
> If you filtered them, please provide a capture file with beacons in it
> showing
> the _bad_ behaviour as seen from the _capture_ (iwl3945) station.

Yeah I did filter them out to keep the pcaps small. The phone is kinda dumb and takes minutes to scan and associate.

> Please also enable CONFIG_MAC80211_VERBOSE_PS_DEBUG and check kernel log.

Ok will repost the updated stuff later.
Thanks for the quick reply!
Comment 4 aCaB 2009-05-04 19:37:28 UTC
Created attachment 21216 [details]
pcap with beacon and missing arp reply

New capture without beacon filtering.
HtH
Comment 5 Johannes Berg 2009-05-04 19:54:23 UTC
Thanks. Do you see any new messages from the CONFIG_MAC80211_VERBOSE_PS_DEBUG config?

Anyway, this is quite clearly an ath5k bug, the beacon always looks the same. The sequence number increases, but the TIM IE is always:
05 04 01 02 00 00

instead of, at least, alternating between
05 04 00 02 00 00
05 04 01 02 00 00

The AID bit is also never set, so the AP never tells the phone that there is data buffered for it.

I'll remove myself from CC since this really is an ath5k problem.
Comment 6 Bob Copeland 2009-05-04 20:07:40 UTC
(In reply to comment #4)
> Created an attachment (id=21216) [details]
> pcap with beacon and missing arp reply
> 
> New capture without beacon filtering.

Thanks for the very detailed report.  I also have a g1 (ti wl1251, for what it is worth) so I should be able to reproduce this at home with ath5k.
Comment 7 aCaB 2009-05-04 20:59:05 UTC
(In reply to comment #5)
> Thanks. Do you see any new messages from the CONFIG_MAC80211_VERBOSE_PS_DEBUG
> config?

I get these:
May  4 22:55:15 loneliness kernel: [90198.257516] wlan0: STA 00:18:41:ea:a1:ba aid 1 enters power save mode                                                                                              
May  4 22:55:16 loneliness kernel: [90199.349062] wlan0: STA 00:18:41:ea:a1:ba aid 1 exits power save mode                                                                                               
May  4 22:55:18 loneliness kernel: [90201.344354] wlan0: STA 00:18:41:ea:a1:ba aid 1 enters power save mode                                                                                              
May  4 22:55:19 loneliness kernel: [90201.841901] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 0)                                                                                              
May  4 22:55:19 loneliness kernel: [90201.973440] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 1)                                                                                              
May  4 22:55:19 loneliness kernel: [90201.980889] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 2)                                                                                              
May  4 22:55:20 loneliness kernel: [90202.838118] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 3)                                                                                              
May  4 22:55:21 loneliness kernel: [90203.837989] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 4)                                                                                              
May  4 22:55:24 loneliness kernel: [90206.837837] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 5)                                                                                              
May  4 22:55:24 loneliness kernel: [90206.973098] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 6)                                                                                              
May  4 22:55:25 loneliness kernel: [90207.973375] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 7)                                                                                              
May  4 22:55:25 loneliness kernel: [90208.168067] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 8)                                                                                              
May  4 22:55:26 loneliness kernel: [90208.842322] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 9)                                                                                              
May  4 22:55:26 loneliness kernel: [90208.972951] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 10)                                                                                             
May  4 22:55:29 loneliness kernel: [90211.837742] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 11)
May  4 22:55:30 loneliness kernel: [90212.840207] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 12)
May  4 22:55:31 loneliness kernel: [90213.837761] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 13)
May  4 22:55:32 loneliness kernel: [90215.006577] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:32 loneliness kernel: [90215.006599] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:32 loneliness kernel: [90215.006618] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:32 loneliness kernel: [90215.006636] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:32 loneliness kernel: [90215.006654] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:34 loneliness kernel: [90216.839666] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 9)
May  4 22:55:35 loneliness kernel: [90217.840272] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 10)
May  4 22:55:36 loneliness kernel: [90218.842585] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 11)
May  4 22:55:38 loneliness kernel: [90221.424900] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 12)
May  4 22:55:39 loneliness kernel: [90222.420147] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 13)
May  4 22:55:40 loneliness kernel: [90223.420029] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 14)
May  4 22:55:42 loneliness kernel: [90225.006367] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006390] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006407] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006424] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006441] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006459] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006476] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006495] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:42 loneliness kernel: [90225.006512] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:43 loneliness kernel: [90226.420001] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 6)
May  4 22:55:44 loneliness kernel: [90227.419993] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 7)
May  4 22:55:45 loneliness kernel: [90228.419999] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 8)
May  4 22:55:48 loneliness kernel: [90231.419914] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 9)
May  4 22:55:49 loneliness kernel: [90232.420354] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 10)
May  4 22:55:50 loneliness kernel: [90233.420169] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 11)
May  4 22:55:52 loneliness kernel: [90235.006642] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:52 loneliness kernel: [90235.006665] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:52 loneliness kernel: [90235.006682] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:52 loneliness kernel: [90235.006699] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:52 loneliness kernel: [90235.006718] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:52 loneliness kernel: [90235.006735] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:55:53 loneliness kernel: [90236.420182] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 6)
May  4 22:55:54 loneliness kernel: [90237.419818] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 7)
May  4 22:55:55 loneliness kernel: [90238.419808] STA 00:18:41:ea:a1:ba aid 1: PS buffer (entries before 8)
May  4 22:56:02 loneliness kernel: [90245.000268] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:02 loneliness kernel: [90245.000292] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:02 loneliness kernel: [90245.000310] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:02 loneliness kernel: [90245.000327] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:02 loneliness kernel: [90245.000345] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:02 loneliness kernel: [90245.000363] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:12 loneliness kernel: [90255.003286] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:12 loneliness kernel: [90255.003309] Buffered frame expired (STA 00:18:41:ea:a1:ba)
May  4 22:56:12 loneliness kernel: [90255.003325] Buffered frame expired (STA 00:18:41:ea:a1:ba)

> Anyway, this is quite clearly an ath5k bug, the beacon always looks the same.
> The sequence number increases, but the TIM IE is always:
> 05 04 01 02 00 00
> 
> instead of, at least, alternating between
> 05 04 00 02 00 00
> 05 04 01 02 00 00
> 
> The AID bit is also never set, so the AP never tells the phone that there is
> data buffered for it.

Thanks for the insight.
Comment 8 Bob Copeland 2009-05-05 05:24:05 UTC
(In reply to comment #5)
> Anyway, this is quite clearly an ath5k bug, the beacon always looks the same.
> The sequence number increases, but the TIM IE is always:
> 05 04 01 02 00 00
> 
> instead of, at least, alternating between
> 05 04 00 02 00 00
> 05 04 01 02 00 00

I think ath5k_beacon_send() needs to call ieee80211_beacon_get()... otherwise we only have the beacon as it looked when bss_info_changed() was called.  I made a hacky patch to do that, and it looked like TIM would then update correctly but it was also prone to lockups so I'll try again tomorrow.
Comment 9 aCaB 2009-05-05 21:07:09 UTC
(In reply to comment #8)
> I think ath5k_beacon_send() needs to call ieee80211_beacon_get()... otherwise
> we only have the beacon as it looked when bss_info_changed() was called.  I
> made a hacky patch to do that, and it looked like TIM would then update
> correctly but it was also prone to lockups so I'll try again tomorrow.

Hi Bob,
Mind sharing the patch?
I've been trying to add IEEE80211_HW_HOST_BROADCAST_PS_BUFFERING and get_buffered_bc stuff last night without much success. Thought to fit them just after beacon_get inside ath5k_bss_info_changed but this is only called a couple of times at startup. So i'm prolly looking in the wrong direction.

On a marginally related note, there are a few comments in the code about AID s'port and lack thereof.
Since now mac80211 makes it easy to track AIDs it would be nice to have them set properly. Guess this addresses the second point in Johannes' post.

If time permits i'd like to look closer at this stuff.
Comment 10 Bob Copeland 2009-05-06 03:31:11 UTC
Created attachment 21241 [details]
Get a new beacon each time for AP mode
Comment 11 Bob Copeland 2009-05-06 03:35:04 UTC
(In reply to comment #9)
> Hi Bob,
> Mind sharing the patch?

Posted, it is guaranteed to eat your kittens...

> On a marginally related note, there are a few comments in the code about AID
> s'port and lack thereof.
> Since now mac80211 makes it easy to track AIDs it would be nice to have them
> set properly. Guess this addresses the second point in Johannes' post.

Ok - I believe the aid comments you are referring to are related to STA mode operation, which is doubtless broken but should only affect PS mode as a client.  But yeah all that should be fixed.
Comment 12 Bob Copeland 2009-05-18 15:01:55 UTC
Ok, I think I worked out the kinks and sent out a patch.  If you'd like "Reported-by:" (or Tested-by) attribution, please reply to the patch with your real name and email address.

I was able to associate with my G1 using this patch and get a DHCP IP.  I didn't have a proper network setup so didn't test further, but if you have other problems please post them here.
Comment 13 aCaB 2009-05-18 15:20:01 UTC
Hi Bob,
I've been using your previous patch with some additional code to get_buffered_beacon's as well for a couple of days with no problems and no lookups.
Everything worked very good in open mode. WPA instead is pretty broken, but i think for different reasons.

I though i'd wait a few more days before reporting to have a chance to perform more testing.

Thanks anyway. Don't care about the reported-by stuff.
Comment 14 Bob Copeland 2009-05-18 15:36:37 UTC
Ok, thanks for the report.  Yeah, this patch still doesn't send out the buffered frames (get_buffered_bc), so if you want to prepare a patch for that then feel free.  Otherwise I'll give it a go in a few days.

FWIW the issues I had originally were an unrelated panic elsewhere in the w-t tree, and a nested spinlock in ad-hoc mode (oops!).
Comment 15 aCaB 2009-05-18 15:55:19 UTC
(In reply to comment #14)
> Ok, thanks for the report.  Yeah, this patch still doesn't send out the
> buffered frames (get_buffered_bc), so if you want to prepare a patch for that
> then feel free.  Otherwise I'll give it a go in a few days.

I'd be happy to except i'm not totally sure how to actually send the buffered frames after i pick them up.
So the only effect at the moment is to get some debug lines ("PS buffer (entries before..." and "Buffered frame expired") in the logs.

In ath9k it's basically like this:
while (ieee80211_get_buffered_bc(...)) { ath_tx_cabq(...); }

What's the ath_tx_cabq() equivalent in the ath5k world?

> FWIW the issues I had originally were an unrelated panic elsewhere in the w-t
> tree, and a nested spinlock in ad-hoc mode (oops!).

Cool!


Ah just a precisation: The WPA issues are not specific to the g1. It's broken/unstable with any other client board too.
Comment 16 Bob Copeland 2009-06-07 18:49:26 UTC
(In reply to comment #15)
> I'd be happy to except i'm not totally sure how to actually send the buffered
> frames after i pick them up.
> So the only effect at the moment is to get some debug lines ("PS buffer
> (entries before..." and "Buffered frame expired") in the logs.
> 
> In ath9k it's basically like this:
> while (ieee80211_get_buffered_bc(...)) { ath_tx_cabq(...); }
> 
> What's the ath_tx_cabq() equivalent in the ath5k world?

Sorry for the slow reply.

Ath5k doesn't have the CAB queue yet AFAIK, so it would need to be added.
That stands for "Crap [or Content, depending on whom you ask :)] After
Beacon" and is simply another TX queue just for these frames.  I still
have a few things in my queue before I get back to this, maybe end of this
week, so let me know if you plan to make any headway on it. 

> Ah just a precisation: The WPA issues are not specific to the g1. It's
> broken/unstable with any other client board too.

Yeah, anything that uses power-save mode (anything that doesn't go to sleep should work ok).  I also cooked up some patches for AID on the client side
but I haven't put it through the wringer yet.
Comment 17 Bob Copeland 2009-06-15 03:55:01 UTC
Created attachment 21917 [details]
send buffered frames after beacons

This is completely untested since I ran out of time this wknd, but here's a go at using the CAB queue.
Comment 18 aCaB 2010-01-02 10:35:23 UTC
Hi Bob,
it's been a while.

I saw you've merged all of the above patches.
Good job, things are way better now!
Ath5k in AP mode is now pretty usable.

However, besides the powersave stuff that doesn't seem to be working yet (the g1 still doesn't get back from sleep but it's easy to solve the issue client side) there are a few minor glitches.
With hostapd in WPA mode the board tends to temporarly lock up after the 4th step in the eapol handshake.
Things generally goes like this: the ap spams beacons, the client queries a few times, the ap replies a few times, the handshake starts, the handshake completes (step 4/4). At that point, most of the times, the ap totally stops transmitting (nothing is sent, not even beacons). After about 10-20 seconds the AP resumes the transmissions but by that time the clients have generally already given up.
This happens with all the clients i've tried: an ipw3945 + wpa_supplicant, another ath5k + wpa_supplicant, the g1, a wii and an ipw3945 on a windoze laptop.
One time out of 5-10, instead, the AP doesn't "lock up" and the client can happily associate. So, in practice, it's only a matter of trying to associate multiple times to get things working.
Here I assume beaconing is done in hardware which means that, if beacons stop, the driver is to be blamed. If that's not the case, then just ignore everything as hostapd is likely the culprit.

Another minor problem is related to bridging. Wlan0 can't be bridged until hostapd is started. I'm not sure what's the magic behind that but if I try to "brctl addif" with wlan0 up in ap mode i get "Operation not supported". As soon as I start hostapd once, the iface becomes perfetly bridgeable and un-bridgeable. Even if hostapd is killed, the bridging keeps on working as expected.

Another thing... the wep keys can only be set while the interface is down. This means that hostapd will always fail to set them properly. The only way to have the AP in wep mode is to set the keys before starting hostapd. It'll then complain that the keys can't be set but wont' fail.

Finally... the kernel panics within 30-40 seconds after issuing a "modprobe -r ath5k". Sorry no backtrace.


If you want more details, pcaps etc, just ask. Also if you prefer me to open different bugs for the above issues, I'll do so.
Thanks a lot for the amazing work you've done!
Comment 19 Bob Copeland 2010-01-09 04:17:14 UTC
> However, besides the powersave stuff that doesn't seem to be working yet (the
> g1 still doesn't get back from sleep but it's easy to solve the issue client
> side) there are a few minor glitches.

> With hostapd in WPA mode the board tends to temporarly lock up after the 4th
> step in the eapol handshake.
> Things generally goes like this: the ap spams beacons, the client queries a
> few

Anything in dmesg?  The hardware sends beacons but the driver is responsible for loading the beacon into the proper queue.  The hardware generates an "SWBA" interrupt at some point in time before the beacon interval expires, then we DMA the beacon into card memory, then the beacon interval expires and it either sends the beacon, or asserts a 'beacon not ready' interrupt.  You can see if the card is generating the interrupts by "modprobe ath5k debug=0x2" -- which will generate a ton of output -- and looking at the status value that is printed. 

> Another minor problem is related to bridging. Wlan0 can't be bridged until
> hostapd is started. I'm not sure what's the magic behind that but if I try to
> "brctl addif" with wlan0 up in ap mode i get "Operation not supported". As
> soon
> as I start hostapd once, the iface becomes perfetly bridgeable and
> un-bridgeable. Even if hostapd is killed, the bridging keeps on working as
> expected.
> 
> Another thing... the wep keys can only be set while the interface is down.
> This
> means that hostapd will always fail to set them properly. The only way to
> have
> the AP in wep mode is to set the keys before starting hostapd. It'll then
> complain that the keys can't be set but wont' fail.

Hrm.. What version of hostapd are you using?  I have to admit I haven't used WEP or bridged APs, but I do know setting WPA key material works from hostapd config.  The above two are general infrastructure type things, I would expect them to be issues with mac80211/cfg80211, though you can always try nohwcrypt=1 just to rule out the crypto code in ath5k.

> Finally... the kernel panics within 30-40 seconds after issuing a "modprobe
> -r
> ath5k". Sorry no backtrace.

What about just "modprobe ath5k" (no -r).  Either way, a backtrace would help a lot here of course.  It's probably an interrupt or tasklet that didn't get cancelled properly.

> If you want more details, pcaps etc, just ask. Also if you prefer me to open
> different bugs for the above issues, I'll do so.

For now it's ok just putting it here but once we get the specifics ironed out then it may be worth opening different bugs.

> Thanks a lot for the amazing work you've done!

Heh thanks, just wish I could focus more time on the driver but we'll get there :)
Comment 20 Bob Copeland 2010-01-25 21:28:37 UTC
> What about just "modprobe ath5k" (no -r).  Either way, a backtrace would help
> a

I meant "rmmod ath5k" here.
Comment 21 Simon Farnsworth 2010-02-06 00:44:04 UTC
I'm also affected by this - but using a HTC Hero, and an ath5k AP. I'm seeing the ARP replies being dropped by the AP, and everything getting horribly confused after the AP acknowledges a power save poll, but then doesn't send queued traffic.

It appears that the Android expects the AP to react to the acknowledged PS poll, but the AP expects the Android to redo the PS poll when it sees the next beacon, and everything comes to a screaming halt.

I'm using current wireless-testing, and have the pcap to show the fault condition (captured by an iwl3945 in the middle, so not from either end).
Comment 22 Simon Farnsworth 2010-02-06 11:25:37 UTC
Created attachment 24926 [details]
A packet capture showing ath5k AP acknowledging a PS poll, and not then responding

This packet capture was taken from a Linux laptop running 2.6.31, and an iwl3945 WiFi card in monitor mode (MAC address 00:1f:3c:39:ef:c6). The iwl3945 is in monitor mode, with the "fcsfail", "control", and "otherbss" monitor flags set.

The Hero was approximately 20 centimetres from the capture station. The AP was approximately 1 metre from the capture station, with a brick wall between the AP and both the capture station and the Hero.

There were two devices in my WiFi network; a HTC Hero mobile phone, MAC address 00:23:76:58:92:7a, which had AID 1 and an ath5k-based AP, MAC address 00:0c:42:2c:46:9d.

I can supply the WEP key needed to decrypt this capture, if required.

The failure point is frames 3338 onwards. At 3338, you see a beacon with no TIM bits set. 3339 is the Hero sending a DNS request, and indicating that it intends to go to sleep. 3341 is the DTIM, and there are no bits set. At 3343, the beacon includes a bitmap indicating traffic for AID 1 (the Hero). At 3344, you see the Hero's PS poll, which is acknowledged in 3345.

There is no response from the AP other than the acknowledgement. The next traffic you see beyond beacons and ProbeRequests is in 3426, where the Hero ARPs for my DNS server. This is acknowledged. 

You then see no traffic for the Hero until 3443, by which point the Hero has stopped listening again, as it hasn't sent a PS poll since going back to sleep. All the queued traffic dumps out, but it's now too late, and the Hero is confused.

At this point, the only way to recover is to disable WiFi on the Hero, and reassociate; after enough of these powersave incidents, ath5k starts giving me "ath5k phy0: no further txbuf available, dropping packet" messages in dmesg. Once I have started to get these messages, beaconing stops, and I have to stop hostapd, remove the ath5k module, and reload it to get my AP back.

For reference, the AP's ath5k cards (there are two, one used for 2.4GHz, one for 5GHz) are both 5414 MAC, 5413 PHY according to ath_info, PCI ID 168c:0013, subsystem ID 19b6:0016.
Comment 23 Bob Copeland 2010-02-09 16:30:08 UTC
I believe you said 'yes' on irc, but just to confirm, the AP does have my recent w-t patch "ath5k: fix setup for CAB queue", right?
Comment 24 Simon Farnsworth 2010-02-09 20:02:49 UTC
I believe it does; I'm on wireless-testing, as of 064c35b4dbafb66ec416ec57f91c25971ea51ebc.

git log --oneline shows the following as the most recent three commits to ath5k
47db3a6 ath5k: adding LED support for AR5BXB63 cards
a951ae2 ath5k: fix setup for CAB queue
5d6ce62 ath5k: dont use external sleep clock in AP mode

I'm obviously happy to experiment for you (either trying patches, or setting some .config options and breaking it as per the pcap dump again).
Comment 25 Bob Copeland 2010-02-12 18:20:56 UTC
Ok, well it seems things in the cab queue aren't getting sent on your HW, let alone the pspoll responses.  Can you try adding the following printks (let me know if you want me to make a patch, I just figured typing this would save some time):

One in ath5k_beacon_send:
skb = ieee80211_get_buffered_bc
while(skb) {
    printk(KERN_DEBUG "ath5k: cab queued %p\n", skb);
    //...
}

One in ath5k_tx_processq:
pci_unmap_single(sc->pdev, ...)
printk(KERN_DEBUG "ath5k cab finished %p\n", skb);

One in ath5k_tx_queue:
if (info->flags & IEEE80211_TX_CTL_PSPOLL_RESPONSE)
    printk(KERN_DEBUG "ath5k pspoll response %p\n", skb);


Also please turn on CONFIG_MAC80211_PS_VERBOSE_DEBUG (I think) along with CONFIG_PRINTK_TIME, and run "iw event -t" in parallel while doing another capture.  Then we should be able to correlate the queuing in mac80211 with ath5k and the missing packets.
Comment 26 Simon Farnsworth 2010-02-12 19:58:15 UTC
I can't find "info" in ath5k_tx_queue - my version of the function is the same as at http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=blob;f=drivers/net/wireless/ath/ath5k/base.c;h=2468c64d6c128c10bd4af66258f1ae63cc404342;hb=HEAD#l2676

The changes I've managed to make result in the following diff:
$ git diff
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 535a6af..2726729 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -1983,6 +1983,7 @@ ath5k_tx_processq(struct ath5k_softc *sc, struct ath5k_txq *txq)

                pci_unmap_single(sc->pdev, bf->skbaddr, skb->len,
                                PCI_DMA_TODEVICE);
+                printk(KERN_DEBUG "ath5k cab finished %p\n", skb);

                ieee80211_tx_info_clear_status(info);
                for (i = 0; i < 4; i++) {
@@ -2185,6 +2186,7 @@ ath5k_beacon_send(struct ath5k_softc *sc)

        skb = ieee80211_get_buffered_bc(sc->hw, sc->vif);
        while (skb) {
+                printk(KERN_DEBUG "ath5k: cab queued %p\n", skb);
                ath5k_tx_queue(sc->hw, skb, sc->cabq);
                skb = ieee80211_get_buffered_bc(sc->hw, sc->vif);
        }


I've turned on CONFIG_MAC80211_VERBOSE_PS_DEBUG and CONFIG_PRINTK_TIME, ready for a build to test when I've worked out how to print the PSPOLL flag in ath5k_tx_queue
Comment 27 Simon Farnsworth 2010-02-13 11:23:30 UTC
Think I've worked it out - the resulting diff is:
$ git diff
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 535a6af..e0d1aac 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -1983,6 +1983,7 @@ ath5k_tx_processq(struct ath5k_softc *sc, struct ath5k_txq *txq)

                pci_unmap_single(sc->pdev, bf->skbaddr, skb->len,
                                PCI_DMA_TODEVICE);
+                printk(KERN_DEBUG "ath5k cab finished %p\n", skb);

                ieee80211_tx_info_clear_status(info);
                for (i = 0; i < 4; i++) {
@@ -2185,6 +2186,7 @@ ath5k_beacon_send(struct ath5k_softc *sc)

        skb = ieee80211_get_buffered_bc(sc->hw, sc->vif);
        while (skb) {
+                printk(KERN_DEBUG "ath5k: cab queued %p\n", skb);
                ath5k_tx_queue(sc->hw, skb, sc->cabq);
                skb = ieee80211_get_buffered_bc(sc->hw, sc->vif);
        }
@@ -2664,6 +2666,10 @@ static int ath5k_tx_queue(struct ieee80211_hw *hw, struct sk_buff *skb,
        unsigned long flags;
        int hdrlen;
        int padsize;
+        struct  ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
+
+        if (info->flags & IEEE80211_TX_CTL_PSPOLL_RESPONSE)
+                printk(KERN_DEBUG "ath5k pspoll response %p\n", skb);

        ath5k_debug_dump_skb(sc, skb, "TX  ", 1);

I'll try this later, and post back.
Comment 28 Bob Copeland 2010-02-13 12:38:51 UTC
Oops, I meant info in ath5k_txbuf_setup, but your patch is equivalent, so do post your results.  Sorry for wasting your time, next time I'll just make a patch.
Comment 29 Simon Farnsworth 2010-02-13 16:31:43 UTC
Ugh. With this extra debug in place, I'm unable to reproduce the problem. I'm going to back it out, and see if I can tie it to any particular bit of debug.
Comment 30 Simon Farnsworth 2010-02-13 16:55:41 UTC
You're going to "love" the results. I cannot get it to fail *if* I set CONFIG_MAC80211_PS_VERBOSE_DEBUG. If I disable that option, I can reproduce failure, with or without the printks patch to ath5k.

This looks likely to be a mac80211 bug, as a result.
Comment 31 Simon Farnsworth 2010-02-13 17:39:04 UTC
Another anomaly - possibly harmless. I'm seeing things in the cab finished messages that don't correspond to a pspoll or cab queued message. In particular, I see 202 unique skb addresses that have only appeared in an "ath5k cab finished" line, and not in a cab queued or pspoll response line.
Comment 32 Bob Copeland 2010-02-13 18:01:46 UTC
(In reply to comment #30)
> You're going to "love" the results. I cannot get it to fail *if* I set
> CONFIG_MAC80211_PS_VERBOSE_DEBUG. If I disable that option, I can reproduce
> failure, with or without the printks patch to ath5k.
> 
> This looks likely to be a mac80211 bug, as a result.

Hmm, well it could be a race condition.  The printks might slow it down enough that the pspoll delivery doesn't interfere with other packets or something like that.  Just a thought -- I haven't looked at the mac80211 code in detail yet.
Comment 33 John W. Linville 2010-03-02 16:30:03 UTC
Simon, is this still an issue w/ 2.6.33?  Bob, any more word on this?
Comment 34 Simon Farnsworth 2010-03-02 21:27:40 UTC
I've updated to wireless-testing as of 77b0c7b18e6472b8b1323ab1844ed60e073cd49c, and the problem has gone away - I've been able to take out the debugging patch, and disable CONFIG_MAC80211_PS_VERBOSE_DEBUG without pain.

No idea what fixed it, mind, or whether it'll reappear in future, but I can always file a new bug if it breaks again.
Comment 35 John W. Linville 2010-03-02 22:02:14 UTC
Cool, thanks!
Comment 36 aCaB 2010-04-15 21:37:55 UTC
Hi,
sorry if it took so long to test. I just wanted to confirm everything works since the latest set of patches from Bob Copeland.

Thanks a lot, good job!

Note You need to log in before you can comment on or make changes to this bug.