Bug 203545 - ath9k : need to reboot AP every day to prevent "4WAY_HANDSHAKE_TIMEOUT". No such issue with ath5k
Summary: ath9k : need to reboot AP every day to prevent "4WAY_HANDSHAKE_TIMEOUT". No s...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-08 15:08 UTC by Gilles Buloz
Modified: 2019-05-11 20:53 UTC (History)
0 users

See Also:
Kernel Version: 5.0.9-301.fc30.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
wp_supplicant debug outputs (54.46 KB, application/x-compressed-tar)
2019-05-08 15:08 UTC, Gilles Buloz
Details
Some debug added to mac80211 module (17.08 KB, application/x-compressed-tar)
2019-05-11 20:45 UTC, Gilles Buloz
Details

Description Gilles Buloz 2019-05-08 15:08:11 UTC
Created attachment 282675 [details]
wp_supplicant debug outputs

Issue seen on a laptop with AR9285 managed by driver ath9k
No such issue on another laptop with AR5BXB63 managed by driver ath5k, nor on other laptops/tablets using other chips/drivers.
The access point is a Freebox HD (ADSL box) from the french "Free" ADSL provider, using WPAv1, CCMP/PSK. Unfortunately no debug output can be retrieved on the AP side.

During 2 weeks I tried several workarounds this "4WAY_HANDSHAKE_TIMEOUT" (displayed in dmesg), but none worked for me and the root cause seems different.

Running wpa_supplicant manually with max debug shows that when the problem occurs, once associated to the AP, I don't get "l2_packet_receive: src=xx:xx:xx:xx:xx:xx" in the debug output.

Issue present with kernel 5.0.9 (Fedora 30), but also with Fedora 29, ... at least down to kernel 4.8 (Fedora 23). Tested by booting from a Live image to have a clean configuration.

This could be an AP issue (bug ?) but as there's no problem with ath5k or other devices, I would like to debug to see what's going wrong and try to make ath9k work at least as the ath5k does with this AP. I looks like ath9k does something (wrong ?) that my AP does not like...
When the problem occurs, I just have to reboot my AP (ADSL box) to get ath9k sucessfully connected ... up to the next day where the problem is back.

I've also tried to modify the ath9k driver to remove TDLS and P2P support to get a wpa_supplicant output as closed to the one of the ath5k, but this did not help me. I also enabled all debug messages in the ath9k driver but I'm a little bit lost and don't know what/where to investigate.

Is there any possibly related (suspected) known ath9k issue ? 
Or some advice to help me to debug ?

Please find as attachment the wpa_supplicant outputs before and after AP reboot for both ath9k and ath5k

Thanks very much
Comment 1 Gilles Buloz 2019-05-11 20:45:35 UTC
Created attachment 282725 [details]
Some debug added to mac80211 module
Comment 2 Gilles Buloz 2019-05-11 20:53:27 UTC
To debug this issue with my laptop using ath9k (AR9285), I've added some debug to the mac80211 module of kernel 5.0.9-301.fc30.x86_64.

In the both cases (connection OK or Failed), I see that once associated to the AP (wlp2s0: associated), ieee80211_rx_napi() is called every ~100mS (something received I guess). But in the Failed case, ieee80211_tx() is never called. See attached mac80211dbg.tgz

So I'm wondering if once associated, the ath9k has to send something to the AP or is expected to receive something before sending something ?
I the case it should send something first, this is not the case.

Any advice to help me to go on with this debug ?

Please note that my other laptop using ath5k with the same kernel version always connects to this AP without problem.
Same with another laptop with a BCM4318 and b43 driver with kernel 4.8.15-300.fc25.i686+PAE (also using mac80211/cfg80211)
Same also with two other devices I have.
So the AP works quite well, ... except with my ath9k device :-(

Note You need to log in before you can comment on or make changes to this bug.