Bug 190561 - Iwlwifi: 8260: frequent authentication timeouts & hangs with Mediatek MT7612E AP - WIFILNX-569
Summary: Iwlwifi: 8260: frequent authentication timeouts & hangs with Mediatek MT7612E...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-17 11:41 UTC by Stijn Segers
Modified: 2017-02-05 10:41 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.7 / 4.8 / 4.9 RC7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel log when connection hangs (13.47 KB, application/octet-stream)
2016-12-17 11:41 UTC, Stijn Segers
Details
Re-authentication attempts after connection hung (24.92 KB, application/octet-stream)
2016-12-17 11:42 UTC, Stijn Segers
Details
Time event is over already connection failure (4.76 KB, application/octet-stream)
2017-01-04 20:58 UTC, Stijn Segers
Details
Wireshark log (285.04 KB, application/x-pcapng)
2017-01-07 23:33 UTC, Stijn Segers
Details
Wireshark log with channel set (89.20 KB, application/pgp-encrypted)
2017-01-08 11:39 UTC, Stijn Segers
Details
Problematic authentication capture (39.49 KB, application/pgp-encrypted)
2017-01-21 16:08 UTC, Stijn Segers
Details

Description Stijn Segers 2016-12-17 11:41:08 UTC
Created attachment 247911 [details]
Kernel log when connection hangs

My Intel 8260AC card frequently fails to connect to my DIR-860L AP (SSID: Zeus 802.11ac). When it does establish a connection, it often hangs - sometimes it takes half an hour, sometimes no more than a few minutes. System is a Dell XPS 13 9350, running Debian Stretch x86_64.

In attachment the kernel logs. Frequently recurring messages are 'No association and the time event is over already...' and 'aborting authentication with aa:bb:cc:xx:xx:xx by local choice (Reason: 3=DEAUTH_LEAVING)'.

I tried disabling powersaving and falling back to 802.11n, switching the wireless off and on again from the settings panel in the UI, restarting Network-Manager, but that doesn't change anything. Scoured the web for newer firmwares to test, but I cannot find anything newer than iwlwifi-8265-22.ucode, despite the 4.9 iwlwifi driver looking for .25 and .26, and 4.8 looking for .23 and .24.

Tested with Debian's stock kernels - 4.7, 4.8 - and their 4.9 RC7 from Experimental. All exhibit the same behaviour.

I have a fallback Atheros-based AP (WNDR3700, SSID: Zeus 802.11n). I have tested the following situations:

* DIR-860L + Google Nexus 7 2013: works fine;
* DIR-860L + Sony Xperia Z3 Compact: works fine;
* DIR-860L + Dell XPS 13 with Intel 8260AC: breaks as described above.

* WNDR3700 + Google Nexus 7 2013: works fine;
* WNDR3700 + Sony Xperia Z3 Compact: works fine;
* WNDR3700 + Dell XPS 13 with Intel 8260AC: works fine.
Comment 1 Stijn Segers 2016-12-17 11:42:01 UTC
Created attachment 247921 [details]
Re-authentication attempts after connection hung
Comment 2 Stijn Segers 2016-12-17 11:58:17 UTC
Edit: the log containing 'No association and the time event is over already...' didn't contain any references to the AC AP so I have removed it. Will upload a new one when the issue pops up again.
Comment 3 Stijn Segers 2017-01-04 20:58:43 UTC
Created attachment 250291 [details]
Time event is over already connection failure

Can someone take a look at this please?

I will attach a log with the 'time event is over already' all over the place, just took me 10' and a reboot to authenticate to the AP.

Thanks!
Comment 4 Emmanuel Grumbach 2017-01-07 21:01:39 UTC
Could you please use your device as a sniffer and get an air sniffer capture of your channel?

This kind of problems can arise when the beacon of the AP is dropped by our firmware because the beacon is corrupted. Other devices (your Sony and Nexus) can be less sensitive than us.

This is just a theory based on other bugs we already had in the past, but recording the beacon of your AP can teach us a lot here.
If that doesn't help, we'll need to go for firmware debugging.
Comment 5 Stijn Segers 2017-01-07 23:33:02 UTC
Created attachment 250741 [details]
Wireshark log

Thanks Emmanuel. I have put my card in monitor mode, wireshark capture in attachment.

AP's MAC is E4:6F:13:2E:BF:2A, I don't see it anywhere in the output, but I'm not familiar with Wireshark so...

Let me know if i need to check or filter for anything.
Comment 6 Emmanuel Grumbach 2017-01-08 03:04:48 UTC
Are you sure you are on the same channel as the AP?

How did you put the device in monitor mode? You first need to stop the supplicant otherwise it will only be a managed interface with a sniffer virtual monitor 

interface attached to it.
Comment 7 Emmanuel Grumbach 2017-01-08 06:59:00 UTC
Ok - I looked at your sniffer capture and it looks like your device was properly configured as a sniffer, but I am not sure about the channel.
In your capture, I can't really see ANY beacons from your Access Point which can hint me that you are sniffing on the wrong channel.
Please attach to here the output of:
sudo iw wlan0 scan and tell us what is your SSID.

If you want, you can encrypt the data using the keys described here:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
Comment 8 Stijn Segers 2017-01-08 11:39:51 UTC
Created attachment 250751 [details]
Wireshark log with channel set

Thank you, I did forget to set the channel indeed.

Now with the right channel set I see a lot of traffic to/from the D-Link. I have attached the encrypted wireshark log. Thanks for the GPG link, I figured obfuscating the MACs wouldn't help debugging, didn't know we could upload encrypted files :-).
Comment 9 Emmanuel Grumbach 2017-01-08 12:18:51 UTC
Ok - looked at this. The beacon looks sane. Can you record a tracing record of a failure?

Thanks.
Comment 10 Stijn Segers 2017-01-10 20:12:52 UTC
I should have time to test the coming days, I'll report back.
Comment 11 Luca Coelho 2017-01-19 15:57:40 UTC
Stijn, did you get a chance to get the traces Emmanuel asked for?
Comment 12 Stijn Segers 2017-01-19 21:22:44 UTC
Hi Luca,

I did. Embarrassingly enough, while I've had this issue for months, I cannot reproduce it anymore at the moment. Router has been powercycled a few times now, but authentication still works like it should. I'm keeping an eye on it, but so far it seems to work fine...
Comment 13 Emmanuel Grumbach 2017-01-20 07:13:40 UTC
OK.

Closing for now.
Comment 14 Stijn Segers 2017-01-21 16:08:10 UTC
Created attachment 252701 [details]
Problematic authentication capture

So, as luck would have it... I flashed & rebooted my AP today and have problems authenticating again. Capture in attachment.
Comment 15 Emmanuel Grumbach 2017-01-22 08:24:12 UTC
I looked at the sniffer capture. You seem to have captured the Sony device you were mentioning with the AP which indeed connects to the AP despite a few hiccups.

1) the AP seems to be sending ACKs instead of BACKs during a BA session which your Sony device doesn't really like in the first try (it sends a BAR in packet 782). But the AP doesn't get the hint and continues to misbehave in packet 787 which is a regular ACK and not a BACK. This time though, the Sony device doesn't get angry at the AP and doesn't send the BAR.
2) the AP sends an SMPS action frame (packet 788). While this is totally fine from a technical point of view, this frame is surprising and I have never seen it coming from an AP. Of course not to say that SMPS is disabled whereas it has been disabled in the association response frame.

Besides this, your AP seems to be sending beacons very nicely.

Unfortunately, I can't deduce much from this since this is the working case. Can you record tracing from the non-working case?

Thanks.
Comment 16 Stijn Segers 2017-01-22 09:32:42 UTC
OK. It seems in monitor mode I cannot connect to an AP, so I'd have to find a third device I can put in monitor mode right, so I can sniff the channel while my laptop connects to the AP, right?
Comment 17 Emmanuel Grumbach 2017-01-22 09:37:02 UTC
Right.

But before you do that, please add power_scheme=1 as a module parameter to iwlmvm and run trace-cmd to record tracing while you associate to the AP on the system that has the Intel device. This can happily be done while you use your device in managed mode.
Comment 18 Stijn Segers 2017-02-05 10:00:56 UTC
Hi guys, I'll close this for now since the issue doesn't seem to pop up anymore. Thanks for your help!
Comment 19 Emmanuel Grumbach 2017-02-05 10:41:21 UTC
Ok - see you again, when you'll reboot the AP :)

Note You need to log in before you can comment on or make changes to this bug.