Bug 120471

Summary: iwlwifi: 7265: excessive roaming causes disconnections
Product: Drivers Reporter: Ruben De Smet (ruben.de.smet)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED INVALID    
Severity: normal CC: linuxwifi, luca, ruben.de.smet
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.6.2-1-ARCH Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg after I had the issue once.
`trace-cmd record -e iwlwifi` output

Description Ruben De Smet 2016-06-16 15:12:31 UTC
Created attachment 220291 [details]
dmesg after I had the issue once.

Hello,

My specs:
=========

Wireless Access Point:
----------------------
TP-LINK AC1200 rev1.2

PC1 (my laptop)
---

- Lenovo Thinkpad X250
- 03:00.0 Network controller: Intel Corporation Wireless 7265 (rev
59)
- Arch Linux (4.6.2-1)
- Confirm the bug under 4.5

PC2 (brother's laptop)
---
Acer subnotebook
Intel 7250 (iirc)
Ubuntu 14.04 stock kernel

PC3 (htpc)
---
Zotac HTPC (BI323)
01:00.0 Network controller: Intel Corporation Wireless 3160 (rev 83)
Fedora 23 64 bit with boot-to-Kodi


PC4 (mothers computer)
----------------
Lenovo Thinkpad T450
Intel 7265
Ubuntu 16.04


They all have in common:
========================
- Linux, Intel wifi card and TP-LINK Archer AC router AC1200 rev 1.2.


The problem(s?):
================

When near the TP-LINK access point, NetworkManager needs
reconfirmation
of password. On the htpc, this means logging in on the console,
`nmtui`,
activate connection.

On the computers with DE, this means a dialog box opens asking for a
password, which is already filed. Pressing enter connects to the
network.

My laptop, as my brother's, tend to lose the connection now and then.
When I checked `dmesg`, it said to be "lowering the transmission
power
as requested by the accesspoint by -xdB" (not those exact words, I
can
retrigger the exact error though), before losing the connection some
seconds later.

Other tested (working) configurations:
- TP-LINK Archer C2 (which is also with AC)
- Good ol' Linksys WRT
- eduroam at my uni
- any other wifi point I've ever had access to.
- Tested another PC with a lot older (Intel) WiFi card (with AC):
same
issue.

Not tested:
- Windows (don't have any Windows pc with those cards near me atm)

I see two options here:
- TP-LINK bug
- Intel iwlwifi bug

Please advice on what I should test, which logs to send, ...

I'm currently running `sudo trace-cmd record -e iwlwifi`. When it reoccurs, I'll Ctrl-C it and upload it aswell.

If you want me to test on current mainline, let me know.
Comment 1 Ruben De Smet 2016-06-16 16:33:18 UTC
Created attachment 220351 [details]
`trace-cmd record -e iwlwifi` output

When coming out of standby, the issue came up again.

This _could_ be a separate issue, but I really doubt it.

This time, it asked me to confirm the WPA2 password twice, before connecting.
Comment 2 Luca Coelho 2016-06-17 08:17:51 UTC
This seems like an issue with the AP's power save behavior.  Can you try to load the iwlmvm module with power_scheme=1? That disables power saving in the driver and may solve the issue here.
Comment 3 Ruben De Smet 2016-06-17 08:49:18 UTC
Hi Luca,

(In reply to Luca Coelho from comment #2)
> This seems like an issue with the AP's power save behavior.  Can you try to
> load the iwlmvm module with power_scheme=1? That disables power saving in
> the driver and may solve the issue here.

I'm putting this setting by default on the T450 Ubuntu machine, I'll let it be for a few days (or until it reappears) and report back.
Comment 4 Ruben De Smet 2016-06-17 09:16:14 UTC
(In reply to Ruben De Smet from comment #3)
> Hi Luca,
> 
> (In reply to Luca Coelho from comment #2)
> > This seems like an issue with the AP's power save behavior.  Can you try to
> > load the iwlmvm module with power_scheme=1? That disables power saving in
> > the driver and may solve the issue here.
> 
> I'm putting this setting by default on the T450 Ubuntu machine, I'll let it
> be for a few days (or until it reappears) and report back.

Okay, that went quick: it doesn't help. I tested it on the T450 and on the Zotac. I used

    rmmod iwlmvm
    modprobe iwlmvm power_scheme=1

and I added `options iwlmvm powerscheme=1` to a file in /etc/modprobe.d/

Rebooting the Zotac didn't give any network, and the T450 lost its connection after using the rmmod-modprobe combo.
Comment 5 Luca Coelho 2016-06-17 09:29:14 UTC
Okay, thanks for testing!

So, the lowest hanging fruit we had didn't turn out to be fruitful. :)

Let dig a bit further.  Are you able to capture sniffer logs (with monitor interface on another device)? Also it would be great if you could get trace-cmd logs, as explained here:

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#tracing

Let's see if we can find anything with those.
Comment 6 Ruben De Smet 2016-06-17 09:33:56 UTC
(In reply to Luca Coelho from comment #5)
> Let dig a bit further.  Are you able to capture sniffer logs (with monitor
> interface on another device)?

I could try that, but I'm afraid I wont be able to do that before the 26th of this month. I still have about three hours here before I'm off to Brussels again, and I have some other things to fix that are more urgent :)

> Also it would be great if you could get
> trace-cmd logs, as explained here:
> 
> https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#tracing

I added `trace-cmd record -e iwlwifi` output (3.17 MB, application/x-xz)

yesterday. Let me know if it suffices; I didn't press Ctrl-C before packing them and uploading them, so it's the cpu0-4 files.

I do have the single result file too, but that's after having downloaded ~40GB of data from our NAS, so that's quite a big file (will probably take some time to dig through all the information in there).
Comment 7 Luca Coelho 2016-06-17 09:51:11 UTC
Unfortunately I can't open those traces.  Even though it's supposed to be possible to reassemble them from the separate cpu files, I've never managed to do it.

If you have the single trace.dat file, please upload it.  Hopefully is not too big to be allowed as an attachment here, but otherwise I think we can handle it (especially now that we know there is a lot of data after the issue happened).

Let's hope we can find some good info there.  Otherwise we'll have to wait till you're back.
Comment 8 Ruben De Smet 2016-06-17 10:01:52 UTC
(In reply to Luca Coelho from comment #7)
> Unfortunately I can't open those traces.  Even though it's supposed to be
> possible to reassemble them from the separate cpu files, I've never managed
> to do it.
> 
> If you have the single trace.dat file, please upload it.  Hopefully is not
> too big to be allowed as an attachment here, but otherwise I think we can
> handle it (especially now that we know there is a lot of data after the
> issue happened).

Decompressed it's 800 MB, I compressed it to 155 MB, and uploaded it to my ownCloud: https://cloud.glycos.org/public.php?service=files&t=f004970b9ea918d0305a8275752335eb

If that's not okay, I'll have to rebuild the data. In anycase, the bug was there before I started downloading the huge amount of data.

Little question: should I change my WiFi keys after having changed this data? Is there any sensitive stuff in it?

> Let's hope we can find some good info there.  Otherwise we'll have to wait
> till you're back.
Comment 9 Ruben De Smet 2016-06-17 10:10:11 UTC
Additional information: my brother just informed me that he does _not_ have the issue on his 7260 in his Acer subnotebook.

Could this be a ThinkPad specific thing perhaps? And the Zotac box having a separate issue? Sounds a bit far fetched to be honest.
Comment 10 Luca Coelho 2016-06-17 10:13:33 UTC
Thanks, I'll take a look into those.

About privacy, take a look here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects

If you want, you can encrypt it with one of the keys provided there (mine, Emmanuel's or Johannes').  I don't think there is any way to figure out your key from those logs, but there are other private things (like mac addresses etc.) that may be considered sensitive.

I'd suggest that you encrypt the file with our keys.
Comment 11 Luca Coelho 2016-06-17 10:21:59 UTC
These NICs (7260, 7265 and 3160) are very similar, so I doubt the difference is the NIC themselves.  It could be an RF issue, which could explain the problem.  The 3160 (Zotac) has 1:1 antennas, the other two are 2:2.
Comment 12 Ruben De Smet 2016-06-17 10:57:26 UTC
(In reply to Luca Coelho from comment #10)
> Thanks, I'll take a look into those.
> 
> About privacy, take a look here:
> https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/
> debugging#privacy_aspects
> 
> If you want, you can encrypt it with one of the keys provided there (mine,
> Emmanuel's or Johannes').  I don't think there is any way to figure out your
> key from those logs, but there are other private things (like mac addresses
> etc.) that may be considered sensitive.
> 
> I'd suggest that you encrypt the file with our keys.

I'd say I'll leave the file as is. Harm has been done by making it public already. When the bug is resolved, I'll take it offline again.

(In reply to Luca Coelho from comment #11)
> These NICs (7260, 7265 and 3160) are very similar, so I doubt the difference
> is the NIC themselves.  It could be an RF issue, which could explain the
> problem.  The 3160 (Zotac) has 1:1 antennas, the other two are 2:2.

The most weird thing is, the 7260 doesn't have the problem; I could try swapping out my 7265 and my brother's 7260 too see whether he will get the issue too? This one is very weird to me.
Comment 13 Emmanuel Grumbach 2016-07-11 07:18:16 UTC
What happens here is that we roam from 2.4GHz to 5.2GHz for a strange reason since the rssi on 2.4GHz is pretty strong.
Can you disable one for the band to see that it helps?
If it does, then we'd need to check what the supplicant decides to roam.
Comment 14 Ruben De Smet 2016-07-23 20:02:37 UTC
(In reply to Emmanuel Grumbach from comment #13)
> What happens here is that we roam from 2.4GHz to 5.2GHz for a strange reason
> since the rssi on 2.4GHz is pretty strong.
> Can you disable one for the band to see that it helps?
> If it does, then we'd need to check what the supplicant decides to roam.

Still on vacation. Will report back starting next month.

I can already tell you this:

I have two accesspoints in that building. The TP-Link, which has both 2.4 and 5.2, and one Linksys WRT54, which only has 2.4. But while testing, I was closest to the TP all the time though; when near the WRT54, no roaming happens afaict.
Comment 15 Ruben De Smet 2016-08-05 19:26:17 UTC
I shut down 5 GHz on the TP-Link, and the Kodi box boots flawlessly now. Waiting for result from both my and my mother's laptop, but this is promising indeed.

How can we find out why the supplicant wants to roam? I don't think it's distance. There's one wall, but that's it.
Comment 16 Emmanuel Grumbach 2016-08-15 16:44:13 UTC
We'd need to have a higher debug level in the supplicant. It might be caused by a wrong Rssi report from the driver.
Comment 17 Ruben De Smet 2016-08-15 16:47:49 UTC
(In reply to Emmanuel Grumbach from comment #16)
> We'd need to have a higher debug level in the supplicant. It might be caused
> by a wrong Rssi report from the driver.

Am I correct to follow "Firmware Debugging" on the https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#tracing page?
Comment 18 Emmanuel Grumbach 2016-08-15 18:04:38 UTC
Nope. I talked about supplicant logs, not firmware logs.
Comment 19 Ruben De Smet 2016-09-03 13:02:14 UTC
I have a feeling that it's a TP-LINK bug. I flashed openwrt on the router and now everything is fine. Even the Windows machines can connect to it.

Thanks for the hint on roaming from 2.4 to 5GHz though, that's what gave out some hints.