Bug 108271

Summary: iwlwifi-7265: connection freezes with firmware version 17.246894.0
Product: Drivers Reporter: Stijn Tintel (stijn+bugs)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED DUPLICATE    
Severity: normal    
Priority: P1    
Hardware: Intel   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=97291
https://bugzilla.kernel.org/show_bug.cgi?id=107861
https://bugzilla.kernel.org/show_bug.cgi?id=103531
https://bugzilla.kernel.org/show_bug.cgi?id=107471
Kernel Version: 4.3.0 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: trace.dat
trace.dat with uapsd_disable=1
trace.dat with uapsd_disable=1

Description Stijn Tintel 2015-11-21 21:33:29 UTC
A few days ago I noticed there were new firmware versions available for iwlwifi-7265, so I installed them on my machine. I was using version 15.195093.0 before, which was relatively stable. After reloading the kernel modules, the v17 firmware was used:

iwlwifi 0000:02:00.0: api_index larger than supported by driver
iwlwifi 0000:02:00.0: loaded firmware version 17.246894.0 op_mode iwlmvm

After a few hours I noticed my connection froze; still associated but no traffic possible. This was solved with rfkill block and rfkill unblock (via Fn+PrtScr on my XPS13):

nov 19 06:22:36 sylvester.nomad.adlevio.net kernel: iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
nov 19 06:22:36 sylvester.nomad.adlevio.net kernel: iwlwifi 0000:02:00.0: RF_KILL bit toggled to enable radio.
nov 19 06:22:40 sylvester.nomad.adlevio.net kernel: wlan0: associated

This happened a 2nd time:

nov 19 12:38:01 sylvester.nomad.adlevio.net kernel: wlan0: associated
nov 19 15:26:07 sylvester.nomad.adlevio.net kernel: iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
nov 19 15:26:07 sylvester.nomad.adlevio.net kernel: iwlwifi 0000:02:00.0: RF_KILL bit toggled to enable radio.
nov 19 15:26:11 sylvester.nomad.adlevio.net kernel: wlan0: associated

I had this problem before with the v14 firmware (that was much worse, happened after a few minutes, sometimes even less than a minute), so I immediately suspected the new firmware to be the problem.

I then removed the iwlwifi-7265D-17.ucode file and reloaded the kernel modules again. Since then it using version 16.242414.0. I did not see the problem with this firmware in the last 2 days.
Comment 1 Emmanuel Grumbach 2015-11-22 08:21:00 UTC
Please run tracing as explained here:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging

Thanks.
Comment 2 Emmanuel Grumbach 2015-11-22 10:28:36 UTC
If you need hours to reproduce the problem, tracing might not be the right tool.
Let's start with a dmesg output. You don't see anything special from iwlwifi there?
Comment 3 Stijn Tintel 2015-11-22 23:42:56 UTC
(In reply to Emmanuel Grumbach from comment #2)
> Let's start with a dmesg output. You don't see anything special from iwlwifi
> there?

Nothing in dmesg at all, just RF_KILL when I disable wireless to solve the problem:
nov 19 06:17:25 sylvester.nomad.adlevio.net kernel: kvm: zapping shadow pages for mmio generation wraparound
nov 19 06:22:36 sylvester.nomad.adlevio.net kernel: iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.

I will install the v17 firmware again, try to reproduce it and run tracing when it happens.
Comment 4 Emmanuel Grumbach 2015-11-25 21:10:05 UTC
Any news?
Did you try to disable 11n (11n_disable=1 as a module parameter to iwlwifi)?
Comment 5 Stijn Tintel 2015-11-26 20:07:24 UTC
Created attachment 195551 [details]
trace.dat
Comment 6 Stijn Tintel 2015-11-26 20:14:30 UTC
After installing the v17 firmware again, I was unable to reproduce the problem. I then remembered I had enabled U-APSD again to test if the latency spikes I had on my AP (DAP-2695) would be fixed. It still took a while, but today the problem happened twice. The laptop is in a different location than where I usually use it, so it's possible that uapsd_disable=1 is unrelated, not sure.

I ran "trace-cmd record -e iwlwifi" and ran a ping to the def gw when I noticed the connection hang.
Comment 7 Emmanuel Grumbach 2015-11-26 20:53:03 UTC
Ok thanks. I'll take a look. Note that there are serious interrop issues with uAPSD, this is why it is disable by default.
Comment 8 Emmanuel Grumbach 2015-11-29 08:22:27 UTC
If the problem doesn't reproduce with uapsd_disable=1 (as default), please close this issue.
Comment 9 Stijn Tintel 2015-11-30 20:35:24 UTC
Problem just happened again with uapsd_disable=1.

sylvester ~ # cat /sys/module/iwlwifi/parameters/uapsd_disable
Y
Comment 10 Stijn Tintel 2015-11-30 20:36:04 UTC
Created attachment 196161 [details]
trace.dat with uapsd_disable=1
Comment 11 Emmanuel Grumbach 2015-11-30 20:41:00 UTC
I don't see anything bad in this tracing.

What are the symptoms?
You run pings and the pings stop?
Comment 12 Stijn Tintel 2015-11-30 22:01:29 UTC
Associated but unable to ping default gateway. I only start the trace when I notice the problem (ssh sessions no longer respond, websites no longer open).
Comment 13 Emmanuel Grumbach 2015-12-01 06:16:09 UTC
Are you using Bluetooth?
Can you try to disable Bluetooth?

Please let trace-cmd run for more time and do some network operations while it is running.

Thanks
Comment 14 Stijn Tintel 2015-12-04 19:13:58 UTC
Created attachment 196401 [details]
trace.dat with uapsd_disable=1

Still associated, unable to ping anything in local subnet
Tried disconnect + reconnect -> network not found
Toggle rfkill on/off -> reconnected fine
Comment 15 Stijn Tintel 2015-12-04 21:02:11 UTC
Missed the bluetooth part. I am using bluetooth, and I need it (working remote), so disabling it is difficult. Did you mean to completely disable bluetooth, or just disable bt_coex_active?
Comment 16 Emmanuel Grumbach 2015-12-05 19:28:17 UTC
I meant completely disable BT.

Please try with firmware -16.ucode. Please remove the -17.ucode version before doing so.
I also recommend you install the latest BT firmware from linux-firmware.git.
Comment 17 Emmanuel Grumbach 2015-12-21 08:01:53 UTC

*** This bug has been marked as a duplicate of bug 107471 ***