Bug 94341

Summary: iwlwifi: dvm: 1030 randomly disconnecting MWG100231518
Product: Drivers Reporter: Viktor Pal (viktorpal)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: ilw, linville
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 3.16 Subsystem:
Regression: No Bisected commit-id:
Attachments: Contains logs that happen when the connection drops and then during reconnecting
Dmesg output
Dmesg snippet with comments
fix

Description Viktor Pal 2015-03-05 21:10:00 UTC
Created attachment 169351 [details]
Contains logs that happen when the connection drops and then during reconnecting

I'm using a Dell Inspiron 15R (N5110) laptop with following network card:
Lshw output:
description: Wireless interface
product: Centrino Wireless-N 1030 [Rainbow Peak]
vendor: Intel Corporation
physical id: 0
bus info: pci@0000:09:00.0
logical name: wlan0
version: 34
serial: bc:77:37:46:b1:0d
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless
configuration: broadcast=yes driver=iwlwifi driverversion=3.13.0-37-generic firmware=18.168.6.1 ip=192.168.1.4 latency=0 link=yes multicast=yes wireless=IEEE 802.11bgn
resources: irq:56 memory:f7a00000-f7a01fff

Lspci output:
09:00.0 Network controller: Intel Corporation Centrino Wireless-N 1030 [Rainbow Peak] (rev 34)
Subsystem: Intel Corporation Centrino Wireless-N 1030 BGN
Flags: bus master, fast devsel, latency 0, IRQ 56
Memory at f7a00000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [e0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number bc-77-37-ff-ff-46-b1-0d
Kernel driver in use: iwlwifi

The card randomly disconnects from the AP.
After it disconnects I have to disable and enable wireless in network manager to be able to reconnect again.

What I see in the logs while this is happening can be seen in the attached file.

Adding the following module options to the modprobe configuration seems to mostly solve the problem:
options iwlmvm power_scheme=1
options iwlwifi bt_coex_active=N swcrypto=1 11n_disable=1

Though I'm still seeing some disconnects, and the card does not reconnect after suspend and I have to do the mentioned disable/enable process described above.

This driver card been working flawlessly for several years with Ubuntu 12.04 with the 3.2 kernel.

I have also filed this bug here:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1420935

I'm more than happy to provide any additional information to get this fixed.
Comment 1 Emmanuel Grumbach 2015-03-08 11:19:56 UTC
Please share your dmesg output.

Also - please make sure you have:

commit a0855054e59b0c5b2b00237fdb5147f7bcc18efb
Author: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date:   Sun Oct 5 09:11:14 2014 +0300

    iwlwifi: dvm: drop non VO frames when flushing

This commit should have been ported to 3.16 by your distribution.
Comment 2 Viktor Pal 2015-03-09 20:19:06 UTC
Created attachment 170021 [details]
Dmesg output
Comment 3 Emmanuel Grumbach 2015-03-09 20:24:30 UTC
If there is an issue (not clear from the dmesg), it will most likely be a firmware issue and hence will not be fixed.
You can nevertheless try to record a WiFi sniffer of the problem so that we can get a better understanding of what's going on.
Comment 4 Viktor Pal 2015-03-09 20:41:11 UTC
It seems that the kernel I'm using contains the mentioned patch according to the changelog:
# zgrep -A1 dvm /usr/share/doc/linux-image-3.16.0-31-generic/changelog.Debian.gz
  * iwlwifi: dvm: fix flush support for old firmware
    - LP: #1419125
--
  * iwlwifi: dvm: drop non VO frames when flushing
    - LP: #1393401

This is the bug refereed in the change log:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1393401

I also suspect that this might be related to suspending the laptop.
I just reinstalled my laptop yesterday and had several restarts after the installation, after the final restart I used the laptop for hours without disconnect.
Today morning when the laptop came back from suspend it was not able to reconnect, I had to disable and enable wireless in network manager.
I just used the laptop for a short time in the morning, then suspended it again.
Now (in the evening) I woke the laptop up from hibernation and I have quite frequent disconnects.

Can you please suggest me an exact program which I can use for doing the sniffing?
If that is really a firmware issue what can I do?

Thanks.
Comment 5 Emmanuel Grumbach 2015-03-09 20:44:35 UTC
For sniffing you'd need an additional WiFi device and you'd need to put it in monitor mode on the right frequency.
If that's a firmware issue, there isn't much we can do.
The only thing I can think about is to re-run the calibration after suspend resume.
Can you test a patch?
Comment 6 Viktor Pal 2015-03-09 21:01:12 UTC
Okay, sniffing seems not to be executable right now as I have no additional device for that at home.
Regarding trying a patch, of course, that could work.
Would it be okay to patch the sources of the kernel I'm currently running (downloading the source package) and compiling it with my distributions configuration?

I just had a disconnect again. I'm attaching an additional dmesg snippet with comments about what happened when.
Comment 7 Viktor Pal 2015-03-09 21:02:20 UTC
Created attachment 170031 [details]
Dmesg snippet with comments
Comment 8 Emmanuel Grumbach 2015-03-09 21:03:22 UTC
It'd be ok I think since the driver you are using hasn't changed much for a while.
Comment 9 Emmanuel Grumbach 2015-03-09 21:05:56 UTC
please try this:

diff --git a/drivers/net/wireless/iwlwifi/dvm/ucode.c b/drivers/net/wireless/iwlwifi/dvm/ucode.c
index 4dbef7e..bb1322b 100644
--- a/drivers/net/wireless/iwlwifi/dvm/ucode.c
+++ b/drivers/net/wireless/iwlwifi/dvm/ucode.c
@@ -440,8 +440,6 @@ int iwl_run_init_ucode(struct iwl_priv *priv)
         */
        ret = iwl_wait_notification(&priv->notif_wait, &calib_wait,
                                        UCODE_CALIB_TIMEOUT);
-       if (!ret)
-               priv->init_ucode_run = true;
 
        goto out;
Comment 10 Viktor Pal 2015-03-10 20:57:37 UTC
Compiled and installed the patched kernel.
Let's see if that fixes the problem.
Comment 11 Emmanuel Grumbach 2015-03-13 06:24:29 UTC
Any news here?
Comment 12 Viktor Pal 2015-03-13 07:07:55 UTC
Yes, it seems to be much better now, but let me also test it over the weekend and come back after that.
Comment 13 Emmanuel Grumbach 2015-03-16 07:13:59 UTC
Created attachment 170711 [details]
fix

final version of the fix.
Comment 14 Viktor Pal 2015-03-17 21:57:57 UTC
So I experienced some disconnects, but very few, once a day maybe.
To be 100% sure I wold have to go back to a previous kernel for a few days to gather some more evidence and then come back to the patched one for another few days.
I'm just a bit suspicious because I would have expected a complete fix based on the patch (the removed code) and I'm not sure if the improvement is really caused by the patch or some other unknown factor.

But to summarize it is for sure not worse than it was before in any way.
Comment 15 Emmanuel Grumbach 2015-03-18 05:52:21 UTC
The fix seems trivial, but it is not.
What the fix does is impacting the physical layer behavior which means that it impacts the less predictable component in the system. WiFi disconnects from time to time. This is totally fine. I won't consider this as a real bug. The disconnections you had were a bug.
What I do with that patch is that I re-calibrate the PHY layer every time you exit suspend which makes the PHY layer more accurate.

I'll leave the bug open for a few more days but I'll send the patch upstream anyway. An open bug consumes resources and this bug seems to be fixed from my point of view, I'll close it next Sunday if I don't get any show stopper from your side.
Comment 16 Viktor Pal 2015-03-19 23:09:07 UTC
In that case I think that it is completely okay to close the bug and consider it as fixed.
The disconnects are really very rare. For example I had none in the last 2-3 days while my laptop was running all night long.
Although I restarted my laptop today and had one disconnect after 3 hours and this is what I saw in dmesg after the disconnect (maybe it helps to decide if it is related to what we have seen before):
[10916.367646] cfg80211: Calling CRDA to update world regulatory domain
[10916.440559] cfg80211: World regulatory domain updated:
[10916.440564] cfg80211:  DFS Master region: unset
[10916.440566] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[10916.440569] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
[10916.440572] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
[10916.440573] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm), (N/A)
[10916.440575] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
[10916.440577] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)

Thanks for sorting this out and fixing it finally.
That is a great help and great support and is mostly appreciated.
Comment 17 Viktor Pal 2015-03-21 19:28:18 UTC
Sorry for coming back to this thread again.
I experienced these frequent reconnects that I experienced before again.
I presumed (based on what you said about the patch) that suspending the laptop and waking it up again would stop these reconnects and it actually did.
So it seems that something that causes this (either the firmware or something else) is still "hiding" there, but the suspend > wakeup, maybe because of the re-calibration helped.
Just wanted to add this here so you have accurate information.
Comment 18 Emmanuel Grumbach 2015-03-21 20:44:50 UTC
yeah - so I am not really surprised.
What it means is that the firmware can't really "stay calibrated" for too long.
This is clearly a firmware bug and won't be fixed.
What you can do is to suspend resume, or to crash manually the firmware and it will re-calibrate. To do so:

echo 1 > /sys/kernel/debug/iwlwifi/0000\:03\:00.0/iwldvm/debug/fw_restart 


of course, you'd need to replace the 03\:... with your device's pci number.
Comment 19 Viktor Pal 2015-11-18 10:03:32 UTC
Just to follow up on this.
I upgraded a few weeks before to Ubuntu 15.10 and it looks like this bug is completely fixed.

It has the following kernel:
Linux 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 x86_64 GNU/Linux

Wifi works stable without any additional tweaks.

Happy that this was fixed, thanks.