Bug 116891 - iwlwifi: 7260: fail to run INIT ucode
Summary: iwlwifi: 7260: fail to run INIT ucode
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-21 23:12 UTC by cartwright
Modified: 2016-09-22 05:47 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.6.4-1-ARCH
Subsystem:
Regression: No
Bisected commit-id:


Attachments
DMESG output showing a failed boot and successful reloading of iwlwifi (66.91 KB, text/plain)
2016-04-21 23:12 UTC, cartwright
Details
Debug Log Output (885.62 KB, text/plain)
2016-05-19 05:17 UTC, cartwright
Details
Updated log (850.58 KB, text/plain)
2016-05-27 04:09 UTC, cartwright
Details
Complete boot log (1.10 MB, text/plain)
2016-05-27 04:12 UTC, cartwright
Details
DMESG output after applying patches (194.30 KB, text/plain)
2016-08-03 04:18 UTC, cartwright
Details

Description cartwright 2016-04-21 23:12:05 UTC
Created attachment 213651 [details]
DMESG output showing a failed boot and successful reloading of iwlwifi

I have a Dell Precision M3800 Laptop and most of the time when it boots, the loading the iwlwifi module fails.  If I reload the module after a failed booting it always works.

I can't figure out why it does this.  The main error appears to be

Apr 14 21:26:13 leda.scit.us kernel: iwlwifi 0000:06:00.0: Failed to run INIT ucode: -110

The firmware version is 16.242414.0

Searching for this error hasn't produced a successful, permanent fix.

Successful Boot
--------------
Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: enabling device (0000 -> 0002)
Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2
Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm
Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:35:01 leda.scit.us kernel: iwlwifi 0000:06:00.0 wlp6s0: renamed from wlan0
Apr 14 21:35:01 leda.scit.us NetworkManager[797]: <info>  rfkill2: found WiFi radio killswitch (at /sys/devices/pci0000:00/0000:00:1c.2/0000:06:00.0/ieee80211/phy0/rfkill2) (driver iwlwifi)
Apr 14 21:35:01 leda.scit.us NetworkManager[797]: <info>  (wlp6s0): new 802.11 WiFi device (carrier: UNKNOWN, driver: 'iwlwifi', ifindex: 2)
Apr 14 21:35:01 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:35:01 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:35:02 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:35:02 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled

Failed Boot
-----------
Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: enabling device (0000 -> 0002)
Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2
Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm
Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:26:13 leda.scit.us kernel: iwlwifi 0000:06:00.0: Failed to run INIT ucode: -110
Apr 14 21:26:13 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled

Reloading Module Fixes It
-------------------------
Opening up a terminal and issue the following command always fixes the wifi.

# sudo rmmod iwlmvm iwlwifi && sudo modprobe iwlwifi

Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2
Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm
Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:27:33 leda.scit.us NetworkManager[854]: <info>  rfkill2: found WiFi radio killswitch (at /sys/devices/pci0000:00/0000:00:1c.2/0000:06:00.0/ieee80211/phy1/rfkill2) (driver iwlwifi)
Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0 wlp6s0: renamed from wlan0
Apr 14 21:27:33 leda.scit.us NetworkManager[854]: <info>  (wlan0): new 802.11 WiFi device (carrier: UNKNOWN, driver: 'iwlwifi', ifindex: 2)
Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:27:34 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Apr 14 21:27:34 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Comment 1 Emmanuel Grumbach 2016-04-22 04:17:36 UTC
Please reproduce with debug=0xffffffff as a module parameter to iwlwifi.
Comment 2 cartwright 2016-04-22 05:15:53 UTC
How do I do that?  I added iwlwifi.debug=0xffffffff as a kernel parameter on boot, but I get the message "iwlwifi: unknown parameter 'debug' ignored".  I tried looking on https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging but the links under "How to debug?" go nowhere.
Comment 3 Emmanuel Grumbach 2016-05-01 06:55:13 UTC
You need to have CONFIG_IWLWIFI_DEBUG enabled.
Comment 4 cartwright 2016-05-19 05:17:02 UTC
Created attachment 216671 [details]
Debug Log Output

I finally figure out how to recompile the module for my machine.

Attached is a log of

1) iwlwifi failing to initialize on boot
2) reloading the driver after boot, which succeeds
3) reloading again to turn off debug info since I don't need it anymore this boot.
Comment 5 Emmanuel Grumbach 2016-05-19 06:12:38 UTC
Can you try to apply this patch and try again?

I really can't understand what is going on here.

Thanks.

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
index 80a0c3d..377ea55 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
@@ -1269,6 +1269,8 @@ static bool iwl_mvm_set_hw_rfkill_state(struct iwl_op_mode *op_mode, bool state)
        struct iwl_mvm *mvm = IWL_OP_MODE_GET_MVM(op_mode);
        bool calibrating = ACCESS_ONCE(mvm->calibrating);
 
+       pr_err("%s %d %d\n", __func__, __LINE__, state);
+
        if (state)
                set_bit(IWL_MVM_STATUS_HW_RFKILL, &mvm->status);
        else
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 15bb7ae..cf1b179 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1124,6 +1124,8 @@ static void _iwl_trans_pcie_stop_device(struct iwl_trans *trans, bool low_power)
 
        lockdep_assert_held(&trans_pcie->mutex);
 
+       dump_stack();
+
        if (trans_pcie->is_down)
                return;
 
@@ -1370,8 +1372,10 @@ void iwl_trans_pcie_rf_kill(struct iwl_trans *trans, bool state)
 
        lockdep_assert_held(&trans_pcie->mutex);
 
-       if (iwl_op_mode_hw_rf_kill(trans->op_mode, state))
+       if (iwl_op_mode_hw_rf_kill(trans->op_mode, state)) {
+               pr_err("%s %d %d\n", __func__, __LINE__, state);        
                _iwl_trans_pcie_stop_device(trans, true);
+       }
 }
 
 static void iwl_trans_pcie_d3_suspend(struct iwl_trans *trans, bool test,
Comment 6 cartwright 2016-05-27 04:09:11 UTC
Created attachment 217821 [details]
Updated log

I couldn't apply your patch directly, but I manually made the changes to the kernel 4.5.4 source code that I'm using.

Here is an updated log file made with `journalctl -b | grep iwlwifi`.
Comment 7 cartwright 2016-05-27 04:12:38 UTC
Created attachment 217831 [details]
Complete boot log

Since the previous log appears to be missing some of the new output, I've attached the full boot log.
Comment 8 Emmanuel Grumbach 2016-05-27 05:08:56 UTC
something looks really weird here.
I'll need to get back to you with another patch.... after my week end :)
Comment 9 Emmanuel Grumbach 2016-07-11 05:44:38 UTC
I am very sorry for the time it took to get back to you.
I am not sure how much time I will have to work on this bug, but I'd like you to try our latest code. We saw (and fixed) issues that could very well be the same as the issue you are facing.

You can find our latest code as a backport tree here:
https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/

Please take the master branch and let us know.

We found very subtle races in the firmware loading flow which is exactly the flow that is failing for you.

Thank you!
Comment 10 cartwright 2016-07-12 04:17:43 UTC
Do you have instructions on how to properly compile the module from the backport tree?  When I try doing it, I get the following error:

$ uname -a
Linux leda.scit.us 4.6.3-1-ARCH #1 SMP PREEMPT Fri Jun 24 21:19:13 CEST 2016 x86_64 GNU/Linux

$ make menuconfig
$ make M=drivers/net/wireless/intel/iwlwifi

make[5]: 'conf' is up to date.
#
# configuration written to .config
#
Building backport-include/backport/autoconf.h ... done.
  CC [M]  /home/reed/Kernel/backport-iwlwifi/compat/main.o
  CC [M]  /home/reed/Kernel/backport-iwlwifi/compat/backport-4.7.o
In file included from include/linux/sched.h:57:0,
                 from include/linux/kasan.h:4,
                 from include/linux/slab.h:118,
                 from /home/reed/Kernel/backport-iwlwifi/backport-include/linux/slab.h:3,
                 from include/linux/textsearch.h:8,
                 from include/linux/skbuff.h:30,
                 from /home/reed/Kernel/backport-iwlwifi/backport-include/linux/skbuff.h:3,
                 from /home/reed/Kernel/backport-iwlwifi/compat/backport-4.7.c:11:
include/net/scm.h: In function ‘scm_recv’:
/home/reed/Kernel/backport-iwlwifi/backport-include/linux/cred.h:7:44: error: ‘struct nsproxy’ has no member named ‘user_ns’; did you mean ‘uts_ns’?
 #define current_user_ns() (current->nsproxy->user_ns)
                                            ^
include/net/scm.h:121:39: note: in expansion of macro ‘current_user_ns’
   struct user_namespace *current_ns = current_user_ns();
                                       ^~~~~~~~~~~~~~~
make[6]: *** [scripts/Makefile.build:292: /home/reed/Kernel/backport-iwlwifi/compat/backport-4.7.o] Error 1
make[5]: *** [scripts/Makefile.build:440: /home/reed/Kernel/backport-iwlwifi/compat] Error 2
make[4]: *** [Makefile:1429: _module_/home/reed/Kernel/backport-iwlwifi] Error 2
make[3]: *** [Makefile.build:6: modules] Error 2
make[2]: *** [Makefile.real:100: modules] Error 2
make[1]: *** [Makefile:43: modules] Error 2
make: *** [Makefile:30: default] Error 2
Comment 11 Emmanuel Grumbach 2016-07-12 06:05:20 UTC
This can happen when the upstream backport project breaks or when distribution merge patches that change APIs.

I can suggest you move to a stable vanilla kernel, or to try to fix the compilation error manually.

All the fixes I was thinking about are included in iwlwifi-next.git, but I wouldn't recommend you run that kernel as is.
Comment 12 cartwright 2016-07-13 04:27:59 UTC
I don't think I have the time to figure out how to get this to work with Arch Linux. I guess I will wait for Arch to pull in the updates.
Comment 14 cartwright 2016-07-19 09:33:07 UTC
I installed those patches on the 4.6.4 kernel.

Fingers crossed.
Comment 15 cartwright 2016-07-30 03:50:07 UTC
With the patches installed on the 4.6.4 kernel, I'm still experiencing the same intermittent failures.
Comment 16 Emmanuel Grumbach 2016-07-30 18:13:31 UTC
Same reproduction rate?

Can you share the dmesg output?
Comment 17 cartwright 2016-08-01 17:45:29 UTC
Same reproduction rate.

Do you need to DEBUG information or just the dmesg output?
Comment 18 Emmanuel Grumbach 2016-08-01 17:46:58 UTC
let's start by the dmesg output, although it will surely not suffice.
Comment 19 cartwright 2016-08-03 04:18:43 UTC
Created attachment 227421 [details]
DMESG output after applying patches
Comment 20 Emmanuel Grumbach 2016-08-03 05:28:46 UTC
Ok, as expected, can you try again with the patches with debug=0xffffffff ?
Comment 21 Emmanuel Grumbach 2016-08-16 16:00:33 UTC
Any news?
Comment 22 cartwright 2016-08-20 04:27:10 UTC
I'm still working on finding time to install patched drivers with debug mode enabled.  Maybe sometime next week.
Comment 23 Luca Coelho 2016-09-22 05:25:16 UTC
Ping?
Comment 24 cartwright 2016-09-22 05:30:10 UTC
I've been running the 4.7.2+ kernel for a few weeks and haven't noticed it. I suspect the issue has been fixed.
Comment 25 Luca Coelho 2016-09-22 05:46:37 UTC
Yes, we have a fix in v4.8, commit f16c3ebfa64f ("iwlwifi: pcie: fix a race in firmware loading flow"), which was backported to 4.7.2 as commit 3de99eb07b92.

Unfortunately, neither 4.5 (where the bug was introduced) nor 4.6 were being updated anymore when the backport reached stable releases.  So, the only recommendation we can give for those kernels is to upgrade to 4.7.2+.

Note You need to log in before you can comment on or make changes to this bug.