Created attachment 213651 [details] DMESG output showing a failed boot and successful reloading of iwlwifi I have a Dell Precision M3800 Laptop and most of the time when it boots, the loading the iwlwifi module fails. If I reload the module after a failed booting it always works. I can't figure out why it does this. The main error appears to be Apr 14 21:26:13 leda.scit.us kernel: iwlwifi 0000:06:00.0: Failed to run INIT ucode: -110 The firmware version is 16.242414.0 Searching for this error hasn't produced a successful, permanent fix. Successful Boot -------------- Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: enabling device (0000 -> 0002) Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2 Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144 Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:34:57 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:35:01 leda.scit.us kernel: iwlwifi 0000:06:00.0 wlp6s0: renamed from wlan0 Apr 14 21:35:01 leda.scit.us NetworkManager[797]: <info> rfkill2: found WiFi radio killswitch (at /sys/devices/pci0000:00/0000:00:1c.2/0000:06:00.0/ieee80211/phy0/rfkill2) (driver iwlwifi) Apr 14 21:35:01 leda.scit.us NetworkManager[797]: <info> (wlp6s0): new 802.11 WiFi device (carrier: UNKNOWN, driver: 'iwlwifi', ifindex: 2) Apr 14 21:35:01 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:35:01 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:35:02 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:35:02 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Failed Boot ----------- Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: enabling device (0000 -> 0002) Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2 Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144 Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:26:08 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:26:13 leda.scit.us kernel: iwlwifi 0000:06:00.0: Failed to run INIT ucode: -110 Apr 14 21:26:13 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Reloading Module Fixes It ------------------------- Opening up a terminal and issue the following command always fixes the wifi. # sudo rmmod iwlmvm iwlwifi && sudo modprobe iwlwifi Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2 Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144 Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:27:33 leda.scit.us NetworkManager[854]: <info> rfkill2: found WiFi radio killswitch (at /sys/devices/pci0000:00/0000:00:1c.2/0000:06:00.0/ieee80211/phy1/rfkill2) (driver iwlwifi) Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0 wlp6s0: renamed from wlan0 Apr 14 21:27:33 leda.scit.us NetworkManager[854]: <info> (wlan0): new 802.11 WiFi device (carrier: UNKNOWN, driver: 'iwlwifi', ifindex: 2) Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:27:33 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:27:34 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled Apr 14 21:27:34 leda.scit.us kernel: iwlwifi 0000:06:00.0: L1 Enabled - LTR Enabled
Please reproduce with debug=0xffffffff as a module parameter to iwlwifi.
How do I do that? I added iwlwifi.debug=0xffffffff as a kernel parameter on boot, but I get the message "iwlwifi: unknown parameter 'debug' ignored". I tried looking on https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging but the links under "How to debug?" go nowhere.
You need to have CONFIG_IWLWIFI_DEBUG enabled.
Created attachment 216671 [details] Debug Log Output I finally figure out how to recompile the module for my machine. Attached is a log of 1) iwlwifi failing to initialize on boot 2) reloading the driver after boot, which succeeds 3) reloading again to turn off debug info since I don't need it anymore this boot.
Can you try to apply this patch and try again? I really can't understand what is going on here. Thanks. diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index 80a0c3d..377ea55 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -1269,6 +1269,8 @@ static bool iwl_mvm_set_hw_rfkill_state(struct iwl_op_mode *op_mode, bool state) struct iwl_mvm *mvm = IWL_OP_MODE_GET_MVM(op_mode); bool calibrating = ACCESS_ONCE(mvm->calibrating); + pr_err("%s %d %d\n", __func__, __LINE__, state); + if (state) set_bit(IWL_MVM_STATUS_HW_RFKILL, &mvm->status); else diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index 15bb7ae..cf1b179 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -1124,6 +1124,8 @@ static void _iwl_trans_pcie_stop_device(struct iwl_trans *trans, bool low_power) lockdep_assert_held(&trans_pcie->mutex); + dump_stack(); + if (trans_pcie->is_down) return; @@ -1370,8 +1372,10 @@ void iwl_trans_pcie_rf_kill(struct iwl_trans *trans, bool state) lockdep_assert_held(&trans_pcie->mutex); - if (iwl_op_mode_hw_rf_kill(trans->op_mode, state)) + if (iwl_op_mode_hw_rf_kill(trans->op_mode, state)) { + pr_err("%s %d %d\n", __func__, __LINE__, state); _iwl_trans_pcie_stop_device(trans, true); + } } static void iwl_trans_pcie_d3_suspend(struct iwl_trans *trans, bool test,
Created attachment 217821 [details] Updated log I couldn't apply your patch directly, but I manually made the changes to the kernel 4.5.4 source code that I'm using. Here is an updated log file made with `journalctl -b | grep iwlwifi`.
Created attachment 217831 [details] Complete boot log Since the previous log appears to be missing some of the new output, I've attached the full boot log.
something looks really weird here. I'll need to get back to you with another patch.... after my week end :)
I am very sorry for the time it took to get back to you. I am not sure how much time I will have to work on this bug, but I'd like you to try our latest code. We saw (and fixed) issues that could very well be the same as the issue you are facing. You can find our latest code as a backport tree here: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/ Please take the master branch and let us know. We found very subtle races in the firmware loading flow which is exactly the flow that is failing for you. Thank you!
Do you have instructions on how to properly compile the module from the backport tree? When I try doing it, I get the following error: $ uname -a Linux leda.scit.us 4.6.3-1-ARCH #1 SMP PREEMPT Fri Jun 24 21:19:13 CEST 2016 x86_64 GNU/Linux $ make menuconfig $ make M=drivers/net/wireless/intel/iwlwifi make[5]: 'conf' is up to date. # # configuration written to .config # Building backport-include/backport/autoconf.h ... done. CC [M] /home/reed/Kernel/backport-iwlwifi/compat/main.o CC [M] /home/reed/Kernel/backport-iwlwifi/compat/backport-4.7.o In file included from include/linux/sched.h:57:0, from include/linux/kasan.h:4, from include/linux/slab.h:118, from /home/reed/Kernel/backport-iwlwifi/backport-include/linux/slab.h:3, from include/linux/textsearch.h:8, from include/linux/skbuff.h:30, from /home/reed/Kernel/backport-iwlwifi/backport-include/linux/skbuff.h:3, from /home/reed/Kernel/backport-iwlwifi/compat/backport-4.7.c:11: include/net/scm.h: In function ‘scm_recv’: /home/reed/Kernel/backport-iwlwifi/backport-include/linux/cred.h:7:44: error: ‘struct nsproxy’ has no member named ‘user_ns’; did you mean ‘uts_ns’? #define current_user_ns() (current->nsproxy->user_ns) ^ include/net/scm.h:121:39: note: in expansion of macro ‘current_user_ns’ struct user_namespace *current_ns = current_user_ns(); ^~~~~~~~~~~~~~~ make[6]: *** [scripts/Makefile.build:292: /home/reed/Kernel/backport-iwlwifi/compat/backport-4.7.o] Error 1 make[5]: *** [scripts/Makefile.build:440: /home/reed/Kernel/backport-iwlwifi/compat] Error 2 make[4]: *** [Makefile:1429: _module_/home/reed/Kernel/backport-iwlwifi] Error 2 make[3]: *** [Makefile.build:6: modules] Error 2 make[2]: *** [Makefile.real:100: modules] Error 2 make[1]: *** [Makefile:43: modules] Error 2 make: *** [Makefile:30: default] Error 2
This can happen when the upstream backport project breaks or when distribution merge patches that change APIs. I can suggest you move to a stable vanilla kernel, or to try to fix the compilation error manually. All the fixes I was thinking about are included in iwlwifi-next.git, but I wouldn't recommend you run that kernel as is.
I don't think I have the time to figure out how to get this to work with Arch Linux. I guess I will wait for Arch to pull in the updates.
In that case, I am afraid I'll have to close the bug. You can re-open it if you need. The fixes you needed are: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/iwlwifi-next.git/commit/?id=f16c3ebfa64fdf0e2dc88e6baa72da95ab70ffd7 https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/iwlwifi-next.git/commit/?id=2aabdbdc17b7c53490337bfc58de3409c84d85d2
I installed those patches on the 4.6.4 kernel. Fingers crossed.
With the patches installed on the 4.6.4 kernel, I'm still experiencing the same intermittent failures.
Same reproduction rate? Can you share the dmesg output?
Same reproduction rate. Do you need to DEBUG information or just the dmesg output?
let's start by the dmesg output, although it will surely not suffice.
Created attachment 227421 [details] DMESG output after applying patches
Ok, as expected, can you try again with the patches with debug=0xffffffff ?
Any news?
I'm still working on finding time to install patched drivers with debug mode enabled. Maybe sometime next week.
Ping?
I've been running the 4.7.2+ kernel for a few weeks and haven't noticed it. I suspect the issue has been fixed.
Yes, we have a fix in v4.8, commit f16c3ebfa64f ("iwlwifi: pcie: fix a race in firmware loading flow"), which was backported to 4.7.2 as commit 3de99eb07b92. Unfortunately, neither 4.5 (where the bug was introduced) nor 4.6 were being updated anymore when the backport reached stable releases. So, the only recommendation we can give for those kernels is to upgrade to 4.7.2+.