Bug 106991 - iwlwifi: 7260: Can't load the firmware when interrupted by RFKILL - MWG100249477
Summary: iwlwifi: 7260: Can't load the firmware when interrupted by RFKILL - MWG100249477
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-01 08:53 UTC by Shawn Starr
Modified: 2016-01-31 13:47 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.3.0-rc7
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Kernel bootup (dmesg) (124.58 KB, text/x-log)
2015-11-01 19:05 UTC, Shawn Starr
Details
Failed attempts to bring up wifi card (dmesg) (158.37 KB, text/x-log)
2015-11-01 22:22 UTC, Shawn Starr
Details
add more prints (1.12 KB, patch)
2015-11-08 07:58 UTC, Emmanuel Grumbach
Details | Diff
good bootup (74.37 KB, text/plain)
2015-12-06 10:12 UTC, Shawn Starr
Details

Description Shawn Starr 2015-11-01 08:53:34 UTC
Someone else with a Dell laptop reported this here: http://ubuntuforums.org/showthread.php?t=2301239

The same workaround applies for me also.

Latest BIOS from October 2015, Precision M6800:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 73)
        Subsystem: Intel Corporation Dual Band Wireless-AC 7260
        Flags: fast devsel
        Memory at f7d00000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [40] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 48-51-b7-ff-ff-bf-c2-b4
        Capabilities: [14c] Latency Tolerance Reporting
        Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 <?>
Comment 1 Shawn Starr 2015-11-01 09:06:22 UTC
[    9.162741] cfg80211: World regulatory domain updated:
[    9.163041] cfg80211:  DFS Master region: unset
[    9.163305] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[    9.163860] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[    9.164300] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[    9.164757] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[    9.165207] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[    9.165715] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
[    9.166241] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[    9.166675] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[    9.167146] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[    9.221272] Intel(R) Wireless WiFi driver for Linux
[    9.222364] Copyright(c) 2003- 2015 Intel Corporation
[    9.223580] iwlwifi 0000:03:00.0: enabling device (0000 -> 0002)
[    9.226354] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2
[    9.227832] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-16.ucode failed with error -2
[    9.229180] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-15.ucode failed with error -2
[    9.230489] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-14.ucode failed with error -2
[    9.343361] iwlwifi 0000:03:00.0: loaded firmware version 25.30.13.0 op_mode iwlmvm

[   10.210424] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
[   10.211680] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
[   10.213043] iwlwifi 0000:03:00.0: RF_KILL bit toggled to disable radio.
[   10.213083] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
....

[   15.214883] iwlwifi 0000:03:00.0: Failed to load firmware chunk!
[   15.216079] iwlwifi 0000:03:00.0: Could not load the [0] uCode section
[   15.217151] iwlwifi 0000:03:00.0: Failed to start INIT ucode: -110
[   15.218559] iwlwifi 0000:03:00.0: Failed to run INIT ucode: -110
[   15.219674] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled

 rfkill list
1: dell-wifi: Wireless LAN
        Soft blocked: yes
        Hard blocked: no
2: dell-bluetooth: Bluetooth
        Soft blocked: no
        Hard blocked: no
3: hci0: Bluetooth
        Soft blocked: no
        Hard blocked: no

Seems to not always load firmware, sometimes it works.
Comment 2 Shawn Starr 2015-11-01 09:13:03 UTC
If I remove iwlmvm and remove iwlwifi then reload iwlmvm:


[  491.884765] cfg80211: Regulatory domain changed to country: CA
[  491.885064] cfg80211:  DFS Master region: FCC
[  491.885279] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[  491.885786] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm), (N/A)
[  491.886189] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 1700 mBm), (N/A)
[  491.886697] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2400 mBm), (0 s)
[  491.887215] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2400 mBm), (0 s)
[  491.887603] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm), (N/A)
[  507.685276] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
[  519.836377] Intel(R) Wireless WiFi driver for Linux
[  519.836650] Copyright(c) 2003- 2015 Intel Corporation
[  519.837477] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2
[  519.837960] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-16.ucode failed with error -2
[  519.838446] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-15.ucode failed with error -2
[  519.838905] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-7260-14.ucode failed with error -2
[  519.839585] iwlwifi 0000:03:00.0: loaded firmware version 25.30.13.0 op_mode iwlmvm
[  519.893973] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
[  519.894459] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
[  519.895029] iwlwifi 0000:03:00.0: RF_KILL bit toggled to disable radio.
[  519.895397] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
[  519.909669] ieee80211 phy5: Selected rate control algorithm 'iwl-mvm-rs'
[  519.911584] iwlwifi 0000:03:00.0 wlp3s0: renamed from wlan0
[  519.926568] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready

# rfkill list
1: dell-wifi: Wireless LAN
        Soft blocked: no
        Hard blocked: no
2: dell-bluetooth: Bluetooth
        Soft blocked: no
        Hard blocked: no
3: hci0: Bluetooth
        Soft blocked: no
        Hard blocked: no
5: phy5: Wireless LAN
        Soft blocked: yes
        Hard blocked: yes
Comment 3 Emmanuel Grumbach 2015-11-01 17:32:19 UTC
You seem to have a race between driver load and RFKILL interrupt. What is triggering the RFKILL state?

What is the reproduction rate?
Comment 4 Shawn Starr 2015-11-01 18:28:27 UTC
It happens more than 50% I can power off laptop and on sometimes it loads it sometimes not.
Comment 5 Shawn Starr 2015-11-01 18:30:03 UTC
I don't know whats triggering the RFKILL state
Comment 6 Emmanuel Grumbach 2015-11-01 18:37:14 UTC
What happens when you load iwlwifi manually?
Comment 7 Shawn Starr 2015-11-01 18:43:49 UTC
It does not always work and throws : Failed to load firmware chunk! error.

If you use IRC any channel you're on?
Comment 8 Emmanuel Grumbach 2015-11-01 18:54:08 UTC
No irc :)

Please load iwlwifi with debug=0xffffffff

And send the full dmesg output when it fails. Thank you.
Comment 9 Shawn Starr 2015-11-01 19:05:03 UTC
Created attachment 191801 [details]
Kernel bootup (dmesg)

Initially it booted up with firmware loaded ok, removed driver, reloaded it, showed failure (this test was from a cold boot)
Comment 10 Emmanuel Grumbach 2015-11-01 21:40:02 UTC
So this clearly shows that the RFKILL interrupt is happening during the boot up of the device. This should be a problem, but apparently, it is...
I don't see why we don't get the interrupt from the DMA engine to load the firmware.
I'll dig a bit and try to reproduce.
Comment 11 Emmanuel Grumbach 2015-11-01 21:45:37 UTC
even worse... When we re-enable the interrupt... we do get the DMA Interrupt right away...

Please re-try with that patch:

diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c b/drivers/net/wireless/iwlwifi/pcie/trans.c
index c590fd8..7c104e1 100644
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -657,6 +657,7 @@ static int iwl_pcie_load_firmware_chunk(struct iwl_trans *trans, u32 dst_addr,
                                 trans_pcie->ucode_write_complete, 5 * HZ);
        if (!ret) {
                IWL_ERR(trans, "Failed to load firmware chunk!\n");
+               iwl_pcie_dump_csr(trans);
                return -ETIMEDOUT;
        }
Comment 12 Shawn Starr 2015-11-01 21:52:02 UTC
Building kernel now and will do same steps as before for debug.
Comment 13 Emmanuel Grumbach 2015-11-01 21:53:48 UTC
you can use our backport tree to save time:

https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/
Comment 14 Shawn Starr 2015-11-01 21:57:09 UTC
I have linux-4.3.0-0.rc7.git2.2.fc23.x86_64 mostly ccached so building isn't an issue, I can however build from that git branch next
Comment 15 Shawn Starr 2015-11-01 22:22:30 UTC
Created attachment 191811 [details]
Failed attempts to bring up wifi card (dmesg)

This failed to bring up card after repeated unload/reloading with debugging on.
Comment 16 Shawn Starr 2015-11-01 22:23:13 UTC
Emmanuel, I'll now try your git kernel branch once you let me know if you find something odd in my output.
Comment 17 Emmanuel Grumbach 2015-11-02 06:47:29 UTC
INT_MASK is clear. This explains why you don't get the interrupt.
The question is why is the interrupt clear.
I'll dig into the code a bit later.
Comment 18 Emmanuel Grumbach 2015-11-08 07:58:17 UTC
Created attachment 192381 [details]
add more prints

Can you please reproduce with the patch attached?

Unfortunately, I can't figure out what is happening for now and need more prints.
Thanks.
Comment 19 Shawn Starr 2015-11-17 01:47:54 UTC
I will get back to you on this tomorrow, with 4.4-rc0/4.4-rc1 (if sources hit Fedora).
Comment 20 Emmanuel Grumbach 2015-11-23 07:28:49 UTC
ping? :)
Comment 21 Shawn Starr 2015-11-23 07:54:44 UTC
Let me get back to you on this today (Monday). I also noticed systemd has a rfkill service.. unsure if that is causing some sort of issue. I will disable it too.
Comment 22 Emmanuel Grumbach 2015-11-23 08:00:51 UTC
systemd is surely causing some issue :)
You are getting an RFKILL interrupt at a very unexpected timing.
Comment 23 Shawn Starr 2015-11-25 09:44:51 UTC
I have masked systemd-rfkill, so far its not failing to load firmware, but I might be lucky. I think you are right that it is systemd doing it.

Let's keep this open for a bit more before closing.
Comment 24 Emmanuel Grumbach 2015-11-25 09:49:08 UTC
First, I'd like you to show me how to play with rfkill in systemd. I can learn by myself, but if you already know.. :)

Second, system is doing something that is weird, but it uncovered a race I'd like to further debug. So can you please provide the info I requested?
If you are tired with it, I won't insist.
Comment 25 Shawn Starr 2015-11-25 10:16:09 UTC
For sure I can, race conditions are ugly. I'll unmask systemd-rfkill again (at least I have a workaround).

I'll get you this output today.
Comment 26 Shawn Starr 2015-11-25 10:26:59 UTC
The systemd service just captures the rfkill state, NetworkManager manages radio. NM disables WIFI hardware radio when ethernet is connected (unless you force it on):

Since this laptop has a hardware switch, there's two wifi states, NetworkManager manages both:

Turning HW switch on driver says: iwlwifi 0000:03:00.0: RF_KILL bit toggled to enable radio.

[root@segfault spstarr]# nmcli r 
WIFI-HW  WIFI      WWAN-HW  WWAN     
enabled  disabled  enabled  disabled 

And off:

[root@segfault spstarr]# nmcli r 
WIFI-HW   WIFI      WWAN-HW  WWAN     
disabled  disabled  enabled  disabled 


HW switch turned off:

[root@segfault spstarr]# rfkill list all
2: dell-wifi: Wireless LAN
        Soft blocked: no
        Hard blocked: yes
3: dell-bluetooth: Bluetooth
        Soft blocked: no
        Hard blocked: yes
4: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: yes

On:

2: dell-wifi: Wireless LAN
        Soft blocked: no
        Hard blocked: no
3: dell-bluetooth: Bluetooth
        Soft blocked: no
        Hard blocked: no
4: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: no
9: hci0: Bluetooth
        Soft blocked: no
        Hard blocked: no

I also tried using rfkill.master_switch_mode=2 which forces all devices to be unblocked with rfkill.
Comment 27 Shawn Starr 2015-11-25 10:32:24 UTC
In addition to unmasking systemd-rfkill I'll remove the rfkill kernel option so we have the same state as before.
Comment 28 Shawn Starr 2015-11-25 21:07:12 UTC
It's not systemd-rfkill, it showed same error with it disabled on a reboot today...

Will get you debug info, there isn't any issues with this hardware revision? Maybe Dell will have to provide either a BIOS fix or me a new card? (if not kernel driver specific). If Windows shows same problem, then it would point to firmware, I can test that
Comment 29 Emmanuel Grumbach 2015-11-25 21:12:08 UTC
I doubt windows will have the problem. The driver flows are completely different and the OEM drivers as well.
Comment 30 Emmanuel Grumbach 2015-12-04 08:21:29 UTC
Please provide the info with the patch attached.
I will close the bug on Sunday otherwise.
Comment 31 Shawn Starr 2015-12-04 17:03:21 UTC
Will collect this for you tonight, sorry about the delay.
Comment 32 Shawn Starr 2015-12-06 06:55:45 UTC
Sunday is today.. we're in different timezones :) I'll get this today (its 1:55am)
Comment 33 Emmanuel Grumbach 2015-12-06 07:32:58 UTC
There is at least one tonight in 48 hours.

Please reopen with data.
Comment 34 Shawn Starr 2015-12-06 08:06:22 UTC
Your patch failed.. fixing it..

drivers/net/wireless/iwlwifi/pcie/trans.c: In function 'iwl_pcie_load_firmware_chunk':
drivers/net/wireless/iwlwifi/pcie/trans.c:653:3: error: implicit declaration of function 'dump_csr' [-Werror=implicit-function-declaration]
   dump_csr(trans);
Comment 35 Shawn Starr 2015-12-06 08:07:32 UTC
I think you meant iwl_pcie_dump_csr()?
Comment 36 Emmanuel Grumbach 2015-12-06 08:08:52 UTC
(In reply to Shawn Starr from comment #35)
> I think you meant iwl_pcie_dump_csr()?

yes - sorry.
Comment 37 Shawn Starr 2015-12-06 08:37:13 UTC
Building again using Fedora's kernel source... stock 4.4-rc3 panics on bootup not good.. (scsi_lib.c: 1096)
Comment 38 Emmanuel Grumbach 2015-12-06 08:39:23 UTC
not related to this bug.
Comment 39 Shawn Starr 2015-12-06 10:12:49 UTC
Created attachment 196481 [details]
good bootup

This is a good bootup
Comment 40 Shawn Starr 2015-12-06 10:30:57 UTC
Unable to get a bad bootup right now, oddly when iwlwifi is successful, the other oops doesn't happen. I suspect cascading bug. 

I'll build with https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/

Since I need a stable kernel to get you proper information now.
Comment 41 Shawn Starr 2015-12-06 10:40:55 UTC
Can't build your git module with my kernel...

  CC [M]  /root/Sources/backport-iwlwifi/net/mac80211/iface.o
/root/Sources/backport-iwlwifi/net/mac80211/iface.c: In function ‘ieee80211_if_add’:
/root/Sources/backport-iwlwifi/net/mac80211/iface.c:1802:806: error: expected ‘)’ before ‘;’ token

Which kernel is this git branch based on? 4.4 doesn't seem to be working with it?
Comment 42 Emmanuel Grumbach 2015-12-06 11:15:43 UTC
Right.
I am now pushing a backport update that may help.
backport typically don't work well with the bleeding edge of Linus's tree.
Comment 43 Shawn Starr 2015-12-06 22:35:22 UTC
Same compile error, im looking to see if i can find the changed function
Comment 44 Shawn Starr 2015-12-07 00:56:41 UTC
Ok I've got it working here's some steps for anyone on google :)

1) rpm -ivh /root/kernel-4.4.0-0.rc3.git3.2.fc24.src.rpm
2) rpmbuild -bp --target=$(uname -m) kernel.spec
3) cd $HOME/rpmbuild/BUILD/kernel-4.3.fc23/linux-4.4.0-0.rc3.git3.2.fc23.x86_64
4) vi Makefile --> add the proper matching kernel version '-0.rc3.git3.2.fc24.x86_64'
5) cp /boot/config-4.4.0-0.rc3.git3.2.fc24.x86_64 .config
6) cp /lib/modules/4.4.0-0.rc3.git3.2.fc24.x86_64/build/Module.symvers .
7) cd certs && cp ~/rpmbuild/SOURCES/x509.genkey .
8) cd .. && make certs
7) make modules_prepare
8) make M=drivers/net/wireless/iwlwifi modules
9) ./scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.x509 drivers/net/wireless/iwlwifi/iwlwifi.ko 
10) ./scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.x509 drivers/net/wireless/iwlwifi/mvm/iwlmvm.ko 
11) ./scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.x509  drivers/net/wireless/iwlwifi/dvm/iwldvm.ko
12) compress kernel modules with xz
13) Replace/move originals out of way
14 depmod -a

Reboot

I will have your output now once i get it to fail...
Comment 45 Emmanuel Grumbach 2015-12-07 06:57:18 UTC
you haven't replaced mac80211.ko / cfg80211.ko?
Something looks weird...
Comment 46 Shawn Starr 2015-12-07 08:17:03 UTC
Replaced mac80211.ko and the cfg80211/lib80211 modules, now we'll find out when I get this to fail (today?)
Comment 47 Emmanuel Grumbach 2015-12-16 16:31:43 UTC
So?
Comment 48 Shawn Starr 2015-12-17 01:57:31 UTC
For some reason, I cannot trigger this still with my frozen kernel, no update yet. Unless something we're adding to debug is somehow preventing this race condition?
Comment 49 Shawn Starr 2015-12-17 02:49:21 UTC
Of note, new iwlwifi firmware has been pushed to Fedora now, (-16 ucode).
Comment 50 Emmanuel Grumbach 2015-12-17 06:52:54 UTC
(In reply to Shawn Starr from comment #48)
> For some reason, I cannot trigger this still with my frozen kernel, no
> update yet. Unless something we're adding to debug is somehow preventing
> this race condition?

Don't think so.
Comment 51 Shawn Starr 2015-12-25 10:15:33 UTC
I have discovered something interesting in Dell BIOS, 

Wireless Radio Control - 'If Enabled, this feature will sense the connection of the system to a wired network and subsequently disable the selected wireless radios (WLAN/WWLAN). Upon disconnection from the wired network, the selected radios will be re-enabled'

I have this open as ON, and I do believe this would cause iwlwifi confusion if the BIOS cut radio *while* the ethernet connection port is connected still which it is.

I have also updated BIOS to A16 as of now (released December 2015).

Let's close this, I suspect iwlwifi has issues with the BIOS yanking the radio without being told?
Comment 52 Emmanuel Grumbach 2015-12-25 14:07:27 UTC
Iwlwifi should handle this nicely. But I don't have the data I need to debug...
Comment 53 Shawn Starr 2015-12-25 22:31:01 UTC
Unfortunately, it is not failing anymore no matter what I do, so, I can't give you any further info, either way. It's working so this is good.

Note You need to log in before you can comment on or make changes to this bug.