Bug 114501 - iwlwifi: 7265D: WDG_NMI in lmpmBtCoexHalIsBtIdle() - WIFILNX-10
Summary: iwlwifi: 7265D: WDG_NMI in lmpmBtCoexHalIsBtIdle() - WIFILNX-10
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
: 179901 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-03-13 14:45 UTC by Michał Gawron
Modified: 2018-06-17 05:16 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.4.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg log (98.42 KB, application/octet-stream)
2016-03-13 14:45 UTC, Michał Gawron
Details
Kernel log after startup (59.64 KB, application/octet-stream)
2016-03-14 16:31 UTC, Michał Gawron
Details
Full dmesg (364.64 KB, application/octet-stream)
2016-03-16 17:03 UTC, Michał Gawron
Details
latest BT firmware - D0 Patch Version: 35 (37.39 KB, application/octet-stream)
2016-03-23 20:27 UTC, Emmanuel Grumbach
Details
Full dmesg 2016-04-14 (874.89 KB, text/plain)
2016-04-14 16:08 UTC, Michał Gawron
Details
Core14 - 7260 FW with a potential fix (1.00 MB, application/octet-stream)
2016-11-29 13:06 UTC, Emmanuel Grumbach
Details
New 7265D firmware with a potential fix. (1.32 MB, application/octet-stream)
2016-11-29 13:10 UTC, Luca Coelho
Details
"journalctl -b 0 | grep iwl" after resuming (90.97 KB, text/plain)
2016-12-11 09:55 UTC, Niklas Sombert
Details
7260 FW with potential fix (1.00 MB, application/octet-stream)
2017-02-12 15:48 UTC, Emmanuel Grumbach
Details
7265D FW with potential fix (1.32 MB, application/octet-stream)
2017-02-12 15:48 UTC, Emmanuel Grumbach
Details

Description Michał Gawron 2016-03-13 14:45:02 UTC
Created attachment 208831 [details]
dmesg log

After several hours Wifi stops functioning. ifconfig -a doesn't show wireless interface anymore. Restart is needed to have working wifi again. (I tried unloading iwl-* modules but they were "in use").

I've tested that it's stable on following packages:
  linux-firmware-20150904.6ebf5d5-1
  linux-4.2.5

Stopped working when I upgraded my system to the following versions:
  linux-firmware-20160113.40e9ae8-1
  linux-4.4.5-1

The bug may be somewhere in between.
The distribution I use is Arch Linux.

lspci:
00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09)
00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09)
00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-V (rev 03)
00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #2 (rev e3)
00:1c.1 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #3 (rev e3)
00:1d.0 USB controller: Intel Corporation Wildcat Point-LP USB EHCI Controller (rev 03)
00:1f.0 ISA bridge: Intel Corporation Wildcat Point-LP LPC Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] (rev 03)
00:1f.3 SMBus: Intel Corporation Wildcat Point-LP SMBus Controller (rev 03)
00:1f.6 Signal processing controller: Intel Corporation Wildcat Point-LP Thermal Management Controller (rev 03)
04:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)

dmesg log attached as file.
Comment 1 Emmanuel Grumbach 2016-03-14 15:30:30 UTC
Please attach kernel log. The dmesg buffer was filled with warnings and I can't see the start of the mess.
Thanks
Comment 2 Michał Gawron 2016-03-14 16:31:01 UTC
Created attachment 209121 [details]
Kernel log after startup

Adding kernel log from after reboot.
Comment 3 Emmanuel Grumbach 2016-03-14 17:58:37 UTC
yes, but I'd like to see the beginning of the errors.
The logs you have here are either completely clean, either start when the system is already messed up.
Comment 4 Michał Gawron 2016-03-14 18:01:09 UTC
Ok. But the first dmesg is right after Wifi stopped working. ;-) I'll see if kernel log is dumped to disk somewhere…
Comment 5 Michał Gawron 2016-03-16 17:03:41 UTC
Created attachment 209471 [details]
Full dmesg
Comment 6 Emmanuel Grumbach 2016-03-16 18:20:24 UTC
Thanks. I'll take a look. Do you use Bluetooth?
Comment 7 Emmanuel Grumbach 2016-03-16 18:21:35 UTC
Thanks. I'll take a look. Do you use Bluetooth?
Comment 8 Michał Gawron 2016-03-16 20:17:34 UTC
Hmm. Looks like I have bluetooth disabled. And I don't use it with any devices.
Comment 9 Emmanuel Grumbach 2016-03-16 21:41:07 UTC
Thanks - this is good data. I will open a ticket on the firmware team.
Would you be able to test Core18 firmware with a backport based driver?

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/core_release#core_release

Note: you have a 7265D device.

You can also check newer firmware without changing the driver by testing firmware versions in the table on the same wiki page.
4.4 supports -19.ucode.
Comment 10 Emmanuel Grumbach 2016-03-17 07:54:03 UTC
I also noticed that the upgrade you made changed a lot of components at the same time: WiFi driver, WiFi firmware and Bluetooth firmware.

Can you try to remove iwlwifi-7265D-16.ucode from /lib/firmware (or rename it) and try to see if the problem goes away? You'll use an older firmware then.
Comment 11 Michał Gawron 2016-03-18 06:38:59 UTC
I'm going to test by removing iwlwifi-7265D-16.ucode first.
Comment 12 Michał Gawron 2016-03-19 16:08:43 UTC
Removing -16 didn't help. I'll test the newest firmware and kernel modules.
Comment 13 Emmanuel Grumbach 2016-03-19 16:59:25 UTC
It is probably due to the Bluetooth firmware update then. You already have the latest firmware. You need to roll back Bluetooth firmware.
Comment 14 Michał Gawron 2016-03-23 17:47:48 UTC
I moved /usr/lib/firmware/intel/ibt-hw-37.8.10-fw-1.10.3.11.e.bseq from that directory and it seems it helped. After a couple of days I didn't have any more problems with wifi.
Comment 15 Emmanuel Grumbach 2016-03-23 17:56:58 UTC
Thanks. I already involved the BT team. Thank you.
Comment 16 Emmanuel Grumbach 2016-03-23 20:27:33 UTC
Created attachment 210471 [details]
latest BT firmware - D0 Patch Version: 35

Please try this firmware - thank you.
Comment 17 Emmanuel Grumbach 2016-03-27 08:28:49 UTC
Did you get a chance to try?
Comment 18 Michał Gawron 2016-04-13 18:00:51 UTC
Sorry for the delay. I'll test the firmware now.
Comment 19 Michał Gawron 2016-04-14 16:07:13 UTC
I just got yet another stacktrace and not working wifi/BT. Attaching new kernel log.
Comment 20 Michał Gawron 2016-04-14 16:08:02 UTC
Created attachment 212701 [details]
Full dmesg 2016-04-14
Comment 21 Emmanuel Grumbach 2016-04-15 12:49:23 UTC
Thanks. I'll inform the BT team.
Comment 22 Emmanuel Grumbach 2016-10-22 16:46:48 UTC
*** Bug 179901 has been marked as a duplicate of this bug. ***
Comment 23 Emmanuel Grumbach 2016-11-29 13:06:58 UTC
Created attachment 246251 [details]
Core14 - 7260 FW with a potential fix

Hello,

our firmware team would like to test the firmware attached. Can you please test and report back?

Please remember to re-enable the BT firmware if you had disabled it.

Thanks!
Comment 26 Emmanuel Grumbach 2016-12-06 04:58:24 UTC
Any news?
Comment 27 Niklas Sombert 2016-12-09 15:26:42 UTC
The new firmware you have uploaded says 7260. Will it work with a 7265D?
If so, then I could test this next week.
Comment 28 Luca Coelho 2016-12-09 15:44:52 UTC
Comment on attachment 246261 [details]
New 7265D firmware with a potential fix.

Ah, it's a mistake.  The 7260 FW will not work with 7265D.  I had uploaded the one for 7265D at the same time as Emmanuel uploaded the one for 7260, so I hid my attachment in favor of his.  I'm making it visible again so you can try it.
Comment 29 Niklas Sombert 2016-12-09 17:14:15 UTC
So, how do I confirm whether this firmware will work?

1. Re-installing and booting the latest kernel that my distribution ships and reinstalling the latest firmware files, so that I confirm that the problem (still) happens.
2. Put your iwlwifi-7265D-17.ucode in /lib/firmware/ (and overwrite the existing one).
3. Delete /lib/firmware/iwlwifi-7265D-22.ucode and /lib/firmware/iwlwifi-7265D-21.ucode.
4. Reboot.
5. Try suspend and resume.

Is this the correct way to go?
Comment 30 Emmanuel Grumbach 2016-12-10 17:28:26 UTC
Yes - this looks correct.
Comment 31 Niklas Sombert 2016-12-11 09:55:52 UTC
Created attachment 247411 [details]
"journalctl -b 0 | grep iwl" after resuming
Comment 32 Niklas Sombert 2016-12-11 09:58:43 UTC
This didn't work for me.

The only remaining workarounds are deleting /lib/firmware/intel/ibt-hw-37.8.10-fw-1.10.3.11.e.bseq or reverting http://kernel.ubuntu.com/git/ubuntu/ubuntu-wily.git/commit/?id=8d193ca26cc28019e760b77830295a0c349d90dc.
Comment 33 Luca Coelho 2017-01-11 12:01:52 UTC
We will try to check why reverting this patch actually helps.  Thanks for informing us of that.
Comment 34 Luca Coelho 2017-01-23 09:24:01 UTC
Sorry, this is still pending on our side.  I'm trying to push it forward.
Comment 35 Emmanuel Grumbach 2017-02-12 15:48:04 UTC
Created attachment 254707 [details]
7260 FW with potential fix

Hi,

our firmware team has provided another firmware with a potential fix. Can you give it a try?

thanks.
Comment 36 Emmanuel Grumbach 2017-02-12 15:48:54 UTC
Created attachment 254709 [details]
7265D FW with potential fix
Comment 37 Niklas Sombert 2017-02-15 10:26:30 UTC
I'm sorry, I wasn't able to reproduce this.

I tried both with my current setup and with the same software I used when I first encountered the bug.
Comment 38 Emmanuel Grumbach 2017-02-15 10:46:32 UTC
are you sure you haven't removed the BT firmware?
Comment 39 Niklas Sombert 2017-02-16 14:04:21 UTC
I've tried it both with my current setup (WiFi: 22.361476.0; Bluetooth: ibt-hw-37.8.10-fw-1.10.3.11.e.bseq; kernel: 4.8.0-37) and with a live USB stick which is close to the software I ran last year (WiFi: 16.242414.0; Bluetooth: ibt-hw-37.8.10-fw-1.10.3.11.e.bseq; kernel: 4.4.0-31).

I really don't know why, but it seems to work now.
Comment 40 Emmanuel Grumbach 2017-02-16 14:09:20 UTC
This is quite possible. The bug seems to be a race between WiFi and BT loading their firmware at the exact same time...

Anyone else before I close the bug?
Comment 41 Emmanuel Grumbach 2017-02-19 07:17:02 UTC
Ok - I am closing the bug here.
I'll be happy to hear about other users if they can reproduce on the original firmware and if the new firmware fixed their issue.

Note You need to log in before you can comment on or make changes to this bug.