Bug 197061

Summary: iwlwifi: 8260: ASSERT 0x307C - WIFILNX-1474
Product: Drivers Reporter: Francisco Cribari (cribari)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED UNREPRODUCIBLE    
Severity: high CC: cribari, goodmirek, jeremy, joss, luca
Priority: P1    
Hardware: x86-64   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=197039
Kernel Version: 4.14 RC1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: ifw-fw-error dump
dmesg_34ucode_4.14rc8_BT_intermittent.tar.gz
iwlwifi-8000C-31.ucode with usniffer enabled
iwlwifi-8000C-34.ucode with usniffer enabled

Description Francisco Cribari 2017-09-28 10:31:39 UTC
Created attachment 258625 [details]
ifw-fw-error dump

I run Manjaro Linux KDE (an Arch derivative with the KDE Plasma desktop environment) on a Samsung notebook that has an Intel 8260 wifi/bluetooth chipset. I have been experiencing crashes when the latest kernels (4.13.X and 4.14 RC1) are used together with the latest firmware (20170907.a61ac5c-1). Interestingly, I do not experience crashes when: (i) I use the latest firmware (20170907.a61ac5c-1) together with the LTS kernel (4.9.X), (ii) I use the previous firmware (20170622.7d2c913-1) with the most recent kernels (4.13.X and 4.14 RC1), (iii) The lastest firmware is used together with the latest kernels, but bluetooth coexistence is disabled (bt_coex_active=0). The crashes take place when I use the latest firmware together with the latest kernels and wifi and bluetooth are simultaneously used with bluetooth coexistence enabled.

I initially thought that the problem was bluetooth-related and filed the following bug report: 

https://bugzilla.kernel.org/show_bug.cgi?id=197039

It was made clear to me, however, that the problem is wifi-related (not bluetooth-related). I was asked to file a new bug, and that's what I am doing. Mr Emmanuel Grumbach provided me with firmware with debug enabled. After compiling the kernel with the appropriate flags and enabling bluetooth coexistence, I used it following the directions available at 

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging

(I created the udev rule described in the above instructions.) 

dmesg info after crash is available at 

https://pastebin.com/kSwpjMME

I am attaching ifw-fw-error dump files (zip). 

The hardware: 

[cribari@darwin5 ~]$ inxi -Fxzc0
System:    Host: darwin5 Kernel: 4.13.3-2-MANJARO x86_64 bits: 64 gcc: 7.2.0
           Desktop: KDE Plasma 5.10.5 (Qt 5.9.1) Distro: Manjaro Linux
Machine:   Device: laptop System: SAMSUNG product: 900X3L v: P05AFN serial: N/A
           Mobo: SAMSUNG model: NP900X3L-KW1BR v: SGL8776A06-C01-G001-S0001+10.0.10586 serial: N/A
           UEFI [Legacy]: American Megatrends v: P05AFN.035.160331.PS date: 03/31/2016
Battery    BAT1: charge: 28.0 Wh 92.0% condition: 30.4/30.0 Wh (101%) model: SAMSUNG SR Real status: Charging
CPU:       Dual core Intel Core i7-6500U (-HT-MCP-) arch: Skylake rev.3 cache: 4096 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 10372
           clock speeds: max: 3100 MHz 1: 2600 MHz 2: 2600 MHz 3: 2600 MHz 4: 2600 MHz
Graphics:  Card: Intel HD Graphics 520 bus-ID: 00:02.0
           Display Server: x11 (X.Org 1.19.3 ) driver: modesetting Resolution: 1920x1080@60.00hz
           OpenGL: renderer: Mesa DRI Intel HD Graphics 520 (Skylake GT2)
           version: 4.5 Mesa 17.2.1 Direct Render: Yes
Audio:     Card Intel Sunrise Point-LP HD Audio driver: snd_hda_intel bus-ID: 00:1f.3
           Sound: Advanced Linux Sound Architecture v: k4.13.3-2-MANJARO
Network:   Card-1: Intel Wireless 8260 driver: iwlwifi bus-ID: 01:00.0
           IF: wlp1s0 state: up mac: <filter>
           Card-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           driver: r8168 v: 8.044.02-NAPI port: e000 bus-ID: 02:00.0
           IF: enp2s0 state: down mac: <filter>
Drives:    HDD Total Size: 256.1GB (29.2% used)
           ID-1: /dev/sda model: LITEON_CV1 size: 256.1GB
Partition: ID-1: / size: 108G used: 62G (61%) fs: ext4 dev: /dev/sda3
           ID-2: swap-1 size: 9.44GB used: 0.00GB (0%) fs: swap dev: /dev/sda5
Sensors:   System Temperatures: cpu: 42.0C mobo: 41.0C
           Fan Speeds (in rpm): cpu: N/A
Info:      Processes: 220 Uptime: 10:05 Memory: 4053.4/7902.4MB Init: systemd Gcc sys: 7.2.0
           Client: Shell (bash 4.4.121) inxi: 2.3.39

For additional information, see the original bug report: 

https://bugzilla.kernel.org/show_bug.cgi?id=197039
Comment 1 Luca Coelho 2017-10-11 09:02:57 UTC
This is under investigation by our firmware team and they have found a possible culprit.  We'll come back to you soon with more info.
Comment 2 Jeremy Cline 2017-10-12 15:43:59 UTC
Possibly also seen in Fedora as https://bugzilla.redhat.com/show_bug.cgi?id=1501313
Comment 3 Emmanuel Grumbach 2017-10-12 16:31:46 UTC
(In reply to Jeremy Cline from comment #2)
> Possibly also seen in Fedora as
> https://bugzilla.redhat.com/show_bug.cgi?id=1501313

Not related.
Comment 4 GoodMirek 2017-10-30 20:24:09 UTC
The same bug affects also my system for over one month. I currently use the workaround bt_coex_active=0. The system is otherwise up-to-date, with Fedora 26 repo Test Updates. The issue happens with any 4.14 and 4.13 kernel, does not happen with 4.12 kernels.


inxi -Fxzc0
System:    Host: laptop Kernel: 4.14.0-0.rc6.git3.1.fc28.x86_64 x86_64 bits: 64 gcc: 7.2.1 Desktop: Gnome 3.24.3
           Distro: Fedora release 26 (Twenty Six)
Machine:   Device: laptop System: HP product: HP EliteBook 850 G4 serial: <filter>
           Mobo: HP model: 828C v: KBC Version 45.3C serial: <filter> UEFI: HP v: P78 Ver. 01.08 date: 10/17/2017
Battery    BAT0: charge: 48.0 Wh 100.0% condition: 48.0/48.0 Wh (100%)
           model: Hewlett-Packard Primary status: Full
CPU:       Dual core Intel Core i5-7200U (-HT-MCP-) arch: Kaby Lake rev.9 cache: 3072 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 10848
           clock speeds: max: 3100 MHz 1: 2700 MHz 2: 2700 MHz 3: 2700 MHz 4: 2700 MHz
Graphics:  Card: Intel HD Graphics 620 bus-ID: 00:02.0
           Display Server: x11 (X.org 1.19.3 ) driver: i915 Resolution: 1920x1080@60.05hz
           OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
           version: 4.5 Mesa 17.2.2 Direct Render: Yes
Audio:     Card Intel Sunrise Point-LP HD Audio driver: snd_hda_intel bus-ID: 00:1f.3
           Sound: Advanced Linux Sound Architecture v: k4.14.0-0.rc6.git3.1.fc28.x86_64
Network:   Card-1: Intel Ethernet Connection (4) I219-V driver: e1000e v: 3.2.6-k bus-ID: 00:1f.6
           IF: enp0s31f6 state: down mac: <filter>
           Card-2: Intel Wireless 8265 / 8275 driver: iwlwifi bus-ID: 02:00.0
           IF: wlp2s0 state: up mac: <filter>
Drives:    HDD Total Size: 256.1GB (39.1% used)
           ID-1: /dev/nvme0n1 model: SAMSUNG_MZVLW256HEHP size: 256.1GB temp: 32C
Partition: ID-1: / size: 120G used: 93G (78%) fs: xfs dev: /dev/dm-2
           ID-2: /boot size: 1018M used: 426M (42%) fs: xfs dev: /dev/nvme0n1p6
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   System Temperatures: cpu: 32.5C mobo: 0.0C
           Fan Speeds (in rpm): cpu: N/A
Info:      Processes: 317 Uptime: 11 min Memory: 3927.2/15806.8MB Init: systemd runlevel: 5 Gcc sys: 7.2.1
           Client: Shell (bash 4.4.121) inxi: 2.3.40


iwl firmware:
rpm -qf /lib/firmware/iwlwifi-8265-31.ucode 
iwl7260-firmware-25.30.13.0-77.fc26.noarch

I have just tried iwlwifi-8265-34.ucode, this one crashes immediately after boot for me.
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-8265-34.ucode
Comment 5 Emmanuel Grumbach 2017-10-30 20:57:14 UTC
@goodmirek, please share dmesg and open a new bug for the issues with 34.ucode.
Comment 6 GoodMirek 2017-10-30 21:29:50 UTC
34.ucode does work with kernel 4.13.10, but the bug reported in this bug report is not solved, disabled coexistence workaround is still needed.

34.ucode crashes with 4.14.0 rc6, opening another bug for that.
Comment 7 Emmanuel Grumbach 2017-10-30 22:00:45 UTC
(In reply to goodmirek from comment #6)
> 
> 34.ucode crashes with 4.14.0 rc6, opening another bug for that.

Please make sure to CC linuxwifi@intel.com
Comment 8 GoodMirek 2017-11-11 23:31:46 UTC
The issue is still present. I am not able to debug the driver as I lack the necessary skills. Is there anything else I can do to facilitate the fix?
Comment 9 Emmanuel Grumbach 2017-11-12 04:29:09 UTC
You mean, still present in 34.ucode?
Comment 10 GoodMirek 2017-11-13 12:26:50 UTC
Actually, I do not know. I just realized that kernel 4.13 does not load firmware 34.ucode.
I do not know how to build the firmware with the patch described in https://bugzilla.kernel.org/show_bug.cgi?id=197591 in order to be able to test whether 34.ucode fixes this issue.
Comment 11 Luca Coelho 2017-11-13 12:48:11 UTC
v4.13 loads FW -33.ucode.  v4.14 loads FW -34.ucode.

I'll comment more in bug 197591.
Comment 12 GoodMirek 2017-11-13 18:01:25 UTC
Disclaimer: This is first time I have compiled anything related to Linux kernel.

The system under test is the same as in comment 4, with a newer kernel and BT coexistence still switched off.
First, I have tried to compile the module from vendor provided source code (Fedora). I have succeeded and successfully reproduced the issue described in bug 197591. Only difference I have observed was the warning that kernel is tainted due to module iwlwifi.
Then, I have applied the patch provided in bug 197591, using `git am <patch>`
I have rebuilt iwlwifi with the patch.
As baseline, I used:
```
uname -r
4.14.0-0.rc8.git3.1.fc28.x86_64
```
This is actually not vanilla 4.14 RC8, as Fedora regularly adds number of patches on top of the vanilla kernel.

I have copied build artifacts to appropriate location:
```
iwlwifi.ko
iwlmvm.ko
iwldvm.ko
```
It was possible to load the `iwlwifi` module with `modprobe`, after unloading `iwlmvm` and `iwlwifi` first.
After reboot, kernel again complains about tainted iwlwifi module. Intel firmware in use is 34.
Wifi works.
Bluetooth does not work well.

dmesg shows failures with status 0x0c
After boot, the BT mouse does not connect, even though it is already paired and it connects if I boot kernel 4.13. It is possible to pair and connect the mouse after switching BT off and on from the GUI (Gnome settings), waiting until an error is logged into `dmesg`, what also causes BT to go off again and switching the BT on again.

Once BT coexistence has been switched on, the behavior did not change much. Observed difference is that mouse stops working few seconds after successful pairing. For those few seconds it was working, then it shows in Gnome settings it is connected, but does not work.
Also, bluetoothd segfaulted during troubleshooting and I had to restart it via systemd.
dmesg outputs captured during troubleshooting, showing the above mentioned errors, are attached.

If I boot the exactly same system with 4.13, it works without any issue.
Comment 13 GoodMirek 2017-11-13 18:10:45 UTC
Created attachment 260635 [details]
dmesg_34ucode_4.14rc8_BT_intermittent.tar.gz
Comment 14 GoodMirek 2017-11-13 18:42:25 UTC
Not sure whether the following is relevant.
My system suffers from GPE storm, happens both under 4.13 and 4.14.

find /sys/firmware/acpi/interrupts/* -exec sh -c 'echo -n {}" "; cat {}' \; 
<truncated>
/sys/firmware/acpi/interrupts/gpe6E   539882  EN     enabled      unmasked
<truncated>
/sys/firmware/acpi/interrupts/gpe_all   539916
/sys/firmware/acpi/interrupts/sci   539916
/sys/firmware/acpi/interrupts/sci_not       81

The offending ACPI interrupt was not disabled during tests described in the previous comment.

With BT coexistence off and kernel 4.13, BT mouse sometimes lags, while USB mouse connected at the same time does not lag. Modifying various settings:
- TX power
- frequency
- bandwidth, switching off/on extended channels (20/40MHz Ce/eC)
on AP does not help, though I have not had a chance to move WiFi to channel 14 to minimize interference with bluetooth.
It seems the BT mouse lag happens under heavy WiFi load. If I disable the ACPI interrupt causing GPE storm, the lag seemed to disappear. However, I tested that just for a few hours, so I am not 100% sure it helped.

With BT coexistence on, ucode 34 and kernel 4.14, the mouse is working just for few seconds (about 30-60), as described in the previous comment. During that time the mouse feels laggy even with the ACPI interrupt disabled, thus without GPE storm.
In this situation, if I switch BT coexistence off, reload the iwlwifi module and keep GPE storm off, the BT mouse is still laggy. With BT coexistence off the BT mouse does not disconnect; GPE storm does not influence the disconnect behavior.

My bluez:
bluez-5.47-2.fc27.x86_64

Wifi works well all the time.
Comment 15 GoodMirek 2017-11-13 20:05:02 UTC
I have successfully applied the patch from bug 197591 on kernel 4.14.0 (4.14.0-1.fc28.x86_64).
WiFi works with the patch and firmware 34.ucode.
BT works well until WiFi connects. As soon as WiFi connection is established the BT mouse stops working.
This time it is not possible to workaround the behavior via BT coexistence option. The behavior is same for both coexistence on and off.
Comment 16 Emmanuel Grumbach 2017-11-13 21:11:37 UTC
I can't see any occurrences of the firmware crash mentioned in the subject.

What you are describing looks more like https://bugzilla.kernel.org/show_bug.cgi?id=197807
Comment 17 Luca Coelho 2017-11-17 12:58:29 UTC
We need go get some firmware traces.  Can you please install one of the two firmwares I'll attach and follow the instructions from our wiki[1]?

You can choose either -31 or -34, depending on which version of the kernel you are using.

And please make sure you read and understand our privacy policy[2].

[1] https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging

[2] https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
Comment 18 Luca Coelho 2017-11-17 13:01:54 UTC
Created attachment 260699 [details]
iwlwifi-8000C-31.ucode with usniffer enabled
Comment 19 Luca Coelho 2017-11-17 13:02:26 UTC
Created attachment 260701 [details]
iwlwifi-8000C-34.ucode with usniffer enabled
Comment 20 Francisco Cribari 2017-11-17 16:56:08 UTC
I've been testing the iwlwifi-8000C-31.ucode firmware in attachment attachment 260699 [details] with BT coexistence on. After a few hours, bluetooth crashed: 

[cribari@darwin5 ~]$ systemctl status bluetooth
● bluetooth.service - Bluetooth service
   Loaded: loaded (/usr/lib/systemd/system/bluetooth.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2017-11-17 10:25:17 -03; 3h 25min ago
     Docs: man:bluetoothd(8)
 Main PID: 595 (bluetoothd)
   Status: "Running"
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/bluetooth.service
           └─595 /usr/lib/bluetooth/bluetoothd

nov 17 10:25:17 darwin5 systemd[1]: Starting Bluetooth service...
nov 17 10:25:17 darwin5 systemd[1]: Started Bluetooth service.
nov 17 10:25:17 darwin5 bluetoothd[595]: Bluetooth daemon 5.47
nov 17 10:25:17 darwin5 bluetoothd[595]: Starting SDP server
nov 17 10:25:17 darwin5 bluetoothd[595]: Bluetooth management interface 1.14 initialized
nov 17 10:25:26 darwin5 bluetoothd[595]: Endpoint registered: sender=:1.50 path=/MediaEndpoint/A2DPSource
nov 17 10:25:26 darwin5 bluetoothd[595]: Endpoint registered: sender=:1.50 path=/MediaEndpoint/A2DPSink
nov 17 11:28:31 darwin5 bluetoothd[595]: /org/bluez/hci0/dev_10_B7_F6_23_F0_B8/fd0: fd(42) ready
nov 17 13:48:18 darwin5 bluetoothd[595]: Can't get HIDP connection info
nov 17 13:48:24 darwin5 bluetoothd[595]: connect error: Host is down (112)

From dmesg: 

[11997.944947] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=13407 end=13408) time 193 us, min 1073, max 1079, scanline start 1072, end 1085

dmesg dump available at: 

https://pastebin.com/4JfLbpaV

The kernel I am using: 

[cribari@darwin5 ~]$ uname -a
Linux darwin5 4.13.12-1-MANJARO #1 SMP PREEMPT Wed Nov 8 10:52:53 UTC 2017 x86_64 GNU/Linux
Comment 21 Luca Coelho 2017-11-17 17:06:20 UTC
Can you collect a FW dump just when BT crashes? You can follow the instructions in the wiki I linked earlier.  You can manually trigger the dump with the fw_dbg_collect instructions we have there.

I don't know if the Bluetooth crash has anything to do with the original bug, but we may see something in the wifi FW dump.

You didn't see the 0x307C assert, did you?
Comment 22 Francisco Cribari 2017-11-18 15:07:44 UTC
@Luca Coelho No, I did not see the 0x307C assert. The good news is that I have been unable to replicate the crash. It hasn't happen again (so far).
Comment 23 Francisco Cribari 2017-11-19 17:20:57 UTC
I experienced a wifi/bluetooth crash today while using the iwlwifi-8000C-31.ucode firmware with usniffer enabled. The FW dump is available at 

https://www.dropbox.com/s/gq75a3wfq1igk51/iwl-fw-error_2017-11-19_12-11-24.dump?dl=0
Comment 24 Emmanuel Grumbach 2017-11-19 19:26:44 UTC
Did you have the assert 0x307C?

If not, let's close the bug and track this problem elsewhere.
Comment 25 Francisco Cribari 2017-11-19 19:54:33 UTC
@Emmanuel Grumbach I did not see didn't see the 0x307C assert.
Comment 26 Emmanuel Grumbach 2017-11-19 20:11:44 UTC
Then I'll close this bug.
There is 197147 about Coex issues.