Bug 197061
Summary: | iwlwifi: 8260: ASSERT 0x307C - WIFILNX-1474 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Francisco Cribari (cribari) |
Component: | network-wireless | Assignee: | DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | high | CC: | cribari, goodmirek, jeremy, joss, luca |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
See Also: | https://bugzilla.kernel.org/show_bug.cgi?id=197039 | ||
Kernel Version: | 4.14 RC1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
ifw-fw-error dump
dmesg_34ucode_4.14rc8_BT_intermittent.tar.gz iwlwifi-8000C-31.ucode with usniffer enabled iwlwifi-8000C-34.ucode with usniffer enabled |
Description
Francisco Cribari
2017-09-28 10:31:39 UTC
This is under investigation by our firmware team and they have found a possible culprit. We'll come back to you soon with more info. Possibly also seen in Fedora as https://bugzilla.redhat.com/show_bug.cgi?id=1501313 (In reply to Jeremy Cline from comment #2) > Possibly also seen in Fedora as > https://bugzilla.redhat.com/show_bug.cgi?id=1501313 Not related. The same bug affects also my system for over one month. I currently use the workaround bt_coex_active=0. The system is otherwise up-to-date, with Fedora 26 repo Test Updates. The issue happens with any 4.14 and 4.13 kernel, does not happen with 4.12 kernels. inxi -Fxzc0 System: Host: laptop Kernel: 4.14.0-0.rc6.git3.1.fc28.x86_64 x86_64 bits: 64 gcc: 7.2.1 Desktop: Gnome 3.24.3 Distro: Fedora release 26 (Twenty Six) Machine: Device: laptop System: HP product: HP EliteBook 850 G4 serial: <filter> Mobo: HP model: 828C v: KBC Version 45.3C serial: <filter> UEFI: HP v: P78 Ver. 01.08 date: 10/17/2017 Battery BAT0: charge: 48.0 Wh 100.0% condition: 48.0/48.0 Wh (100%) model: Hewlett-Packard Primary status: Full CPU: Dual core Intel Core i5-7200U (-HT-MCP-) arch: Kaby Lake rev.9 cache: 3072 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 10848 clock speeds: max: 3100 MHz 1: 2700 MHz 2: 2700 MHz 3: 2700 MHz 4: 2700 MHz Graphics: Card: Intel HD Graphics 620 bus-ID: 00:02.0 Display Server: x11 (X.org 1.19.3 ) driver: i915 Resolution: 1920x1080@60.05hz OpenGL: renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2) version: 4.5 Mesa 17.2.2 Direct Render: Yes Audio: Card Intel Sunrise Point-LP HD Audio driver: snd_hda_intel bus-ID: 00:1f.3 Sound: Advanced Linux Sound Architecture v: k4.14.0-0.rc6.git3.1.fc28.x86_64 Network: Card-1: Intel Ethernet Connection (4) I219-V driver: e1000e v: 3.2.6-k bus-ID: 00:1f.6 IF: enp0s31f6 state: down mac: <filter> Card-2: Intel Wireless 8265 / 8275 driver: iwlwifi bus-ID: 02:00.0 IF: wlp2s0 state: up mac: <filter> Drives: HDD Total Size: 256.1GB (39.1% used) ID-1: /dev/nvme0n1 model: SAMSUNG_MZVLW256HEHP size: 256.1GB temp: 32C Partition: ID-1: / size: 120G used: 93G (78%) fs: xfs dev: /dev/dm-2 ID-2: /boot size: 1018M used: 426M (42%) fs: xfs dev: /dev/nvme0n1p6 RAID: No RAID devices: /proc/mdstat, md_mod kernel module present Sensors: System Temperatures: cpu: 32.5C mobo: 0.0C Fan Speeds (in rpm): cpu: N/A Info: Processes: 317 Uptime: 11 min Memory: 3927.2/15806.8MB Init: systemd runlevel: 5 Gcc sys: 7.2.1 Client: Shell (bash 4.4.121) inxi: 2.3.40 iwl firmware: rpm -qf /lib/firmware/iwlwifi-8265-31.ucode iwl7260-firmware-25.30.13.0-77.fc26.noarch I have just tried iwlwifi-8265-34.ucode, this one crashes immediately after boot for me. https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-8265-34.ucode @goodmirek, please share dmesg and open a new bug for the issues with 34.ucode. 34.ucode does work with kernel 4.13.10, but the bug reported in this bug report is not solved, disabled coexistence workaround is still needed. 34.ucode crashes with 4.14.0 rc6, opening another bug for that. (In reply to goodmirek from comment #6) > > 34.ucode crashes with 4.14.0 rc6, opening another bug for that. Please make sure to CC linuxwifi@intel.com The issue is still present. I am not able to debug the driver as I lack the necessary skills. Is there anything else I can do to facilitate the fix? You mean, still present in 34.ucode? Actually, I do not know. I just realized that kernel 4.13 does not load firmware 34.ucode. I do not know how to build the firmware with the patch described in https://bugzilla.kernel.org/show_bug.cgi?id=197591 in order to be able to test whether 34.ucode fixes this issue. v4.13 loads FW -33.ucode. v4.14 loads FW -34.ucode. I'll comment more in bug 197591. Disclaimer: This is first time I have compiled anything related to Linux kernel. The system under test is the same as in comment 4, with a newer kernel and BT coexistence still switched off. First, I have tried to compile the module from vendor provided source code (Fedora). I have succeeded and successfully reproduced the issue described in bug 197591. Only difference I have observed was the warning that kernel is tainted due to module iwlwifi. Then, I have applied the patch provided in bug 197591, using `git am <patch>` I have rebuilt iwlwifi with the patch. As baseline, I used: ``` uname -r 4.14.0-0.rc8.git3.1.fc28.x86_64 ``` This is actually not vanilla 4.14 RC8, as Fedora regularly adds number of patches on top of the vanilla kernel. I have copied build artifacts to appropriate location: ``` iwlwifi.ko iwlmvm.ko iwldvm.ko ``` It was possible to load the `iwlwifi` module with `modprobe`, after unloading `iwlmvm` and `iwlwifi` first. After reboot, kernel again complains about tainted iwlwifi module. Intel firmware in use is 34. Wifi works. Bluetooth does not work well. dmesg shows failures with status 0x0c After boot, the BT mouse does not connect, even though it is already paired and it connects if I boot kernel 4.13. It is possible to pair and connect the mouse after switching BT off and on from the GUI (Gnome settings), waiting until an error is logged into `dmesg`, what also causes BT to go off again and switching the BT on again. Once BT coexistence has been switched on, the behavior did not change much. Observed difference is that mouse stops working few seconds after successful pairing. For those few seconds it was working, then it shows in Gnome settings it is connected, but does not work. Also, bluetoothd segfaulted during troubleshooting and I had to restart it via systemd. dmesg outputs captured during troubleshooting, showing the above mentioned errors, are attached. If I boot the exactly same system with 4.13, it works without any issue. Created attachment 260635 [details]
dmesg_34ucode_4.14rc8_BT_intermittent.tar.gz
Not sure whether the following is relevant. My system suffers from GPE storm, happens both under 4.13 and 4.14. find /sys/firmware/acpi/interrupts/* -exec sh -c 'echo -n {}" "; cat {}' \; <truncated> /sys/firmware/acpi/interrupts/gpe6E 539882 EN enabled unmasked <truncated> /sys/firmware/acpi/interrupts/gpe_all 539916 /sys/firmware/acpi/interrupts/sci 539916 /sys/firmware/acpi/interrupts/sci_not 81 The offending ACPI interrupt was not disabled during tests described in the previous comment. With BT coexistence off and kernel 4.13, BT mouse sometimes lags, while USB mouse connected at the same time does not lag. Modifying various settings: - TX power - frequency - bandwidth, switching off/on extended channels (20/40MHz Ce/eC) on AP does not help, though I have not had a chance to move WiFi to channel 14 to minimize interference with bluetooth. It seems the BT mouse lag happens under heavy WiFi load. If I disable the ACPI interrupt causing GPE storm, the lag seemed to disappear. However, I tested that just for a few hours, so I am not 100% sure it helped. With BT coexistence on, ucode 34 and kernel 4.14, the mouse is working just for few seconds (about 30-60), as described in the previous comment. During that time the mouse feels laggy even with the ACPI interrupt disabled, thus without GPE storm. In this situation, if I switch BT coexistence off, reload the iwlwifi module and keep GPE storm off, the BT mouse is still laggy. With BT coexistence off the BT mouse does not disconnect; GPE storm does not influence the disconnect behavior. My bluez: bluez-5.47-2.fc27.x86_64 Wifi works well all the time. I have successfully applied the patch from bug 197591 on kernel 4.14.0 (4.14.0-1.fc28.x86_64). WiFi works with the patch and firmware 34.ucode. BT works well until WiFi connects. As soon as WiFi connection is established the BT mouse stops working. This time it is not possible to workaround the behavior via BT coexistence option. The behavior is same for both coexistence on and off. I can't see any occurrences of the firmware crash mentioned in the subject. What you are describing looks more like https://bugzilla.kernel.org/show_bug.cgi?id=197807 We need go get some firmware traces. Can you please install one of the two firmwares I'll attach and follow the instructions from our wiki[1]? You can choose either -31 or -34, depending on which version of the kernel you are using. And please make sure you read and understand our privacy policy[2]. [1] https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging [2] https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects Created attachment 260699 [details]
iwlwifi-8000C-31.ucode with usniffer enabled
Created attachment 260701 [details]
iwlwifi-8000C-34.ucode with usniffer enabled
I've been testing the iwlwifi-8000C-31.ucode firmware in attachment attachment 260699 [details] with BT coexistence on. After a few hours, bluetooth crashed: [cribari@darwin5 ~]$ systemctl status bluetooth ● bluetooth.service - Bluetooth service Loaded: loaded (/usr/lib/systemd/system/bluetooth.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2017-11-17 10:25:17 -03; 3h 25min ago Docs: man:bluetoothd(8) Main PID: 595 (bluetoothd) Status: "Running" Tasks: 1 (limit: 4915) CGroup: /system.slice/bluetooth.service └─595 /usr/lib/bluetooth/bluetoothd nov 17 10:25:17 darwin5 systemd[1]: Starting Bluetooth service... nov 17 10:25:17 darwin5 systemd[1]: Started Bluetooth service. nov 17 10:25:17 darwin5 bluetoothd[595]: Bluetooth daemon 5.47 nov 17 10:25:17 darwin5 bluetoothd[595]: Starting SDP server nov 17 10:25:17 darwin5 bluetoothd[595]: Bluetooth management interface 1.14 initialized nov 17 10:25:26 darwin5 bluetoothd[595]: Endpoint registered: sender=:1.50 path=/MediaEndpoint/A2DPSource nov 17 10:25:26 darwin5 bluetoothd[595]: Endpoint registered: sender=:1.50 path=/MediaEndpoint/A2DPSink nov 17 11:28:31 darwin5 bluetoothd[595]: /org/bluez/hci0/dev_10_B7_F6_23_F0_B8/fd0: fd(42) ready nov 17 13:48:18 darwin5 bluetoothd[595]: Can't get HIDP connection info nov 17 13:48:24 darwin5 bluetoothd[595]: connect error: Host is down (112) From dmesg: [11997.944947] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=13407 end=13408) time 193 us, min 1073, max 1079, scanline start 1072, end 1085 dmesg dump available at: https://pastebin.com/4JfLbpaV The kernel I am using: [cribari@darwin5 ~]$ uname -a Linux darwin5 4.13.12-1-MANJARO #1 SMP PREEMPT Wed Nov 8 10:52:53 UTC 2017 x86_64 GNU/Linux Can you collect a FW dump just when BT crashes? You can follow the instructions in the wiki I linked earlier. You can manually trigger the dump with the fw_dbg_collect instructions we have there. I don't know if the Bluetooth crash has anything to do with the original bug, but we may see something in the wifi FW dump. You didn't see the 0x307C assert, did you? @Luca Coelho No, I did not see the 0x307C assert. The good news is that I have been unable to replicate the crash. It hasn't happen again (so far). I experienced a wifi/bluetooth crash today while using the iwlwifi-8000C-31.ucode firmware with usniffer enabled. The FW dump is available at https://www.dropbox.com/s/gq75a3wfq1igk51/iwl-fw-error_2017-11-19_12-11-24.dump?dl=0 Did you have the assert 0x307C? If not, let's close the bug and track this problem elsewhere. @Emmanuel Grumbach I did not see didn't see the 0x307C assert. Then I'll close this bug. There is 197147 about Coex issues. |