Bug 193141

Summary: ath10k broke suspend to RAM in 4.9
Product: Drivers Reporter: Enrico Tagliavini (enrico.tagliavini)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: NEW ---    
Severity: normal CC: bugz, joost, rootkit85
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.9.4, 4.9.5, 4.9.6, 4.9.7, 4.9.8, 4.9.9, 4.9.12, 4.10 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: messages when attempting the suspend to ram with ath10k_pci loaded
`journalctl -b0` after ath10k_pci firmware crash on suspend on Acer Aspire V13

Description Enrico Tagliavini 2017-01-22 16:55:11 UTC
Created attachment 252811 [details]
messages when attempting the suspend to ram with ath10k_pci loaded

This is on Fedora 24. Updated from kernel 4.8.16 to 4.9.4 and now I can't suspend to ram anymore. When I attempt to suspend to RAM screen becomes black, but after a few seconds it comes back instead of suspending. I think the fault is in the ath10k driver because if I modprobe -r ath10k_pci and the attempt to suspend everything works as expected. Having ath10k_pci loaded prevents the suspend. Also in dmesg I can see the driver complaining the firmware crashed (see attachment for messages snippet). After the suspend to ram failed the OS is in a zombie state. Programs do not close correctly, the usually stay in defunct state, but do not move from there. I usually use SUB sysrqs to reboot at this point.

Might be related: even when I simply modprobe -r ath10k_pci I get a firmware crash:

[   61.104169] ath10k_pci 0000:03:00.0: firmware crashed! (uuid 7f9700df-93ab-4d1b-8c6d-aea24b60d170)
[   61.104175] ath10k_pci 0000:03:00.0: qca6174 hw2.1 target 0x05010000 chip_id 0x003405ff sub 1a56:1525
[   61.104176] ath10k_pci 0000:03:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
[   61.104450] ath10k_pci 0000:03:00.0: firmware ver SW_RM.1.1.1-00157-QCARMSWPZ-1 api 5 features ignore-otp,no-4addr-pad crc32 10bf8e08
[   61.104596] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 ae2e275a
[   61.104598] ath10k_pci 0000:03:00.0: htt-ver 3.1 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[   61.105151] ath10k_pci 0000:03:00.0: firmware register dump:
[   61.105151] ath10k_pci 0000:03:00.0: [00]: 0x05010000 0x00000000 0x0092E4DC 0x365591B9
[   61.105151] ath10k_pci 0000:03:00.0: [04]: 0x0092E4DC 0x00060130 0x00000018 0x0041A760
[   61.105151] ath10k_pci 0000:03:00.0: [08]: 0x365591A5 0x00400000 0x00000000 0x000A5C88
[   61.105151] ath10k_pci 0000:03:00.0: [12]: 0x00000009 0x00000000 0x0096C09C 0x0096C0A7
[   61.105151] ath10k_pci 0000:03:00.0: [16]: 0x0096BDBC 0x009BFC42 0x00000000 0x009287BD
[   61.105151] ath10k_pci 0000:03:00.0: [20]: 0x4092E4DC 0x0041A710 0x00000000 0x0F000000
[   61.105151] ath10k_pci 0000:03:00.0: [24]: 0x809432A7 0x0041A770 0x0040D400 0xC092E4DC
[   61.105151] ath10k_pci 0000:03:00.0: [28]: 0x80942BC4 0x0041A790 0x365591A5 0x00400000
[   61.105151] ath10k_pci 0000:03:00.0: [32]: 0x80947BA7 0x0041A7B0 0x00404D88 0x00413980
[   61.105151] ath10k_pci 0000:03:00.0: [36]: 0x809BDECC 0x0041A7D0 0x00404D88 0x00413980
[   61.105151] ath10k_pci 0000:03:00.0: [40]: 0x8099638C 0x0041A7F0 0x00404D88 0x00000000
[   61.105151] ath10k_pci 0000:03:00.0: [44]: 0x80992076 0x0041A810 0x004084F0 0x00405244
[   61.105151] ath10k_pci 0000:03:00.0: [48]: 0x80996BD3 0x0041A830 0x004084F0 0x00000000
[   61.105151] ath10k_pci 0000:03:00.0: [52]: 0x800B4405 0x0041A850 0x00422318 0x00005002
[   61.105151] ath10k_pci 0000:03:00.0: [56]: 0x809A6C34 0x0041A8E0 0x0042932C 0x0042CA20
[   61.111014] ath10k_pci 0000:03:00.0: could not suspend target (-108)
[   61.178604] ath10k_pci 0000:03:00.0: cannot restart a device that hasn't been started

Version of the firmware from:
root@alientux ~ # rpm -qa | grep linux-firmware
linux-firmware-20160923-68.git42ad5367.fc24.noarch

Computer is an alienware 15 from 2015, BIOS A06.
Wireless card: 
root@alientux ~ # lspci | grep Atheros
03:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 20)
Comment 1 Enrico Tagliavini 2017-01-25 20:10:10 UTC
Same with kernel 4.9.5 and newer linux-firmware. Dmesg snippet to have some reference of the firmware version and hardware model

enrico@alientux ~ $ dmesg | grep ath
[   14.126074] ath10k_pci 0000:03:00.0: enabling device (0000 -> 0002)
[   14.126636] ath10k_pci 0000:03:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[   14.340810] ath10k_pci 0000:03:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:03:00.0.bin failed with error -2
[   14.340821] ath10k_pci 0000:03:00.0: Direct firmware load for ath10k/cal-pci-0000:03:00.0.bin failed with error -2
[   14.342532] ath10k_pci 0000:03:00.0: qca6174 hw2.1 target 0x05010000 chip_id 0x003405ff sub 1a56:1525
[   14.342534] ath10k_pci 0000:03:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
[   14.342901] ath10k_pci 0000:03:00.0: firmware ver SW_RM.1.1.1-00157-QCARMSWPZ-1 api 5 features ignore-otp,no-4addr-pad crc32 10bf8e08
[   14.409452] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 ae2e275a
[   15.637263] ath10k_pci 0000:03:00.0: htt-ver 3.1 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[   15.709438] ath: EEPROM regdomain: 0x6c
[   15.709439] ath: EEPROM indicates we should expect a direct regpair map
[   15.709440] ath: Country alpha2 being used: 00
[   15.709441] ath: Regpair used: 0x6c
[   15.717913] ath10k_pci 0000:03:00.0 wlp3s0: renamed from wlan0
enrico@alientux ~ $ rpm -qa | grep linux-firmware
linux-firmware-20161205-69.git91ddce49.fc24.noarch
Comment 2 Enrico Tagliavini 2017-02-07 16:30:22 UTC
With 4.9.6 I don't get the firmware crash anymore if I simply modprobe -r. However the crash still happens when I try to suspend to RAM, which still fails.
Comment 3 Enrico Tagliavini 2017-02-08 19:43:06 UTC
Scratch my last comment, I still get firmware crashes even when I just modprobe -r ath10k_pci

[ 3114.768363] wlp3s0: deauthenticating from dc:53:7c:b9:c1:88 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 3114.774009] ath10k_pci 0000:03:00.0: firmware crashed! (uuid 051b84a0-623f-4b0e-a2e8-80ece240ced8)
[ 3114.774009] ath10k_pci 0000:03:00.0: qca6174 hw2.1 target 0x05010000 chip_id 0x003405ff sub 1a56:1525
[ 3114.774009] ath10k_pci 0000:03:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
[ 3114.778175] ath10k_pci 0000:03:00.0: firmware ver SW_RM.1.1.1-00157-QCARMSWPZ-1 api 5 features ignore-otp,no-4addr-pad crc32 10bf8e08
[ 3114.778320] ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 ae2e275a
[ 3114.778321] ath10k_pci 0000:03:00.0: htt-ver 3.1 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[ 3114.780359] ath10k_pci 0000:03:00.0: firmware register dump:
[ 3114.780360] ath10k_pci 0000:03:00.0: [00]: 0x05010000 0x00000000 0x0092E4DC 0xBBC6E1B9
[ 3114.780362] ath10k_pci 0000:03:00.0: [04]: 0x0092E4DC 0x00060130 0x00000018 0x0041A760
[ 3114.780363] ath10k_pci 0000:03:00.0: [08]: 0xBBC6E1A5 0x00400000 0x00000000 0x000A5C88
[ 3114.780364] ath10k_pci 0000:03:00.0: [12]: 0x00000009 0x00000000 0x0096C09C 0x0096C0A7
[ 3114.780365] ath10k_pci 0000:03:00.0: [16]: 0x0096BDBC 0x009287BD 0x00000000 0x009287BD
[ 3114.780365] ath10k_pci 0000:03:00.0: [20]: 0x4092E4DC 0x0041A710 0x00000000 0x0F000000
[ 3114.780366] ath10k_pci 0000:03:00.0: [24]: 0x809432A7 0x0041A770 0x0040D400 0xC092E4DC
[ 3114.780367] ath10k_pci 0000:03:00.0: [28]: 0x80942BC4 0x0041A790 0xBBC6E1A5 0x00400000
[ 3114.780368] ath10k_pci 0000:03:00.0: [32]: 0x80947BA7 0x0041A7B0 0x00404D88 0x00413980
[ 3114.780369] ath10k_pci 0000:03:00.0: [36]: 0x809BDECC 0x0041A7D0 0x00404D88 0x00413980
[ 3114.780370] ath10k_pci 0000:03:00.0: [40]: 0x8099638C 0x0041A7F0 0x00404D88 0x00000000
[ 3114.780371] ath10k_pci 0000:03:00.0: [44]: 0x80992076 0x0041A810 0x004084F0 0x00405244
[ 3114.780372] ath10k_pci 0000:03:00.0: [48]: 0x80996BD3 0x0041A830 0x004084F0 0x00000000
[ 3114.780373] ath10k_pci 0000:03:00.0: [52]: 0x800B4405 0x0041A850 0x00422318 0x00005002
[ 3114.780374] ath10k_pci 0000:03:00.0: [56]: 0x809A6C34 0x0041A8E0 0x0042932C 0x0042CA44
[ 3114.780473] ath10k_pci 0000:03:00.0: could not suspend target (-108)
[ 3114.847671] ath10k_pci 0000:03:00.0: cannot restart a device that hasn't been started

Maybe not everytime.

Anyway, just to be clear, it always fails to suspend to RAM. I'll try with 4.9.7, should be in fedora stable for fedora 24 pretty soon.
Comment 4 Y. G. (theYinYeti) 2017-05-11 07:29:39 UTC
Same here. My laptop is an Acer Aspire V13 (aka. V3-371), with ath10k_pci/qca6174 for the WiFi.

Closing the lid triggers suspend-to-RAM, which makes the WiFi firmware crash.
Consequences: unresponsive WiFi hardware => frozen WPA-supplicant and frozen NetworkManager (not even kill -9 can kill the process); and power-off takes ages!

Please find attached the output of `journalctl -b0`; problem starts at line 1562.

Note however that the _first_ suspend seems to work, and the problem appears at the second suspend (look for regex “Lid [co]”):

```
firmware crashed! (uuid aa12624f-8faa-4113-a852-5a04505c824b)
qca6174 hw2.1 target 0x05010000 chip_id 0x003405ff sub 11ad:0804
kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
firmware ver SW_RM.1.1.1-00157-QCARMSWPZ-1 api 5 features ignore-otp,no-4addr-pad crc32 10bf8e08
board_file api 2 bmi_id N/A crc32 ae2e275a
htt-ver 3.1 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
firmware register dump:
[00]: 0x05010000 0x00000000 0x0092E4DC 0xD5B8816B
[04]: 0x0092E4DC 0x00060130 0x00000018 0x0041A760
[08]: 0xD5B88157 0x00400000 0x00000000 0x000A5C88
[12]: 0x00000009 0x00000000 0x0096C09C 0x0096C0A7
[16]: 0x0096BDBC 0x0099BA03 0x00000000 0x009287BD
[20]: 0x4092E4DC 0x0041A710 0x00000000 0x0F000000
[24]: 0x809432A7 0x0041A770 0x0040D400 0xC092E4DC
[28]: 0x80942BC4 0x0041A790 0xD5B88157 0x00400000
[32]: 0x80947BA7 0x0041A7B0 0x004050A8 0x00413980
[36]: 0x809BDECC 0x0041A7D0 0x004050A8 0x00413980
[40]: 0x8099638C 0x0041A7F0 0x004050A8 0x00000000
[44]: 0x80992076 0x0041A810 0x004084F0 0x00405244
[48]: 0x80996BD3 0x0041A830 0x004084F0 0x00000000
[52]: 0x800B4405 0x0041A850 0x00422318 0x00005002
[56]: 0x809A6C34 0x0041A8E0 0x0042932C 0x0042CA20
failed to submit keepalive on vdev 0: -108
failed to disable keepalive on vdev 0: -108
```

This log file is with Archlinux kernel 4.10. This laptop was previously using mainline kernel 4.5, with which suspend was working fine.
Comment 5 Y. G. (theYinYeti) 2017-05-11 07:35:39 UTC
Created attachment 256367 [details]
`journalctl -b0` after ath10k_pci firmware crash on suspend on Acer Aspire V13
Comment 6 Y. G. (theYinYeti) 2017-05-11 08:05:41 UTC
See also: https://bugzilla.kernel.org/show_bug.cgi?id=195487
Comment 7 Matteo Croce 2017-09-12 09:46:53 UTC
I can confirm this on 4.12 at least