Bug 217239 - ath11k: WCN6855: firmware -3.6510.23 (and later) breaks suspend on certain setups
Summary: ath11k: WCN6855: firmware -3.6510.23 (and later) breaks suspend on certain se...
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: AMD Linux
: P1 blocking
Assignee: Kalle Valo
URL:
Keywords:
: 218209 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-03-24 09:55 UTC by Vlad
Modified: 2024-02-09 16:07 UTC (History)
20 users (show)

See Also:
Kernel Version: 6.5.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
log from amd_s2idle.py script (125.76 KB, text/plain)
2023-05-04 10:56 UTC, Vlad
Details

Description Vlad 2023-03-24 09:55:09 UTC
Driver prevents system from entering suspend and wakes up immediately. Disabling with modprobe -r ath11k_pci before suspend helps. It worked before linux-firmware got updated to version 20230310-148. Fails every time.

Distro: Fedora 37, Fedora 38.
Laptop: HONOR MagicBook 15
APU: Ryzen 5500U
Wifi adapter: Qualcomm QCNFA765 Wireless Network Adapter

Logs.

uname -a:
Linux fedora 6.2.7-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Mar 17 16:02:49 UTC 2023 x86_64 GNU/Linux

lspci -mnn:
00:00.0 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir/Cezanne Root Complex [1630]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
00:00.2 "IOMMU [0806]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir/Cezanne IOMMU [1631]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
00:01.0 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir PCIe Dummy Host Bridge [1632]" -p00 "" ""
00:02.0 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir PCIe Dummy Host Bridge [1632]" -p00 "" ""
00:02.2 "PCI bridge [0604]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir/Cezanne PCIe GPP Bridge [1634]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
00:02.4 "PCI bridge [0604]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir/Cezanne PCIe GPP Bridge [1634]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
00:08.0 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir PCIe Dummy Host Bridge [1632]" -p00 "" ""
00:08.1 "PCI bridge [0604]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Internal PCIe GPP Bridge to Bus [1635]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
00:14.0 "SMBus [0c05]" "Advanced Micro Devices, Inc. [AMD] [1022]" "FCH SMBus Controller [790b]" -r51 -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
00:14.3 "ISA bridge [0601]" "Advanced Micro Devices, Inc. [AMD] [1022]" "FCH LPC Bridge [790e]" -r51 -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
00:18.0 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 0 [1448]" -p00 "" ""
00:18.1 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 1 [1449]" -p00 "" ""
00:18.2 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 2 [144a]" -p00 "" ""
00:18.3 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 3 [144b]" -p00 "" ""
00:18.4 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 4 [144c]" -p00 "" ""
00:18.5 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 5 [144d]" -p00 "" ""
00:18.6 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 6 [144e]" -p00 "" ""
00:18.7 "Host bridge [0600]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir Device 24: Function 7 [144f]" -p00 "" ""
01:00.0 "Network controller [0280]" "Qualcomm Technologies, Inc [17cb]" "QCNFA765 Wireless Network Adapter [1103]" -r01 -p00 "Foxconn International, Inc. [105b]" "Device [e0ca]"
02:00.0 "Non-Volatile memory controller [0108]" "Sandisk Corp [15b7]" "WD Blue SN550 NVMe SSD [5009]" -r01 -p02 "Sandisk Corp [15b7]" "WD Blue SN550 NVMe SSD [5009]"
03:00.0 "VGA compatible controller [0300]" "Advanced Micro Devices, Inc. [AMD/ATI] [1002]" "Lucienne [164c]" -rc2 -p00 "QUANTA Computer Inc [152d]" "Device [1410]"
03:00.1 "Audio device [0403]" "Advanced Micro Devices, Inc. [AMD/ATI] [1002]" "Renoir Radeon High Definition Audio Controller [1637]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
03:00.2 "Encryption controller [1080]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Family 17h (Models 10h-1fh) Platform Security Processor [15df]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
03:00.3 "USB controller [0c03]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir/Cezanne USB 3.1 [1639]" -p30 "QUANTA Computer Inc [152d]" "Device [1319]"
03:00.4 "USB controller [0c03]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Renoir/Cezanne USB 3.1 [1639]" -p30 "QUANTA Computer Inc [152d]" "Device [1319]"
03:00.5 "Multimedia controller [0480]" "Advanced Micro Devices, Inc. [AMD] [1022]" "ACP/ACP3X/ACP6x Audio Coprocessor [15e2]" -r01 -p00 "QUANTA Computer Inc [152d]" "Device [1319]"
03:00.6 "Audio device [0403]" "Advanced Micro Devices, Inc. [AMD] [1022]" "Family 17h/19h HD Audio Controller [15e3]" -p00 "QUANTA Computer Inc [152d]" "Device [1319]"

find /lib/firmware/ath11k/ -type f | xargs md5sum:
3137b3e52626593296473ee16afb3baf  /lib/firmware/ath11k/IPQ5018/hw1.0/Notice.txt.xz
4f0aeb7f3b2e5690e8a8c09929967573  /lib/firmware/ath11k/IPQ5018/hw1.0/board-2.bin.xz
8740b99fb61e978f87967fa9e1f85fcd  /lib/firmware/ath11k/IPQ5018/hw1.0/m3_fw.b00.xz
f16b3ce3246f7705e80c24b99f43975f  /lib/firmware/ath11k/IPQ5018/hw1.0/m3_fw.b01.xz
c59da31bfbe8538f93a7deb564f9cf08  /lib/firmware/ath11k/IPQ5018/hw1.0/m3_fw.b02.xz
9e88fcc1a07de0438f21b9c493c5bfef  /lib/firmware/ath11k/IPQ5018/hw1.0/m3_fw.flist.xz
c0aafc9c1f06aba7bcb592abac9dfc98  /lib/firmware/ath11k/IPQ5018/hw1.0/m3_fw.mdt.xz
c549b368d7727741fd135b3e30d5abc6  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b00.xz
6bf63411912a293f53a5e11dce20112b  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b01.xz
2ed668ffe8a5fd829900f6a151cf51a7  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b02.xz
21503f14f04467ebb513170ddbfdd5b1  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b03.xz
1bb35a37aefb331919365eb9f1a747b5  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b04.xz
39370b1e0f4d2c7c9ba82802cfd35830  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b05.xz
16a3531627ce3dbd59b7573fa9fc5432  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b07.xz
1082519c851404c4a06fa13c864fe106  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b08.xz
b3822ee68812281f9260f47a7a08b5b4  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b09.xz
6cb61159924728db1c7469f25e203817  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b10.xz
94c050e055f8bccec92dfbacba223ffd  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b11.xz
112336431b6b092743d1f76da74e5939  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b13.xz
3539f9a108bc7c29b060936136996cf0  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.b14.xz
63895167a38bd7176e669cabbc52e242  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.flist.xz
e9f7787d9355b9dec449d30f6f766d60  /lib/firmware/ath11k/IPQ5018/hw1.0/q6_fw.mdt.xz
3137b3e52626593296473ee16afb3baf  /lib/firmware/ath11k/IPQ6018/hw1.0/Notice.txt.xz
fb4cc27dab20aad56f4050c8476e5bbf  /lib/firmware/ath11k/IPQ6018/hw1.0/board-2.bin.xz
9e80f0cd50571d06f667a7e15799f0a3  /lib/firmware/ath11k/IPQ6018/hw1.0/m3_fw.b00.xz
67cf279ad77670ccf2499392e08fe50a  /lib/firmware/ath11k/IPQ6018/hw1.0/m3_fw.b01.xz
5121c5c1529f8509d1966dae73ddec2b  /lib/firmware/ath11k/IPQ6018/hw1.0/m3_fw.b02.xz
b705372f21c2e2c89d8ec456c18e1cc2  /lib/firmware/ath11k/IPQ6018/hw1.0/m3_fw.flist.xz
a7b9bbbbaa8a49b863eb06da1fc82af4  /lib/firmware/ath11k/IPQ6018/hw1.0/m3_fw.mdt.xz
807826d6df871cfc464377f0cd55e1bc  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b00.xz
3dfffc87c0232a200e921b8d500cddbc  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b01.xz
c010d9478fad3b40bbcc10837df26c5c  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b02.xz
f01839ab1c38959399af3acc6ba9015c  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b03.xz
211659cf209b06211751980b482ca35e  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b04.xz
7f12d3d53156d75ebca50bae8913362f  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b05.xz
bfbc83bf5f694a1e1edc06b70b2c6758  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b07.xz
ceff8e218f026b0afc6973a07d152a12  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.b08.xz
0340d651128f833aae391f4f46d53f79  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.flist.xz
64a79ea34805b85f58863423492faa1e  /lib/firmware/ath11k/IPQ6018/hw1.0/q6_fw.mdt.xz
3137b3e52626593296473ee16afb3baf  /lib/firmware/ath11k/IPQ8074/hw2.0/Notice.txt.xz
75a55a5b68fc7e3d93b77efc78de5787  /lib/firmware/ath11k/IPQ8074/hw2.0/board-2.bin.xz
e3e870cc4e9c27c217f2b2c8a7422727  /lib/firmware/ath11k/IPQ8074/hw2.0/m3_fw.b00.xz
a90eb005d94faf76d0f3e9f336326d6b  /lib/firmware/ath11k/IPQ8074/hw2.0/m3_fw.b01.xz
c77bfebbca12800a4292945ff7b7e2c9  /lib/firmware/ath11k/IPQ8074/hw2.0/m3_fw.b02.xz
634b5521539681dfbed9625782a0fa07  /lib/firmware/ath11k/IPQ8074/hw2.0/m3_fw.flist.xz
0d0de5abc558fd0786efe318cdfae418  /lib/firmware/ath11k/IPQ8074/hw2.0/m3_fw.mdt.xz
9c2c0cd4e890f448933d2c5016df0e5d  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b00.xz
ed37c681ed978feed154ca0451e0c349  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b01.xz
542a89fe5454bcc7a8353a5948f47ff5  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b02.xz
bbe9d91114db8343650fbe1d72877189  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b03.xz
91d66e56b232b58c55d85258cb0ff58e  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b04.xz
bcf379e0e008664c0dce8b719a98a2c0  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b05.xz
481794868c2e44c6eb4b3b07f88963c9  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b07.xz
67069281dc198147b45ab6c30b728ef1  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.b08.xz
bb69a475d73f4012ef97060ea5415d5c  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.flist.xz
7abdcc194cc8c560b753052f9e76f83b  /lib/firmware/ath11k/IPQ8074/hw2.0/q6_fw.mdt.xz
0bd91848c55b8980d288a7c42464f19f  /lib/firmware/ath11k/QCA6390/hw2.0/Notice.txt.xz
08f6bd67666917dc88e4626d006cd64c  /lib/firmware/ath11k/QCA6390/hw2.0/amss.bin.xz
7fdf7054861672ed3db5dc4c273830cb  /lib/firmware/ath11k/QCA6390/hw2.0/board-2.bin.xz
c9498e2021e12b95612572e4a9104704  /lib/firmware/ath11k/QCA6390/hw2.0/m3.bin.xz
3137b3e52626593296473ee16afb3baf  /lib/firmware/ath11k/QCN9074/hw1.0/Notice.txt.xz
67c06948add663c11580e1d2d372e090  /lib/firmware/ath11k/QCN9074/hw1.0/amss.bin.xz
692733948778a4fd161db3e4adb12e5e  /lib/firmware/ath11k/QCN9074/hw1.0/board-2.bin.xz
6528db410e5c659e61c083b909f732f5  /lib/firmware/ath11k/QCN9074/hw1.0/m3.bin.xz
795d7e608177237bc71043b78c6ec2aa  /lib/firmware/ath11k/WCN6750/hw1.0/Notice.txt.xz
424ced991d6957cfa9a45c4406515955  /lib/firmware/ath11k/WCN6750/hw1.0/board-2.bin.xz
899cacf1070a4ccb0cab06b3f232c840  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b00.xz
8109a191ee442d99b1bff600b30c21df  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b01.xz
49d8f3d6e8973f17fd47893d091c889a  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b02.xz
c6d7fb6cbc457ba7378d21bd9caa2ea2  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b03.xz
33a6c6dd702359735d03f445abc7d087  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b04.xz
da294c8f00bfc0fa549c7d325d91cfa0  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b05.xz
848236197a1369e9ec39e7a7e9dba745  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b06.xz
24dc6f2f619d85fb78981d8a1a7c9e0b  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b07.xz
a39e73f232d88ed23a6821d1ede10d39  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.b08.xz
4ab0297e3c840f0b8ac3aa178bd0264a  /lib/firmware/ath11k/WCN6750/hw1.0/wpss.mdt.xz
a0ec40099f15ba93b0ba6de3f939b148  /lib/firmware/ath11k/WCN6855/hw2.0/Notice.txt.xz
3ffe318752c22395c04c56d27328dad0  /lib/firmware/ath11k/WCN6855/hw2.0/amss.bin.xz
e86d650894182c8df254cd113e7ddaa9  /lib/firmware/ath11k/WCN6855/hw2.0/board-2.bin.xz
3c8634cfea019c810fe9744c50e85bb2  /lib/firmware/ath11k/WCN6855/hw2.0/m3.bin.xz
e6fc32d432478538fe117bcfca7efadf  /lib/firmware/ath11k/WCN6855/hw2.0/regdb.bin.xz

dmesg | grep ath11k:
[    6.713249] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0xd0000000-0xd01fffff 64bit]
[    6.713267] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    6.713827] ath11k_pci 0000:01:00.0: MSI vectors: 32
[    6.713835] ath11k_pci 0000:01:00.0: wcn6855 hw2.0
[    7.706819] ath11k_pci 0000:01:00.0: chip_id 0x2 chip_family 0xb board_id 0xff soc_id 0x400c0200
[    7.706828] ath11k_pci 0000:01:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
[    8.065447] ath11k_pci 0000:01:00.0 wlp1s0: renamed from wlan0
[  200.812572] ath11k_pci 0000:01:00.0: msdu_done bit in attention is not set
[ 1081.710207] ath11k_pci 0000:01:00.0: failed to enqueue rx buf: -28
[ 1408.849132] ath11k_pci 0000:01:00.0: Failed to set the requested Country regulatory setting
[ 1408.849474] ath11k_pci 0000:01:00.0: Failed to set the requested Country regulatory setting
[ 2115.691602] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0xd0000000-0xd01fffff 64bit]
[ 2115.692061] ath11k_pci 0000:01:00.0: MSI vectors: 32
[ 2115.692068] ath11k_pci 0000:01:00.0: wcn6855 hw2.0
[ 2116.682002] ath11k_pci 0000:01:00.0: chip_id 0x2 chip_family 0xb board_id 0xff soc_id 0x400c0200
[ 2116.682010] ath11k_pci 0000:01:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
[ 2117.068573] ath11k_pci 0000:01:00.0 wlp1s0: renamed from wlan0
Comment 1 Kalle Valo 2023-03-24 10:20:34 UTC
Could you verify that this release works correctly:

https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16

Just copy amss.bin and m3.bin to the directory:

/lib/firmware/ath11k/WCN6855/hw2.0/

And then reboot. Just backup the original files first before the copy. Verify that the firmware release is correct with 'dmesg | grep ath11k'.
Comment 2 Vlad 2023-03-24 14:05:17 UTC
(In reply to Kalle Valo from comment #1)
> Could you verify that this release works correctly:
> 
> https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.
> HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16
> 
> Just copy amss.bin and m3.bin to the directory:
> 
> /lib/firmware/ath11k/WCN6855/hw2.0/
> 
> And then reboot. Just backup the original files first before the copy.
> Verify that the firmware release is correct with 'dmesg | grep ath11k'.

Yes, it works great with this release!
Comment 3 Kalle Valo 2023-03-24 18:07:48 UTC
Thanks for confirmation, so this is a regression in the latest firmware release WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23. I have reported this to the firmware team.
Comment 4 Vlad 2023-04-23 13:04:29 UTC
Latest firmware update version 20230404 still contains this bug and it is still version WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 with this suspend regression.
Comment 5 Mario Limonciello (AMD) 2023-05-03 15:12:46 UTC
This also got reported into AMD's DRM bug tracker because it collided with a separate warning regression.

https://gitlab.freedesktop.org/drm/amd/-/issues/2539

FWIW this WCN6855 firmware update fixes a resume problem where that GPIO is active causing spurious wakeups, so just reverting back to the old one is trading one issue for another.

If you can generate a report with https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py I can suggest a workaround to run the kernel command line until the firmware is fixed.
Comment 6 Jürg Billeter 2023-05-04 09:17:17 UTC
If I work around the suspend issue as per the AMD DRM issue with:

acpi_mask_gpe=0x0e gpiolib_acpi.ignore_interrupt=AMDI0030:00@18

WLAN frequently breaks on resume with the following messages (can provide full log if needed):

mhi mhi0: Did not enter M0 state, MHI state: M3, PM state: SYS ERROR Detect
ath11k_pci 0000:02:00.0: failed to resume mhi: -5
ath11k_pci 0000:02:00.0: failed to resume hif during resume: -5
ath11k_pci 0000:02:00.0: failed to resume core: -5
ath11k_pci 0000:02:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -5
ath11k_pci 0000:02:00.0: PM: failed to resume async: error -5
ath11k_pci 0000:02:00.0: wmi command 16387 timeout
ath11k_pci 0000:02:00.0: failed to send WMI_PDEV_SET_PARAM cmd
ath11k_pci 0000:02:00.0: failed to enable dynamic bw: -11
Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
WARNING: CPU: 10 PID: 9 at net/mac80211/util.c:2553 ieee80211_reconfig+0xa0/0x1760 [mac80211]

And this even leads to a system reset and WLAN was still unavailable after reboot. It works again after a power cycle.

I've switched back to the old firmware (3.6510.9), which seems to be working fine so far even with Linux 6.2.14.
Comment 7 Vlad 2023-05-04 10:56:21 UTC
Created attachment 304213 [details]
log from amd_s2idle.py script

(In reply to Mario Limonciello (AMD) from comment #5)
> This also got reported into AMD's DRM bug tracker because it collided with a
> separate warning regression.
> 
> https://gitlab.freedesktop.org/drm/amd/-/issues/2539
> 
> FWIW this WCN6855 firmware update fixes a resume problem where that GPIO is
> active causing spurious wakeups, so just reverting back to the old one is
> trading one issue for another.
> 
> If you can generate a report with
> https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py I
> can suggest a workaround to run the kernel command line until the firmware
> is fixed.

I ran the script. I set two suspend cycles, first one immediately waken up, second one suspended ok. But I found out today that some times it wakes up even with module disabled but after 10-20 minutes. It is super annoying.

There is log generated by script:
Comment 8 Vlad 2023-05-04 12:43:05 UTC
The simplest workaround until bug is fixed is to create systemd service that disabling module on suspend and enabling it on wakeup.

Create file /etc/systemd/system/root-suspend-fix.service:

[Unit]
Description=Suspend fix for ath11k_pci
Before=sleep.target
StopWhenUnneeded=yes

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=-modprobe -r ath11k_pci
ExecStop=-modprobe ath11k_pci

[Install]
WantedBy=sleep.target

Enable service:
sudo systemctl enable --now root-suspend-fix.service
Comment 9 Mario Limonciello (AMD) 2023-05-04 13:23:36 UTC
> Hardware became unavailable upon resume. This could be a software issue prior
> to suspend or a hardware issue.

@Kalle - the fact that ignoring the pin leads to this behavior makes me wonder if ath11k_pci is missing a check in the suspend path related to active firmware state?

> I've switched back to the old firmware (3.6510.9), which seems to be working
> fine so far even with Linux 6.2.14.

FYI going back to the older firmware will lead to system wakes up on lid close.

> There is log generated by script:

The same kernel command line workaround to ignore the pin would apply to your system, but as Jürg mentioned some negative side effects I wouldn't suggest it anymore.
Comment 10 bzsbzs 2023-05-29 07:10:23 UTC
any updates on this? linux-firmware-20230515 still contains fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
Comment 11 Vlad 2023-10-03 17:03:45 UTC
Little update. Problem still exists:
- Kernel 6.5.5
- linux-firmware 20230919-1
- fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
Comment 12 Philipp 2023-10-09 18:52:25 UTC
> I have reported this to the firmware team.

Could you provide a pointer to where you reported this?

What is the typical process of such a report? Is the code where this regression happened open-sourced? Can we diff the changes between the last good and known broken version?

Is there anything affected users can do to help? Do you need more information or more experiments to be done?
Comment 13 bouke 2023-10-14 09:40:24 UTC
TLDR: I was affected by this issue, but I might have found a workaround/fix by changing the power settings from 'on' to 'auto': 

> echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control'

More info:

Current system:
Distro: Ubuntu 22.04
Laptop: Thinkpad T14 Gen 3 AMD
CPU: AMD 6850U
Kernel: 6.2.0
Wifi adapter: Qualcomm Atheros QCNFA765
Driver: ath11k_pci 0000:02:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23

Problems with the laptop suspend that led me to this ticket:

- System waking from suspend randomly, in bag on the go or just at rest on my desk.
- After waking from suspend (automatically or intended), total system freeze.

I tried various versions of the driver as suggested by Kalle in comment 1, but couldnt get consistent behavior. I then worked with the systemd service suggested by Vlad in comment 8. This prevented the random wakes, and the system freeze would happen anytime I would load the ath11k_pci driver again through modprobe. So I removed the ExecStop clause, and after suspend I would have no wifi until reboot.

I was investigating power consumption with powertop the other day, and one of the tunables that were `BAD` was the "Runtime PM for PCI Device Qualcomm Atheros QCNFA765". So I toggled it to `GOOD`, which executes "echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control'". I disabled the systemd service that removes the driver on suspend, and haven't had suspend problems since.

Now I'm not knowledgeable enough to be sure that changing this setting fixes anything, it could also be coincidence. Some system update might've fixed the problem recently, and I just happened to discover that because the powertop suggestion prompted me to disable the workaround systemd service and try things out again. Supporting that theory, the power setting is automatically set to 'on' when plugging in the power adapter (system switches to 'performance' mode), and then suspend still works correctly. I just hope this information might help someone out or contribute to a fix, because this issue can be really annoying!
Comment 14 Mario Limonciello (AMD) 2023-10-15 18:57:28 UTC
> Now I'm not knowledgeable enough to be sure that changing this setting fixes
> anything, it could also be coincidence.

I've noticed an "easy" way to trigger this issue is by using the amd-s2idle.py script with a short period of time between cycles but run a lot of cycles.  For example 4 seconds between 30s long cycles running for hours.

If your theory is correct, the policy should continue to apply between cycles and you can see if you can still break it.
Comment 15 Jürg Billeter 2023-11-03 10:27:09 UTC
(In reply to bouke from comment #13)
> TLDR: I was affected by this issue, but I might have found a workaround/fix
> by changing the power settings from 'on' to 'auto': 
> 
> > echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control'

This doesn't seem to help on my laptop. It's still waking up immediately when trying to suspend.
Comment 16 Kalle Valo 2023-12-20 07:43:13 UTC
(In reply to Philipp from comment #12)
> > I have reported this to the firmware team.
> 
> Could you provide a pointer to where you reported this?

I contacted them privately but I'll ping them again.
 
> What is the typical process of such a report? Is the code where this
> regression happened open-sourced? Can we diff the changes between the last
> good and known broken version?

Unfortunately the firmware is closed source so there is not much the community can do than to test different firmware versions and provide results.

> Is there anything affected users can do to help? Do you need more
> information or more experiments to be done?

The most helpful is to provide information how to reproduce this. The challenge here is that only some portion of users seem to see this and the reason for that is unclear. If people who experiencing the bug could provide detailed information about their hardware (laptop make, model, BIOS version and so on) hopefully that gives us some leads.

Just to clarify: this bug report is *ONLY* about a firmware regression where suspend works without issues with WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 but always fails every time with release WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 (or later). Any other issues should be filed on a new report. This is very important, if there are multiple different issues on the same report it will become difficult to make sense all of it.
Comment 17 Kalle Valo 2023-12-20 08:12:06 UTC
(In reply to bouke from comment #13)
> TLDR: I was affected by this issue, but I might have found a workaround/fix
> by changing the power settings from 'on' to 'auto': 
> 
> > echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control'
> 
> More info:
> 
> Current system:
> Distro: Ubuntu 22.04
> Laptop: Thinkpad T14 Gen 3 AMD
> CPU: AMD 6850U
> Kernel: 6.2.0
> Wifi adapter: Qualcomm Atheros QCNFA765
> Driver: ath11k_pci 0000:02:00.0: fw_version 0x110b196e fw_build_timestamp
> 2022-12-22 12:54 fw_build_id
> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
> 
> Problems with the laptop suspend that led me to this ticket:
> 
> - System waking from suspend randomly, in bag on the go or just at rest on
> my desk.
> - After waking from suspend (automatically or intended), total system freeze.
> 
> I tried various versions of the driver as suggested by Kalle in comment 1,
> but couldnt get consistent behavior.

This sounds like a different problem, please file a new report. Let's try to keep the bug reports clean, otherwise it's so difficult to work with them. Only one issue per report, please.
Comment 18 Kalle Valo 2023-12-20 08:15:25 UTC
(In reply to Mario Limonciello (AMD) from comment #9)
> > Hardware became unavailable upon resume. This could be a software issue
> prior
> > to suspend or a hardware issue.
> 
> @Kalle - the fact that ignoring the pin leads to this behavior makes me
> wonder if ath11k_pci is missing a check in the suspend path related to
> active firmware state?
You mean that there would be a race condition in ath11k suspend handler? That is very well possible. But on the other hand if the reporter sees that one firmware release always works and another release always fails that does not sound like a race condition to me. This is a tricky problem.
Comment 19 Kalle Valo 2023-12-20 17:53:34 UTC
The firmware team reports that one possible issue was fixed recently, could people seeing the suspend regression test this firmware release to see if the bug still happens:

https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.33

This is just a shot in the dark but it's good to test it. See my instructions in comment #1 how to install the release manually. Do let me know the results and also detailed hardware and software info (laptop, kernel version etc.)
Comment 20 Kalle Valo 2023-12-21 13:15:37 UTC
(In reply to Kalle Valo from comment #19)
> The firmware team reports that one possible issue was fixed recently, could
> people seeing the suspend regression test this firmware release to see if
> the bug still happens:
> 
> https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.
> HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.33
> 
> This is just a shot in the dark but it's good to test it. See my
> instructions in comment #1 how to install the release manually. Do let me
> know the results and also detailed hardware and software info (laptop,
> kernel version etc.)

Actually there's a new version now available, please test this one instead:

https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.36
Comment 21 Vlad 2023-12-21 18:03:57 UTC
(In reply to Kalle Valo from comment #20)
> Actually there's a new version now available, please test this one instead:
> 
> https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.
> HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.36


I've just tested it. Problem still exists. My workaround with systemd service still works.


Laptop: HONOR Magicbook 15 BMH-WCX9
CPU: AMD Ryzen™ 5 5500U
OS: Fedora Silverblue
Kernel: Linux 6.6.6-200.fc39.x86_64
Display server: Wayland
DE: GNOME 45.2
Comment 22 Jürg Billeter 2023-12-21 18:23:05 UTC
Mostly the same here. Suspend actually worked the first time on the first boot with the 3.6510.36 firmware. However, after that, suspend consistently failed again (waking up immediately after suspend). Couldn't get it working a second time even after another reboot.

Suspend works fine with the systemd service workaround here as well, although Wifi takes longer to initialize on resume (compared to the old firmware without the systemd service workaround) but I guess that's to be expected.

I've been using the old 3.6510.9 firmware for many months and haven't seen any issues with that.

Lenovo ThinkPad T14 Gen 3 AMD (6850U)
Linux 6.6.7
Comment 23 Kalle Valo 2023-12-21 19:06:04 UTC
(In reply to Jürg Billeter from comment #22)
> I've been using the old 3.6510.9 firmware for many months and haven't seen
> any issues with that.
What about this release:

https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16

Vlad reported comment 2 that this release works for him. It would be good to know if you Jürg see the same or not.
Comment 24 Jürg Billeter 2023-12-22 06:28:46 UTC
The 3.6510.16 firmware seems to work fine as well in initial tests.

With 3.6510.36 and the modprobe service workaround, my laptop froze on resume after being suspended over night. Possibly during reinitialization of wifi after modprobe but that's just a guess. I don't remember this laptop ever freezing on resume with 3.6510.9 (but I also don't use modprobe on resume with that firmware).

I'll keep using 3.6510.16 for now.
Comment 25 Kalle Valo 2023-12-22 07:57:21 UTC
> The 3.6510.16 firmware seems to work fine as well in initial tests.
Thanks, good to know. So this confirms that you both are seeing the same
issue.

I'm talking with the firmware how we could find more information to help
get this issue fixed.
Comment 26 Ulf Winkelvos 2024-01-04 20:59:33 UTC
Laptop: Lenovo Z16
Wifi: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter (rev 01)
Kernel: 6.6.9-arch1-1

I can also confirm that 09 & 16 work while 23 & 36 fail on my system. Strangely enough I did not notice this until very recently, while the 23 firmware was installed on my system since 04/23. The only thing I remember is that the wifi completely broke down every couple of weeks after a suspend resume cycle and came only back to life after a poweroff/start cycle. Since some time though I see the immediate wake up after suspend issue.
Comment 28 Peter Robinson 2024-01-16 14:52:32 UTC
Not there is a new firmware for WCN6855 in the latest tagged (20240115) linux-firmware, not sure if that helps here too.
Comment 29 Ulf Winkelvos 2024-01-16 15:07:00 UTC
That seems to be the 36 firmware that was avail on Kalle Valo's repo already.
Comment 32 Dimitar Atanasov 2024-01-19 13:32:14 UTC
Same on my Blade 14. I found that it is enough to use rfkill before suspend to work as expected.
Comment 39 Kalle Valo 2024-01-26 17:09:52 UTC
I just uploaded new firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37 but I'm not optimistic that it will fix this issue. Though there is one lead which looks like this issue, I'll get back about that once I know more.
Comment 40 Vlad 2024-01-26 17:45:48 UTC
(In reply to Kalle Valo from comment #39)
> I just uploaded new firmware
> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37 but I'm not
> optimistic that it will fix this issue. Though there is one lead which looks
> like this issue, I'll get back about that once I know more.

I just tested new version and IT WORKS GREAT!!! Tried several times with different duration between suspends. Laptop successfully suspends.

Currently my distro ships version 36 of the firmware. When update will be available within distro package, I'll test it and report status here.

Great work!

Laptop: HONOR Magicbook 15 BMH-WCX9
CPU: AMD Ryzen™ 5 5500U
OS: Fedora Silverblue 39
Kernel: Linux 6.6.12-200.fc39.x86_64
Display server: Wayland
DE: GNOME 45.3
Comment 41 gaelic 2024-01-26 19:09:15 UTC
For me it also works, now using kernel 6.7.1 Thanks for the release.

PS: I'm baffled that my previous comments got deleted. Amazing.
Comment 42 Ulf Winkelvos 2024-01-26 23:00:50 UTC
So the 37 works perfectly fine with kernel 6.6.13. No wake-ups, no crashes so far. Thanks for the update @Kalle! With kernel 6.7.1 and 37 firmware also no wake-ups and no hard system freeze anymore, but an issue that the wifi stack crashes and I have to "modprobe -r ath11k_pci; modprobe ath11k_pci" before I can use the card again. This seems to be correlated, as the behavior is different to older kernel versions, but the wakeup problem seems to be fixed with the 37 firmware. I'll try to bisect this problem and open a separate bug for this.
Comment 43 Kalle Valo 2024-01-29 11:06:42 UTC
> For me it also works, now using kernel 6.7.1 Thanks for the release.
This was a big surprise that the new release solved the issue for many,
that's really good news.

So I see there were three people saying that the issue is fixed now. Now
my next question is are there still people who see still see this
firmare issue with firmware
WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? It could be
that there are actually multiple firmware issues so please do report
here if suspend works with firmware-3.6510.23 but still fails with
-3.6510.37.

> PS: I'm baffled that my previous comments got deleted. Amazing.
To keep the report clean I marked some of the unrelated comments
private, none were deleted. For example, reporting a different problem
or using a distro kernel falls into that category.
Comment 44 Ulf Winkelvos 2024-01-29 16:25:24 UTC
(In reply to Kalle Valo from comment #43)
> So I see there were three people saying that the issue is fixed now. Now
> my next question is are there still people who see still see this
> firmare issue with firmware
> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? It could be
> that there are actually multiple firmware issues so please do report
> here if suspend works with firmware-3.6510.23 but still fails with
> -3.6510.37.
For me 23 was the first version NOT to work correctly. so 09,16 and now 37 are fine, but 23,33 6 36 were broken.
Comment 45 gaelic 2024-01-30 17:11:22 UTC
@ulf

I also had the "inconvenience" regarding the wifi. Today I installed kernel 6.8_rc2 and now both problems (suspend and wifi) are gone.
Comment 46 Jürg Billeter 2024-02-01 09:15:27 UTC
The new firmware 3.6510.37 seems to work fine so far for me as well with Linux 6.6.10. Thanks for the update. Same as Ulf, firmware versions 23-36 were broken for me while 09, 16 and 37 seem to work fine.

I just saw a kernel crash on first suspend on Linux 6.7.3 but I assume that's not a firmware issue and related to https://bugzilla.kernel.org/show_bug.cgi?id=218364

Lenovo ThinkPad T14 Gen 3 AMD (6850U)
Comment 47 Kalle Valo 2024-02-01 13:29:46 UTC
(In reply to Ulf Winkelvos from comment #44)
> For me 23 was the first version NOT to work correctly. so 09,16 and now 37
> are fine, but 23,33 6 36 were broken.
Sorry, I got the version numbers wrong. I'll try again with fixed question:

Now my next question is are there still people who see still see
this firmare issue with firmware
WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? It
could be that there are actually multiple firmware issues so
please do report here if suspend works with firmware-3.6510.16
but fails with -3.6510.37.
Comment 48 Marco Trevisan (Treviño) 2024-02-02 11:56:06 UTC
I've tried new firmware too, but still failing for me with 6.7.1 (it works on 6.6.11 though as the older ones used to):

feb 02 12:34:22 tricky kernel: BUG: kernel NULL pointer dereference, address: 00000000000000a0
feb 02 12:34:22 tricky kernel: #PF: supervisor write access in kernel mode
feb 02 12:34:22 tricky kernel: #PF: error_code(0x0002) - not-present page
feb 02 12:34:22 tricky kernel: PGD 0 P4D 0 
feb 02 12:34:22 tricky kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
feb 02 12:34:22 tricky kernel: CPU: 3 PID: 4505 Comm: NetworkManager Tainted: G           OE      6.7.1-zabbly+ #ubuntu22.04
feb 02 12:34:22 tricky kernel: Hardware name: LENOVO 21K5CTO1WW/21K5CTO1WW, BIOS R2FET36W (1.16 ) 10/24/2023
feb 02 12:34:22 tricky kernel: RIP: 0010:down_write+0x21/0x80
feb 02 12:34:22 tricky kernel: Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 bd c2 ff ff 65 ff 05 2e f2 b1 5f 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 29 65 48 8b 04 25 c0 2b 03 00 49 89 44 24 08
feb 02 12:34:22 tricky kernel: RSP: 0018:ffffa76a065934a0 EFLAGS: 00010246
feb 02 12:34:22 tricky kernel: RAX: 0000000000000000 RBX: ffff8b4a98022068 RCX: 0000000000000000
feb 02 12:34:22 tricky kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
feb 02 12:34:22 tricky kernel: RBP: ffffa76a065934a8 R08: ffff8b4a9c9bd498 R09: 0000000000000000
feb 02 12:34:22 tricky kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000a0
feb 02 12:34:22 tricky kernel: R13: ffff8b4a98021c38 R14: 00000000000000a0 R15: ffff8b4a996d1fc0
feb 02 12:34:22 tricky kernel: FS:  00007f2c8eee64c0(0000) GS:ffff8b58e1ec0000(0000) knlGS:0000000000000000
feb 02 12:34:22 tricky kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
feb 02 12:34:22 tricky kernel: CR2: 00000000000000a0 CR3: 000000011bfa8000 CR4: 0000000000750ef0
feb 02 12:34:22 tricky kernel: PKRU: 55555554
feb 02 12:34:22 tricky kernel: Call Trace:
feb 02 12:34:22 tricky kernel:  <TASK>
feb 02 12:34:22 tricky kernel:  ? show_regs+0x72/0x90
feb 02 12:34:22 tricky kernel:  ? __die+0x25/0x80
feb 02 12:34:22 tricky kernel:  ? page_fault_oops+0x154/0x4c0
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? raw_spin_rq_unlock+0x10/0x40
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? finish_task_switch.isra.0+0x84/0x2b0
feb 02 12:34:22 tricky kernel:  ? do_user_addr_fault+0x30e/0x6e0
feb 02 12:34:22 tricky kernel:  ? exc_page_fault+0x84/0x1b0
feb 02 12:34:22 tricky kernel:  ? asm_exc_page_fault+0x27/0x30
feb 02 12:34:22 tricky kernel:  ? down_write+0x21/0x80
feb 02 12:34:22 tricky kernel:  simple_recursive_removal+0xaa/0x2c0
feb 02 12:34:22 tricky kernel:  ? __pfx_remove_one+0x10/0x10
feb 02 12:34:22 tricky kernel:  debugfs_remove+0x45/0x80
feb 02 12:34:22 tricky kernel:  ath11k_debugfs_remove_interface+0x1e/0x40 [ath11k]
feb 02 12:34:22 tricky kernel:  ath11k_mac_op_remove_interface+0x1a5/0x2f0 [ath11k]
feb 02 12:34:22 tricky kernel:  drv_remove_interface+0xe0/0x1a0 [mac80211]
feb 02 12:34:22 tricky kernel:  ieee80211_do_stop+0x651/0x960 [mac80211]
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ieee80211_stop+0x59/0x190 [mac80211]
feb 02 12:34:22 tricky kernel:  __dev_close_many+0x9f/0x130
feb 02 12:34:22 tricky kernel:  __dev_change_flags+0xe6/0x230
feb 02 12:34:22 tricky kernel:  dev_change_flags+0x26/0x80
feb 02 12:34:22 tricky kernel:  do_setlink+0x2b0/0x1230
feb 02 12:34:22 tricky kernel:  ? fib_table_dump+0xc0/0x3b0
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? __nla_validate_parse+0x5d/0xfc0
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? sched_clock_noinstr+0x9/0x10
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? sched_clock+0x10/0x30
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? select_task_rq_fair+0x19e/0x20b0
feb 02 12:34:22 tricky kernel:  __rtnl_newlink+0x5d0/0xb10
feb 02 12:34:22 tricky kernel:  rtnl_newlink+0x49/0x80
feb 02 12:34:22 tricky kernel:  rtnetlink_rcv_msg+0x179/0x450
feb 02 12:34:22 tricky kernel:  ? ep_autoremove_wake_function+0x12/0x40
feb 02 12:34:22 tricky kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
feb 02 12:34:22 tricky kernel:  netlink_rcv_skb+0x59/0x110
feb 02 12:34:22 tricky kernel:  rtnetlink_rcv+0x15/0x30
feb 02 12:34:22 tricky kernel:  netlink_unicast+0x247/0x360
feb 02 12:34:22 tricky kernel:  netlink_sendmsg+0x25d/0x510
feb 02 12:34:22 tricky kernel:  ? __check_object_size+0x6d/0x310
feb 02 12:34:22 tricky kernel:  ____sys_sendmsg+0x3e9/0x420
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ___sys_sendmsg+0x88/0xe0
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? ttwu_queue_wakelist+0x139/0x1c0
feb 02 12:34:22 tricky kernel:  ? eventfd_write+0xcf/0x1e0
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? try_to_wake_up+0x271/0x6d0
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? __fget_light+0xa0/0x150
feb 02 12:34:22 tricky kernel:  __sys_sendmsg+0x69/0xd0
feb 02 12:34:22 tricky kernel:  __x64_sys_sendmsg+0x1d/0x30
feb 02 12:34:22 tricky kernel:  do_syscall_64+0x5c/0xf0
feb 02 12:34:22 tricky kernel:  ? syscall_exit_to_user_mode+0x38/0x60
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? do_syscall_64+0x6b/0xf0
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? exit_to_user_mode_prepare+0x39/0x190
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? syscall_exit_to_user_mode+0x38/0x60
feb 02 12:34:22 tricky kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
feb 02 12:34:22 tricky kernel:  ? do_syscall_64+0x6b/0xf0
feb 02 12:34:22 tricky kernel:  ? sysvec_apic_timer_interrupt+0x4e/0xb0
feb 02 12:34:22 tricky kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
feb 02 12:34:22 tricky kernel: RIP: 0033:0x7f2c8ff2799d
feb 02 12:34:22 tricky kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a 90 f6 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 ae 90 f6 ff 48
feb 02 12:34:22 tricky kernel: RSP: 002b:00007ffe9ca7f550 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
feb 02 12:34:22 tricky kernel: RAX: ffffffffffffffda RBX: 000000000000002f RCX: 00007f2c8ff2799d
feb 02 12:34:22 tricky kernel: RDX: 0000000000000000 RSI: 00007ffe9ca7f590 RDI: 000000000000000c
feb 02 12:34:22 tricky kernel: RBP: 000055c7bc7a8030 R08: 0000000000000000 R09: 0000000000000000
feb 02 12:34:22 tricky kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
feb 02 12:34:22 tricky kernel: R13: 00007ffe9ca7f6e0 R14: 00007ffe9ca7f6dc R15: 0000000000000000
feb 02 12:34:22 tricky kernel:  </TASK>
feb 02 12:34:22 tricky kernel: Modules linked in: xt_comment iptable_raw iptable_mangle iptable_nat bpfilter vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype vmw_vsock_vmci_transport vsock vmw_vmci ccm michael_mic snd_seq_dummy snd_hrtimer vboxnetadp(OE) vboxnetflt(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vboxdrv(OE) nf_tables nfnetlink rfcomm cmac algif_hash algif_skcipher af_alg bnep overlay btusb btrtl btintel btbcm btmtk bluetooth ecdh_generic ecc uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common qrtr_mhi mc sunrpc binfmt_misc nls_iso8859_1 sch_fq_codel intel_rapl_msr snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach intel_rapl_common edac_mce_amd snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci
feb 02 12:34:22 tricky kernel:  snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_sof_utils snd_soc_core snd_hda_intel joydev snd_compress snd_intel_dspcfg snd_intel_sdw_acpi ac97_bus thinkpad_acpi rapl snd_pcm_dmaengine snd_hda_codec nvram qrtr ledtrig_audio snd_hda_core snd_hwdep platform_profile snd_pci_ps snd_seq_midi ath11k_pci snd_rpl_pci_acp6x snd_seq_midi_event snd_acp_pci ath11k snd_rawmidi snd_acp_legacy_common qmi_helpers input_leds snd_seq snd_pci_acp6x mac80211 serio_raw snd_pcm think_lmi snd_seq_device snd_pci_acp5x hid_multitouch firmware_attributes_class snd_timer snd_rn_pci_acp3x cfg80211 snd_acp_config wmi_bmof snd_soc_acpi snd libarc4 k10temp snd_pci_acp3x soundcore mhi mac_hid amd_pmc kvm_amd ccp kvm irqbypass iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables pkcs8_key_parser cuse msr parport_pc ppdev lp parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_crypt amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper
feb 02 12:34:22 tricky kernel:  drm_ttm_helper ttm drm_display_helper crct10dif_pclmul crc32_pclmul polyval_clmulni hid_generic cec polyval_generic ghash_clmulni_intel rc_core i2c_hid_acpi sha256_ssse3 sha1_ssse3 i2c_hid nvme drm_kms_helper ucsi_acpi hid r8169 typec_ucsi xhci_pci video psmouse thunderbolt i2c_piix4 nvme_core xhci_pci_renesas realtek typec drm wmi aesni_intel crypto_simd cryptd
feb 02 12:34:22 tricky kernel: CR2: 00000000000000a0
feb 02 12:34:22 tricky kernel: ---[ end trace 0000000000000000 ]---
feb 02 12:34:22 tricky kernel: RIP: 0010:down_write+0x21/0x80
feb 02 12:34:22 tricky kernel: Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 bd c2 ff ff 65 ff 05 2e f2 b1 5f 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 29 65 48 8b 04 25 c0 2b 03 00 49 89 44 24 08
feb 02 12:34:22 tricky kernel: RSP: 0018:ffffa76a065934a0 EFLAGS: 00010246
feb 02 12:34:22 tricky kernel: RAX: 0000000000000000 RBX: ffff8b4a98022068 RCX: 0000000000000000
feb 02 12:34:22 tricky kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
feb 02 12:34:22 tricky kernel: RBP: ffffa76a065934a8 R08: ffff8b4a9c9bd498 R09: 0000000000000000
feb 02 12:34:22 tricky kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000a0
feb 02 12:34:22 tricky kernel: R13: ffff8b4a98021c38 R14: 00000000000000a0 R15: ffff8b4a996d1fc0
feb 02 12:34:22 tricky kernel: FS:  00007f2c8eee64c0(0000) GS:ffff8b58e1ec0000(0000) knlGS:0000000000000000
feb 02 12:34:22 tricky kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
feb 02 12:34:22 tricky kernel: CR2: 00000000000000a0 CR3: 000000011bfa8000 CR4: 0000000000750ef0
feb 02 12:34:22 tricky kernel: PKRU: 55555554
Comment 49 Kalle Valo 2024-02-02 14:00:58 UTC
> I've tried new firmware too, but still failing for me with 6.7.1 (it works on
> 6.6.11 though as the older ones used to):
>
> feb 02 12:34:22 tricky kernel: BUG: kernel NULL pointer dereference, address:
This is a different problem, see bug #218364.
Comment 50 Mario Limonciello (AMD) 2024-02-05 03:03:04 UTC
*** Bug 218209 has been marked as a duplicate of this bug. ***
Comment 51 Roland Ruckerbauer 2024-02-07 09:45:33 UTC
I finally found this thread after suddenly having lots of issues with my Thinkpad T16 AMD.

Here is the issue I had:

Scenario 1:
Closing the lid, the thinkpad led keeps going for ~30 seconds, after that it starts blinking. Meaning standby was delayed? I guess it has to do with the wakeups from the wifi module.

Scenario 2:
Closing the lid, the led keeps going, but this time forever. Opening up the lid again shows the monitor is off. The laptop seems completely frozen, although the led shows its not actually in standby.
I guess it got woken up by the wifi module immediately, and before the kernel could wake up everything (e.g. amdgpu), it crashed. Unfortunately I am unable to get any logs from this state. The disk probably did not wake up before the crash.


I kind of started to suspect that my suspend issues were wifi related, after I discovered, that my laptop does not freeze any more, if I keep pinging it over the network when connected to wifi, while trying to suspend / resume.

I tired out firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37, which seems to fix the Scenario 1 completely, and Scenario 2 partially. Also testing with amd_s2idle.py shows there are no spurious wakeups any more. The laptop always goes to sleep immediately according to its led.

The remaining issue I am now facing is, that sometimes when waking up (led stops blinking and is on again), I have a frozen system again. I only tested it now for ~5 times, it happened 4 out of 5 times so pretty consistently. I guess the freeze really is resume related, and probably a totally different bug?
Comment 52 Roland Ruckerbauer 2024-02-07 09:50:21 UTC
(In reply to Roland Ruckerbauer from comment #51)
> I finally found this thread after suddenly having lots of issues with my
> Thinkpad T16 AMD.
> 
> Here is the issue I had:
> 
> Scenario 1:
> Closing the lid, the thinkpad led keeps going for ~30 seconds, after that it
> starts blinking. Meaning standby was delayed? I guess it has to do with the
> wakeups from the wifi module.
> 
> Scenario 2:
> Closing the lid, the led keeps going, but this time forever. Opening up the
> lid again shows the monitor is off. The laptop seems completely frozen,
> although the led shows its not actually in standby.
> I guess it got woken up by the wifi module immediately, and before the
> kernel could wake up everything (e.g. amdgpu), it crashed. Unfortunately I
> am unable to get any logs from this state. The disk probably did not wake up
> before the crash.
> 
> 
> I kind of started to suspect that my suspend issues were wifi related, after
> I discovered, that my laptop does not freeze any more, if I keep pinging it
> over the network when connected to wifi, while trying to suspend / resume.
> 
> I tired out firmware
> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37, which seems to
> fix the Scenario 1 completely, and Scenario 2 partially. Also testing with
> amd_s2idle.py shows there are no spurious wakeups any more. The laptop
> always goes to sleep immediately according to its led.
> 
> The remaining issue I am now facing is, that sometimes when waking up (led
> stops blinking and is on again), I have a frozen system again. I only tested
> it now for ~5 times, it happened 4 out of 5 times so pretty consistently. I
> guess the freeze really is resume related, and probably a totally different
> bug?

I forgot to mention: The freezing of the system happens very often, when suspend / resume is triggered by me opening and closing the lid. I`d say it reproduces  80% of the time. When standby is triggered from linux with systemctl suspend it almost never happens. With the power button of the thinkpad, its maybe a bit between the two other triggers?

This is really strange, I have no idea why the different triggers should make a difference. Maybe this can be a clue, like maybe a race between turning on/off the screen / gpu and s2idle?
Comment 53 Kalle Valo 2024-02-07 13:30:05 UTC
> I guess the freeze really is resume related, and probably a totally
> different bug?
Please file a new bug report for your scenario 2. See also:

https://wireless.wiki.kernel.org/en/users/drivers/ath11k/bugreport
Comment 54 Mario Limonciello (AMD) 2024-02-07 14:01:48 UTC
> Maybe this can be a clue, like maybe a race between turning on/off the screen
> / gpu and s2idle?

That's exactly right.  There's another regression in the GPU driver.  It's discussed in https://gitlab.freedesktop.org/drm/amd/-/issues/3132

> So I see there were three people saying that the issue is fixed now. Now
my next question is are there still people who see still see this
firmare issue with firmware
WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? 

Overwhelmingly this firmware seems to improve things for people.  I suggest closing this bug now.
Comment 55 Kalle Valo 2024-02-07 18:20:43 UTC
(In reply to Mario Limonciello (AMD) from comment #54)
> Overwhelmingly this firmware seems to improve things for people.  I suggest
> closing this bug now.
Agreed, I'll close this finally.

Thank you everyone for helping fix this! If someone still has suspend problems please file a new report.
Comment 56 Warren Togami 2024-02-09 13:55:35 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2262577#c20
I tested firmware 37 with kernel-6.7.4 and kernel-6.8.0-rc3.  I believe the suspend problems are not fixed. Details here.
Comment 57 Warren Togami 2024-02-09 13:58:07 UTC
Suspend works with zero problems with firmware 23 but not 37.
Comment 58 Warren Togami 2024-02-09 14:02:42 UTC
disregard the previous post

- kernel-6.6.14 has no suspend problems with any firmware version
- kernel-6.7.4 has suspend problems with firmware 23 and 37
- kernel-6.8.0-rc3 has suspend problems with firmware 23 and 37
- kernel-6.7.4 can deadlock the kernel while reloading the ath11k_pci driver after a successful resume
- kernel-6.8.0-rc3 can deadlock the kernel while reloading the ath11_pci driver after a successful resume
Comment 59 Mario Limonciello (AMD) 2024-02-09 14:04:14 UTC
I believe you're exposed to a separate GPU suspend resume bug that happened in kernel 6.7. please reference the above link.
Comment 60 Daniel Kolesa 2024-02-09 14:18:11 UTC
With firmware 37 (kernel 6.7.1, haven't tried newer ones yet) I get working manual suspend without toggling off wifi (i.e. via the gnome suspend button) but lid suspend hangs (unless i disable wifi beforehand, then lid suspend also works). To unhang the computer with lid suspend, I have to hold the power button for a few seconds which makes it suspend for real, and then I can resume it and get a desktop again. Thinkpad P14S gen4 AMD.

Note You need to log in before you can comment on or make changes to this bug.