Bug 217239
Summary: | ath11k: WCN6855: firmware -3.6510.23 (and later) breaks suspend on certain setups | ||
---|---|---|---|
Product: | Drivers | Reporter: | Vlad (mrvladus) |
Component: | network-wireless | Assignee: | Kalle Valo (kvalo) |
Status: | CLOSED CODE_FIX | ||
Severity: | blocking | CC: | aprilgrimoire, boukekrom, gaelic, gordan.kresic, idozo, j, kernel, kvalo, leminhman0312, linux, mail, mail, marekrusinowski, mario.limonciello, mpearson-lenovo, mrvladus, pbrobinson, philipp, phusho, ulf, warren |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | 6.5.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | log from amd_s2idle.py script |
Description
Vlad
2023-03-24 09:55:09 UTC
Could you verify that this release works correctly: https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 Just copy amss.bin and m3.bin to the directory: /lib/firmware/ath11k/WCN6855/hw2.0/ And then reboot. Just backup the original files first before the copy. Verify that the firmware release is correct with 'dmesg | grep ath11k'. (In reply to Kalle Valo from comment #1) > Could you verify that this release works correctly: > > https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN. > HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 > > Just copy amss.bin and m3.bin to the directory: > > /lib/firmware/ath11k/WCN6855/hw2.0/ > > And then reboot. Just backup the original files first before the copy. > Verify that the firmware release is correct with 'dmesg | grep ath11k'. Yes, it works great with this release! Thanks for confirmation, so this is a regression in the latest firmware release WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23. I have reported this to the firmware team. Latest firmware update version 20230404 still contains this bug and it is still version WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 with this suspend regression. This also got reported into AMD's DRM bug tracker because it collided with a separate warning regression. https://gitlab.freedesktop.org/drm/amd/-/issues/2539 FWIW this WCN6855 firmware update fixes a resume problem where that GPIO is active causing spurious wakeups, so just reverting back to the old one is trading one issue for another. If you can generate a report with https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py I can suggest a workaround to run the kernel command line until the firmware is fixed. If I work around the suspend issue as per the AMD DRM issue with: acpi_mask_gpe=0x0e gpiolib_acpi.ignore_interrupt=AMDI0030:00@18 WLAN frequently breaks on resume with the following messages (can provide full log if needed): mhi mhi0: Did not enter M0 state, MHI state: M3, PM state: SYS ERROR Detect ath11k_pci 0000:02:00.0: failed to resume mhi: -5 ath11k_pci 0000:02:00.0: failed to resume hif during resume: -5 ath11k_pci 0000:02:00.0: failed to resume core: -5 ath11k_pci 0000:02:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -5 ath11k_pci 0000:02:00.0: PM: failed to resume async: error -5 ath11k_pci 0000:02:00.0: wmi command 16387 timeout ath11k_pci 0000:02:00.0: failed to send WMI_PDEV_SET_PARAM cmd ath11k_pci 0000:02:00.0: failed to enable dynamic bw: -11 Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue. WARNING: CPU: 10 PID: 9 at net/mac80211/util.c:2553 ieee80211_reconfig+0xa0/0x1760 [mac80211] And this even leads to a system reset and WLAN was still unavailable after reboot. It works again after a power cycle. I've switched back to the old firmware (3.6510.9), which seems to be working fine so far even with Linux 6.2.14. Created attachment 304213 [details] log from amd_s2idle.py script (In reply to Mario Limonciello (AMD) from comment #5) > This also got reported into AMD's DRM bug tracker because it collided with a > separate warning regression. > > https://gitlab.freedesktop.org/drm/amd/-/issues/2539 > > FWIW this WCN6855 firmware update fixes a resume problem where that GPIO is > active causing spurious wakeups, so just reverting back to the old one is > trading one issue for another. > > If you can generate a report with > https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py I > can suggest a workaround to run the kernel command line until the firmware > is fixed. I ran the script. I set two suspend cycles, first one immediately waken up, second one suspended ok. But I found out today that some times it wakes up even with module disabled but after 10-20 minutes. It is super annoying. There is log generated by script: The simplest workaround until bug is fixed is to create systemd service that disabling module on suspend and enabling it on wakeup. Create file /etc/systemd/system/root-suspend-fix.service: [Unit] Description=Suspend fix for ath11k_pci Before=sleep.target StopWhenUnneeded=yes [Service] Type=oneshot RemainAfterExit=yes ExecStart=-modprobe -r ath11k_pci ExecStop=-modprobe ath11k_pci [Install] WantedBy=sleep.target Enable service: sudo systemctl enable --now root-suspend-fix.service > Hardware became unavailable upon resume. This could be a software issue prior > to suspend or a hardware issue. @Kalle - the fact that ignoring the pin leads to this behavior makes me wonder if ath11k_pci is missing a check in the suspend path related to active firmware state? > I've switched back to the old firmware (3.6510.9), which seems to be working > fine so far even with Linux 6.2.14. FYI going back to the older firmware will lead to system wakes up on lid close. > There is log generated by script: The same kernel command line workaround to ignore the pin would apply to your system, but as Jürg mentioned some negative side effects I wouldn't suggest it anymore. any updates on this? linux-firmware-20230515 still contains fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 Little update. Problem still exists: - Kernel 6.5.5 - linux-firmware 20230919-1 - fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 > I have reported this to the firmware team.
Could you provide a pointer to where you reported this?
What is the typical process of such a report? Is the code where this regression happened open-sourced? Can we diff the changes between the last good and known broken version?
Is there anything affected users can do to help? Do you need more information or more experiments to be done?
TLDR: I was affected by this issue, but I might have found a workaround/fix by changing the power settings from 'on' to 'auto': > echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control' More info: Current system: Distro: Ubuntu 22.04 Laptop: Thinkpad T14 Gen 3 AMD CPU: AMD 6850U Kernel: 6.2.0 Wifi adapter: Qualcomm Atheros QCNFA765 Driver: ath11k_pci 0000:02:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 Problems with the laptop suspend that led me to this ticket: - System waking from suspend randomly, in bag on the go or just at rest on my desk. - After waking from suspend (automatically or intended), total system freeze. I tried various versions of the driver as suggested by Kalle in comment 1, but couldnt get consistent behavior. I then worked with the systemd service suggested by Vlad in comment 8. This prevented the random wakes, and the system freeze would happen anytime I would load the ath11k_pci driver again through modprobe. So I removed the ExecStop clause, and after suspend I would have no wifi until reboot. I was investigating power consumption with powertop the other day, and one of the tunables that were `BAD` was the "Runtime PM for PCI Device Qualcomm Atheros QCNFA765". So I toggled it to `GOOD`, which executes "echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control'". I disabled the systemd service that removes the driver on suspend, and haven't had suspend problems since. Now I'm not knowledgeable enough to be sure that changing this setting fixes anything, it could also be coincidence. Some system update might've fixed the problem recently, and I just happened to discover that because the powertop suggestion prompted me to disable the workaround systemd service and try things out again. Supporting that theory, the power setting is automatically set to 'on' when plugging in the power adapter (system switches to 'performance' mode), and then suspend still works correctly. I just hope this information might help someone out or contribute to a fix, because this issue can be really annoying! > Now I'm not knowledgeable enough to be sure that changing this setting fixes
> anything, it could also be coincidence.
I've noticed an "easy" way to trigger this issue is by using the amd-s2idle.py script with a short period of time between cycles but run a lot of cycles. For example 4 seconds between 30s long cycles running for hours.
If your theory is correct, the policy should continue to apply between cycles and you can see if you can still break it.
(In reply to bouke from comment #13) > TLDR: I was affected by this issue, but I might have found a workaround/fix > by changing the power settings from 'on' to 'auto': > > > echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control' This doesn't seem to help on my laptop. It's still waking up immediately when trying to suspend. (In reply to Philipp from comment #12) > > I have reported this to the firmware team. > > Could you provide a pointer to where you reported this? I contacted them privately but I'll ping them again. > What is the typical process of such a report? Is the code where this > regression happened open-sourced? Can we diff the changes between the last > good and known broken version? Unfortunately the firmware is closed source so there is not much the community can do than to test different firmware versions and provide results. > Is there anything affected users can do to help? Do you need more > information or more experiments to be done? The most helpful is to provide information how to reproduce this. The challenge here is that only some portion of users seem to see this and the reason for that is unclear. If people who experiencing the bug could provide detailed information about their hardware (laptop make, model, BIOS version and so on) hopefully that gives us some leads. Just to clarify: this bug report is *ONLY* about a firmware regression where suspend works without issues with WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 but always fails every time with release WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 (or later). Any other issues should be filed on a new report. This is very important, if there are multiple different issues on the same report it will become difficult to make sense all of it. (In reply to bouke from comment #13) > TLDR: I was affected by this issue, but I might have found a workaround/fix > by changing the power settings from 'on' to 'auto': > > > echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control' > > More info: > > Current system: > Distro: Ubuntu 22.04 > Laptop: Thinkpad T14 Gen 3 AMD > CPU: AMD 6850U > Kernel: 6.2.0 > Wifi adapter: Qualcomm Atheros QCNFA765 > Driver: ath11k_pci 0000:02:00.0: fw_version 0x110b196e fw_build_timestamp > 2022-12-22 12:54 fw_build_id > WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 > > Problems with the laptop suspend that led me to this ticket: > > - System waking from suspend randomly, in bag on the go or just at rest on > my desk. > - After waking from suspend (automatically or intended), total system freeze. > > I tried various versions of the driver as suggested by Kalle in comment 1, > but couldnt get consistent behavior. This sounds like a different problem, please file a new report. Let's try to keep the bug reports clean, otherwise it's so difficult to work with them. Only one issue per report, please. (In reply to Mario Limonciello (AMD) from comment #9) > > Hardware became unavailable upon resume. This could be a software issue > prior > > to suspend or a hardware issue. > > @Kalle - the fact that ignoring the pin leads to this behavior makes me > wonder if ath11k_pci is missing a check in the suspend path related to > active firmware state? You mean that there would be a race condition in ath11k suspend handler? That is very well possible. But on the other hand if the reporter sees that one firmware release always works and another release always fails that does not sound like a race condition to me. This is a tricky problem. The firmware team reports that one possible issue was fixed recently, could people seeing the suspend regression test this firmware release to see if the bug still happens: https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.33 This is just a shot in the dark but it's good to test it. See my instructions in comment #1 how to install the release manually. Do let me know the results and also detailed hardware and software info (laptop, kernel version etc.) (In reply to Kalle Valo from comment #19) > The firmware team reports that one possible issue was fixed recently, could > people seeing the suspend regression test this firmware release to see if > the bug still happens: > > https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN. > HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.33 > > This is just a shot in the dark but it's good to test it. See my > instructions in comment #1 how to install the release manually. Do let me > know the results and also detailed hardware and software info (laptop, > kernel version etc.) Actually there's a new version now available, please test this one instead: https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.36 (In reply to Kalle Valo from comment #20) > Actually there's a new version now available, please test this one instead: > > https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN. > HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.36 I've just tested it. Problem still exists. My workaround with systemd service still works. Laptop: HONOR Magicbook 15 BMH-WCX9 CPU: AMD Ryzen™ 5 5500U OS: Fedora Silverblue Kernel: Linux 6.6.6-200.fc39.x86_64 Display server: Wayland DE: GNOME 45.2 Mostly the same here. Suspend actually worked the first time on the first boot with the 3.6510.36 firmware. However, after that, suspend consistently failed again (waking up immediately after suspend). Couldn't get it working a second time even after another reboot. Suspend works fine with the systemd service workaround here as well, although Wifi takes longer to initialize on resume (compared to the old firmware without the systemd service workaround) but I guess that's to be expected. I've been using the old 3.6510.9 firmware for many months and haven't seen any issues with that. Lenovo ThinkPad T14 Gen 3 AMD (6850U) Linux 6.6.7 (In reply to Jürg Billeter from comment #22) > I've been using the old 3.6510.9 firmware for many months and haven't seen > any issues with that. What about this release: https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.16 Vlad reported comment 2 that this release works for him. It would be good to know if you Jürg see the same or not. The 3.6510.16 firmware seems to work fine as well in initial tests. With 3.6510.36 and the modprobe service workaround, my laptop froze on resume after being suspended over night. Possibly during reinitialization of wifi after modprobe but that's just a guess. I don't remember this laptop ever freezing on resume with 3.6510.9 (but I also don't use modprobe on resume with that firmware). I'll keep using 3.6510.16 for now. > The 3.6510.16 firmware seems to work fine as well in initial tests.
Thanks, good to know. So this confirms that you both are seeing the same
issue.
I'm talking with the firmware how we could find more information to help
get this issue fixed.
Laptop: Lenovo Z16 Wifi: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter (rev 01) Kernel: 6.6.9-arch1-1 I can also confirm that 09 & 16 work while 23 & 36 fail on my system. Strangely enough I did not notice this until very recently, while the 23 firmware was installed on my system since 04/23. The only thing I remember is that the wifi completely broke down every couple of weeks after a suspend resume cycle and came only back to life after a poweroff/start cycle. Since some time though I see the immediate wake up after suspend issue. Not there is a new firmware for WCN6855 in the latest tagged (20240115) linux-firmware, not sure if that helps here too. That seems to be the 36 firmware that was avail on Kalle Valo's repo already. Same on my Blade 14. I found that it is enough to use rfkill before suspend to work as expected. I just uploaded new firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37 but I'm not optimistic that it will fix this issue. Though there is one lead which looks like this issue, I'll get back about that once I know more. (In reply to Kalle Valo from comment #39) > I just uploaded new firmware > WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37 but I'm not > optimistic that it will fix this issue. Though there is one lead which looks > like this issue, I'll get back about that once I know more. I just tested new version and IT WORKS GREAT!!! Tried several times with different duration between suspends. Laptop successfully suspends. Currently my distro ships version 36 of the firmware. When update will be available within distro package, I'll test it and report status here. Great work! Laptop: HONOR Magicbook 15 BMH-WCX9 CPU: AMD Ryzen™ 5 5500U OS: Fedora Silverblue 39 Kernel: Linux 6.6.12-200.fc39.x86_64 Display server: Wayland DE: GNOME 45.3 For me it also works, now using kernel 6.7.1 Thanks for the release. PS: I'm baffled that my previous comments got deleted. Amazing. So the 37 works perfectly fine with kernel 6.6.13. No wake-ups, no crashes so far. Thanks for the update @Kalle! With kernel 6.7.1 and 37 firmware also no wake-ups and no hard system freeze anymore, but an issue that the wifi stack crashes and I have to "modprobe -r ath11k_pci; modprobe ath11k_pci" before I can use the card again. This seems to be correlated, as the behavior is different to older kernel versions, but the wakeup problem seems to be fixed with the 37 firmware. I'll try to bisect this problem and open a separate bug for this. > For me it also works, now using kernel 6.7.1 Thanks for the release. This was a big surprise that the new release solved the issue for many, that's really good news. So I see there were three people saying that the issue is fixed now. Now my next question is are there still people who see still see this firmare issue with firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? It could be that there are actually multiple firmware issues so please do report here if suspend works with firmware-3.6510.23 but still fails with -3.6510.37. > PS: I'm baffled that my previous comments got deleted. Amazing. To keep the report clean I marked some of the unrelated comments private, none were deleted. For example, reporting a different problem or using a distro kernel falls into that category. (In reply to Kalle Valo from comment #43) > So I see there were three people saying that the issue is fixed now. Now > my next question is are there still people who see still see this > firmare issue with firmware > WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? It could be > that there are actually multiple firmware issues so please do report > here if suspend works with firmware-3.6510.23 but still fails with > -3.6510.37. For me 23 was the first version NOT to work correctly. so 09,16 and now 37 are fine, but 23,33 6 36 were broken. @ulf I also had the "inconvenience" regarding the wifi. Today I installed kernel 6.8_rc2 and now both problems (suspend and wifi) are gone. The new firmware 3.6510.37 seems to work fine so far for me as well with Linux 6.6.10. Thanks for the update. Same as Ulf, firmware versions 23-36 were broken for me while 09, 16 and 37 seem to work fine. I just saw a kernel crash on first suspend on Linux 6.7.3 but I assume that's not a firmware issue and related to https://bugzilla.kernel.org/show_bug.cgi?id=218364 Lenovo ThinkPad T14 Gen 3 AMD (6850U) (In reply to Ulf Winkelvos from comment #44) > For me 23 was the first version NOT to work correctly. so 09,16 and now 37 > are fine, but 23,33 6 36 were broken. Sorry, I got the version numbers wrong. I'll try again with fixed question: Now my next question is are there still people who see still see this firmare issue with firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? It could be that there are actually multiple firmware issues so please do report here if suspend works with firmware-3.6510.16 but fails with -3.6510.37. I've tried new firmware too, but still failing for me with 6.7.1 (it works on 6.6.11 though as the older ones used to): feb 02 12:34:22 tricky kernel: BUG: kernel NULL pointer dereference, address: 00000000000000a0 feb 02 12:34:22 tricky kernel: #PF: supervisor write access in kernel mode feb 02 12:34:22 tricky kernel: #PF: error_code(0x0002) - not-present page feb 02 12:34:22 tricky kernel: PGD 0 P4D 0 feb 02 12:34:22 tricky kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI feb 02 12:34:22 tricky kernel: CPU: 3 PID: 4505 Comm: NetworkManager Tainted: G OE 6.7.1-zabbly+ #ubuntu22.04 feb 02 12:34:22 tricky kernel: Hardware name: LENOVO 21K5CTO1WW/21K5CTO1WW, BIOS R2FET36W (1.16 ) 10/24/2023 feb 02 12:34:22 tricky kernel: RIP: 0010:down_write+0x21/0x80 feb 02 12:34:22 tricky kernel: Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 bd c2 ff ff 65 ff 05 2e f2 b1 5f 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 29 65 48 8b 04 25 c0 2b 03 00 49 89 44 24 08 feb 02 12:34:22 tricky kernel: RSP: 0018:ffffa76a065934a0 EFLAGS: 00010246 feb 02 12:34:22 tricky kernel: RAX: 0000000000000000 RBX: ffff8b4a98022068 RCX: 0000000000000000 feb 02 12:34:22 tricky kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 feb 02 12:34:22 tricky kernel: RBP: ffffa76a065934a8 R08: ffff8b4a9c9bd498 R09: 0000000000000000 feb 02 12:34:22 tricky kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000a0 feb 02 12:34:22 tricky kernel: R13: ffff8b4a98021c38 R14: 00000000000000a0 R15: ffff8b4a996d1fc0 feb 02 12:34:22 tricky kernel: FS: 00007f2c8eee64c0(0000) GS:ffff8b58e1ec0000(0000) knlGS:0000000000000000 feb 02 12:34:22 tricky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 feb 02 12:34:22 tricky kernel: CR2: 00000000000000a0 CR3: 000000011bfa8000 CR4: 0000000000750ef0 feb 02 12:34:22 tricky kernel: PKRU: 55555554 feb 02 12:34:22 tricky kernel: Call Trace: feb 02 12:34:22 tricky kernel: <TASK> feb 02 12:34:22 tricky kernel: ? show_regs+0x72/0x90 feb 02 12:34:22 tricky kernel: ? __die+0x25/0x80 feb 02 12:34:22 tricky kernel: ? page_fault_oops+0x154/0x4c0 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? raw_spin_rq_unlock+0x10/0x40 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? finish_task_switch.isra.0+0x84/0x2b0 feb 02 12:34:22 tricky kernel: ? do_user_addr_fault+0x30e/0x6e0 feb 02 12:34:22 tricky kernel: ? exc_page_fault+0x84/0x1b0 feb 02 12:34:22 tricky kernel: ? asm_exc_page_fault+0x27/0x30 feb 02 12:34:22 tricky kernel: ? down_write+0x21/0x80 feb 02 12:34:22 tricky kernel: simple_recursive_removal+0xaa/0x2c0 feb 02 12:34:22 tricky kernel: ? __pfx_remove_one+0x10/0x10 feb 02 12:34:22 tricky kernel: debugfs_remove+0x45/0x80 feb 02 12:34:22 tricky kernel: ath11k_debugfs_remove_interface+0x1e/0x40 [ath11k] feb 02 12:34:22 tricky kernel: ath11k_mac_op_remove_interface+0x1a5/0x2f0 [ath11k] feb 02 12:34:22 tricky kernel: drv_remove_interface+0xe0/0x1a0 [mac80211] feb 02 12:34:22 tricky kernel: ieee80211_do_stop+0x651/0x960 [mac80211] feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ieee80211_stop+0x59/0x190 [mac80211] feb 02 12:34:22 tricky kernel: __dev_close_many+0x9f/0x130 feb 02 12:34:22 tricky kernel: __dev_change_flags+0xe6/0x230 feb 02 12:34:22 tricky kernel: dev_change_flags+0x26/0x80 feb 02 12:34:22 tricky kernel: do_setlink+0x2b0/0x1230 feb 02 12:34:22 tricky kernel: ? fib_table_dump+0xc0/0x3b0 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? __nla_validate_parse+0x5d/0xfc0 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? sched_clock_noinstr+0x9/0x10 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? sched_clock+0x10/0x30 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? select_task_rq_fair+0x19e/0x20b0 feb 02 12:34:22 tricky kernel: __rtnl_newlink+0x5d0/0xb10 feb 02 12:34:22 tricky kernel: rtnl_newlink+0x49/0x80 feb 02 12:34:22 tricky kernel: rtnetlink_rcv_msg+0x179/0x450 feb 02 12:34:22 tricky kernel: ? ep_autoremove_wake_function+0x12/0x40 feb 02 12:34:22 tricky kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10 feb 02 12:34:22 tricky kernel: netlink_rcv_skb+0x59/0x110 feb 02 12:34:22 tricky kernel: rtnetlink_rcv+0x15/0x30 feb 02 12:34:22 tricky kernel: netlink_unicast+0x247/0x360 feb 02 12:34:22 tricky kernel: netlink_sendmsg+0x25d/0x510 feb 02 12:34:22 tricky kernel: ? __check_object_size+0x6d/0x310 feb 02 12:34:22 tricky kernel: ____sys_sendmsg+0x3e9/0x420 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ___sys_sendmsg+0x88/0xe0 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? ttwu_queue_wakelist+0x139/0x1c0 feb 02 12:34:22 tricky kernel: ? eventfd_write+0xcf/0x1e0 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? try_to_wake_up+0x271/0x6d0 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? __fget_light+0xa0/0x150 feb 02 12:34:22 tricky kernel: __sys_sendmsg+0x69/0xd0 feb 02 12:34:22 tricky kernel: __x64_sys_sendmsg+0x1d/0x30 feb 02 12:34:22 tricky kernel: do_syscall_64+0x5c/0xf0 feb 02 12:34:22 tricky kernel: ? syscall_exit_to_user_mode+0x38/0x60 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? do_syscall_64+0x6b/0xf0 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? exit_to_user_mode_prepare+0x39/0x190 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? syscall_exit_to_user_mode+0x38/0x60 feb 02 12:34:22 tricky kernel: ? srso_alias_return_thunk+0x5/0xfbef5 feb 02 12:34:22 tricky kernel: ? do_syscall_64+0x6b/0xf0 feb 02 12:34:22 tricky kernel: ? sysvec_apic_timer_interrupt+0x4e/0xb0 feb 02 12:34:22 tricky kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 feb 02 12:34:22 tricky kernel: RIP: 0033:0x7f2c8ff2799d feb 02 12:34:22 tricky kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a 90 f6 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 ae 90 f6 ff 48 feb 02 12:34:22 tricky kernel: RSP: 002b:00007ffe9ca7f550 EFLAGS: 00000293 ORIG_RAX: 000000000000002e feb 02 12:34:22 tricky kernel: RAX: ffffffffffffffda RBX: 000000000000002f RCX: 00007f2c8ff2799d feb 02 12:34:22 tricky kernel: RDX: 0000000000000000 RSI: 00007ffe9ca7f590 RDI: 000000000000000c feb 02 12:34:22 tricky kernel: RBP: 000055c7bc7a8030 R08: 0000000000000000 R09: 0000000000000000 feb 02 12:34:22 tricky kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 feb 02 12:34:22 tricky kernel: R13: 00007ffe9ca7f6e0 R14: 00007ffe9ca7f6dc R15: 0000000000000000 feb 02 12:34:22 tricky kernel: </TASK> feb 02 12:34:22 tricky kernel: Modules linked in: xt_comment iptable_raw iptable_mangle iptable_nat bpfilter vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype vmw_vsock_vmci_transport vsock vmw_vmci ccm michael_mic snd_seq_dummy snd_hrtimer vboxnetadp(OE) vboxnetflt(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vboxdrv(OE) nf_tables nfnetlink rfcomm cmac algif_hash algif_skcipher af_alg bnep overlay btusb btrtl btintel btbcm btmtk bluetooth ecdh_generic ecc uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common qrtr_mhi mc sunrpc binfmt_misc nls_iso8859_1 sch_fq_codel intel_rapl_msr snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach intel_rapl_common edac_mce_amd snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci feb 02 12:34:22 tricky kernel: snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_sof_utils snd_soc_core snd_hda_intel joydev snd_compress snd_intel_dspcfg snd_intel_sdw_acpi ac97_bus thinkpad_acpi rapl snd_pcm_dmaengine snd_hda_codec nvram qrtr ledtrig_audio snd_hda_core snd_hwdep platform_profile snd_pci_ps snd_seq_midi ath11k_pci snd_rpl_pci_acp6x snd_seq_midi_event snd_acp_pci ath11k snd_rawmidi snd_acp_legacy_common qmi_helpers input_leds snd_seq snd_pci_acp6x mac80211 serio_raw snd_pcm think_lmi snd_seq_device snd_pci_acp5x hid_multitouch firmware_attributes_class snd_timer snd_rn_pci_acp3x cfg80211 snd_acp_config wmi_bmof snd_soc_acpi snd libarc4 k10temp snd_pci_acp3x soundcore mhi mac_hid amd_pmc kvm_amd ccp kvm irqbypass iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables pkcs8_key_parser cuse msr parport_pc ppdev lp parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_crypt amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper feb 02 12:34:22 tricky kernel: drm_ttm_helper ttm drm_display_helper crct10dif_pclmul crc32_pclmul polyval_clmulni hid_generic cec polyval_generic ghash_clmulni_intel rc_core i2c_hid_acpi sha256_ssse3 sha1_ssse3 i2c_hid nvme drm_kms_helper ucsi_acpi hid r8169 typec_ucsi xhci_pci video psmouse thunderbolt i2c_piix4 nvme_core xhci_pci_renesas realtek typec drm wmi aesni_intel crypto_simd cryptd feb 02 12:34:22 tricky kernel: CR2: 00000000000000a0 feb 02 12:34:22 tricky kernel: ---[ end trace 0000000000000000 ]--- feb 02 12:34:22 tricky kernel: RIP: 0010:down_write+0x21/0x80 feb 02 12:34:22 tricky kernel: Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 bd c2 ff ff 65 ff 05 2e f2 b1 5f 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 29 65 48 8b 04 25 c0 2b 03 00 49 89 44 24 08 feb 02 12:34:22 tricky kernel: RSP: 0018:ffffa76a065934a0 EFLAGS: 00010246 feb 02 12:34:22 tricky kernel: RAX: 0000000000000000 RBX: ffff8b4a98022068 RCX: 0000000000000000 feb 02 12:34:22 tricky kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 feb 02 12:34:22 tricky kernel: RBP: ffffa76a065934a8 R08: ffff8b4a9c9bd498 R09: 0000000000000000 feb 02 12:34:22 tricky kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000a0 feb 02 12:34:22 tricky kernel: R13: ffff8b4a98021c38 R14: 00000000000000a0 R15: ffff8b4a996d1fc0 feb 02 12:34:22 tricky kernel: FS: 00007f2c8eee64c0(0000) GS:ffff8b58e1ec0000(0000) knlGS:0000000000000000 feb 02 12:34:22 tricky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 feb 02 12:34:22 tricky kernel: CR2: 00000000000000a0 CR3: 000000011bfa8000 CR4: 0000000000750ef0 feb 02 12:34:22 tricky kernel: PKRU: 55555554 > I've tried new firmware too, but still failing for me with 6.7.1 (it works on > 6.6.11 though as the older ones used to): > > feb 02 12:34:22 tricky kernel: BUG: kernel NULL pointer dereference, address: This is a different problem, see bug #218364. *** Bug 218209 has been marked as a duplicate of this bug. *** I finally found this thread after suddenly having lots of issues with my Thinkpad T16 AMD. Here is the issue I had: Scenario 1: Closing the lid, the thinkpad led keeps going for ~30 seconds, after that it starts blinking. Meaning standby was delayed? I guess it has to do with the wakeups from the wifi module. Scenario 2: Closing the lid, the led keeps going, but this time forever. Opening up the lid again shows the monitor is off. The laptop seems completely frozen, although the led shows its not actually in standby. I guess it got woken up by the wifi module immediately, and before the kernel could wake up everything (e.g. amdgpu), it crashed. Unfortunately I am unable to get any logs from this state. The disk probably did not wake up before the crash. I kind of started to suspect that my suspend issues were wifi related, after I discovered, that my laptop does not freeze any more, if I keep pinging it over the network when connected to wifi, while trying to suspend / resume. I tired out firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37, which seems to fix the Scenario 1 completely, and Scenario 2 partially. Also testing with amd_s2idle.py shows there are no spurious wakeups any more. The laptop always goes to sleep immediately according to its led. The remaining issue I am now facing is, that sometimes when waking up (led stops blinking and is on again), I have a frozen system again. I only tested it now for ~5 times, it happened 4 out of 5 times so pretty consistently. I guess the freeze really is resume related, and probably a totally different bug? (In reply to Roland Ruckerbauer from comment #51) > I finally found this thread after suddenly having lots of issues with my > Thinkpad T16 AMD. > > Here is the issue I had: > > Scenario 1: > Closing the lid, the thinkpad led keeps going for ~30 seconds, after that it > starts blinking. Meaning standby was delayed? I guess it has to do with the > wakeups from the wifi module. > > Scenario 2: > Closing the lid, the led keeps going, but this time forever. Opening up the > lid again shows the monitor is off. The laptop seems completely frozen, > although the led shows its not actually in standby. > I guess it got woken up by the wifi module immediately, and before the > kernel could wake up everything (e.g. amdgpu), it crashed. Unfortunately I > am unable to get any logs from this state. The disk probably did not wake up > before the crash. > > > I kind of started to suspect that my suspend issues were wifi related, after > I discovered, that my laptop does not freeze any more, if I keep pinging it > over the network when connected to wifi, while trying to suspend / resume. > > I tired out firmware > WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37, which seems to > fix the Scenario 1 completely, and Scenario 2 partially. Also testing with > amd_s2idle.py shows there are no spurious wakeups any more. The laptop > always goes to sleep immediately according to its led. > > The remaining issue I am now facing is, that sometimes when waking up (led > stops blinking and is on again), I have a frozen system again. I only tested > it now for ~5 times, it happened 4 out of 5 times so pretty consistently. I > guess the freeze really is resume related, and probably a totally different > bug? I forgot to mention: The freezing of the system happens very often, when suspend / resume is triggered by me opening and closing the lid. I`d say it reproduces 80% of the time. When standby is triggered from linux with systemctl suspend it almost never happens. With the power button of the thinkpad, its maybe a bit between the two other triggers? This is really strange, I have no idea why the different triggers should make a difference. Maybe this can be a clue, like maybe a race between turning on/off the screen / gpu and s2idle? > I guess the freeze really is resume related, and probably a totally > different bug? Please file a new bug report for your scenario 2. See also: https://wireless.wiki.kernel.org/en/users/drivers/ath11k/bugreport > Maybe this can be a clue, like maybe a race between turning on/off the screen > / gpu and s2idle? That's exactly right. There's another regression in the GPU driver. It's discussed in https://gitlab.freedesktop.org/drm/amd/-/issues/3132 > So I see there were three people saying that the issue is fixed now. Now my next question is are there still people who see still see this firmare issue with firmware WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37? Overwhelmingly this firmware seems to improve things for people. I suggest closing this bug now. (In reply to Mario Limonciello (AMD) from comment #54) > Overwhelmingly this firmware seems to improve things for people. I suggest > closing this bug now. Agreed, I'll close this finally. Thank you everyone for helping fix this! If someone still has suspend problems please file a new report. https://bugzilla.redhat.com/show_bug.cgi?id=2262577#c20 I tested firmware 37 with kernel-6.7.4 and kernel-6.8.0-rc3. I believe the suspend problems are not fixed. Details here. Suspend works with zero problems with firmware 23 but not 37. disregard the previous post - kernel-6.6.14 has no suspend problems with any firmware version - kernel-6.7.4 has suspend problems with firmware 23 and 37 - kernel-6.8.0-rc3 has suspend problems with firmware 23 and 37 - kernel-6.7.4 can deadlock the kernel while reloading the ath11k_pci driver after a successful resume - kernel-6.8.0-rc3 can deadlock the kernel while reloading the ath11_pci driver after a successful resume I believe you're exposed to a separate GPU suspend resume bug that happened in kernel 6.7. please reference the above link. With firmware 37 (kernel 6.7.1, haven't tried newer ones yet) I get working manual suspend without toggling off wifi (i.e. via the gnome suspend button) but lid suspend hangs (unless i disable wifi beforehand, then lid suspend also works). To unhang the computer with lid suspend, I have to hold the power button for a few seconds which makes it suspend for real, and then I can resume it and get a desktop again. Thinkpad P14S gen4 AMD. Neither Firmware 37 nor 16 worked on my machine. (Linux AprilGrimoire-HONOR-laptop 6.6.27-1-lts #1 SMP PREEMPT_DYNAMIC Sat, 13 Apr 2024 11:50:59 +0000 x86_64 GNU/Linux ) Network controller: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter (rev 01) You should open a new ticket to dig into specifics for your machine and situation. |