Bug 214091
Summary: | Failed to do s2idle on AMD Cezanne platform | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | KaiChuan-Hsieh (kaichuan.hsieh) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | RESOLVED CODE_FIX | ||
Severity: | blocking | CC: | fabriziobertocci, koba.ko, mario.limonciello, vicamo, yihunglin |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.14-rc6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel log with capture after device hang
5.14-rc7 s2idle hang kernel log 5.14-rc7 s2idle dyndbg enable acpidump of the system boot log with acpi dyndbg enabled lspci log of the system |
Please try with 5.14-rc7 or later, 5.11 doesn't have all the s2i patches. Specifically rc7 contains https://github.com/torvalds/linux/commit/4753b46e16073c3100551a61024989d50f5e4874 which may be important for this system. Created attachment 298435 [details]
5.14-rc7 s2idle hang kernel log
With 5.14-rc7 kernel, the log is fewer than 5.14-rc6.
Can see s2idle entry, then the device hang. Need to press power key to boot.
Aug 23 22:07:51 u-Inspiron-15-3525 kernel: [ 12.832308] Bluetooth: RFCOMM ver 1.11
Aug 23 22:07:51 u-Inspiron-15-3525 kernel: [ 13.009304] rfkill: input handler enabled
Aug 23 22:07:53 u-Inspiron-15-3525 kernel: [ 14.306839] rfkill: input handler disabled
Aug 23 22:09:07 u-Inspiron-15-3525 kernel: [ 88.802924] wlp2s0: deauthenticating from 24:4b:fe:25:a7:ec by local choice (Reason: 3=DEAUTH_LEAVING)
Aug 23 22:09:07 u-Inspiron-15-3525 kernel: [ 88.830493] ath10k_pci 0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xffef5e50 flags=0x0070]
Aug 23 22:09:08 u-Inspiron-15-3525 kernel: [ 89.907796] PM: suspend entry (s2idle)
I've tried to add module parameters, amd_pmc.dyndbg=+pt, but still have no significant error message. And the system hang happens too quick the journald can't even capture the log.
Do you have way to dump more error for debugging?
>Can see s2idle entry, then the device hang. Need to press power key to boot. When you say it's hanging, do you know it's actually hung, or it can't "wakeup"? The difference here is whether it's a problem going down or back up. Can you try other sources for wakeup like lid, keyboard, xhci? >Aug 23 22:09:07 u-Inspiron-15-3525 kernel: [ 88.830493] ath10k_pci >0000:02:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b >address=0xffef5e50 flags=0x0070] As ath10k card is causing a page fault right before, can you remove ath10k card from the system and see if the hang keeps happening? >Aug 23 21:48:48 u-Inspiron-15-3525 kernel: [ 0.738388] ACPI Error: Aborting >method \_SB.GPIO._EVT due to previous error (AE_NOT_EXIST) >(20210604/psparse-529) Particularly worrying is this - if ASL GPIO events have an interpreter problem, then they might not be configured properly. >Do you have way to dump more error for debugging? Can you turn on dynamic debugging for uPEP (drivers/acpi/x86/s2idle.c) to see that all those events are sent properly and which method they're using? Created attachment 298437 [details]
5.14-rc7 s2idle dyndbg enable
I try to blacklist ath10k_pci module and enable all dynamic log for s2idle.c.
drivers/acpi/x86/s2idle.c:399 [acpi]lps0_device_attach =pt "_DSM Using AMD method\012"
drivers/acpi/x86/s2idle.c:395 [acpi]lps0_device_attach =pt "_DSM UUID %s: Adjusted function mask: 0x%x\012"
drivers/acpi/x86/s2idle.c:357 [acpi]validate_dsm =pt "_DSM UUID %s rev %d function mask: 0x%x\012"
drivers/acpi/x86/s2idle.c:351 [acpi]validate_dsm =pt "_DSM UUID %s rev %d function 0 evaluation failed\012"
drivers/acpi/x86/s2idle.c:331 [acpi]acpi_sleep_run_lps0_dsm =pt "_DSM function %u evaluation %s\012"
drivers/acpi/x86/s2idle.c:301 [acpi]lpi_check_constraints =pt "LPI: required min power state:%s current power state:%s\012"
drivers/acpi/x86/s2idle.c:284 [acpi]lpi_device_get_constraints =pt "LPI: constraints list end\012"
drivers/acpi/x86/s2idle.c:276 [acpi]lpi_device_get_constraints =pt "Incomplete constraint defined\012"
drivers/acpi/x86/s2idle.c:265 [acpi]lpi_device_get_constraints =pt "uid:%d min_dstate:%s\012"
drivers/acpi/x86/s2idle.c:240 [acpi]lpi_device_get_constraints =pt "index:%d Name:%s\012"
drivers/acpi/x86/s2idle.c:201 [acpi]lpi_device_get_constraints =pt "LPI: constraints list begin:\012"
drivers/acpi/x86/s2idle.c:189 [acpi]lpi_device_get_constraints =pt "_DSM function 1 eval %s\012"
drivers/acpi/x86/s2idle.c:174 [acpi]lpi_device_get_constraints_amd =pt "LPI: constraints list end\012"
drivers/acpi/x86/s2idle.c:164 [acpi]lpi_device_get_constraints_amd =pt "Incomplete constraint defined\012"
drivers/acpi/x86/s2idle.c:158 [acpi]lpi_device_get_constraints_amd =pt "Name:%s\012"
drivers/acpi/x86/s2idle.c:119 [acpi]lpi_device_get_constraints_amd =pt "LPI: constraints list begin:\012"
drivers/acpi/x86/s2idle.c:102 [acpi]lpi_device_get_constraints_amd =pt "_DSM function 1 eval %s\012"
However, the kernel log still has no significant error.
And I try to use keyboard/touchpad to wake, but they are all failed. I don't have to long press the power key to force it shotdown, just press the power key once as usual, then the system boot after hang. It seems the suspend function leads the system goes to shutdown directly. >I try to blacklist ath10k_pci module Can you physically remove the card or is it soldered to the board? If you can please remove it physically. > and enable all dynamic log for s2idle.c. I don't see any of the output related to uPEP in your logs. Is uPEP device not in ACPI tables? Created attachment 298439 [details] acpidump of the system I try to dump the acpi log, but I didn't see the uPEP device you mentioned inside the dsdt.dsl, could indicate which file should contain it? Would you please check if the ACPI table has configured correctly for supporting s2idle. The acpi.log I attached can be retrieved by acpixtract tool introduced by http://alexhungdmz.blogspot.com/2012/05/how-to-dump-acpi-tables-in-ubuntu.html Thanks, It's present in ssdt15.dat in your attachment. I see that it should be effectively using Microsoft UUID 11e00d56-ce64-47ce-837b-1f898f9aa461 which we have support for in 5.14-rc7. I also do see that FACP does set low power idle to 1, so the function should be initializing. Can you please turn on dynamic debugging for s2idle.c at bootup? Some of those messages only happen at bootup. Lastly do you have SKU with NVME? Is failure only happening on SATA and works on NVME? Created attachment 298449 [details]
boot log with acpi dyndbg enabled
Hello,
This is the boot log with acpi dynamic debug enabled. I saw a lot of IOMMU failed. May I know if it is related to BIOS or kernel driver?
Thanks,
Created attachment 298451 [details]
lspci log of the system
My OS is installed on NVME but not SATA HDD. It can sitll reproduce the hang after entering suspend. Please check the lspci result of the system.
>My OS is installed on NVME but not SATA HDD. It can sitll reproduce the hang >after entering suspend. Please check the lspci result of the system. Sorry the internal ticket on this was indicating it was SATA. >This is the boot log with acpi dynamic debug enabled. I saw a lot of IOMMU >failed. May I know if it is related to BIOS or kernel driver? I think you'll need to open up some internal tickets to let the appropriate team dig into this. >boot log with acpi dyndbg enabled It does confirm it's using Microsoft UUID, but your debug log doesn't show a call into s2idle and the functions called to see any issues there. Are you running this test connected to battery or to AC adapter? Can you please check both? There are some community reports that issuing suspend while connected to battery without AC are having problems, and this might be the same they're seeing. If this is the same as those, it will need deeper firmware debugging on an internal ticket. Filed another bug https://bugzilla.kernel.org/show_bug.cgi?id=214365 reproduced with AMD Barcelo CRB and SATA device. Something I noted in the logs: >Nov 29 02:37:02 u-Inspiron-15-3525 kernel: [ 39.653393] amd_pmc AMDI0005:00: >SMU response timed out >Nov 29 02:37:02 u-Inspiron-15-3525 kernel: [ 39.653399] amd_pmc AMDI0005:00: >suspend failed >Nov 29 02:37:02 u-Inspiron-15-3525 kernel: [ 39.653400] PM: >dpm_run_callback(): acpi_subsys_suspend_noirq+0x0/0x50 returns -110 >Nov 29 02:37:02 u-Inspiron-15-3525 kernel: [ 39.653407] amd_pmc AMDI0005:00: >PM: failed to suspend noirq: error -110 This reminds me of another issue that was being reported and caused the timeout to be extended for amd-pmc. You might try to see if https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/commit/?h=fixes&id=3c3c8e88c8712bfe06cd10d7ca77a94a33610cd6 helps. @Mario, I tried this patch and it is not helpful. after suspend the machine, i couldn't wake up the machine. @Mario: I have a similar system with the same problem. in my case the problem occur only when running with battery power. If the laptop is connected to AC, s2idle works well. From comparing the system logs when the failure occur, it seems that my system simply fails to go to sleep, and starts the resume functionality immediately, then it hangs with a black screen. @Favrizio, Can you please open your own issue with all of the details of your system and configuration? Preferably here instead: https://gitlab.freedesktop.org/drm/amd/-/issues/ There has been a lot of movement in recent kernels and we need to look more closely at individual issues. Thanks, The original issue for this with SATA is resolved via https://github.com/torvalds/linux/commit/7c5f641a5914ce0303b06bcfcd7674ee64aeebe9 when SATA is properly configured for DEVSLP. |
Created attachment 298343 [details] kernel log with capture after device hang The AMD Cezanne platform failed to suspend when select mem_sleep to s2idle. The system will hang, and press power key to boot. Some driver error when tries to do s2idle. kernel: amd_pmc AMDI0005:00: SMU response timed out kernel: amd_pmc AMDI0005:00: suspend failed kernel: PM: dpm_run_callback(): acpi_subsys_suspend_noirq+0x0/0x50 returns -110 kernel: amd_pmc AMDI0005:00: PM: failed to suspend noirq: error -110 kernel: PM: noirq suspend of devices failed kernel: pci 0000:00:00.2: can't derive routing for PCI INT A kernel: pci 0000:00:00.2: PCI INT A: no GSI