Was trying out 6.9rc1 to give perfcore a whirl but I've noticed that when I go to s2idle I cannot wake up, the power button does not wake the laptop up again and stays flashing and requires a full reboot to become responsive again. Arch linux, linux-mainline. >>options root=/dev/nvme0n1p3 rw resume=UUID=x-x-x-x-x >>rtc_cmos.use_acpi_alarm=1 pcie_aspm.policy=powersupersave amdgpu.abmlevel=3 >>iomem=relaxed amdgpu.ppfeaturemask=0xffffffff amdgpu.sg_display=0 AMD Ryzen 5 7640U w/ Radeon 760M Graphics. Works fine on 6.8.1.
sorry, prefcore!
AMD Dev Mario Limonciello seems to think they're onto to something over at the Framework forums. https://community.frame.work/t/fyi-amd-framework-13-suspend-resume-regression-in-6-9rc1/47697/8?u=david_markey
I finished up bisecting and it comes down to this commit: ``` commit 7ee988770326fca440472200c3eb58935fe712f6 Author: Anna-Maria Behnsen <anna-maria@linutronix.de> Date: Thu Feb 22 11:37:10 2024 +0100 timers: Implement the hierarchical pull model ``` When the issue happens, the system isn't able to be woken up from the GPIO controller anymore. However, it can still be woken up from programming a wakeup timer. Here is a snippet of the logs when it occurs: ``` Mar 25 21:30:14 fedora kernel: PM: suspend-to-idle Mar 25 21:30:14 fedora kernel: ACPI: EC: ACPI EC GPE status set Mar 25 21:30:14 fedora kernel: ACPI: PM: Rearming ACPI SCI for wakeup Mar 25 21:30:14 fedora kernel: amd_pmc: SMU idlemask s0i3: 0x8fff9eb5 Mar 25 21:30:14 fedora kernel: Timekeeping suspended for 2.303 seconds Mar 25 21:30:14 fedora kernel: PM: Triggering wakeup from IRQ 7 Mar 25 21:30:14 fedora kernel: PM: Triggering wakeup from IRQ 9 Mar 25 21:30:14 fedora kernel: ACPI: EC: ACPI EC GPE status set Mar 25 21:30:14 fedora kernel: ACPI: EC: ACPI EC GPE dispatched Mar 25 21:30:14 fedora kernel: ACPI: EC: ACPI EC work flushed Mar 25 21:30:14 fedora kernel: ACPI: PM: Wakeup after ACPI Notify sync Mar 25 21:30:14 fedora kernel: PM: resume from suspend-to-idle Mar 25 21:30:14 fedora kernel: PM: Triggering wakeup from IRQ 0 Mar 25 21:30:14 fedora kernel: ACPI Error: Aborting method \_SB.PCI0.LPC0.EC0.EMBF due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) Mar 25 21:30:14 fedora kernel: ACPI Error: Aborting method \_SB.PEP._DSM due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) Mar 25 21:30:14 fedora kernel: ACPI: \_SB_.PEP_: failed to evaluate _DSM 560de011-64ce-ce47-837b-1f898f9aa461 (0x3021) Mar 25 21:30:14 fedora kernel: ACPI: \_SB_.PEP_: Failed to transitioned to state lps0 exit Mar 25 21:30:14 fedora kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 8010624773 wd_nsec: 8010615594 Mar 25 21:30:14 fedora kernel: ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 ms exit Mar 25 21:30:14 fedora kernel: ACPI: \_SB_.PEP_: Successfully transitioned to state screen on Mar 25 21:30:14 fedora kernel: ACPI: EC: interrupt unblocked Mar 25 21:30:14 fedora kernel: PM: noirq resume of devices complete after 64.225 msecs Mar 25 21:30:14 fedora kernel: GPIO 0 is active: 0x3805780f Mar 25 21:30:14 fedora kernel: GPIO 8 is active: 0x18147b00 ``` Conversely here's a snippet of logs from 7ee988770326fca440472200c3eb58935fe712f6~1: ``` Mar 25 21:20:34 fedora kernel: PM: suspend-to-idle Mar 25 21:20:34 fedora kernel: ACPI: EC: ACPI EC GPE status set Mar 25 21:20:34 fedora kernel: ACPI: PM: Rearming ACPI SCI for wakeup Mar 25 21:20:34 fedora kernel: amd_pmc: SMU idlemask s0i3: 0x8fff9eb5 Mar 25 21:20:34 fedora kernel: Timekeeping suspended for 1.835 seconds Mar 25 21:20:34 fedora kernel: PM: Triggering wakeup from IRQ 9 Mar 25 21:20:34 fedora kernel: PM: Triggering wakeup from IRQ 7 Mar 25 21:20:34 fedora kernel: ACPI: EC: ACPI EC GPE status set Mar 25 21:20:34 fedora kernel: ACPI: EC: ACPI EC GPE dispatched Mar 25 21:20:34 fedora kernel: ACPI: EC: ACPI EC work flushed Mar 25 21:20:34 fedora kernel: ACPI: PM: Wakeup after ACPI Notify sync Mar 25 21:20:34 fedora kernel: PM: resume from suspend-to-idle Mar 25 21:20:34 fedora kernel: PM: Triggering wakeup from IRQ 0 Mar 25 21:20:34 fedora kernel: ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 exit Mar 25 21:20:34 fedora kernel: ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 ms exit Mar 25 21:20:34 fedora kernel: ACPI: \_SB_.PEP_: Successfully transitioned to state screen on Mar 25 21:20:34 fedora kernel: ACPI: EC: interrupt unblocked Mar 25 21:20:34 fedora kernel: PM: noirq resume of devices complete after 64.161 msecs Mar 25 21:20:34 fedora kernel: GPIO 0 is active: 0x3805780f ```
Created attachment 306042 [details] revert against 6.9-rc1 Reverting the commit I posted above against 6.9-rc1 fixes it for me. Can you see if that works for you too?
Thanks for the Bug report! I have access to a similar machine where this behavior is reproducible. So I should be able to investigate it the next days. If I have an update, I'll come back to you. Thanks
Created attachment 306077 [details] Patch which adds trace printks to locate the timeout
Created attachment 306078 [details] script to enable all required trace options
Sorry for the late reply - Easter vacation... I was able to reproduce it. But in contrast to the above, the laptop always returned from suspend after quite a time (~50 sec). This behavior was seen when pressing the power button and also using a wake timer. During the tests, I also had two (accidental) working resume from suspend when programming a wake timer. I was also able to track it down where this timeout occurs. At the moment if have no access to the machine which I used last week, so I would be happy to get some help to investigate it further: - Could you please wait some more time to see if your laptop resumes from suspend as well after some time? - Could you please apply the patch which adds trace printks to locate the timeout and run a script (which enables trace points and function tracers) before you send the system into suspend? When it comes back from suspend (doesn't matter whether it is a wakeup timer or pressing the butting), please cat the trace into a file and share it with me.
I've applied your patch and turned on tracing before suspend. Here is the trace from /sys/kernel/trace captured on a Framework 13 running 6.9-rc2. https://drive.google.com/file/d/1jb6_lTyUPTCl_1ii_fQWOzxFHA_p3U_h/view?usp=sharing It was too big to put on a gist or attach here. I'd like to point out a few other observations. 1. I don't believe the notification made it over to the EC to exit suspend because the LED is still blinking and the keyboard doesn't work properly. I had to get this stuff from SSH. 2. Here is the kernel log matching the wakeup this trace was captured after. [ 67.372518] PM: suspend-to-idle [ 67.393190] amd_pmc: SMU idlemask s0i3: 0x3ffbbebd [ 69.680668] Timekeeping suspended for 0.881 seconds [ 69.680715] PM: Triggering wakeup from IRQ 1 [ 69.680721] PM: Triggering wakeup from IRQ 9 [ 69.680840] ACPI: EC: ACPI EC GPE status set [ 69.680907] ACPI: EC: ACPI EC GPE dispatched [ 72.008808] PM: Triggering wakeup from IRQ 0 [ 102.488319] ACPI Error: Aborting method \_SB.PCI0.LPC0.EC0.SIQ1 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) [ 102.488348] ACPI Error: Aborting method \_SB.PCI0.LPC0.EC0._Q0D due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) [ 102.488450] ACPI: EC: ACPI EC work flushed [ 102.488463] ACPI: PM: Wakeup after ACPI Notify sync [ 102.488465] PM: resume from suspend-to-idle [ 102.490133] ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 exit [ 134.487330] ACPI Error: Aborting method \_SB.PCI0.LPC0.EC0.SIQ1 due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) [ 134.487360] ACPI Error: Aborting method \_SB.PEP._DSM due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) [ 134.487395] ACPI: \_SB_.PEP_: failed to evaluate _DSM 560de011-64ce-ce47-837b-1f898f9aa461 (0x3021) [ 134.487403] ACPI: \_SB_.PEP_: Failed to transitioned to state lps0 ms exit [ 134.488319] ACPI: \_SB_.PEP_: Successfully transitioned to state screen on [ 134.488645] ACPI: EC: interrupt unblocked [ 134.489049] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 64629783981 wd_nsec: 64629428453 [ 134.701727] PM: noirq resume of devices complete after 213.402 msecs [ 134.701790] GPIO 8 is active: 0x18147b00 [ 134.701805] PM: Triggering wakeup from IRQ 0 [ 134.776787] PM: early resume of devices complete after 2.452 msecs [ 134.777893] GPIO 8 is active: 0x18141b00 [ 134.779407] GPIO 8 is active: 0x18141b00
Github is truncating, here is a second captured trace. https://drive.google.com/file/d/1LnfWJ6FaKtZ_xu6kDDttTra6LbBvKCQa/view?usp=sharing
Created attachment 306085 [details] Fix - please try Thanks for the trace. Can you please test this patch (should work on top of linus tree) and see, whether suspend/resume works again? I will write a detailed explanation for the 'why' later on...
Thank you for the Patch! Tested it on Framework13 AMD and suspend / resume is now working as it should...
Works for me too on F13 AMD, BIOS 3.05. Thanks! Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Here you can find the patch (with detailed explanation): https://lore.kernel.org/r/20240405083410.4896-1-anna-maria@linutronix.de/
Hey, sorry for the delay. I tested the patch against 6.9 rc3, The laptop wakes up but the screen remains Frozen and I need to hard reboot.
I didn't test against rc3, I tested against rc2. Can you see if it works on rc2 for you? If so there might be a separate regression in rc3.
(In reply to David Markey from comment #17) > Hey, sorry for the delay. I tested the patch against 6.9 rc3, The laptop > wakes up but the screen remains Frozen and I need to hard reboot. I can not confirm this behavior. Applied the patch against 6.9rc3 five days ago. I've suspended / resumed my machine at least 30 times since and had no problems with it. Suspend / Resume works as intended.
So to be sure I tried building rc2 and ruling out another patch related to a separate hibernation issue, but this new issue s2idle is still present. Suspend, you can wake up, but screen immediately freezes. I guess the only only thing is I'm on the latest BIOS from Framework which Mario mightn't be on.
I am on the latest BIOS as well. I suggest running the s2idle script and see if you get any other hints of what's going on if it dumps on resume before you freeze.
https://gist.github.com/dmarkey/ceef070ded9635828499fd1694e3b39f Here is an s2idle script report.
The only thing that stands out different to me in that report is mine doesn't have Intel WiFi and I have been testing in balanced mode.
Hmm tried quickly there with iwlmvm and iwlmvm unloaded, same result, and balanced mode selected. There are 2 FW 13 AMD models though, one with more cores, mine is the 12 thread model.. could there be a slight discrepancy there?
* and iwlwifi unloaded.
David, I'm wondering if it might be a good idea to rule out other problems on your machine by testing these two things: * retry if 7ee988770326fca440472200c3eb58935fe712f6^ (the parent of the culprit) really works * checkout 7ee988770326fca440472200c3eb58935fe712f6 and applying the fix on top of it to see if it fixes this problem.
Created attachment 306159 [details] 6.9-rc4 report showing it working I did double check 6.9-rc4 with no patches applied. Works for me now.
David, if you really still have some problem, it's a different one -- hence please open a new bug and afterwards drop a link here, as things otherwise quickly can become confusing.
Alright, I'll try to isolate this new problem and maybe raise a new bug. Thanks.
So this seems to be that my disk disappears on resume, opened a new bug here: https://bugzilla.kernel.org/show_bug.cgi?id=218736 Thanks for addressing this one, all.