Bug 216516

Summary: s2ram freezes screen (Ryzen-5650U incl. Radeon GPU)
Product: Drivers Reporter: kolAflash (kolAflash)
Component: Platform_x86Assignee: drivers_platform_x86 (drivers_platform_x86)
Status: RESOLVED DOCUMENTED    
Severity: normal CC: mario.limonciello
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.0-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel log: v5.18-rc7 and v5.18-rc7-1-g7123d39dc24d
kernel log: v6.0-rc6 and v6.0-rc6 with 7123d39dc reverted
kernel log for s2idle: v6.0-rc6 and v6.0-rc6 with 7123d39dc reverted
kernel log for s2idle: v6.0-rc6 with CONFIG_AMD_PMC enabled
kernel log for s2idle: v6.0-rc6 with debug options
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #14 patches
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #14 patches - 23 minutes of s2idle
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #25 patches
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #25 patches - 30 minutes of s2idle
kernel log for s2idle with ec_no_wakeup: v6.0-rc7 with bug 216516 comment #25 patches
HP EliteBook 845 G8: dmidecode and lshw
kernel log for s2both with s2idle (hybrid sleep): v6.0

Description kolAflash 2022-09-23 02:05:01 UTC
Beginning with this commit (regression) s2ram freezes the screen and I have to restart the XServer.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7123d39dc24dcd21ff23d75f46f926b15269b9da
git describe: 5.18-rc7-1-g7123d39dc24d
s2ram => systemctl start suspend.target

Before v5.18-rc7 the system sometimes stays awake instead of entering S3 sleep. But in that case the system keeps running stable and the screen doesn't freeze. And on the second try the system then successfully enters S3. You can see this behaviour in the attached linux-v5.18-rc7.txt for 5.18.0-rc7-v5.18-rc7.
This "stays awake" behaviour is probably a bug on it's own.
(I may may open another bug report when this one is solved)

Later on in the attached linux-v5.18-rc7.txt you'll see how 5.18.0-rc7-v5.18-rc7-1-g7123d39dc24d behaves when the screen is getting frozen.


Reverting commit 7123d39dc on v5.19.10 or v6.0-rc7 makes the problem disappear.


= System =
Model: HP EliteBook 845 G8 (notebook)
CPU+GPU: Ryzen 5650U incl. Radeon GPU
OS: openSUSE-15.4

The HP EliteBook 845 G8 uses s0ix/s2idle by default. But I changed the ACPI BIOS settings to enable classic S3 s2ram.
Comment 1 kolAflash 2022-09-23 02:15:05 UTC
Created attachment 301849 [details]
kernel log: v5.18-rc7 and v5.18-rc7-1-g7123d39dc24d
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-09-23 07:23:50 UTC
Mario, what's your opinion on this? 

@kolAflash, FWIW, this in not my area of expertise, but this one AFAICS is tricky, as 7123d39dc24d is a fix for a regression; one that iirc bothered a few people. So we can't simply revert it, as that would cause a regressions for other people. And I wonder if the real problem here is the firmware, maybe its s2ram codepaths are the root of the problem here.
Comment 3 kolAflash 2022-09-23 09:16:37 UTC
@Thorsten
Sure I understand that simply reverting is not an option.
Is there anything I can do to track this down further?
Maybe enabling some debug log options to get more informative output?

Or is there a kernel/driver option I can set so workaround this behavior, so I don't have to patch my kernel?


I'm using the firmware (/lib/firmware/amdgpu/) as provided by the openSUSE Linux distro.
https://download.opensuse.org/update/leap/15.4/sle/noarch/kernel-firmware-amdgpu-20220509-150400.4.8.1.noarch.rpm
Comment 4 Mario Limonciello (AMD) 2022-09-23 12:13:03 UTC
> This "stays awake" behaviour is probably a bug on it's own.
> (I may may open another bug report when this one is solved)

This is actually the root of the problem I suspect. This might actually be fixed by b8c824a869f220c6b46df724f85794349bafbf23.
 
> Mario, what's your opinion on this? 

So to further complicate things 7123d39dc24d went back to stable and is in 5.15.y and such.

I think we need to better understand the failure mode and actions associated.  The workaround in daf8de0874ab5b specifically helped with "aborted suspends".  7123d39dc24d fixed the behavior that regression that happened on more systems as a result of that.

> Is there anything I can do to track this down further?

Here's a few things that would at least help me understand the situation:
1) Does this still exist in a fully unpatched 6.0-rc6? What's the behavior?
2) What bisect points did you use to conclude that this problem on that particular commit?  By chance do you have the bisect log?
3) Why don't you just use s2idle instead in BIOS setup?  It's very mature on a modern kernel. Most systems with AMD APUs shipped from OEMs don't offer a toggle even, which is why I'm surprised you have one.

BTW - I wasn't aware that they offered a BIOS setting for s2ram on 845.  Are you patching the BIOS somehow or just changing a setting?  If you change it back to s2idle does all this go away?
Comment 5 kolAflash 2022-09-23 15:46:11 UTC
Created attachment 301854 [details]
kernel log: v6.0-rc6 and v6.0-rc6 with 7123d39dc reverted

@Mario


(In reply to Mario Limonciello (AMD) from comment #4)
> [...]
> 1) Does this still exist in a fully unpatched 6.0-rc6? What's the behavior?

Yes.
That's why I've set the bugs kernel version to 6.0-rc6 ;-)

I've attached kernel logs for v6.0-rc6 and for v6.0-rc6 with 7123d39dc reverted.
Reverting 7123d39dc on v6.0-rc6 makes the bug disappear.



> 2) What bisect points did you use to conclude that this problem on that
> particular commit?  By chance do you have the bisect log?

Sorry, no bisect log saved.
But I've explicitly tested 7123d39dc and it's predecessor 7123d39dc^, which is v5.18-rc7.
And 7123d39dc is definitely broken and v5.18-rc7 is working fine.



> 3) Why don't you just use s2idle instead in BIOS setup?  It's very mature on
> a modern kernel. Most systems with AMD APUs shipped from OEMs don't offer a
> toggle even, which is why I'm surprised you have one.

Tested s2idle with the openSUSE-15.4 stock kernel v5.14 and it consumed MASSIVELY more power than S3.
I may give s2idle it another try with linux-6.0.

QUESTION:
Is "echo s2idle > /sys/power/mem_sleep" equivalent to enabling s2idle via ACPI?
So I wouldn't need to reboot to change ACPI for testing s2idle.

I also thought about using s2disk. But my employer requires me to use a very long LUKS password. So I prefer not to enter the LUKS password every time I wake my notebook.



> BTW - I wasn't aware that they offered a BIOS setting for s2ram on 845.  Are
> you patching the BIOS somehow or just changing a setting?  If you change it
> back to s2idle does all this go away?

Actually S3 is very well hidden. I don't know a way to enable it directly in the BIOS. But it's in the ACPI tables.

cat /sys/firmware/acpi/tables/DSDT > dsdt.dat
iasl -d dsdt.dat
==========
[...]
    If (!MSCE)
    {
        Name (_S3, Package (0x04)  // _S3_: S3 System State
        {
            0x03, 
            0x03, 
            Zero, 
            Zero
        })
    }
[...]
==========
I changed that "if" statement, recompiled and put the ACPI table into the initrd for the kernel to load. 
See here for details:
https://www.reddit.com/r/AMDLaptops/comments/lg0knl/comment/ikpwpb8/

Side note:
If you have an idea how to set that MSCE variable to false without recompiling the kernel, I'd be quite happy :-)
Comment 6 Mario Limonciello (AMD) 2022-09-23 15:52:31 UTC
> Tested s2idle with the openSUSE-15.4 stock kernel v5.14 and it consumed
> MASSIVELY more power than S3.

TONS and TONS of fixes have landed for s2idle since then.  5.15.y is the first kernel that you should be confident to use it.

Please re-test with s2idle using 6.0-rc6.  If that doesn't work let's fix it.

> Is "echo s2idle > /sys/power/mem_sleep" equivalent to enabling s2idle via
> ACPI?
So I wouldn't need to reboot to change ACPI for testing s2idle.

No, on AMD systems there are Pre-OS changes that occur when you use modern standby/s2idle.  If you use s2idle when the firmware is configured for S3 you will never get into the proper power state.


> Actually S3 is very well hidden. I don't know a way to enable it directly in
> the BIOS. But it's in the ACPI tables.

As you're patching the firmware this is not a regression.  It takes more to properly enable S3 in Pre-OS code than changing a variable evaluated by the Linux kernel's ACPI interpreter..

Thorsten - this is not a regression, patching the BIOS like this is completely off the beating path.  We can't possibly support S3 in Linux if the OEM doesn't offer it in their BIOS.

kolAflash - Let's focus on fixing s2idle for your system.  With 6.0-rc6 and an "unpatched" BIOS let's see if you have any remaining problems with it.
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-09-23 16:07:00 UTC
(In reply to Mario Limonciello (AMD) from comment #6)
>
> Thorsten - this is not a regression, 

don't worry, I already had feared something like this might be the case; I for a moment had even considered asking about the S3 support status myself in my initial comment, but then settled on "stay out of this, Mario will known better what to do" :-D
Comment 8 kolAflash 2022-09-23 18:33:30 UTC
Created attachment 301855 [details]
kernel log for s2idle: v6.0-rc6 and v6.0-rc6 with 7123d39dc reverted

@Mario

Sadly I've to report, that the problem stays the same with s2idle using v6.0-rc6.
And reverting 7123d39dc also makes the bug disappear when using s2idle.
Find the logs attached.

As you said, I've reverted my ACPI modifications and /sys/power/mem_sleep now only contains s2idle.
Comment 9 Mario Limonciello (AMD) 2022-09-23 18:41:50 UTC
You're missing CONFIG_AMD_PMC.

/var/log/messages:2022-09-23T20:19:35.187256+02:00 myhost kernel: [   85.619055][ T3150] amdgpu 0000:05:00.0: amdgpu: Power consumption will be higher as the kernel has not been compiled with CONFIG_AMD_PMC.
Comment 10 kolAflash 2022-09-25 09:29:40 UTC
Created attachment 301870 [details]
kernel log for s2idle: v6.0-rc6 with CONFIG_AMD_PMC enabled

(In reply to Mario Limonciello (AMD) from comment #9)
> You're missing CONFIG_AMD_PMC.

Compiled with CONFIG_AMD_PMC.

Going to s2idle two times.
(first time just for a few seconds)


The good news:
With CONFIG_AMD_PMC in s2idle the screen doesn't freeze.

The bad news:
After less than 8 hours in s2idle the battery went from 60 % to 0 % !!!
That's why there's no second wakeup.
(system was down because battery was drained)

With S3 the battery will loose less than 5 % charge in the same time.
And the notebook is brand new. So there should be no battery degradation.


... just did another s2idle test and battery went from 67 % to 58 % in one hour.



(In reply to Mario Limonciello (AMD) from comment #6)
> [...]
> As you're patching the firmware this is not a regression.  It takes more to
> properly enable S3 in Pre-OS code than changing a variable evaluated by the
> Linux kernel's ACPI interpreter..
> 
> Thorsten - this is not a regression, patching the BIOS like this is
> completely off the beating path.  We can't possibly support S3 in Linux if
> the OEM doesn't offer it in their BIOS.

I think you exaggerate a little.
I'm just enabling existing OEM code!

Also I wouldn't say this is patching the firmware, because it's just the ACPI data, not the BIOS software.
Just changing the existing "if" statement.
See: comment #5
I'm pretty sure there's a way to enable this OEM code without recompiling the ACPI data. Probably by setting the MSCE variable with a hidden BIOS option or something else I haven't found yet.
Comment 11 kolAflash 2022-09-25 09:50:00 UTC
P.S.

I think using S3 is somehow intended by the OEM for the HP EliteBook 845 G8. I consider the S3 option just being hidden. You just need to know how to change that MSCE variable without changing the ACPI data.

In general I'd say that s2idle is absolutely fine. s2idle just should be more energy efficient.
S3 is using less than 1 % battery per hour. So if there's a way to get s2idle down to 2 % per hour that would be a good solution.

Until linux-5.18-rc7 S3 is working very well. So I thought this would be the best solution for an energy efficient standby mode on this hardware.
Comment 12 Mario Limonciello (AMD) 2022-09-25 23:02:42 UTC
> Compiled with CONFIG_AMD_PMC.
> Going to s2idle two times.
> (first time just for a few seconds)

It looks better to me now.

> After less than 8 hours in s2idle the battery went from 60 % to 0 % !!!
> That's why there's no second wakeup.
> (system was down because battery was drained)

I suspect one of two things happened:
1) A spurious event actually woke up the SOC the second time but it never re-entered.
2) You entered back to back and didn't get into the deepest state due to a timing issue.

I know of a potential patch I can offer you for this, but we need to confirm the root cause before I write it up.

> With S3 the battery will loose less than 5 % charge in the same time.
> And the notebook is brand new. So there should be no battery degradation.
> ... just did another s2idle test and battery went from 67 % to 58 % in one
> hour.

It's up to the OEM's design how quickly it will lose battery over Modern Standby/s2idle, it has a lot of factors involved.  I'll describe some additional debugging tactics later in this response.

> I think you exaggerate a little.
> I'm just enabling existing OEM code!

> Also I wouldn't say this is patching the firmware, because it's just the ACPI 
> data, not the BIOS software.
> Just changing the existing "if" statement.
> See: comment #5
> I'm pretty sure there's a way to enable this OEM code without recompiling the 
> ACPI data. Probably by setting the MSCE variable with a hidden BIOS option or
> something else I haven't found yet.

What you're seeing in there is a remnant of AGESA reference code.  I see the same thing in AMD's reference designs and in other OEM designs.

There is definitely OEM effort involved with enabling S3, and like I said there are lots of other firmware components involved that are invisible to the OS. 

The other thing I want to point out - if they use Modern Standby in Windows, this is the path that has gotten lots of validation.  A non-validated path is likely to have unique bugs.

> s2idle just should be more energy efficient.
> S3 is using less than 1 % battery per hour. So if there's a way to get s2idle 
> down to 2 % per hour that would be a good solution.

For most mobile designs the power consumption is better for s2idle than S3.
I suspect you're hitting a bug with wakeups, or s2idle re-entries.

For the next step in debugging, please do all of the following:
0) 6.0-rc6 kernel
1) Add to your kernel command line:
acpi.dyndbg='file drivers/acpi/x86/s2idle.c +p' amd_pmc.dyndbg='+p'
2) set /sys/power/pm_print_times to 1.
3) set /sys/power/pm_debug_messages to 1 

With all those debugging things put in place, run a sample where you put it to sleep for a short period of time (say 10 minutes), and then wakeup and share the log.  Please note how you woke up the system.  Hopefully based upon that log it will be clearer what is causing your increased power consumption.
Comment 13 kolAflash 2022-09-26 18:33:14 UTC
Created attachment 301872 [details]
kernel log for s2idle: v6.0-rc6 with debug options

@Mario

(In reply to Mario Limonciello (AMD) from comment #12)
> [...]
> For the next step in debugging, please do all of the following:
> 0) 6.0-rc6 kernel
> 1) Add to your kernel command line:
> acpi.dyndbg='file drivers/acpi/x86/s2idle.c +p' amd_pmc.dyndbg='+p'
> 2) set /sys/power/pm_print_times to 1.
> 3) set /sys/power/pm_debug_messages to 1
> [...]

Thanks!

Here's the requested log.


I woke the notebook by pressing the power button.

Before standby I'm always doing:
echo disabled > /sys/devices/platform/i8042/serio0/power/wakeup
echo disabled > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/power/wakeup
echo XHC1 > /proc/acpi/wakeup
Else the notebook wakes on every keyboard, mouse, lid open and even lid close action.


Can you tell if the power consumption is completely up to the kernel?
Or should I have a look at the userspace too?
Comment 14 Mario Limonciello (AMD) 2022-09-26 18:49:33 UTC
> Here's the requested log.

OK let me pull out a few pieces and explain what they mean.

> 2022-09-26T19:01:47.104247+02:00 myhost kernel: [ 2218.148069][    T0]
> Timekeeping suspended for 341.998 seconds
> 2022-09-26T19:01:47.104247+02:00 myhost kernel: [ 2218.148164][ T5798] ACPI:
> EC: ACPI EC GPE status set
> 2022-09-26T19:01:47.104247+02:00 myhost kernel: [ 2218.148190][ T5798] ACPI:
> EC: ACPI EC GPE dispatched
> 2022-09-26T19:01:47.104248+02:00 myhost kernel: [ 2218.148541][ T5798] ACPI:
> EC: ACPI EC work flushed
> 2022-09-26T19:01:47.104248+02:00 myhost kernel: [ 2218.149018][ T5798] ACPI:
> PM: Rearming ACPI SCI for wakeup
> 2022-09-26T19:01:47.104249+02:00 myhost kernel: [ 2568.100347][    T0]
> Timekeeping suspended for 349.998 seconds
> 2022-09-26T19:01:47.104249+02:00 myhost kernel: [ 2568.100442][ T5798] ACPI:
> EC: ACPI EC GPE status set
> 2022-09-26T19:01:47.104250+02:00 myhost kernel: [ 2568.100468][ T5798] ACPI:
> EC: ACPI EC GPE dispatched
> 2022-09-26T19:01:47.104250+02:00 myhost kernel: [ 2568.100817][ T5798] ACPI:
> EC: ACPI EC work flushed
> 2022-09-26T19:01:47.104251+02:00 myhost kernel: [ 2568.101326][ T5798] ACPI:
> PM: Rearming ACPI SCI for wakeup

The SOC was asleep for some number of seconds, and then it got woken up from the EC GPE.  It may or may not have gone back to the deepest state during each of these attempts (it's unclear with the way logging works in the kernel).

During the duration of these attempts you can see this happened a LOT of times.

As an educated guess; it might have been that the EC was trying to update new battery life values or it can be illustrative of another bug.  The easiest way to determine the cause of these GPEs do the following.

1) Collect the output of 
# grep -v foo /sys/firmware/acpi/interrupts/*
2) Run your suspend
3) Capture again the output of 
# grep -v foo /sys/firmware/acpi/interrupts/*
4) Compare the two to see which incremented

> 2022-09-26T19:01:47.104625+02:00 myhost kernel: [ 4600.944320][ T5798]
> amd_pmc AMDI0005:00: Last suspend in deepest state for 16419799us

This means the SOC was only in the deepest state for a total of 16 seconds of your last s2idle run.  

> echo disabled > /sys/devices/platform/i8042/serio0/power/wakeup
> echo disabled > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/power/wakeup
> echo XHC1 > /proc/acpi/wakeup

At least for experimentations sake, can you avoid making these changes in your next attempt?  Would like to confirm they weren't /part/ of this problem.

> Can you tell if the power consumption is completely up to the kernel?

From the logs you've shared, this is entirely influence by the kernel and firmware code for your case.  I would not be worried about userspace at all.

If the issue persists even without those wakeup sources then I'll look through your GPEs.

The next debugging step will be applying the following patches to 6.0-rc:

https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/commit/?h=review-hans&id=811d59fdf56a17c66742578fe8be8a6a841af448

https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/commit/?h=review-hans&id=e24faabf5f368c031ad50f0d915a01e1b591f536

https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/commit/?h=review-hans&id=90bec2855c566b6d07cc3e2bb47befb6266cf1ec

https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/commit/?h=review-hans&id=db55fb8a06f241e168a4f275970f2701d52040c6

Then add this patch:

diff --git a/drivers/platform/x86/amd/pmc.c b/drivers/platform/x86/amd/pmc.c
index e47e54b095af..d3a3e0df6e77 100644
--- a/drivers/platform/x86/amd/pmc.c
+++ b/drivers/platform/x86/amd/pmc.c
@@ -750,6 +750,8 @@ static void amd_pmc_s2idle_check(void)
        struct amd_pmc_dev *pdev = &pmc;
        int rc;

+       usleep_range(10000, 20000);
+
        rc = amd_pmc_write_stb(pdev, AMD_PMC_STB_S2IDLE_CHECK);
        if (rc)
                dev_err(pdev->dev, "error writing to STB: %d\n", rc);
Comment 15 kolAflash 2022-09-27 13:15:41 UTC
I booted the Kubuntu-22.10 beta which comes with linux-5.19 (ran the live image from USB storage). And the power consumption was at about 2 % per hour. The kernel is pretty much the same to what I'm using with openSUSE, so in some way the userspace must make a difference. At least in the way the userspace configures the kernel via sysfs.

Then I went back to my openSUSE-15.4 system, running the linux-6.0-rc6 kernel I compiled. And I disabled all sysfs wakeup triggers from userspace.
  find /sys/devices/ -type f -name wakeup -exec bash -c 'echo disabled > "{}"' \;
And indeed, power usage went down to 2 % per hour! :-)
So should I open a bug for s2idle in openSUSE instead!?




I also tried other Linux distros a kernel >= 5.19. But I couldn't get the s2idle power usage significantly below 2 % per hour. So S3 is still about 4 times better with around 0.5 % battery usage per hour.

Tomorrow I'm planning to do the debugging steps described by Mario in comment #14.
So maybe with that the consumption can be brought further down for s2idle.

But I also still the idea of using S3 (especially because of the very low power consumption).
---> So I'd still like to see a fix for the commit initially mentioned. Or maybe not a fix, but a kernel command line option to disable that behavior.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7123d39dc24dcd21ff23d75f46f926b15269b9da
Comment 16 Mario Limonciello (AMD) 2022-09-27 18:00:54 UTC
> And indeed, power usage went down to 2 % per hour! :-)
> So should I open a bug for s2idle in openSUSE instead!?

Eventually; yes.  But I think you need to further root cause why this is happening in openSUSE before opening a bug asking them to change a policy or userspace software.  Narrow down which wakeup trigger is causing the spurious events, and confirm that in Ubuntu 22.10 that wakeup trigger is disabled.

> But I couldn't get the s2idle power usage significantly below 2 % per hour. 

When the power consumption is at this level in your test environment, can you please double check with amd_pmc.dyndbg enabled matches a high percentage of the time you were suspended? 

> Tomorrow I'm planning to do the debugging steps described by Mario in comment
> #14.
> So maybe with that the consumption can be brought further down for s2idle.

Yeah let's see if that series of patches changes anything.  Like I suggested on the otherwise unpatched configuration you should also correlate the number you get for time (in microseconds) you were in the deepest state against the amount of time you were in suspend.

> But I also still the idea of using S3 (especially because of the very low
> power consumption).
> So I'd still like to see a fix for the commit initially mentioned. 
> Or maybe not a fix, but a kernel command line option to disable that
> behavior.

s2idle should be less power consumption than S3 on AMD SoCs and also have a much faster suspend and wakeup process.
I suspect what is happening here is that HP's EC is issuing wakeups too frequently and causing a wasted power consumption.  Let's see the information from my other debugging steps and proposed patch if they help.

For your request I sent up https://patchwork.freedesktop.org/patch/504971/ for discussion.  You can feel free to leave a Tested-by: comment there if that works for you.
Comment 17 Alex Deucher 2022-09-27 18:05:03 UTC
(In reply to kolAflash from comment #15)
> 
> But I also still the idea of using S3 (especially because of the very low
> power consumption).
> ---> So I'd still like to see a fix for the commit initially mentioned. Or
> maybe not a fix, but a kernel command line option to disable that behavior.
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?id=7123d39dc24dcd21ff23d75f46f926b15269b9da

You should not need to reset the GPU for S3 if the device is properly powered down on suspend.  The reset on suspend was to workaround the case where suspend failed and we needed a to put the GPU into a clean state to resume without the GPU having been powered down.  I think the reason this helped you was because the platform was not actually getting properly powered down so the GPU was never powered off, thus on resume the GPU was in an unknown state.  SO you were not actually getting proper suspend in the first place.
Comment 18 Mario Limonciello (AMD) 2022-09-27 18:07:17 UTC
> I think the reason this helped you was because the platform was not actually
> getting properly powered down so the GPU was never powered off, thus on
> resume the GPU was in an unknown state

Which also re-affirms my point that the firmware wasn't properly enabled for S3 in the first place.
Comment 19 kolAflash 2022-09-28 15:55:58 UTC
Created attachment 301886 [details]
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #14 patches

Mystery solved:
Guess what I'm always doing after running "systemctl start suspend.target".
Yes, I'm closing the notebook lid.

--> And closing the notebook lid is what's waking the CPU!

I did a few tests, closing the lid after random times. And "deepest state for" always matched that time.




If I don't close the lid, I also get 2 % battery usage per hour with openSUSE-15.4 and v6.0-rc7.

I have to do at least
  echo disabled > /sys/devices/platform/i8042/serio0/power/wakeup
Else the notebook completely wakes after a random time. (e.g. 20 minutes)
I guess something like a slight mouse movement then wakes it.

With i8042 wakeup disabled I can close the lid without the notebook waking up completely. I can see this, because the keyboard backlight stays dark when I slightly look below the lid. And also programs don't start running again and dmesg also agrees.
But it looks like nevertheless the CPU is waking from the "deepest state".




About my usage behavior:
I don't like my notebook to suspend automatically when closing the lid. So I always disable the lid close action in my desktop environment.
As a result, I always trigger s2ram manually and close the lid afterwards.




About the Ubuntu-22.10 beta and the "echo disabled" wakeup tests:
--> Forget about these!
I didn't closed the lid when testing Ubuntu-22.10 beta. And I just checked, that Ubuntu-22.10 beta also completely wakes when closing the lid, unless i8042 wakeup is disabled.
And I couldn't reproduce the good consumption values after running
  find /sys/devices/ -type f -name wakeup -exec bash -c 'echo disabled > "{}"' \;
when closing the lid. (see comment 15)
I probably just didn't closed the lid when doing that test for comment 15.




(In reply to Mario Limonciello (AMD) from comment #14)
> [...]
> 1) Collect the output of 
> # grep -v foo /sys/firmware/acpi/interrupts/*
> 2) Run your suspend
> 3) Capture again the output of 
> # grep -v foo /sys/firmware/acpi/interrupts/*
> 4) Compare the two to see which incremented
> [...]
> The next debugging step will be applying the following patches to 6.0-rc:
> [...]

I did this and applied the mentioned patches. (4 commits + usleep_range patch)
Find the result in the attached log.
I closed the lod 22 seconds after s2idle started. Which matches exactly the dmesg output:

  Last suspend in deepest state for 22133270us




OBJECTIVES:

1. Stay in deepest sleep state when closing the lid. (or immediately return to deepest sleep)

2. Identify when else the system might unintentionally leaves the deepest sleep state while the screen stays off.
E.g. certain buttons, plug in/out events (USB devices, USB-C docking station, AC power, HDMI, ...)
I'll try to think of some tests I can run for this.
You'll find the output of
  find /sys/devices/ -name wakeup -type f -print -exec cat '{}' \;
in the attachment.

3. Bring the s2idle power consumption further down below 2 % per hour.
Comment 20 kolAflash 2022-09-28 16:29:11 UTC
2.
I did some tests.


Opening the lid or pressing keyboard buttons (e.g. shift) is exactly the same as closing the lid. Also varying it's behavior on the state of
  /sys/devices/platform/i8042/serio0/power/wakeup


And unplugging the AC power also shows the same behavior.
If
  /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/AC/power/wakeup
is set to enabled, the notebook fully wakes.
If it's set to disabled, the notebook just stops the "deepest state" of sleep.
(notebook can be fully woken by pressing the power button)

Note:
The EliteBook 845 G8 has two power sockets. A classical one and an USB-C port. But the wakeup behavior is the equal for both.
Comment 21 Mario Limonciello (AMD) 2022-09-28 16:36:27 UTC
> --> And closing the notebook lid is what's waking the CPU!

Glad it's clarified what is going on.

> I did this and applied the mentioned patches. (4 commits + usleep_range
> patch)
> I closed the lod 22 seconds after s2idle started. Which matches exactly the
> dmesg output:

OK good.

> 1. Stay in deepest sleep state when closing the lid. (or immediately return
> to deepest sleep)

When you close the lid we need to know whether the kernel leaves the suspend loop or not.  If it didn't leave the suspend loop then my hypothesis is the act of going into suspend and then closing the lid causes a Linux specific race condition that prevents going into the deepest state.  That's why I proposed by usleep_range patch.  It will only help with power consumption if the kernel is still in the s2idle loop.

> 3. Bring the s2idle power consumption further down below 2 % per hour.

That last usleep_range patch is what I'm most interested in to fix this.  If you run a longer suspend, I suspect it might help with going to sleep from an EC triggered event for battery life notification.
Comment 22 kolAflash 2022-09-28 16:57:33 UTC
(In reply to Mario Limonciello (AMD) from comment #21)
> > --> And closing the notebook lid is what's waking the CPU!
> [...]
> > 1. Stay in deepest sleep state when closing the lid. (or immediately return
> > to deepest sleep)
> 
> When you close the lid we need to know whether the kernel leaves the suspend
> loop or not.  If it didn't leave the suspend loop then my hypothesis is the
> act of going into suspend and then closing the lid causes a Linux specific
> race condition that prevents going into the deepest state.  That's why I
> proposed by usleep_range patch.  It will only help with power consumption if
> the kernel is still in the s2idle loop.

I don't fully understand what you want me to do now. Please clarify.

The logs I attached where made with a v6.0-rc7 kernel containing all patches you mentioned in comment 14. Including the usleep_range patch.
Does these logs help you?


> > 3. Bring the s2idle power consumption further down below 2 % per hour.
> 
> That last usleep_range patch is what I'm most interested in to fix this.  If
> you run a longer suspend, I suspect it might help with going to sleep from
> an EC triggered event for battery life notification.

So I should put the notebook to sleep for one hour, without closing the lid?

OK, I'll do
  sleep 10; systemctl start suspend.target
just before closing the lid.
(then put the notebook into my bag and go home from work)
(using the comment 14 patched kernel)
Comment 23 Mario Limonciello (AMD) 2022-09-28 17:06:30 UTC
All of these events are so intermingled, we need to decouple and characterize log events to understand which are issues we're hitting or not.

From what you've described there is at a minimum a firmware bug causing a spurious IRQ1.  I suspect this when paired with any other number of wakeup events is causing either a system wakeup or not getting into the deepest state.

So your "disabling keyboard wakeup" is probably a good idea until HP can fix that firmware bug.

I also hypothesize my patch will help battery life when there are these types of spurious events but I don't really know for sure.

So with the same debugging lines I suggested in comment #12 and my patches suggested we need you to issue a suspend and keep track of what you did when so we can correlate it to the events shown.

>   sleep 10; systemctl start suspend.target
> just before closing the lid.

That sounds good to me.  Then we'll know if the EC had spurious events that woke the system and whether it went back to deepest state and how long.
Comment 24 kolAflash 2022-09-28 23:45:57 UTC
Created attachment 301894 [details]
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #14 patches - 23 minutes of s2idle

@Mario
Yes, it's definitely an intermingled mess...
If you have an idea how I can help to untangle this, please give me detailed instructions which tests I should run.

If you think this will help, we can have a (video) call about this. Just send me an email with date + time. (I live in UTC+2)
Just tell me which (video) call tool you like to use. I'm open for pretty much everything (Jitsi-Meet, Skype, Mumble, XMPP, Matrix, ...)


Something about the sysfs and procfs settings I'm doing:

This is needed to disable wakeup from USB events in S3 suspend.
  echo XHC1 > /proc/acpi/wakeup
Actually I wouldn't need to set this for s2idle.
In future I'll leave this out for s2idle tests.

This is needed to prevent the system from waking on keyboard events and on closing the lid.
  echo disabled > /sys/devices/platform/i8042/serio0/power/wakeup
And additionally this is needed to prevent the system from waking on opening the lid.
  echo disabled > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/power/wakeup
(both, i8042 and LNXSYBUS:00/PNP0C0D:00, must be disabled to prevent wake on opening the lid)

Additionally, if i8042 and LNXSYBUS:00/PNP0C0D:00 are not disabled, the system tends to wake after some minutes from s2idle. Maybe something between 10 or 60 minutes.
I haven't figured out yet if this is related to just one or both settings.
So beside the lid open/close, keyboard and AC power events, there must be something else interrupting the suspend. Something which is totally without user interaction, because these wakes also appear if I leave the lid open and don't touch the notebook at all.






I rebooted the system and then put it into s2idle for a little more than 23 minutes (1400 seconds).
See these lines:
  Thu Sep 29 00:09:19 CEST 2022
  Thu Sep 29 00:33:12 CEST 2022

Everything, including all sysfs setting I made, is documented in the log.
And I didn't closed the lid this time. I didn't even touched the notebook during that 23 minutes.



These lines suggest, that the system didn't stay in "deepest state" of sleep the whole time.

[   92.340524] PM: suspend-to-idle
[   92.340549] ACPI: EC: ACPI EC GPE status set
[   92.340562] ACPI: PM: Rearming ACPI SCI for wakeup
[   92.364660] Timekeeping suspended for 759.694 seconds
[   92.364660] ACPI: EC: ACPI EC GPE status set
[   92.364680] ACPI: EC: ACPI EC GPE dispatched
[   92.364987] ACPI: EC: ACPI EC work flushed
[   92.364987] ACPI: PM: Rearming ACPI SCI for wakeup
[   92.383963] ACPI: EC: ACPI EC GPE status set
[   92.383981] ACPI: PM: Rearming ACPI SCI for wakeup
[   92.408334] Timekeeping suspended for 331.956 seconds
[   92.408334] ACPI: EC: ACPI EC GPE status set
[   92.408334] ACPI: EC: ACPI EC GPE dispatched
[   92.408340] ACPI: EC: ACPI EC work flushed
[   92.408340] ACPI: PM: Rearming ACPI SCI for wakeup
[   92.429810] Timekeeping suspended for 328.978 seconds
[   92.429810] ACPI: EC: ACPI EC GPE status set
[   92.429810] ACPI: EC: ACPI EC GPE dispatched
[   92.429815] ACPI: EC: ACPI EC work flushed
[   92.429815] ACPI: PM: Rearming ACPI SCI for wakeup
[   92.451340] Timekeeping suspended for 8.978 seconds
[...]
[   92.451340] amd_pmc AMDI0005:00: Last suspend in deepest state for 758373565us

To it looks pretty much like the deepest sleep was only for the first 758 seconds.
The remaining time of the altogether ~ 1400 seconds the system wasn't in the deepest state.

I guess that the deepest state has been interrupted by the remnant of what completely wakes the system if i8042 and LNXSYBUS:00/PNP0C0D:00 are not disabled (see above).



I set the BIOS to limit the battery to max. 80 %. This usually drastically increases the overall lifetime of lithium-ion batteries. That's what makes the difference between energy-full and energy-full-design in the log. 

These are the power consumption values:
  energy-full-design:  53.0145 Wh
  22.2222/(53.0145/100) = 41.9 %
  20.8824/(53.0145/100) = 39.4 %
So about 2.5 % of power (1.3398 Wh) was consumed.
If the system was in perfect s2idle, this should have been less than 1% for 23 minutes.
Bug 2.5 % pretty much matches about 700 seconds of "bad" s2idle after the initial 700 "good" s2idle seconds in deepest state.
Comment 25 Mario Limonciello (AMD) 2022-09-29 21:14:38 UTC
> So about 2.5 % of power (1.3398 Wh) was consumed.
> If the system was in perfect s2idle, this should have been less than 1% for
> 23 minutes.
> Bug 2.5 % pretty much matches about 700 seconds of "bad" s2idle after the
> initial 700 "good" s2idle seconds in deepest state

Yup so what we're seeing here is that basically after an event comes in from the EC waking up the SOC, it never goes back to sleep properly.  I suspect there is some underlying firmware issue leading to that behavior.

Can you please do the following:

1) add right after the usleep_range() you added before this:

amd_pmc_idlemask_read(pdev, pdev->dev, NULL);

2) Add https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=cb3e7d624c3ff34a300587929c82af7364cf5c09

That will get us some more debugging information at least when this happens.

> I guess that the deepest state has been interrupted by the remnant of what 
> completely wakes the system if i8042 and LNXSYBUS:00/PNP0C0D:00 are not
> disabled (see above).

I think you're right.  The underlying firmware issue is probably the same as https://bugzilla.kernel.org/show_bug.cgi?id=215770

As a W/A can you have a try with acpi.ec_no_wakeup=1 on your kernel command line?  That should stop EC from issuing any wakeups and triggering this other firmware bug.
Comment 26 kolAflash 2022-09-29 23:29:55 UTC
Created attachment 301896 [details]
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #25 patches

@Mario
Thank you for all the great help!!!




(In reply to Mario Limonciello (AMD) from comment #25)
> [...]
> Can you please do the following:
> 
> 1) add right after the usleep_range() you added before this:
> 
> amd_pmc_idlemask_read(pdev, pdev->dev, NULL);
> 
> 2) Add
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/
> ?h=bleeding-edge&id=cb3e7d624c3ff34a300587929c82af7364cf5c09
> 
> That will get us some more debugging information at least when this happens.

Compiled with all patches from comment #14, 1) and 2).

Here are the first results.

1. s2idle was without sysfs settings. System was woken by closing the lid.

2. s2idle was with sysfs settings (see log). Directly after sleep I (un)plugged the power, opened/closed the lid and pressed keys on the keyboard. System was finally fully woken by pressing the power button.

3. s2idle was with sysfs settings (see log). System was going to sleep while the lid was closed. Lid was opened after about 18 seconds. System was fully woken by pressing the power button.

I'll now do another test for at least 30 minutes. So we can see the power consumption.




> [...]
> As a W/A can you have a try with acpi.ec_no_wakeup=1 on your kernel command
> line?  That should stop EC from issuing any wakeups and triggering this
> other firmware bug.

That workaround works very well! Thanks :-)

Didn't make any sysfs of procfs settings. Opening/closing the lid, pressing keyboard keys and disconnecting AC power didn't wake the system. And the "deepest state" log reported the full time being in that state, so it didn't return from "deepest state" early.
I didn't have time to do a long time test with the ec_no_wakeup workaround. Tested it for max. 90 seconds.
(workaround is not in the attached log)
Comment 27 Mario Limonciello (AMD) 2022-09-29 23:52:15 UTC
> Compiled with all patches from comment #14, 1) and 2).

OK, I also sent that new idlemask patch I suggested to the mailing list for the future to make debugging easier.
https://patchwork.kernel.org/project/platform-driver-x86/patch/20220929215042.745-1-mario.limonciello@amd.com/

I don't have any evidence that the usleep_range() change helped in any way, so I won't send that for now.

If you discover that is required for good battery life on your side let me know and I'll cleanup a more proper variation of it to send out.

> Here are the first results.

OK so the idle mask doesn't "change" from one time to another in any unexpected ways to me.

> [   90.048438] PM: Triggering wakeup from IRQ 9
> [   90.048801] PM: Triggering wakeup from IRQ 1

When you see two IRQs like this; it's a firmware bug as described in the other report I mentioned above.  HP will need to issue a new BIOS to fix this.

This is a laptop certified with Ubuntu (see https://ubuntu.com/certified/202206-30367).  You should contact HP support and ask them for a BIOS fix.

> [  332.466583] PM: Triggering wakeup from IRQ 9

When you're seeing these alone in the middle of the sleep it's because of the EC GPE firing to do $something.  In this case I believe it's the EC trying to notify a new battery level.  Due to the above bug I mentioned I think these together will "prevent" it from going back to the deepest state.

> That workaround works very well! Thanks :-)

Sure.  This is going to limit you quite a bit in terms of wakeup sources, but IMO I think it's a worthwhile trade off for this problem.  You should hopefully be getting much better power consumption now over s2idle.
Comment 28 kolAflash 2022-09-30 00:37:15 UTC
Created attachment 301897 [details]
kernel log for s2idle: v6.0-rc7 with bug 216516 comment #25 patches - 30 minutes of s2idle

(In reply to Mario Limonciello (AMD) from comment #27)
> [...]
> I don't have any evidence that the usleep_range() change helped in any way,
> so I won't send that for now.
> 
> If you discover that is required for good battery life on your side let me
> know and I'll cleanup a more proper variation of it to send out.

I'll try to run some tests.
Just to get this right:
Is the usleep_range() patch about helping with the buggy interrupts?

Or is is about further lowering the power consumption below 2 % per hour?
(2 % per hour is the best I got until now, when the deepest state was active all the time)




> > Here are the first results.
> 
> OK so the idle mask doesn't "change" from one time to another in any
> unexpected ways to me.
> 
> > [   90.048438] PM: Triggering wakeup from IRQ 9
> > [   90.048801] PM: Triggering wakeup from IRQ 1
> 
> When you see two IRQs like this; it's a firmware bug as described in the
> other report I mentioned above.  HP will need to issue a new BIOS to fix
> this.

All tests where made with BIOS version 01.10.00 Rev.A (released 2022-08-09 / build 2022-07-13)
This is the most recent BIOS version available.
https://support.hp.com/us-en/drivers/selfservice/swdetails/hp-elitebook-845-g8-notebook-pc/38492638/model/2100000127/swItemId/ob-294554-1?sku=5Z621EA
See "Revision history":
  Embedded Controller (EC), version 43.2D.00
  PSP Firmware, version 0.11.0.79
  SMU Firmware, version 64.61.0
  AMD GOP EFI Driver AMD GOP X64 Release Driver, version 2.16.0.17.10
  Cypress Power Delivery (PD) Firmware, version 0.7.0

I got no Windows installed. But I'm wondering how Windows is handling this firmware bug?




> This is a laptop certified with Ubuntu (see
> https://ubuntu.com/certified/202206-30367).  You should contact HP support
> and ask them for a BIOS fix.

Does "certified with Ubuntu" mean HP is officially supporting it?
Or does this mean, that just Ubuntu says it's working?

I can try posting the HP support for HP support forum or calling the HP business support hotline. But most of the it's quite hard to get real help from them.
https://h30434.www3.hp.com/t5/Business-Notebooks/bd-p/General
Wouldn't it be better, if you - as official AMD employee - report this to HP!?




> > [  332.466583] PM: Triggering wakeup from IRQ 9
> 
> When you're seeing these alone in the middle of the sleep it's because of
> the EC GPE firing to do $something.  In this case I believe it's the EC
> trying to notify a new battery level.  Due to the above bug I mentioned I
> think these together will "prevent" it from going back to the deepest state.

EC means "Embedded Controller"?
GPE means "General Purpose Event"?

Maybe I can more explicitly test if it's the battery level.
Removing the battery is possible. It's just a little more effort.
But instead I could fully charge the battery and put the notebook into s2idle with AC power connected. In that case the battery level shouldn't change. So no event should happen, if I DON'T I press the keyboard, (un)plug the ac power or open/close the lid.


I attached another log:

System was in s2idle for 30 minutes without interruption (deepest state the whole time). System was on battery power (no ac power connected). And comment #25 (including comment #14) patches where applied. I just opened and closed the lid 30 seconds before waking the system using the power button. So the system wasn't in deepest state of sleep.

A little more than 1 % of battery was consumed. And I'm wondering if this might contradicts your theory, about the EC trying to notify about a new battery level.




> > That workaround works very well! Thanks :-)
> 
> Sure.  This is going to limit you quite a bit in terms of wakeup sources,
> but IMO I think it's a worthwhile trade off for this problem.  You should
> hopefully be getting much better power consumption now over s2idle.

I'm a late riser person. And I also like my notebook to stay asleep, unless I press the power button ;-)

So I'll go to bed now. And I'll also put the notebook on battery (no ac power connected) and into s2idle with the ec_no_wakeup workaround enabled. So tomorrow morning I'll see how much power it has consumed using the ec_no_wakeup workaround :-)
Comment 29 Mario Limonciello (AMD) 2022-09-30 01:01:22 UTC
> Is the usleep_range() patch about helping with the buggy interrupts?
> Or is is about further lowering the power consumption below 2 % per hour?
> (2 % per hour is the best I got until now, when the deepest state was active
> all the time)

It's for trying to keep it in deepest state for more of the kernel's suspend cycle when it has been interrupted. If you're already in the deepest state for most of the suspend cycle it won't do anything.

>   SMU Firmware, version 64.61.0

This is the BIOS component with the bug.  The fix is in 64.66.0.

> I got no Windows installed. But I'm wondering how Windows is handling this
> firmware bug?

Windows does Modern Standby a little bit differently.

> Does "certified with Ubuntu" mean HP is officially supporting it?
> Or does this mean, that just Ubuntu says it's working?

I can't put words in their mouths on what it means for them.

> Wouldn't it be better, if you - as official AMD employee - report this to
> HP!?

I'm not their customer; you are.

> EC means "Embedded Controller"?
> GPE means "General Purpose Event"?

Yes

> A little more than 1 % of battery was consumed. And I'm wondering if this
> might contradicts your theory, about the EC trying to notify about a new
> battery level.

That amount of time in deep sleep matches how it should be behaving.  If that *wasn't* with acpi.ec_no_wakeup=1 then the usleep_range() change is helping and I should clean up and send a patch for it too.

But to your question, I don't have insight into the exact nature of what HP's EC does; so I hypothesize.  I only know the AMD side of it.

> So tomorrow morning I'll see how much power it has consumed using the
> ec_no_wakeup workaround :-)

OK.
Comment 30 kolAflash 2022-09-30 10:38:30 UTC
Created attachment 301904 [details]
kernel log for s2idle with ec_no_wakeup: v6.0-rc7 with bug 216516 comment #25 patches

@Mario
Thanks for answering all my questions!




Here's a new log. ec_no_wakeup=Y and nearly 8 hours of s2idle.
That's 27663 seconds of which 27659 where in the deepest state.

Battery went from 96 % to 80 %. (battery limited to 80 % total power)
That's pretty good compared to without ec_no_wakeup, when the firmware bug wakes the Linux Kernel. But it's still about 4 times worse than what S3 consumes.

I stumbled upon this commit for the "ThinkPad X1 Carbon 6th" notebook.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8195a655e5ce09550aff81b2573d9b015d520cb9
It's Intel based. But it seems to have a similar problem. So the commit adds a quirk which sets ec_no_wakeup=true on that notebook.
https://www.thinkwiki.org/wiki/Category:X1_Carbon_(6th_Gen)




(In reply to Mario Limonciello (AMD) from comment #27)
> [...]
> This is a laptop certified with Ubuntu (see
> https://ubuntu.com/certified/202206-30367).  You should contact HP support
> and ask them for a BIOS fix.

(In reply to Mario Limonciello (AMD) from comment #29)
> [...]
> >   SMU Firmware, version 64.61.0
> 
> This is the BIOS component with the bug.  The fix is in 64.66.0.

I also called the HP business support. But they couldn't help me much. They said "soon" there will be a new BIOS update. But they couldn't tell if it will contain a new SMU Firmware version.

@Mario
QUESTION:
If there will be a new BIOS version. Should I immediately update? Or should I keep the current BIOS version for the moment, so we can run a few more tests?
(I'm not sure if a BIOS downgrade would be possible)




> [...]
> (In reply to kolAflash from comment #28)
> > A little more than 1 % of battery was consumed. And I'm wondering if this
> > might contradicts your theory, about the EC trying to notify about a new
> > battery level.
> 
> That amount of time in deep sleep matches how it should be behaving.  If
> that *wasn't* with acpi.ec_no_wakeup=1 then the usleep_range() change is
> helping and I should clean up and send a patch for it too.

ec_no_wakeup was not set when I ran that test. But I think I saw the system staying in deep sleep before. Even for hours if I didn't opened/closed the lid, (un)plugged the power, ...
I just feel very uncertain about when this EC GPE background event fires. So let me run some further checks before committing that usleep_range() patch.




So I see mostly two things remaining now:

1.
Can the power consumption in s2idle be lowered further?
You (Mario) said "For most mobile designs the power consumption is better for s2idle than S3." So is there something the kernel can do do to bring the power consumption further down?
S3 is at around 0,5 % battery per hour, because of which I initially tried to use S3 instead of s2idle. And with ec_no_wakeup=Y I'm getting between 2 % and 3 % battery per hour in s2idle.
(without ec_no_wakeup=Y it was > 10 % per hour if the system left the deepest state of sleep)

2.
What to do if there's no BIOS update containing the new SMU Firmware?
At least for other Linux users with the EliteBook 845 G8, who don't know anything about this and don't update their BIOS.
(unfortunately the notebook doesn't support fwupd, so no automatic BIOS updates when running Linux)
Is ec_no_wakeup=Y the best solution? If yes, maybe add a quirk to the kernel.
Comment 31 Mario Limonciello (AMD) 2022-09-30 12:32:17 UTC
> Can the power consumption in s2idle be lowered further?

Once the amd-pmc driver is reporting nearly the whole sleep in the deepest state, I don't know anything else that can be done from the Linux side.

External things to Linux that come to mind:
- Anything USB plugged in over the sleep can stay powered and sip some battery.  Unplug these.  A good example here is a wireless USB dongle for a mouse or keyboard.
- Do you use any manageability features like AIM-T?  These can keep some parts of the SOC doing things.  If so; you can try to disable those.

> If there will be a new BIOS version. Should I immediately update? Or should I 
> keep the current BIOS version for the moment, so we can run a few more tests?

I don't see any other valuable tests. I would upgrade when it's available.

> Is ec_no_wakeup=Y the best solution? If yes, maybe add a quirk to the kernel.

If this is to be quirked by the kernel by default then it needs to be tied to this BIOS version, as hopefully this will be fixed in a future HP BIOS.
Comment 32 kolAflash 2022-09-30 15:50:06 UTC
Created attachment 301909 [details]
HP EliteBook 845 G8: dmidecode and lshw

(In reply to Mario Limonciello (AMD) from comment #31)
> > Can the power consumption in s2idle be lowered further?
> 
> Once the amd-pmc driver is reporting nearly the whole sleep in the deepest
> state, I don't know anything else that can be done from the Linux side.
> 
> External things to Linux that come to mind:
> - Anything USB plugged in over the sleep can stay powered and sip some
> battery.  Unplug these.  A good example here is a wireless USB dongle for a
> mouse or keyboard.
> - Do you use any manageability features like AIM-T?  These can keep some
> parts of the SOC doing things.  If so; you can try to disable those.

No external devices connected.


I'm currently *experimenting* with additional stuff.

This kernel command line options: pcie_aspm=force iommu=pt
(looks like iommu=pt may already be the default, but I guess adding it should have no disadvantage)

And also setting sysfs options before entering s2idle.
  powersupersave > /sys/module/pcie_aspm/parameters/policy
  find /sys/devices/ -type f -name wakeup -exec bash -c 'echo disabled > "{}"' \;




> > Is ec_no_wakeup=Y the best solution? If yes, maybe add a quirk to the
> kernel.
> 
> If this is to be quirked by the kernel by default then it needs to be tied
> to this BIOS version, as hopefully this will be fixed in a future HP BIOS.

I guess you'll need dmidecode or lshw for that. So I attached it.


Here you can see which BIOS versions contain which AMD SMU Firmware version.
https://support.hp.com/us-en/drivers/selfservice/swdetails/hp-elitebook-845-g8-notebook-pc/38492638/model/2100000127/swItemId/ob-294554-1?sku=5Z621EA
BIOS:
  01.10.00 Rev.A
  01.09.10 Rev.A
  01.09.00 Rev.A
SMU Firmware: 64.61.0

BIOS:
  01.08.20 Rev.A
  01.08.03 Rev.A
SMU Firmware: 64.60.0

BIOS:
  01.07.01 Rev.A
SMU Firmware: 64.48.0

Do you have internal information if all these SMU Firmware versions are affected by the bug? If yes, I'd say the quirk can be enabled for all of them. 


I'm wondering if there could be a way to dynamically detect broken EC GPE's. And such an GPE apears, automatically set
  /sys/module/acpi/parameters/ec_no_wakeup = Y
Not for myself, but for other users affected by this bug.

Because it seems that not just the EliteBook 845 G8 is affected by this. (see above for the ThinkPad Carbon X1 6th)
So an automatic detection may spare a lot of future work.


I already thought about doing this in userspace.
Systemd service:
==========
[Unit]
After=suspend.target hybrid-sleep.target suspend-then-hibernate.target
[Install]
WantedBy=suspend.target hybrid-sleep.target suspend-then-hibernate.target
==========
And the service could call a script, which checks if there are two IRQs in the dmesg like this:
  [   90.048438] PM: Triggering wakeup from IRQ 9
  [   90.048801] PM: Triggering wakeup from IRQ 1
And if yes, ec_no_wakeup=Y is being set. Or at least there's a message telling the user to enable ec_no_wakeup=Y.
If this would be in the kernel, that message could appear in dmesg.

But I'd guess a general detection wouldn't be that simple!?...
Comment 33 Mario Limonciello (AMD) 2022-10-03 17:21:53 UTC
> I guess you'll need dmidecode or lshw for that. So I attached it.

The problem with generically issuing a patch here is that some people may prefer to have high power consumption but those EC triggered wake sources working.

I don't want to make decisions on their behalf by quirking this.  You prefer the lower power consumption and don't care for those other sources so it makes sense to keep it in your kernel command line configuration.

> So an automatic detection may spare a lot of future work.

I don't believe there is a safe way to do it.  I've added various debugging mechanisms into the kernel related to this situation (that I've referenced in this bug and others).  We'll have to rely upon those to determine what's happening.
Comment 34 kolAflash 2022-10-12 17:39:07 UTC
Created attachment 302983 [details]
kernel log for s2both with s2idle (hybrid sleep): v6.0

@Mario

I've got another related problem.
(just tell me if I should open another bug for that)

The system doesn't correctly enter s2both with s2idle. (a.k.a. "hybrid sleep")
Instead the system seems to do simply a s2disk.
So the system simply shuts down completely. And resumes from swapfile at power on.

Correct behavior would be to write the memory to swapfile (like s2disk), but then enter s2idle.
(the idea behind s2both is to prevent data loss if the battery is getting drained while being in some suspend to memory)



What I do:
echo suspend > /sys/power/disk
echo disk > /sys/power/state

$ cat /sys/power/mem_sleep 
[s2idle]

Behavior is independent of the settings in these files:
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/power/wakeup
/sys/devices/platform/i8042/serio0/power/wakeup
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/AC/power/wakeup
/sys/module/acpi/parameters/ec_no_wakeup

Please see the attached kernel log.



I use s2both on a lot of systems, especially with AMD CPU+GPU. And it usually works flawlessly. But these systems all use S3 instead of s2idle. And I couldn't find if there's something extra to consider when using s2both with s2idle.
Comment 35 Mario Limonciello (AMD) 2022-10-12 17:53:36 UTC
I'm admittedly unfamiliar with "s2both", but this sounds like its own bug report.
Comment 36 Mario Limonciello (AMD) 2022-10-12 18:04:50 UTC
BTW AFAICT upstream is dead for "s2both".  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954061.
Comment 37 kolAflash 2022-10-13 01:17:57 UTC
@Mario

I think you might interpreted too much in the term "s2both" I used.
s2both is just a handy name uswsusp uses.
But I don't use uswsusp. (I just like the name "s2both")

The same technique uswsusp calls s2both is also known as "Hybrid Sleep" by Systemd and as "Hybrid System Suspend" by the kernel documentation.
See here:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/pm/sleep-states.rst?h=v6.0.1#n220



I opened a new bug for this Hybrid System Suspend problem.
Would you have a look at it?

Hybrid System Suspend broken HP EliteBook 845 G8 (a.k.a. Hybrid Sleep / s2both) (s2idle Notebook)
https://bugzilla.kernel.org/show_bug.cgi?id=216574

I still suspect that the problem is related to s2idle. Because Hybrid System Suspend usually works by writing the memory to the swap file (like hibernate to disk), and then entering "suspend to memory". So for "suspend to memory" so far "deep" S3 has been used. But with this new hardware I guess "s2idle" must be used instead.