Bug 219346

Summary: [BISECTED] Disable ACPI PM Timer breaks suspend on all Amber Lake machines
Product: Drivers Reporter: Todd Brandt (todd.e.brandt)
Component: Platform_x86Assignee: drivers_platform_x86 (drivers_platform_x86)
Status: RESOLVED CODE_FIX    
Severity: high CC: daniel.lezcano, jwrdegoede
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id: e86c8186d03a6ba018e881ed45f0962ad553e861
Bug Depends on:    
Bug Blocks: 178231    

Description Todd Brandt 2024-10-03 10:17:18 UTC
We have two amber lake machines in our lab that have started hanging on freeze in 6.12-rc1. I've bisected to this commit:

commit e86c8186d03a6ba018e881ed45f0962ad553e861 (refs/bisect/bad)
Author: Marek Maslanka <mmaslanka@google.com>
Date:   Mon Aug 12 18:42:00 2024 +0000

    platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended

    Allow to disable ACPI PM Timer on suspend and enable on resume. A
    disabled timer helps optimise power consumption when the system is
    suspended. On resume the timer is only reactivated if it was activated
    prior to suspend, so unless the ACPI PM timer is enabled in the BIOS,
    this won't change anything.

    The ACPI PM timer is used by Intel's iTCO/wdat_wdt watchdog to drive the
    watchdog, so it doesn't need to run during suspend.

    Signed-off-by: Marek Maslanka <mmaslanka@google.com>
    Reviewed-by: Hans de Goede <hdegoede@redhat.com>
    Link: https://lore.kernel.org/r/20240812184208.1080710-1-mmaslanka@google.com
    Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

I understand there may already be a potential fix in the pipeline but I'd like to track this issue here.
Comment 1 Todd Brandt 2024-10-03 10:47:20 UTC
oh, one further note, this bug affects all 3 power modes: freeze, mem, and disk. So it's not limited to freeze. All three modes hang.
Comment 2 Hans de Goede 2024-10-03 11:02:28 UTC
Thank you for reporting this, there is a potential fix for this here:

https://patchwork.kernel.org/project/platform-driver-x86/patch/20240919165349.235777-1-hdegoede@redhat.com/

Which I plan to merge soon, this should at least fix the mem case, but I'm not sure if it will help with the others.

Maybe instead of not doing the ACPI PM Timer disable when using S3 suspend we need to just not do it at all on Kaby / Amber Lake ?
Comment 3 Todd Brandt 2024-10-03 11:46:49 UTC
I will do a build with your patch and try it, also please note that this issue causes a full hang, pressing the keyboard immediately after S2idle, S3, or S4 doesn't bring it back.
Comment 4 Todd Brandt 2024-10-03 12:50:24 UTC
ok I just tested with your patch, it fixed S3, but S2idle and S4 are still broken and still hang the system. The only difference is when S2idle or S4 hang the system will display this on the screen:

[ OK ] Started Show Plymouth Boot Screen
[ OK ] Started Forward Password R#s to Plymouth Directory Watch.plymouth-start.service
[ OK ] Reached target Local Encrypted Volumes.
systemd-journal-flush.service
[ OK ] Finished Flush Journal to Persistent Storange
[ OK ] Created slice Slice /system/systemd-backlight
       Starting Load/Save Screen #f backlight:intel_backlight...
[ OK ] Finished Load/Save Screen # of backlight:intel_backlight.
systemd-backlight@backlight:intel_backlight.service

And it hangs, doesn't respond to any keypresses, it just sits there. So in short the only two changes are:
1) fixes S3
2) leave S2idle and S4 broken but displays a weird status on the screen on hang

So this is clearly affecting more than just S3, that timer has some purpose in Amber Lake that breaks when the timer is shut down.
Comment 5 Todd Brandt 2024-10-03 13:11:40 UTC
And yea if just disabling this fix completely for kabylake and amberlake is what it takes that's fine. These are older systems. Perhaps in the future we can figure out what the issue is and re-enable it for these two platforms.
Comment 6 Hans de Goede 2024-10-03 20:29:22 UTC
Thank you for testing my original patch.

I have now submitted a patch which outright disables the ACPI PM timer disabling on any systems with a Sunrise or Union Point PCH:

https://lore.kernel.org/platform-driver-x86/20241003202614.17181-1-hdegoede@redhat.com/

Please give this a test and let me know if it fixes things.

Once I have confirmation that this patch works better I'll send it on its way to Linus.
Comment 7 Todd Brandt 2024-10-03 22:47:42 UTC
building it now, I'll have it tested in an hour or so, thanks.
Comment 8 Todd Brandt 2024-10-03 23:59:16 UTC
ok I did a full stress run of S2idle, S3, and S4 and all 3 work just fine on both our AML machines. Looks like it works. Thanks. Go ahead and add me as Tested-By.
Comment 9 Hans de Goede 2024-10-06 11:09:10 UTC
Thank you for testing, the fix is part of this fixes pull-request which I just send to Linus: https://lore.kernel.org/platform-driver-x86/280a792b-ec54-419d-8cca-17b020a38d3f@redhat.com/