We have two amber lake machines in our lab that have started hanging on freeze in 6.12-rc1. I've bisected to this commit: commit e86c8186d03a6ba018e881ed45f0962ad553e861 (refs/bisect/bad) Author: Marek Maslanka <mmaslanka@google.com> Date: Mon Aug 12 18:42:00 2024 +0000 platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended Allow to disable ACPI PM Timer on suspend and enable on resume. A disabled timer helps optimise power consumption when the system is suspended. On resume the timer is only reactivated if it was activated prior to suspend, so unless the ACPI PM timer is enabled in the BIOS, this won't change anything. The ACPI PM timer is used by Intel's iTCO/wdat_wdt watchdog to drive the watchdog, so it doesn't need to run during suspend. Signed-off-by: Marek Maslanka <mmaslanka@google.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240812184208.1080710-1-mmaslanka@google.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> I understand there may already be a potential fix in the pipeline but I'd like to track this issue here.
oh, one further note, this bug affects all 3 power modes: freeze, mem, and disk. So it's not limited to freeze. All three modes hang.
Thank you for reporting this, there is a potential fix for this here: https://patchwork.kernel.org/project/platform-driver-x86/patch/20240919165349.235777-1-hdegoede@redhat.com/ Which I plan to merge soon, this should at least fix the mem case, but I'm not sure if it will help with the others. Maybe instead of not doing the ACPI PM Timer disable when using S3 suspend we need to just not do it at all on Kaby / Amber Lake ?
I will do a build with your patch and try it, also please note that this issue causes a full hang, pressing the keyboard immediately after S2idle, S3, or S4 doesn't bring it back.
ok I just tested with your patch, it fixed S3, but S2idle and S4 are still broken and still hang the system. The only difference is when S2idle or S4 hang the system will display this on the screen: [ OK ] Started Show Plymouth Boot Screen [ OK ] Started Forward Password R#s to Plymouth Directory Watch.plymouth-start.service [ OK ] Reached target Local Encrypted Volumes. systemd-journal-flush.service [ OK ] Finished Flush Journal to Persistent Storange [ OK ] Created slice Slice /system/systemd-backlight Starting Load/Save Screen #f backlight:intel_backlight... [ OK ] Finished Load/Save Screen # of backlight:intel_backlight. systemd-backlight@backlight:intel_backlight.service And it hangs, doesn't respond to any keypresses, it just sits there. So in short the only two changes are: 1) fixes S3 2) leave S2idle and S4 broken but displays a weird status on the screen on hang So this is clearly affecting more than just S3, that timer has some purpose in Amber Lake that breaks when the timer is shut down.
And yea if just disabling this fix completely for kabylake and amberlake is what it takes that's fine. These are older systems. Perhaps in the future we can figure out what the issue is and re-enable it for these two platforms.
Thank you for testing my original patch. I have now submitted a patch which outright disables the ACPI PM timer disabling on any systems with a Sunrise or Union Point PCH: https://lore.kernel.org/platform-driver-x86/20241003202614.17181-1-hdegoede@redhat.com/ Please give this a test and let me know if it fixes things. Once I have confirmation that this patch works better I'll send it on its way to Linus.
building it now, I'll have it tested in an hour or so, thanks.
ok I did a full stress run of S2idle, S3, and S4 and all 3 work just fine on both our AML machines. Looks like it works. Thanks. Go ahead and add me as Tested-By.
Thank you for testing, the fix is part of this fixes pull-request which I just send to Linus: https://lore.kernel.org/platform-driver-x86/280a792b-ec54-419d-8cca-17b020a38d3f@redhat.com/