Bug 208733
Summary: | s2idle freeze wakes from timekeeping_freeze at least 100 times with ec_no_wakeup enabled | ||
---|---|---|---|
Product: | ACPI | Reporter: | Todd Brandt (todd.e.brandt) |
Component: | EC | Assignee: | acpi_ec |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | todd.e.brandt |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.8.0-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 178231 | ||
Attachments: |
issue.def
issue.def issue.def issue.def example_of_infinite_loop_in_amber_lake |
Created attachment 290681 [details]
issue.def
Created attachment 290783 [details]
issue.def
Created attachment 290895 [details]
issue.def
I've found 5 systems that were/are affected by this behavior in our lab. Two of which are still affected in 5.8.0: the otcpl-aml-y (amber lake) and the otcpl-cml-u (comet lake). A key piece of the puzzle appears to be the acpi.ec_no_wakeup=1 kernel argument. This issue only occurs on these systems when this argument is used. Without it both machines pass with flying colors and achieve full S0iX coverage. So the simple solution is to not use acpi.ec_no_wakeup=1, but I'm still looking into why this is. The whole purpose of this flag is to make S2idle more stable and prevent unnecessary wakeups. In these two specific systems it appears to achieve the exact opposite effect for a significant part of the stress testing time. Upon further debug, I believe that cause of the error is a bug in the drivers/acpi/ec.c library. The ec_no_wakeup parameter has 3 major effects: 1) disable EC GPE before suspend, and re-enable EC GPE after resume 2) prevent EC GPE from being flagged as a wakeup source 3) abort the acpi_ec_dispatch_gpe call if an EC GPE is triggered When these infinite loops occur, something else re-enables the EC GPE just after ec_no_wakeup disables it for suspend. Thus the EC GPE is erroneously dispatched. However when the EC GPE reaches acpi_ec_dispatch_gpe, the ec_no_wakeup flag prevents it from being processed and completed. Thus it just keeps re-dispatching over and over in an infinite loop until resume clears everything. The simplest solution to this issue is this: @@ -2011,9 +2040,17 @@ bool acpi_ec_dispatch_gpe(void) if (acpi_any_gpe_status_set(first_ec->gpe)) return true; - if (ec_no_wakeup) - return false; - It allows erroneous EC GPEs to be completed, thus preventing the loop. However if does not solve the mystery of why they're being re-enabled at all. Created attachment 292147 [details]
example_of_infinite_loop_in_amber_lake
This appears fixed with the following: commit e0e9ce390d7bc6a705653d4a8aa4ea92c9a65e53 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Mon Oct 5 19:13:46 2020 +0200 ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe() It turns out that in some cases there are EC events to flush in acpi_ec_dispatch_gpe() even though the ec_no_wakeup kernel parameter is set and the EC GPE is disabled while sleeping, so drop the ec_no_wakeup check that prevents those events from being processed from acpi_ec_dispatch_gpe(). Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
Created attachment 290663 [details] issue.def When sleepgraph is run it is expected to enter s2idle once and leave s2idle once at the triggering of rtcwake. Sometimes wakeups occur which pull it out of timekeeping_freeze but still within the s2idle_enter function. This can affect power and performance. This issue monitors it for occurrences.