Bug 208733 - s2idle freeze wakes from timekeeping_freeze at least 100 times with ec_no_wakeup enabled
Summary: s2idle freeze wakes from timekeeping_freeze at least 100 times with ec_no_wak...
Status: RESOLVED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_ec
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2020-07-29 02:02 UTC by Todd Brandt
Modified: 2020-11-19 23:07 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.8.0-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
issue.def (341 bytes, text/plain)
2020-07-29 02:02 UTC, Todd Brandt
Details
issue.def (338 bytes, text/plain)
2020-07-30 01:47 UTC, Todd Brandt
Details
issue.def (342 bytes, text/plain)
2020-08-05 16:07 UTC, Todd Brandt
Details
issue.def (384 bytes, text/plain)
2020-08-14 03:29 UTC, Todd Brandt
Details
example_of_infinite_loop_in_amber_lake (2.21 MB, text/html)
2020-08-24 18:43 UTC, Todd Brandt
Details

Description Todd Brandt 2020-07-29 02:02:04 UTC
Created attachment 290663 [details]
issue.def

When sleepgraph is run it is expected to enter s2idle once and leave s2idle once at the triggering of rtcwake. Sometimes wakeups occur which pull it out of timekeeping_freeze but still within the s2idle_enter function. This can affect power and performance. This issue monitors it for occurrences.
Comment 1 Todd Brandt 2020-07-30 01:47:25 UTC
Created attachment 290681 [details]
issue.def
Comment 2 Todd Brandt 2020-08-05 16:07:03 UTC
Created attachment 290783 [details]
issue.def
Comment 3 Todd Brandt 2020-08-14 03:29:53 UTC
Created attachment 290895 [details]
issue.def
Comment 4 Todd Brandt 2020-08-14 03:35:20 UTC
I've found 5 systems that were/are affected by this behavior in our lab. Two of which are still affected in 5.8.0: the otcpl-aml-y (amber lake) and the otcpl-cml-u (comet lake). A key piece of the puzzle appears to be the acpi.ec_no_wakeup=1 kernel argument. This issue only occurs on these systems when this argument is used. Without it both machines pass with flying colors and achieve full S0iX coverage.

So the simple solution is to not use acpi.ec_no_wakeup=1, but I'm still looking into why this is. The whole purpose of this flag is to make S2idle more stable and prevent unnecessary wakeups. In these two specific systems it appears to achieve the exact opposite effect for a significant part of the stress testing time.
Comment 5 Todd Brandt 2020-08-24 18:31:16 UTC
Upon further debug, I believe that cause of the error is a bug in the drivers/acpi/ec.c library. The ec_no_wakeup parameter has 3 major effects:

1) disable EC GPE before suspend, and re-enable EC GPE after resume
2) prevent EC GPE from being flagged as a wakeup source
3) abort the acpi_ec_dispatch_gpe call if an EC GPE is triggered

When these infinite loops occur, something else re-enables the EC GPE just after ec_no_wakeup disables it for suspend. Thus the EC GPE is erroneously dispatched. However when the EC GPE reaches acpi_ec_dispatch_gpe, the ec_no_wakeup flag prevents it from being processed and completed. Thus it just keeps re-dispatching over and over in an infinite loop until resume clears everything.

The simplest solution to this issue is this:

@@ -2011,9 +2040,17 @@ bool acpi_ec_dispatch_gpe(void)
        if (acpi_any_gpe_status_set(first_ec->gpe))
                return true;
 
-       if (ec_no_wakeup)
-               return false;
-

It allows erroneous EC GPEs to be completed, thus preventing the loop. However if does not solve the mystery of why they're being re-enabled at all.
Comment 6 Todd Brandt 2020-08-24 18:43:05 UTC
Created attachment 292147 [details]
example_of_infinite_loop_in_amber_lake
Comment 7 Todd Brandt 2020-11-19 23:07:15 UTC
This appears fixed with the following:

commit e0e9ce390d7bc6a705653d4a8aa4ea92c9a65e53
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Oct 5 19:13:46 2020 +0200

    ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()
    
    It turns out that in some cases there are EC events to flush in
    acpi_ec_dispatch_gpe() even though the ec_no_wakeup kernel parameter
    is set and the EC GPE is disabled while sleeping, so drop the
    ec_no_wakeup check that prevents those events from being processed
    from acpi_ec_dispatch_gpe().
    
    Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
    Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Note You need to log in before you can comment on or make changes to this bug.