Bug 206413

Summary: [regression] Wakeup from s2idle automatically, caused by EC evaluations - Surface Book 1
Product: Power Management Reporter: Tsuchiya Yuto (kitakar5525) (kitakar)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED CODE_FIX    
Severity: normal CC: rui.zhang, todd.e.brandt, yu.c.chen
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.4 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: s2idle: wakeup automaticallly after 5.4
acpidump from my SB1
s2idle: unload battery module before suspend
the whole dmesg log after a fresh boot (5.5.1)
contents of /sys/firmware/acpi/interrupts/ before s2idle
contents of /sys/firmware/acpi/interrupts/ after s2idle
dmesg: s2idle cycle with pm_debug_messages
Suspend-to-idle wakeup debug patch
dmesg: s2idle-wake-debug
PM: s2idle: ACPI: Rework the handling of EC wakeup events
dmesg: acpi-s2idle-rework-wake
PM: s2idle: ACPI: Rework the handling of EC wakeup events
ACPI: PM: Check status of all GPEs in acpi_s2idle_wake()
dmesg: s2idle with patches from c18 and c19 (plus debug patch from c13)

Description Tsuchiya Yuto (kitakar5525) 2020-02-04 18:01:06 UTC
Created attachment 287113 [details]
s2idle: wakeup automaticallly after 5.4

Hello,

After kernel 5.4, Surface Book 1 (and also Surface Pro 4, both skylake and
I think it's similar hardware) will wakeup from s2idle automatically.

When I unload battery module before going into s2idle, this issue will not
happen. So, I suspect some EC evaluations regarding battery causes
wakeup on those devices.

I attached the dmesg log where the wakeup happened with some debug output
regarding ACPI methods evaluation.

I think one of these commits are related to this issue but I haven't
bisected yet.
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b531e55c52bbe2927e2c3490d47c82113d303e7
  (Merge suspend-to-idle rework material for v5.4.)

I'd be glad if I can find what is causing this issue.

Regards,
Tsuchiya Yuto
Comment 1 Tsuchiya Yuto (kitakar5525) 2020-02-04 18:03:02 UTC
Created attachment 287115 [details]
acpidump from my SB1
Comment 2 Tsuchiya Yuto (kitakar5525) 2020-02-04 18:05:22 UTC
Created attachment 287117 [details]
s2idle: unload battery module before suspend

Here is a log where I unloaded battery module before going into s2idle.
No automatic wakeup issue.
Comment 3 Tsuchiya Yuto (kitakar5525) 2020-02-04 18:48:49 UTC
Created attachment 287119 [details]
the whole dmesg log after a fresh boot (5.5.1)

Here is the whole dmesg log after fresh boot (does not contain suspend/resume
sequence), for what it's worth.
Comment 4 Rafael J. Wysocki 2020-02-06 10:42:12 UTC
Yes, it looks like the system is woken up by battery events while suspended, but this doesn't seem to be related to the EC GPE.  It looks like the battery events are triggered by another GPE which is also a wakeup one.

Please run this:

$ grep -r '.*' /sys/firmware/acpi/interrupts/

before and after a suspend-to-idle cycle and (if the system wakes up from it immediately due to a battery event) attach the output of it from both runs.
Comment 5 Tsuchiya Yuto (kitakar5525) 2020-02-06 11:47:44 UTC
Created attachment 287167 [details]
contents of /sys/firmware/acpi/interrupts/ before s2idle
Comment 6 Tsuchiya Yuto (kitakar5525) 2020-02-06 11:51:43 UTC
Created attachment 287169 [details]
contents of /sys/firmware/acpi/interrupts/ after s2idle

Data was collected after s2idle when an automatic wakeup occurred.
Comment 7 Tsuchiya Yuto (kitakar5525) 2020-02-06 11:54:30 UTC
Thank you for looking into this issue.

And here is the diff of the contents of /sys/firmware/acpi/interrupts/
before and after a suspend-to-idle cycle

--- acpi_interrupts_before_s2idle.txt	2020-02-06 20:40:12.380455383 +0900
+++ acpi_interrupts_after_s2idle.txt	2020-02-06 20:40:49.436671990 +0900
@@ -33,7 +33,7 @@
 /sys/firmware/acpi/interrupts/gpe76:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe38:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe0B:       0         invalid      unmasked
-/sys/firmware/acpi/interrupts/gpe66:      10  EN     enabled      unmasked
+/sys/firmware/acpi/interrupts/gpe66:      18  EN     enabled      unmasked
 /sys/firmware/acpi/interrupts/gpe28:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe56:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe18:       0         invalid      unmasked
@@ -46,7 +46,7 @@
 /sys/firmware/acpi/interrupts/error:       0
 /sys/firmware/acpi/interrupts/gpe54:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe16:       0         invalid      unmasked
-/sys/firmware/acpi/interrupts/sci:    1058
+/sys/firmware/acpi/interrupts/sci:    1708
 /sys/firmware/acpi/interrupts/gpe44:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe7E:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe72:       0         invalid      unmasked
@@ -70,7 +70,7 @@
 /sys/firmware/acpi/interrupts/gpe2E:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe22:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe5C:       0         invalid      unmasked
-/sys/firmware/acpi/interrupts/gpe50:    1048  EN     enabled      unmasked
+/sys/firmware/acpi/interrupts/gpe50:    1692  EN     enabled      unmasked
 /sys/firmware/acpi/interrupts/gpe1E:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe79:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe12:       0         invalid      unmasked
@@ -97,7 +97,7 @@
 /sys/firmware/acpi/interrupts/gpe67:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe00:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe3A:       0         invalid      unmasked
-/sys/firmware/acpi/interrupts/gpe_all:    1058
+/sys/firmware/acpi/interrupts/gpe_all:    1710
 /sys/firmware/acpi/interrupts/gpe29:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe57:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe2A:       0         invalid      unmasked
@@ -132,6 +132,6 @@
 /sys/firmware/acpi/interrupts/gpe05:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe3F:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe33:       0         invalid      unmasked
-/sys/firmware/acpi/interrupts/gpe6D:       0  EN     enabled      unmasked
+/sys/firmware/acpi/interrupts/gpe6D:       0         disabled     unmasked
 /sys/firmware/acpi/interrupts/ff_slp_btn:       0         invalid      unmasked
 /sys/firmware/acpi/interrupts/gpe61:       0  EN     enabled      unmasked
Comment 8 Rafael J. Wysocki 2020-02-06 16:07:17 UTC
Thanks!

Please also do (as root)

# echo 1 > /sys/power/pm_debug_messages

without enabling the ACPI debug and attach the output of dmesg run after a suspend-to-idle cycle in which the immediate wakeup has occurred.
Comment 9 Tsuchiya Yuto (kitakar5525) 2020-02-06 16:49:40 UTC
Created attachment 287179 [details]
dmesg: s2idle cycle with pm_debug_messages

The PC does not necessarily wake up immediately. Sometimes it wakes up immediately,
sometimes after about 30 seconds and sometimes wakes up after minutes.

The sure way of waking up the PC automatically is plugging/unplugging AC adapter.
So, this time, I plugged the AC adapter into PC to cause the automatic wake up.
Comment 10 Rafael J. Wysocki 2020-02-06 21:20:53 UTC
According to the information collected so far, the premature wakeup is caused by the EC.  Now, this means that setting the ec_no_wakeup parameter of the acpi module should make the problem go away, so please try to do that.

The ec_no_wakeup module parameter is writable, so you should be able to set it at run time (as root) by echoing 1 to /sys/modules/acpi/parameters/ec_no_wakeup.
Comment 11 Rafael J. Wysocki 2020-02-06 21:24:09 UTC
Created attachment 287201 [details]
Suspend-to-idle wakeup debug patch

Nevertheless, let's try to find out why calling pm_system_cancel_wakeup() from acpi_s2idle_wake() is not sufficient.

Please apply this patch, retest and attach the output of dmesg including at least one suspend-to-idle and resume cycle (without ACPI debug and with pm_debug_messages set to 1).
Comment 12 Rafael J. Wysocki 2020-02-06 21:26:54 UTC
(In reply to Rafael J. Wysocki from comment #10)
> According to the information collected so far, the premature wakeup is
> caused by the EC.  Now, this means that setting the ec_no_wakeup parameter
> of the acpi module should make the problem go away, so please try to do that.
> 
> The ec_no_wakeup module parameter is writable, so you should be able to set
> it at run time (as root) by echoing 1 to
> /sys/modules/acpi/parameters/ec_no_wakeup.

That should be:

/sys/module/acpi/parameters/ec_no_wakeup

Sorry for the typo.
Comment 13 Tsuchiya Yuto (kitakar5525) 2020-02-07 01:15:05 UTC
Created attachment 287203 [details]
dmesg: s2idle-wake-debug

I guess you intended to do the following thing instead
(I just removed the second hunk):

Index: linux-pm/drivers/base/power/wakeup.c
===================================================================
--- linux-pm.orig/drivers/base/power/wakeup.c
+++ linux-pm/drivers/base/power/wakeup.c
@@ -924,11 +924,18 @@ bool pm_wakeup_pending(void)
 	return ret || atomic_read(&pm_abort_suspend) > 0;
 }
 
-void pm_system_wakeup(void)
+void __pm_system_wakeup(void)
 {
 	atomic_inc(&pm_abort_suspend);
 	s2idle_wake();
 }
+
+void pm_system_wakeup(void)
+{
+	__pm_system_wakeup();
+
+	dump_stack();
+}
 EXPORT_SYMBOL_GPL(pm_system_wakeup);
 
 void pm_system_cancel_wakeup(void)

---
With the above patch, I did s2idle two times.
- The first time, I observed the automatic wakeup.
- The socond time, with ec_no_wakeup, and no automatic wakeup can be seen
  (when I plug/unplug the AC adapter). I woke up the PC by lid (Power
  buttons also uses EC, I can't wake it up by the button anymore).
Comment 14 Rafael J. Wysocki 2020-02-07 12:00:28 UTC
(In reply to Tsuchiya Yuto (kitakar5525) from comment #13)
> Created attachment 287203 [details]
> dmesg: s2idle-wake-debug
> 
> I guess you intended to do the following thing instead
> (I just removed the second hunk):

I did exactly what I wanted, but never mind, your modification probably saved us a round of back-and-forth comments. :-)
Comment 15 Rafael J. Wysocki 2020-02-07 12:06:02 UTC
Created attachment 287223 [details]
PM: s2idle: ACPI: Rework the handling of EC wakeup events

So the theory goes that the EC GPE may re-trigger immediately after acpi_s2idle_sync() and when the SCI is re-armed after that, it will re-trigger too right away and that may happen before the pm_wakeup_pending() check in s2idle_loop() and the wakeup may be counted as a genuine one.

So I have this patch (which I only have tested very lightly), so please try it (with ec_no_wakeup unset) and report back.
Comment 16 Tsuchiya Yuto (kitakar5525) 2020-02-07 15:56:12 UTC
(In reply to Rafael J. Wysocki from comment #14)
> (In reply to Tsuchiya Yuto (kitakar5525) from comment #13)
> > Created attachment 287203 [details]
> > dmesg: s2idle-wake-debug
> > 
> > I guess you intended to do the following thing instead
> > (I just removed the second hunk):
> 
> I did exactly what I wanted, but never mind, your modification probably
> saved us a round of back-and-forth comments. :-)

(Ah, I got it. The non-underscored function "pm_system_wakeup()" will be
called from another location, too and you expected it. I missed the fact.
But why I removed the second hunk is that, When I tried the debug patch from
Comment 11 without modification, the "dump_stack()" was not called when I
observed the automatic wakeup.)

(I'll try your patch from Comment 15 later!)
Comment 17 Tsuchiya Yuto (kitakar5525) 2020-02-07 17:18:35 UTC
Created attachment 287229 [details]
dmesg: acpi-s2idle-rework-wake

dmesg log with patch "PM: s2idle: ACPI: Rework the handling of EC wakeup events"
from Comment 15 (with s2idle-wake-debug patch from Comment 13).

Hmm. It still automatically wakes up with the same dmesg output as before.
Comment 18 Rafael J. Wysocki 2020-02-10 01:12:30 UTC
Created attachment 287267 [details]
PM: s2idle: ACPI: Rework the handling of EC wakeup events

This is a replacement for the patch from comment #15 that I'm going to apply.

The race addressed by it is real, so it needs fixing regardless of how unlikely it is to occur in practice.
Comment 19 Rafael J. Wysocki 2020-02-10 01:16:44 UTC
Created attachment 287269 [details]
ACPI: PM: Check status of all GPEs in acpi_s2idle_wake()

And here's another patch to try on top of the one from comment #18.

It is possible that the platform triggers a spurious SCI without any GPEs set, in which case it will be treated as a genuine wakeup, so avoid that by checking if any GPEs are set at all before returning 'true' from acpi_s2idle_wake().
Comment 20 Tsuchiya Yuto (kitakar5525) 2020-02-10 03:36:12 UTC
Created attachment 287273 [details]
dmesg: s2idle with patches from c18 and c19 (plus debug patch from c13)

I applied patches from Comment 18 and Comment 19 (plus debug patch from
Comment 13) on Linux 5.5.2 and 5.6-rc1 and I can confirm that no automatic
wakeup happened at least when I plugged/unplugged AC adapter several times.

I attached dmesg log on 5.6-rc1 for reference.

I'll put SB1 into s2idle overnight to see if the automatic wakeup occurs
and report back to you.
Comment 21 Rafael J. Wysocki 2020-02-10 10:10:30 UTC
Thanks! Please also verify that the lid wakeup still works with these patches applied.
Comment 22 Tsuchiya Yuto (kitakar5525) 2020-02-10 10:52:12 UTC
(In reply to Rafael J. Wysocki from comment #21)
> Thanks! Please also verify that the lid wakeup still works with these
> patches applied.

Yes, it still works (with the non-upstream lid driver that I mentioned in
dmesg log attached in Comment 13).
Comment 23 Tsuchiya Yuto (kitakar5525) 2020-02-11 06:24:14 UTC
Hmm... Actually, I observed automatic wakeup two times with patches
from Comment 18, Comment 19 and non-upstream patches (including
the lid driver).

I put SB1 into s2idle overnight with only the patches from c18 and c19
(plus debug patch from c13), but the automatic wakeup did not occur.
So, it might be possible that enabling wakeup from lid might be related.

I couldn't get useful dmesg log at the time when the automatic wakeup
occurred. I'll try to collect useful dmesg log, but it may take some time
because I can't find the way to reproduce the issue yet.
Comment 24 Rafael J. Wysocki 2020-02-11 19:08:16 UTC
So I've posted these two patches to replace the patch from  comment 19:

https://patchwork.kernel.org/patch/11375931/
https://patchwork.kernel.org/patch/11375931/

that (together) should be functionally equivalent to it, and I'm going to push them for merging, because the "spurious SCI wakeup" problem needs to be addressed anyway.

Now, if there are no premature wakeup issues on your system any more with these patches applied on top of the upstream code, I will be inclined to mark this bug as "resolved".
Comment 25 Rafael J. Wysocki 2020-02-11 19:09:29 UTC
Sorry, pasted the same URL twice in the previous comment.  The other patch is at:

https://patchwork.kernel.org/patch/11375923/
Comment 26 Tsuchiya Yuto (kitakar5525) 2020-02-13 13:44:35 UTC
No premature wakeup was observed with this two-day use (5.6-rc1 + your
patches + some non-upstream drivers including the lid driver).

So, I'm still not sure why I encountered the wakeup two times
(Comment 23).  I'll report back here if I see the wakeup again and also if
the wakeup seems to be still related to this bug that I originally filed.

Anyway, thank you for your work!
Comment 27 Todd Brandt 2020-02-22 12:48:39 UTC
Hi Rafael, while stress testing on 5.6.0-rc2 I've discovered that the kernel hangs on freeze (it refuses to wake up from rtc, wake on wlan, keypress, or power press), and upon git bisect the culprit appears to be this patch:

commit fdde0ff8590b4c1c41b3227f5ac4265fccccb96b
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue Feb 11 17:53:52 2020 +0100

    ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system
    
    If the platform triggers a spurious SCI even though the status bit
    is not set for any GPE when the system is suspended to idle, it will
    be treated as a genuine wakeup, so avoid that by checking if any GPEs
    are active at all before returning 'true' from acpi_s2idle_wake().
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206413
    Fixes: 56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow")
    Reported-by: Tsuchiya Yuto <kitakar@gmail.com>
    Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

If I reverse the patch and rebuild 5.6-rc2 without it freeze starts working again, so this patch clearly has an issue.
Comment 28 Tsuchiya Yuto (kitakar5525) 2020-06-23 08:01:16 UTC
follow-up:
It's difficult to say for sure that my original issue (premature wakeups)
has been completely resolved or not because I don't have any way to confirm
this issue other than suspend computer longer and longer.

However, at least I can say that I haven't observed premature
wakeups even once with overnight s2idle suspends on recent kernel versions
(with that aforementioned Lid driver).

Previously, even with commit fdde0ff8590b4 ("ACPI: PM: s2idle: Prevent
spurious SCIs from waking up the system"), I sometimes (but rare case)
observed wakeups (Comment 23).

After the commit, several commits that have reference to the commit have
been made:

    $ git log --pretty=format:"%h - %s" --grep "fdde0ff8"
    7b301750f7f8f - ACPI: EC: PM: Avoid premature returns from acpi_s2idle_wake()
    767191db8220d - platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler()
    ddfd9dcf270ce - ACPI: PM: Add acpi_[un]register_wakeup_handler()
    d5406284ff803 - ACPI: PM: s2idle: Refine active GPEs check
    63fb9623427fb - ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()

I guess those commits additionally improved the situation for
Surface Book 1, too.

(In reply to Todd Brandt from comment #27)
> Hi Rafael, while stress testing on 5.6.0-rc2 I've discovered that the kernel
> hangs on freeze (it refuses to wake up from rtc, wake on wlan, keypress, or
> power press), and upon git bisect the culprit appears to be this patch:
> 
> commit fdde0ff8590b4c1c41b3227f5ac4265fccccb96b
> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Date:   Tue Feb 11 17:53:52 2020 +0100
> 
>     ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system
> [...]
> 
> If I reverse the patch and rebuild 5.6-rc2 without it freeze starts working
> again, so this patch clearly has an issue.

What about the stress test with the recent kernel version?
Comment 29 Zhang Rui 2020-06-29 11:42:44 UTC
I think the following commit has fixed the issue found by stress test.
commit 63fb9623427fbb44e3782233b6e4714057b76ff2
Author:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AuthorDate: Fri Feb 21 01:46:18 2020 +0100
Commit:     Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Fri Feb 21 10:01:25 2020 -0800

    ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()
    
    Commit fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from
    waking up the system") overlooked the fact that fixed events can wake
    up the system too and broke RTC wakeup from suspend-to-idle as a
    result.
    
    Fix this issue by checking the fixed events in acpi_s2idle_wake() in
    addition to checking wakeup GPEs and break out of the suspend-to-idle
    loop if the status bits of any enabled fixed events are set then.
    
    Fixes: fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system")
    Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


Bug closed.