Bug 218265 - Immediate wake after S2 suspend, intermittently, with all /proc/acpi/wakeup sources disabled
Summary: Immediate wake after S2 suspend, intermittently, with all /proc/acpi/wakeup s...
Status: NEW
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: acpi_power-sleep-wake
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-15 04:01 UTC by Mark Haun
Modified: 2024-01-20 20:38 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.7-rc6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg from suspend which immediately wakes (2.89 KB, text/plain)
2023-12-15 04:01 UTC, Mark Haun
Details
dmesg from suspend which works correctly (3.03 KB, text/plain)
2023-12-15 04:02 UTC, Mark Haun
Details
amd_s2idle.py log (45.38 KB, text/plain)
2023-12-24 18:39 UTC, Mark Haun
Details
Full dmesg (108.06 KB, text/plain)
2023-12-24 18:40 UTC, Mark Haun
Details

Description Mark Haun 2023-12-15 04:01:54 UTC
Created attachment 305599 [details]
dmesg from suspend which immediately wakes

On a new Lenovo Thinkpad P16s with kernel 6.6.6, suspend is immediately followed by wake, about half of the time.  (If sleep lasts for three seconds, it will last indefinitely.)  There is no apparent difference between lid close, `zzz`, and `loginctl suspend`.  I see nothing suspicious in dmesg, and I see no difference in the messages between the fail case (immediate wake-up) and the success case (wake-up later).

All ACPI wake sources are already set to "disabled" in /proc/acpi/wakeup.  All potential wake options, like wake on LAN, are disabled in the BIOS.  Observability is poor because I know of no way to log or trace the actual wake event.  E.g. even with "everything" disabled, the laptop's touchpad or keyboard still wake the system, but I only know that by human observation, not proper logging.  Is there some way to make the kernel log the wake source?
Comment 1 Mark Haun 2023-12-15 04:02:53 UTC
Created attachment 305600 [details]
dmesg from suspend which works correctly
Comment 2 Bagas Sanjaya 2023-12-15 06:22:08 UTC
(In reply to Mark Haun from comment #1)
> Created attachment 305600 [details]
> dmesg from suspend which works correctly

From what version?
Comment 3 Mark Haun 2023-12-15 15:31:59 UTC
(In reply to Bagas Sanjaya from comment #2)
> (In reply to Mark Haun from comment #1)
> > Created attachment 305600 [details]
> > dmesg from suspend which works correctly
> 
> From what version?

I'm sorry, I don't understand.  Version of what?
Comment 4 Bagas Sanjaya 2023-12-16 08:37:00 UTC
On 12/15/23 22:31, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218265
> 
> --- Comment #3 from Mark Haun (haunma@keteu.org) ---
> (In reply to Bagas Sanjaya from comment #2)
>> (In reply to Mark Haun from comment #1)
>>> Created attachment 305600 [details]
>>> dmesg from suspend which works correctly
>>
>> From what version?
> 
> I'm sorry, I don't understand.  Version of what?
> 

From what kernel version you attached that dmesg?

Confused...
Comment 5 Mark Haun 2023-12-17 02:11:33 UTC
(In reply to Bagas Sanjaya from comment #4)
> On 12/15/23 22:31, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=218265
> > 
> > --- Comment #3 from Mark Haun (haunma@keteu.org) ---
> > (In reply to Bagas Sanjaya from comment #2)
> >> (In reply to Mark Haun from comment #1)
> >>> Created attachment 305600 [details]
> >>> dmesg from suspend which works correctly
> >>
> >> From what version?
> > 
> > I'm sorry, I don't understand.  Version of what?
> 
> From what kernel version you attached that dmesg?
> Confused...

6.6.6 like it says in the original report---why?  Is there something unexpected in the dmesg?  It would be great if there were.  I can't see anything wrong, unfortunately :(
Comment 6 Bagas Sanjaya 2023-12-18 05:56:49 UTC
Can you also check current mainline (v6.7-rc6)?
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-12-20 10:03:12 UTC
Did this work better with earlier kernel version? Or did you just get the machine?

And can you please attach a full dmesg (e.g. with everything since boot in it).
Comment 8 Mark Haun 2023-12-22 06:15:25 UTC
I just got the machine (and Phoenix 7840 U/S/HS APUs are fairly new anyway), so can't comment on earlier kernels.  I am now running 6.7-rc6 and need to run a few more days to see if the problem will recur.  Suspend seems to work fine many times in a row (after manually disabling all wakeups), then the problem pops up again.  We shall see.

Coming back to the original report, is there any way to log/trace ACPI wakeup events?  Some option I could turn on to start logging them in dmesg?  It seems weird that the only way to test is "circumstantial" i.e. experimentally testing which actions cause a wake and which don't.  This is obviously problematic for any intermittent failure.

I have observed that the built-in trackpad and/or keyboard will usually---but not always!---wake the system even when all /proc/acpi/wakeup sources show as disabled.  The power button will *always* wake the system when all /proc/acpi/wakeup sources show as disabled... one of which is "sleep button".  It would sure be nice to know which interrupt/event actually causes the wakeup, unless this is obfuscated by the BIOS?
Comment 9 Mark Haun 2023-12-22 20:27:14 UTC
Investigating some more today while I wait to see if there are still failures.

Here is my /proc/acpi/wakeup (after manually disabling everything):

Device	S-state	  Status   Sysfs node
GPP0	  S0	*disabled
GPP1	  S4	*disabled
GPP2	  S3	*disabled
GPP5	  S0	*disabled  pci:0000:00:02.1
GPP6	  S4	*disabled  pci:0000:00:02.2
GP11	  S4	*disabled
SWUS	  S4	*disabled
GP12	  S4	*disabled  pci:0000:00:04.1
SWUS	  S4	*disabled
XHC0	  S3	*disabled  pci:0000:64:00.3
XHC1	  S4	*disabled  pci:0000:64:00.4
XHC2	  S4	*disabled  pci:0000:66:00.0
NHI0	  S3	*disabled
XHC3	  S3	*disabled  pci:0000:66:00.3
NHI1	  S4	*disabled  pci:0000:66:00.6
XHC4	  S3	*disabled  pci:0000:66:00.4
LID	  S4	*disabled  platform:PNP0C0D:00
SLPB	  S3	*disabled  platform:PNP0C0E:00

I see that my touchpad is on an I2C bus.  Here is the entry in /sys/bus/hid:

0018:04F3:320B.0008 -> ../../../devices/platform/AMDI0010:01/i2c-19/i2c-ELAN0688:00/0018:04F3:320B.0008

I can't tell if the controller is on the PCI bus.  Does "AMDI" stand for one of these (lspci output):

65:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 14ec
66:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 14ec

If the touchpad I2C is not in /proc/acpi/wakeup, that would explain why I [usually] am able to wake the system by vigorously wiping the touchpad.  If there are ever any spurious interrupts, could this also explain unwanted wakes?
Comment 10 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-12-23 08:38:21 UTC
(In reply to Mark Haun from comment #8)
> I just got the machine (and Phoenix 7840 U/S/HS APUs are fairly new anyway),
> so can't comment on earlier kernels. 

Then this is nothing for me. But FWIW, devs sometimes use this script to get down to the root of the problem:

https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py
Comment 11 Mark Haun 2023-12-24 18:39:04 UTC
Created attachment 305650 [details]
amd_s2idle.py log
Comment 12 Mark Haun 2023-12-24 18:40:10 UTC
Created attachment 305651 [details]
Full dmesg
Comment 13 Mark Haun 2023-12-24 19:01:56 UTC
Update:  I have not seen a recurrence of the immediate wake issue on 6.7-rc6 as long as all of the /proc/acpi/wakeup sources are set to disabled.  The issue does happen most of the time with the boot-up defaults, so I guess it may be worth documenting/logging the results.  Maybe someone can comment if that is appropriate here, or in a new bug report, or not considered a bug?

At Thorstens suggestion I tried running the amd_s2idle.py script.  Somehow it is able to tell me which interrupt/event caused the wake, information not available from dmesg, so that was helpful.

In my case, this output

○ Wakeups triggered from IRQs: [7]
○ Woke up from IRQ 7 (IR-IO-APIC 7-fasteoi pinctrl_amd)
○ gpe07 increased from 13618 to 13632
❌ Userspace suspended for 0:00:03.870541 (< minimum expected 0:00:09)

led me to examine /sys/firmware/acpi/interrupts/gpe07:
$ cat /sys/firmware/acpi/interrupts/gpe07
   12843  EN     enabled      unmasked

I spent some time testing with gpe07 masked/unmasked and disabled/enabled but the results were inconclusive.  Sometimes the system remains in suspend and sometimes it doesn't.  In *both* cases, the script usually reports that gpe07 was the cause of the wakeup, regardless of the mask or enable status of /sys/firmware/acpi/interrupts/gpe07.  It seems like that is just a placebo AFAICT.

Hope this helps.
Comment 14 Mark Haun 2023-12-24 22:55:17 UTC
In further testing, I've discovered some interactions with userspace.  Specifically, I am normally running elogind (on a Void Linux runit-based system), and "sudo zzz" often (but not always) fails to stay in suspend if elogind is running.

But elogind isn't the whole story.  With elogind disabled, zzz seems quite reliable at a VT after boot.  But after starting X (with a window manager, no DE) and firefox, zzz no longer works consistently, even without elogind running.  So my theory that elogind was doing something nefarious in the background doesn't quite hold up :(

In any case, I think this is still in the kernel-bug category, as suspend should never fail due to unrelated userspace processes (excepting those which may be attempting to manipulate power states).
Comment 15 Ulf Winkelvos 2024-01-20 20:38:28 UTC
as the demsg shows:
---
[    7.100182] ath11k_pci 0000:02:00.0: fw_version 0x110b196e fw_build_timestamp 2022-12-22 12:54 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
---
this is possibly related to: https://bugzilla.kernel.org/show_bug.cgi?id=217239

Note You need to log in before you can comment on or make changes to this bug.