Bug 216101 - lost acpi events after resume from suspend - AMD Ryzen 6800H
Summary: lost acpi events after resume from suspend - AMD Ryzen 6800H
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: AMD Linux
: P1 normal
Assignee: Mario Limonciello (AMD)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-09 10:26 UTC by Catalin
Modified: 2022-12-05 21:32 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.18.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
patch 1/4 (3.62 KB, application/mbox)
2022-09-08 13:04 UTC, Mario Limonciello (AMD)
Details
patch 2/4 (1.21 KB, application/mbox)
2022-09-08 13:04 UTC, Mario Limonciello (AMD)
Details
patch 3/4 (2.15 KB, application/mbox)
2022-09-08 13:05 UTC, Mario Limonciello (AMD)
Details
patch 4/4 (1.85 KB, application/mbox)
2022-09-08 13:05 UTC, Mario Limonciello (AMD)
Details

Description Catalin 2022-06-09 10:26:56 UTC
Hello,

I have a ASUS TUF Gaming A17 FA707RE laptop.
Ryzen 6800H, nvidia dgpu.

There are some problems with acpi events detection.
With clean start almost everything works, fn keys, battery status, except the close lid action.
Kernel detects closed/open state but no acpi event is generated in acpi_listen.

After resuming from standby ( which always works! ) no acpi events for fn keys, lid,  also battery status plug or unplug not detected. When plugged in the percentage seems to be correct, acpi reports battery discharging.
Power button and sound off events are the only ones detected.
I tried reloading asus_nb_wmi module with no success. 
I have no useful errors in logs.

Pretty much same behavior on 5.17.11 and 5.19rc1. 5.19rc1 kernel does not detect lid state.

I will supply any information you need.

Thank you,
Catalin
Comment 1 Catalin 2022-06-09 13:56:56 UTC
Correction: close lid works on fresh start with 5.18, after resume no longer works.

OS: Linux Mint 20.3
Comment 2 Mario Limonciello (AMD) 2022-08-24 05:23:56 UTC
Can you please share your kernel config and your full kernel log with /sys/power/pm_debug_messages set before you suspend.
Comment 3 Catalin 2022-09-07 14:03:08 UTC
Sorry, I missed your reply ...

I found same issue, also Ryzen 6000 here: https://bbs.archlinux.org/viewtopic.php?id=279102

I only have s2idle not S3.

Here you have config for 5.19.7:

https://antebit.com/kernel-5.19.7-config.txt

and this is the kernel log after
echo 1 > /sys/power/pm_debug_messages && dmesg -c && systemctl suspend:

https://antebit.com/pm_debug_messages.txt


Thank you,
Catalin
Comment 4 Mario Limonciello (AMD) 2022-09-07 14:42:07 UTC
> I found same issue, also Ryzen 6000 here:
> https://bbs.archlinux.org/viewtopic.php?id=279102

Interesting finding.

> I only have s2idle not S3.

Right - to be expected.

> Here you have config for 5.19.7:

OK good, you have the CONFIG_PINCTRL_AMD driver in place.

And by testing with 5.19.7 you've picked up the commit I was hoping was there (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/pinctrl?h=linux-5.19.y&id=4d8e2fa66adb5514380f5b680796ab0586140447)

Can you share your whole kernel log and an acpidump too?  Please attach to this bug report so in case you host goes down it's still accessible.
Comment 5 Mario Limonciello (AMD) 2022-09-07 14:46:26 UTC
Maybe this is the same on another system too: https://forums.lenovo.com/t5/Other-Linux-Discussions/Firmware-regression-No-more-udev-power-supply-events/m-p/5166407
Comment 6 Catalin 2022-09-07 16:06:50 UTC
This is the log, fresh restart plus one suspend:

https://antebit.com/kernel_log.txt, text is too large to post it here

My laptop is Asus not Lenovo. To be more clear, the widget that reads the status of charging is not working, after resume. The laptop is charging.
Also when I close/open the lid I need to rely on /proc/acpi/button/lid/*/state, and put laptop on sleep with a script.

power_supply events seem to be ok:

sudo udevadm monitor --subsystem-match="power_supply"
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[212.640913] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)
UDEV  [212.643702] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)
KERNEL[212.658193] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)
UDEV  [212.659753] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)
KERNEL[223.347727] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)
UDEV  [223.356580] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)
KERNEL[223.372205] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)
UDEV  [223.378661] change   /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0003:00/power_supply/ACAD (power_supply)

acpi_listen, plug and unplug the power - fresh restart:

ac_adapter ACPI0003:00 00000002 00000000
ac_adapter ACPI0003:00 00000080 00000000
ac_adapter ACPI0003:00 00000080 00000000
 0B3CBB35-E3C2- 000000ff 00000000  ============ acpi event
battery PNP0C0A:00 00000081 00000001
battery PNP0C0A:00 00000080 00000001
ac_adapter ACPI0003:00 00000002 00000001
ac_adapter ACPI0003:00 00000080 00000001
ac_adapter ACPI0003:00 00000080 00000001
 0B3CBB35-E3C2- 000000ff 00000000
battery PNP0C0A:00 00000081 00000001
battery PNP0C0A:00 00000080 00000001
battery PNP0C0A:00 00000081 00000001
battery PNP0C0A:00 00000080 00000001


after suspend:

ac_adapter ACPI0003:00 00000002 00000000
ac_adapter ACPI0003:00 00000080 00000000
ac_adapter ACPI0003:00 00000002 00000001
ac_adapter ACPI0003:00 00000080 00000001

acpi_dump: https://antebit.com/acpi_dump.txt, server is always up, posts are too large...

I did not pick the commit for CONFIG_PINCTRL_AMD, was in .config fiel from Ubuntu.


Catalin
Comment 7 Mario Limonciello (AMD) 2022-09-07 21:28:25 UTC
OK I think I see what might be going on.  In the PEP device there is an extra call in the 11e00d56-ce64-47ce-837b-1f898f9aa461 case for modern standby exit:

                        Case (0x08)
                        {
                            M000 (0x3E08)
                            If (CondRefOf (\_SB.PCI0.GPP7.DEV0))
                            {
                                M460 ("    Notify (\\_SB.PCI0.GPP7.DEV0, 0x1)\n", Zero, Zero, Zero, Zero, Zero, Zero)
                                Notify (\_SB.PCI0.GPP7.DEV0, One) // Device Check
                            }

                            Return (Zero)
                        }

In the e3f32452-febc-43ce-9039-932122d37721 case (which is used by default in Linux) I don't see that call in the matching exit routines:
                        Case (0x03)
                        {
                            M000 (0x3E05)
                            Return (Zero)
                        }
or 
                        Case (0x05)
                        {
                            M000 (0x3E03)
                            Return (Zero)
                        }

I would hypothesize this is the reason for the problem.

Please have a try with this change.  It's not upstreamable like this, but given it's a firmware bug it would at least prove the correct root cause and we can think about how to do it better.

diff --git a/drivers/acpi/x86/s2idle.c b/drivers/acpi/x86/s2idle.c
index f9ac12b778e6..c9a7dd474892 100644
--- a/drivers/acpi/x86/s2idle.c
+++ b/drivers/acpi/x86/s2idle.c
@@ -394,11 +394,6 @@ static int lps0_device_attach(struct acpi_device *adev,
                        lps0_dsm_func_mask = (lps0_dsm_func_mask << 1) | 0x1;
                        acpi_handle_debug(adev->handle, "_DSM UUID %s: Adjusted function mask: 0x%x\n",
                                          ACPI_LPS0_DSM_UUID_AMD, lps0_dsm_func_mask);
-               } else if (lps0_dsm_func_mask_microsoft > 0 &&
-                               (!strcmp(hid, "AMDI0007") ||
-                                !strcmp(hid, "AMDI0008"))) {
-                       lps0_dsm_func_mask_microsoft = -EINVAL;
-                       acpi_handle_debug(adev->handle, "_DSM Using AMD method\n");
                }
        } else {
                rev_id = 1;
Comment 8 Catalin 2022-09-08 05:55:42 UTC
Those 2 lines were missing on my side:

> -                               (!strcmp(hid, "AMDI0007") ||
> -                                !strcmp(hid, "AMDI0008"))) {


But the patch works!!!
Very well, all acpi events are detected: fn keys, close/open lid ( I can get rid of the script which handle it!), plug/unplug power, suspend/resume works flawlessly and very fast, battery drain in suspend is 2% per hour.

Extra:
I would add that asus_wmi_ec_sensors ( which is set to be removed from kernel ) must be removed or blacklisted since 5.19.x. With it are also problems, although the module is not used. On my side, also on another Asus laptop model, kernel no longer detect close/lid action via /proc/acpi/button/lid/*/state, but also some other battery problems: https://bugs.archlinux.org/task/75653
I do not know if there is any connection with what is here.

Thank you!
Comment 9 Catalin 2022-09-08 08:36:59 UTC
I found another small issue, do not know if it's related or is with nvidia (515 driver) or should I open another bug here, but when I have external monitors attached, hdmi or usb-c, which are connected to nvidia chip, laptop very quickly resume from standby.

In dmesg I have:

[ 1917.472805] PM: suspend-to-idle == suspend
[ 1918.094089] ACPI: PM: Wakeup unrelated to ACPI SCI  ==== here is the wakeup
[ 1918.094094] PM: resume from suspend-to-idle
[ 1918.131991] ACPI: EC: interrupt unblocked
[ 1918.651205] PM: noirq resume of devices complete after 519.472 msecs
[ 1918.655315] PM: early resume of devices complete after 3.986 msecs
[ 1918.655574] asus_wmi: Unknown key code 0xc0
[ 1918.656635] Timekeeping suspended for 0.381 seconds
Comment 10 Mario Limonciello (AMD) 2022-09-08 13:04:34 UTC
Created attachment 301770 [details]
patch 1/4
Comment 11 Mario Limonciello (AMD) 2022-09-08 13:04:50 UTC
Created attachment 301771 [details]
patch 2/4
Comment 12 Mario Limonciello (AMD) 2022-09-08 13:05:08 UTC
Created attachment 301772 [details]
patch 3/4
Comment 13 Mario Limonciello (AMD) 2022-09-08 13:05:23 UTC
Created attachment 301773 [details]
patch 4/4
Comment 14 Mario Limonciello (AMD) 2022-09-08 13:10:45 UTC
> But the patch works!!!

Well that's great news.  It confirms this is an ASUS BIOS bug.

For now I've attached a series that adds a quirk for this bug that I think is more likely upstreamable.  If you can please test it on top of 6.0-rc4 and see if things work now?

If they don't I probably got the DMI data for your system wrong.  In that case, please use add this to your kernel command line:

acpi.prefer_microsoft_guid=1 pm_debug_messages acpi.dyndbg='file drivers/acpi/x86/s2idle.c +p'

and try again.

If that works, please share your dmidecode output so I can get the correct strings and respin patch 4/4.

If that doesn't work, please share your full kernel log.

> Extra:

I don't see any connection, this is a separate issue you should work with owners of those drivers.

> but when I have external monitors attached, hdmi or usb-c, which are
> connected to nvidia chip,

You mean that connecting an external monitor causes the system to wake up?
Comment 15 Catalin 2022-09-08 15:20:40 UTC
> For now I've attached a series that adds a quirk for this bug that I think
> is more likely upstreamable.  If you can please test it on top of 6.0-rc4
> and see if things work now?

Works as expected! 
I'll keep it under observation for a few days, just in case.

> You mean that connecting an external monitor causes the system to wake up?
No. 

When external monitor is connected ( signal is coming from nvidia chip )
and want to put it in standby it comes back right away.
When is on hdmi, when it comes back there is always signal on monitor.
When it is on usb-c ( display port) when it comes back monitor is off. If I try again works, because it does not know that the monitor is connected.
I need to plug again to have signal on usb-c.

In logs I have

[ 1917.472805] PM: suspend-to-idle == suspend
[ 1918.094089] ACPI: PM: Wakeup unrelated to ACPI SCI  ==== here is the wakeup
[ 1918.094094] PM: resume from suspend-to-idle
[ 1918.131991] ACPI: EC: interrupt unblocked
[ 1918.651205] PM: noirq resume of devices complete after 519.472 msecs
[ 1918.655315] PM: early resume of devices complete after 3.986 msecs
[ 1918.655574] asus_wmi: Unknown key code 0xc0
[ 1918.656635] Timekeeping suspended for 0.381 seconds
Comment 16 Mario Limonciello (AMD) 2022-09-08 15:25:49 UTC
> Works as expected! 
> I'll keep it under observation for a few days, just in case.

OK.  Let me review that series with some other guys.


> When external monitor is connected ( signal is coming from nvidia chip )
> and want to put it in standby it comes back right away.
> When is on hdmi, when it comes back there is always signal on monitor.
> When it is on usb-c ( display port) when it comes back monitor is off. If I
> try > again works, because it does not know that the monitor is connected.
> I need to plug again to have signal on usb-c.

This is a different unrelated issue.  You won't get any help on the kernel bug tracker for out of tree modules.  If you can reproduce it with nouveau, you should open a separate bug for it for that.
Comment 17 Catalin 2022-09-08 15:32:48 UTC
> This is a different unrelated issue.  You won't get any help on the kernel
> bug tracker for out of tree modules.  If you can reproduce it with nouveau,
> you should open a separate bug for it for that.

It's not so important, maybe the fix will come one day..
Noveau does not work at all on external monitors...

Thank you once again for your support!
Comment 18 Travis Glenn Hansen 2022-09-11 17:00:37 UTC
I have a Lenovo Slim 7 ProX 14ARH7 that I think may be experiencing this same issue.

I have applied the patch on top of 6.0rc4 and it does not however seem to solve the issue for me (I added s2idle.prefer_microsoft_guid=1 to boot options).

This thread https://www.reddit.com/r/Lenovo/comments/w57eeg/comment/in64lbw/ mentions /sys/firmware/acpi/interrupts/gpe09 not incrementing after suspend.

Some other (perhaps useless) info here: https://bbs.archlinux.org/viewtopic.php?pid=2056912#p2056912
Comment 19 Mario Limonciello (AMD) 2022-09-11 20:52:46 UTC
Please open your own issue and let's debug it there. If it's the same root cause we can add you to the quirk list, but there is zero information to indicate so right now.
Comment 20 Travis Glenn Hansen 2022-09-12 03:19:38 UTC
I have opened this: https://bugzilla.kernel.org/show_bug.cgi?id=216473

I think there are some strong similarities for sure:

- the symptoms are nearly identical (works on boot, fails after suspend)
- the same keys/events/etc fail (plug/unplug may function in my case)
- both are Rembrandt
Comment 21 Mario Limonciello (AMD) 2022-09-26 13:36:11 UTC
The kernel solution has been queued up for 6.1 (https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=d0f61e89f08dd46a090da50f5d747204673f70ea)
Comment 22 Paul 2022-12-03 01:12:15 UTC
I have the 15.6" version of this laptop with an AMD Ryzen 6800H that exhibits the same behaviour, but I notice that the patch for 6.1 is only to the "ASUS TUF Gaming A17". Understandable at this point of course.

The model of my laptop is:

Asus TUF A15 FA507RM
Comment 23 Mario Limonciello (AMD) 2022-12-03 01:32:22 UTC
Paul - can you please test with 6.1-rc7 and the latest BIOS from ASUS?

If you're still affected please:
1) open your own issue and CC me.
2) Attach to the issue a dmesg log, acpidump and dmidecode output

We will determine next steps after that.
Comment 24 Paul 2022-12-03 16:48:29 UTC
Thanks Mario, I have submitted a new issue here:

https://bugzilla.kernel.org/show_bug.cgi?id=216768

Note You need to log in before you can comment on or make changes to this bug.