Bug 215770

Summary: Spurious wakeup from s2idle - AMD Ryzen 7 5825U with Radeon Graphics
Product: ACPI Reporter: Kai-Heng Feng (kai.heng.feng)
Component: Power-Sleep-WakeAssignee: acpi_power-sleep-wake
Status: RESOLVED DOCUMENTED    
Severity: normal CC: mario.limonciello, rjw, rui.zhang, superm1
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: mainline, linux-next Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
patch to add more verbose logging
dmesg with the patch applied

Description Kai-Heng Feng 2022-03-29 11:40:26 UTC
Originally I filed at AMDGPU gitlab [1], but now it seems to be EC related so file it here:
[   60.737155] PM: suspend-to-idle
[   60.737181] PM: ACPI EC GPE status set
[   60.737194] PM: Rearming ACPI SCI for wakeup
[16445.761061] PM: Timekeeping suspended for 16384.387 seconds
[16445.761176] PM: ACPI EC GPE status set
[16445.761198] PM: ACPI EC GPE dispatched
[16445.763354] PM: ACPI EC work flushed
[16445.769146] PM: Wakeup after ACPI Notify sync
[16445.769147] PM: resume from suspend-to-idle
[16445.771786] ACPI: EC: interrupt unblocked
[16445.809717] PM: noirq resume of devices complete after 38.173 msecs
[16445.810392] PM: early resume of devices complete after 0.569 msecs
[16445.811040] pci 0000:00:00.2: can't derive routing for PCI INT A
[16445.811045] pci 0000:00:00.2: PCI INT A: no GSI
[16445.811688] i8042: [15041] Interrupt 1, without any data

$ cat /sys/power/pm_wakeup_irq 
1

... and it's IRQ for i8042.

Because keyboard wasn't pressed, so "Interrupt 1, without any data" is pretty accurate. If keyboard was used to wakeup the laptop, there will be some data shown.

[1] https://gitlab.freedesktop.org/drm/amd/-/issues/1951
Comment 1 Kai-Heng Feng 2022-03-29 11:42:09 UTC
Not sure if this matters:
[    0.663660] ACPI Error: No handler for Region [ECRM] (00000000dee7d46d) [EmbeddedControl] (20211217/evregion-130)
[    0.663679] ACPI Error: Region EmbeddedControl (ID=3) has no handler (20211217/exfldio-261)

[    0.663692] No Local Variables are initialized for Method [_EVT]

[    0.663693] Initialized Arguments for Method [_EVT]:  (1 arguments defined for method invocation)
[    0.663693]   Arg0:   00000000fdd19f23 <Obj>           Integer 0000000000000000

[    0.663697] ACPI Error: Aborting method \_SB.GPIO._EVT due to previous error (AE_NOT_EXIST) (20211217/psparse-529)
Comment 2 Kai-Heng Feng 2022-03-29 11:42:51 UTC
Created attachment 300637 [details]
dmesg
Comment 3 Mario Limonciello (AMD) 2022-07-25 18:43:57 UTC
Can you please see if this still occurs with https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=9946e39fe8d0a5da9eb947d8e40a7ef204ba016e applied?
Comment 4 Kai-Heng Feng 2022-07-26 01:35:49 UTC
(In reply to Mario Limonciello (AMD) from comment #3)
> Can you please see if this still occurs with
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/
> ?h=linux-next&id=9946e39fe8d0a5da9eb947d8e40a7ef204ba016e applied?

The issue persists with the patch applied.
Comment 5 Mario Limonciello (AMD) 2022-07-26 01:38:42 UTC
Created attachment 301485 [details]
patch to add more verbose logging

Exactly the same?  IRQ 1 shows at /sys/power/pm_wakeup_irq?

Can you please add this patch as well to your kernel and share full log with pm_debug_messages set?  I wonder if in your circumstance you have more than 2 sources.
Comment 6 Kai-Heng Feng 2022-07-26 02:34:13 UTC
(In reply to Mario Limonciello (AMD) from comment #5)
> Created attachment 301485 [details]
> patch to add more verbose logging
> 
> Exactly the same?  IRQ 1 shows at /sys/power/pm_wakeup_irq?

Yes. Some laptops show IRQ 1, some show IRQ 9. But as you can see in the log the IRQ 1 is the one that triggers wakeup.

As soon as i8042 wakeup is disabled the issue goes away.

> 
> Can you please add this patch as well to your kernel and share full log with
> pm_debug_messages set?  I wonder if in your circumstance you have more than
> 2 sources.
Comment 7 Kai-Heng Feng 2022-07-26 02:34:37 UTC
Created attachment 301486 [details]
dmesg with the patch applied
Comment 8 Mario Limonciello (AMD) 2022-07-26 02:39:37 UTC
Thanks.  It was a bit of a long shot that this helped this design, but that commit does fix some IRQ1 related problems on other designs.

I'll send up this patch for the extra debugging message separately for Rafael to take a look at, I think it's useful for issues like this.

I still do think this issue at it's core is a platform firmware issue not a kernel issue.  One of 3 things to me:
1) Either the EC asserting i8042
2) The polarity is wrong for IRQ1 (like I mentioned for that coreboot design in AMD gitlab issue).
3) Some other source in this design asserting IRQ1 that is not EC.
Comment 9 Kai-Heng Feng 2022-07-26 05:46:27 UTC
(In reply to Mario Limonciello (AMD) from comment #8)
> Thanks.  It was a bit of a long shot that this helped this design, but that
> commit does fix some IRQ1 related problems on other designs.
> 
> I'll send up this patch for the extra debugging message separately for
> Rafael to take a look at, I think it's useful for issues like this.

Yes this will be quite useful.

> 
> I still do think this issue at it's core is a platform firmware issue not a
> kernel issue.  One of 3 things to me:
> 1) Either the EC asserting i8042

Or maybe it's from IO-APIC? The EC folks guaranteed that i8042 doesn't raise the IRQ. 

> 2) The polarity is wrong for IRQ1 (like I mentioned for that coreboot design
> in AMD gitlab issue).

If polarity is wrong, the keyboard won't work at all. So I don' think it's the case here.

> 3) Some other source in this design asserting IRQ1 that is not EC.
I think AMD Taipei is trying to find the root cause here.
Comment 10 Mario Limonciello (AMD) 2022-07-26 05:50:04 UTC
> If polarity is wrong, the keyboard won't work at all. So I don' think it's
> the case here.

Actually - It can be set in either direction as long as it's consistent with rest of platform firmware configuration.  If another source that is part of the AND gate to IRQ1 has a different polarity coming out of s0i3 then that could lead to this mismatch.  This direction makes sense why to investigate IO-APIC configuration.
Comment 11 Kai-Heng Feng 2022-07-27 01:23:27 UTC
(In reply to Mario Limonciello (AMD) from comment #10)
> > If polarity is wrong, the keyboard won't work at all. So I don' think it's
> > the case here.
> 
> Actually - It can be set in either direction as long as it's consistent with
> rest of platform firmware configuration.  If another source that is part of
> the AND gate to IRQ1 has a different polarity coming out of s0i3 then that
> could lead to this mismatch.  This direction makes sense why to investigate
> IO-APIC configuration.

Shouldn't IRQ1 be exclusive to i8042? Does APIC use the same hardware IRQ line for different IRQs?
Comment 12 Mario Limonciello (AMD) 2022-07-27 01:38:57 UTC
> Shouldn't IRQ1 be exclusive to i8042? Does APIC use the same hardware IRQ
> line for different IRQs?

It depends on OEM's design.
Comment 13 Kai-Heng Feng 2022-07-28 02:50:44 UTC
(In reply to Mario Limonciello (AMD) from comment #12)
> > Shouldn't IRQ1 be exclusive to i8042? Does APIC use the same hardware IRQ
> > line for different IRQs?
> 
> It depends on OEM's design.

I was told that the issue also happens on AMD's CRB, so what's the other IRQ that shares with IRQ 1?
Comment 14 Mario Limonciello (AMD) 2022-07-28 05:27:01 UTC
That's not as straightforward a question as you may think.  A bunch of non-obvious devices can also generate IRQ 1 such as PCI SMBUS controller.

CRB doesn't generate EC events during S0i3 very frequently due to using _BTP (which many HP designs don't use).  I've not seen it first hand on CRB myself.

If it can also reproduce on CRB, it should need to be dug into by BIOS guys who can look more closely.
Comment 15 Kai-Heng Feng 2022-07-28 23:48:44 UTC
(In reply to Mario Limonciello (AMD) from comment #14)
> That's not as straightforward a question as you may think.  A bunch of
> non-obvious devices can also generate IRQ 1 such as PCI SMBUS controller.

I didn't know that.

So when PCI SMBUS controller raises IRQ, the kernel calls i8042's IRQ handler as usual?

> 
> CRB doesn't generate EC events during S0i3 very frequently due to using _BTP
> (which many HP designs don't use).  I've not seen it first hand on CRB
> myself.
> 
> If it can also reproduce on CRB, it should need to be dug into by BIOS guys
> who can look more closely.

I think they are getting close to the root cause, are you in the discussion loop?
Comment 16 Mario Limonciello (AMD) 2022-07-29 00:08:55 UTC
> So when PCI SMBUS controller raises IRQ, the kernel calls i8042's IRQ handler
> as usual?

In this case that's what I would expect happens.  It's just like shared IRQ in the kernel between two drivers.

> I think they are getting close to the root cause, are you in the discussion
> loop?

No, but I'll reach out to some of them to find out more.
Comment 17 Mario Limonciello (AMD) 2022-07-29 22:44:55 UTC
Having looked over that discussion; I'm confident this is a hardware/firmware bug, nothing for Linux to do here unless a W/A is bandaged in for it for this system.
Comment 18 Kai-Heng Feng 2022-07-30 06:36:55 UTC
(In reply to Mario Limonciello (AMD) from comment #17)
> Having looked over that discussion; I'm confident this is a
> hardware/firmware bug, nothing for Linux to do here unless a W/A is bandaged
> in for it for this system.

Is it possible to have an erratum for this issue?
Comment 19 Mario Limonciello (AMD) 2022-07-31 13:58:48 UTC
I don't know.  It depends on which place the bug ends up living.  It's not a Linux bug is all I'm saying closing this issue.