Bug 216295
Summary: | Spurious wakeup from s2idle caused by AER | ||
---|---|---|---|
Product: | Drivers | Reporter: | Kai-Heng Feng (kai.heng.feng) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | normal | CC: | bjorn, mario.limonciello, wse |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | mainline, linux-next | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
dmesg with debug message lspci dmesg lspci dmesg with target power state reordered proposed patch |
Created attachment 301498 [details]
dmesg with debug message
pm_async = 0, pm_debug_messages = 1, dynamic debug enabled for PCI, print IRQs in pm_system_irq_wakeup(), and pci_aer_clear_status() removed from pci_restore_state():
Spurious IRQ when root port 01.0 is set to D3cold:
[ 105.756581] pcieport 0000:00:01.0: PME# enabled
[ 106.233587] PM: DEBUG: pm_system_irq_wakeup 122 0
[ 106.324125] pcieport 0000:00:01.0: power state changed by ACPI to D3cold
[ 106.324135] pcieport 0000:00:01.0: PCI PM: Suspend power state: D3cold
ACPI SCI event shouldn't wake the system up, but since an IRQ 122 is already there, a spurious wakeup occurred:
[ 106.327529] PM: DEBUG: pm_system_irq_wakeup 122 9
[ 106.329297] PM: suspend-to-idle
[ 106.329456] ACPI: EC: ACPI EC GPE status set
[ 106.329475] ACPI: PM: Wakeup after ACPI Notify sync
[ 106.329476] PM: resume from suspend-to-idle
[ 106.330920] ACPI: EC: interrupt unblocked
The error being printed out by AER service's ISR:
[ 106.808712] pcieport 0000:00:01.0: AER: Corrected error received: 0000:00:01.0
[ 106.808727] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[ 106.808731] pcieport 0000:00:01.0: device [8086:4c01] error status/mask=00000001/00002000
[ 106.808737] pcieport 0000:00:01.0: [ 0] RxErr
Created attachment 301499 [details]
lspci
Created attachment 304799 [details]
dmesg
In addition to D3cold, the issue can be observed on D3hot case too.
Created attachment 304800 [details]
lspci
$ cat /sys/power/pm_wakeup_irq 122 [ 0.838030] pcieport 0000:00:1c.0: PME: Signaling with IRQ 122 [ 0.838127] pcieport 0000:00:1c.0: AER: enabled with IRQ 122 ... [ 697.102377] PM: Triggering wakeup from IRQ 122 Can you double check the PEP constraints for 0000:00:1c.0? You can turn on dynamic debugging for drivers/acpi/x86/s2idle.c on kernel command line and they'll be printed in your dmesg. It's unlikely; but if any of them are /not/ aiming for D3hot/D3cold at suspend then my patch series for a similar issue of wrong states at suspend https://lore.kernel.org/linux-pci/20230809185453.40916-1-mario.limonciello@amd.com/T/#t may help. And yes I saw that some of that is enabled in your most recent dmesg, but the constraints enumeration happens at startup, so it needs to be on your kernel command line. https://github.com/torvalds/linux/blob/v6.5-rc5/drivers/acpi/x86/s2idle.c#L270 (In reply to Mario Limonciello (AMD) from comment #6) > Can you double check the PEP constraints for 0000:00:1c.0? You can turn on > dynamic debugging for drivers/acpi/x86/s2idle.c on kernel command line and > they'll be printed in your dmesg. > > It's unlikely; but if any of them are /not/ aiming for D3hot/D3cold at > suspend then my patch series for a similar issue of wrong states at suspend > https://lore.kernel.org/linux-pci/20230809185453.40916-1-mario. > limonciello@amd.com/T/#t may help. This series doesn't help. Thanks for confirming. It was a long shot for your issue. (In reply to Mario Limonciello (AMD) from comment #7) > And yes I saw that some of that is enabled in your most recent dmesg, but > the constraints enumeration happens at startup, so it needs to be on your > kernel command line. > > https://github.com/torvalds/linux/blob/v6.5-rc5/drivers/acpi/x86/s2idle. > c#L270 [ 0.760646] ACPI: \_SB_.PEPD: index:2 Name:\_SB.PR02 [ 0.760647] ACPI: \_SB_.PEPD: uid:255 min_dstate:D0 Interesting... Is that device considered 'ACPI power manageable' by the kernel? If it is then re-ordering the last patch in the series to prefer constraints as first choice might actually change things as it would prevent it from going into D3 (the constraints don't say it needs to). (In reply to Mario Limonciello (AMD) from comment #11) > Interesting... Is that device considered 'ACPI power manageable' by the > kernel? Yes, the root port has _PS0 and _PS3 methods so it's considered power manageable. > > If it is then re-ordering the last patch in the series to prefer constraints > as first choice might actually change things as it would prevent it from > going into D3 (the constraints don't say it needs to). Reordering can keep the root port at D0. However the same issue can still be observed. Created attachment 304815 [details]
dmesg with target power state reordered
Comment on attachment 304815 [details] dmesg with target power state reordered Thanks, so this isn't the solution for your issue then. I'm curious though; with it re-ordered and your AER patch in place, do you get to deepest state? It would keep several of your root ports at D0, and if that still works for you I might change the series as well. > [ 1.148467] pcieport 0000:00:1c.0: AER: Corrected error received: > 0000:01:00.0 Looking at the log, I notice that you have AER happening even at bootup. Is something wrong with the card reader or card reader driver perhaps? > It would keep several of your root ports at D0, and if that still works for
> you I might change the series as well.
But FWIW if it does work for you, it at least needs some more consideration for my systems. I've found that moving it earlier leads to some devices that should be in D3cold over s2idle being put into D3hot which causes major problems.
*** Bug 217082 has been marked as a duplicate of this bug. *** Comming from https://bugzilla.kernel.org/show_bug.cgi?id=217082 Let me know if I can help debug this. (In reply to Werner Sembach [TUXEDO] from comment #17) > Comming from https://bugzilla.kernel.org/show_bug.cgi?id=217082 > > Let me know if I can help debug this. Is your case caused by NVIDIA GFX? It is caused by "PEG1" but i don't know what that device actually is. Created attachment 306474 [details]
proposed patch
Proposed patch for this issue, based on v6.10-rc1. Would love to hear any testing results.
Confirmed the patch solves my issue. |
Created attachment 301497 [details] dmesg [ 248.265121] PM: suspend-to-idle [ 248.265280] ACPI: EC: ACPI EC GPE status set [ 248.265303] ACPI: PM: Wakeup after ACPI Notify sync [ 248.265305] PM: resume from suspend-to-idle [ 248.269770] ACPI: EC: interrupt unblocked