Bug 204413

Summary: [bisected][regression] "PCI: Add missing link delays" causes regression on resume from suspend to ram
Product: Drivers Reporter: Matthias Andree (matthias.andree)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: normal CC: kairllur, mika.westerberg, nicholas.johnson-opensource, pmenzel+bugzilla.kernel.org
Priority: P1    
Hardware: All   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1737046
Kernel Version: 5.1.20, 5.2.5, 5.3-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: root user's lspci -vv on modified 5.2.5 kernel.
Nicholas Johnson - "sudo lspci -vv" output - by request of Mika Westerberg
Nicholas Johnson - dmesg with "sudo lspci -vv" by request of Mika Westerberg
Patch to force 100ms always for the delay
Don't use link active reporting for the delay
pci_info() instrumentation spray all over PCIe link delay code
/proc/interrupts before/after the suspend with unmodified 5.3.0-rc3.
lspci -vv on 5.3.0-rc3 before/after, as .tar.gz
.tar.xz (!) tarball of dmesg-after-suspend, and before/after lspci and /proc/interrupts
Disable PME interrupt during suspend/resume
interrupts before/after suspend
PME debugging
debug logs with 284245
Updated PME debugging
Fixed updated PME debugging
usual debug logs w/ 5.3-rc3 and patch 284253
tree lspci and inxi from affected system

Description Matthias Andree 2019-08-02 16:26:45 UTC
Description of problem:
vanilla 5.1.20 on x86_64 fails to wake from suspend (STR),
Fedora and vanilla 5.1.19 and prior were fine. 
5.2.5 (from Fedora's-200.fc30) also fails in a different way (spurious PME interrupts on pcie).

How reproducible:
always

Steps to Reproduce:
1. boot Fedora 30 and log into GNOME desktop
2. click pause symbol to suspend the computer to RAM, wait until suspended
3. press key on keyboard, or power button

Actual results:
computer tries to wake up, HDD LED blinks a bit, but console does not wake. Other computer on network cannot ping the waking computer.

"sync" hangs in D deep sleep for long amounts of time. 

Expected results:
computer wakes up properly and continues to use its devices.

Additional info:
PM tracing was enabled, the next boot returned
[    0.827930] PM:   hash matches drivers/base/power/main.c:1021

It appears that suspend to disk still works.

Computer has an NVIDIA GeForce 1060 PCIe graphics board, but 5.1.19 and prior would suspend properly, and the 5.1.20 and 5.2.5 suspend issues also occur if nvidia kernel modules are renamed out of the way and nouveau remains blocked, so it's not an nvidia driver issue.

I have "git bisect"ed this on the vanilla stable kernel, the stable/linux-5.1.y branch (because I have had starting points 5.1.19 and 5.1.20 there).
The failure-inducing commit on the branch is 3c795a8e3481e4dec071b5956e7177e816f6e7f1 (see below), which got picked from 
master's c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5, (merged through cf2d213e49fdf47e4c10dc629a3659e0026a54b8, v5.3-rc1~167)
and also got picked to stable/linux-5.2.y 5817d78eba34f6c86f5462ae2c5212f80a013357 (v5.2.3~291).

Sasha Levin's signoff is only on the stable branches, not on master.

------------------------------------------------------------
commit 3c795a8e3481e4dec071b5956e7177e816f6e7f1 (refs/bisect/bad)
Author: Mika Westerberg <mika.westerberg@linux.intel.com>  2019-06-12 12:57:38
Committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org>  2019-07-26 09:12:37
Parent: 70cc29dba925b8a99a4917c2b5fa6702d0d496d1 (bpf: fix callees pruning callers)
Child:  a98c15177f72ae3c0a736bb324e66c279bf94899 (net: netsec: initialize tx ring on ndo_open)
Branch: remotes/stable/linux-5.1.y
Follows: v5.1.19
Precedes: v5.1.20

    PCI: Add missing link delays required by the PCIe spec
    
    [ Upstream commit c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5 ]
    
    Currently Linux does not follow PCIe spec regarding the required delays
    after reset. A concrete example is a Thunderbolt add-in-card that
    consists of a PCIe switch and two PCIe endpoints:
    
      +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
                                      +-01.0-[04-36]-- DS hotplug port
                                      +-02.0-[37]----00.0 xHCI controller
                                      \-04.0-[38-6b]-- DS hotplug port
    
    The root port (1b.0) and the PCIe switch downstream ports are all PCIe
    gen3 so they support 8GT/s link speeds.
    
    We wait for the PCIe hierarchy to enter D3cold (runtime):
    
      pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
    
    When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
    PCIe switch is put to reset and its power is re-applied. This means that
    we must follow the rules in PCIe 4.0 section 6.6.1.
[...]
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 drivers/pci/pci.c               | 29 +++++++++++++++++++----------
 drivers/pci/pci.h               |  1 +
 drivers/pci/pcie/portdrv_core.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 10 deletions(-)
Comment 1 Matthias Andree 2019-08-02 17:37:56 UTC
$ lspci -vvt
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex
           +-00.2  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit
           +-01.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-01.3-[03-25]--+-00.0  Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller
           |               +-00.1  Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller
           |               \-00.2-[16-25]--+-00.0-[17]--
           |                               +-01.0-[21]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           |                               +-02.0-[22]--
           |                               +-03.0-[23]--
           |                               +-04.0-[24]--
           |                               \-08.0-[25]----00.0  ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller
           +-02.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-03.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-03.1-[26]--+-00.0  NVIDIA Corporation GP106 [GeForce GTX 1060 6GB]
           |            \-00.1  NVIDIA Corporation GP106 High Definition Audio Controller
           +-04.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-07.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-07.1-[27]--+-00.0  Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller
           +-08.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-08.1-[28]--+-00.0  Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-18.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
           +-18.1  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
           +-18.2  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
           +-18.3  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
           +-18.4  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
           +-18.5  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
           +-18.6  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
           \-18.7  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7


$ lspci -vv
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit
	Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin ? routed to IRQ 27
	Capabilities: <access denied>

00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 28
	Bus: primary=00, secondary=03, subordinate=25, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff [size=4K]
	Memory behind bridge: f7500000-f77fffff [size=3M]
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 29
	Bus: primary=00, secondary=26, subordinate=26, sec-latency=0
	I/O behind bridge: 0000e000-0000efff [size=4K]
	Memory behind bridge: f6000000-f70fffff [size=17M]
	Prefetchable memory behind bridge: 00000000e0000000-00000000f1ffffff [size=288M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA+ VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 30
	Bus: primary=00, secondary=27, subordinate=27, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: f7200000-f74fffff [size=3M]
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 31
	Bus: primary=00, secondary=28, subordinate=28, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: f7800000-f78fffff [size=1M]
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Kernel modules: sp5100_tco

00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Kernel driver in use: k10temp
	Kernel modules: k10temp

00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller (rev 02) (prog-if 30 [XHCI])
	Subsystem: ASMedia Technology Inc. Device 1142
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 44
	Region 0: Memory at f77a0000 (64-bit, non-prefetchable) [size=32K]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd

03:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller (rev 02) (prog-if 01 [AHCI 1.0])
	Subsystem: ASMedia Technology Inc. Device 1062
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 41
	Region 5: Memory at f7780000 (32-bit, non-prefetchable) [size=128K]
	Expansion ROM at f7700000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: ahci

03:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset PCIe Upstream Port (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin C routed to IRQ 32
	Bus: primary=03, secondary=16, subordinate=25, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff [size=4K]
	Memory behind bridge: f7500000-f76fffff [size=2M]
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

16:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode])
	DeviceName: Broadcom 5762
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 33
	Bus: primary=16, secondary=17, subordinate=17, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: None
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

16:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 35
	Bus: primary=16, secondary=21, subordinate=21, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff [size=4K]
	Memory behind bridge: f7600000-f76fffff [size=1M]
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

16:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 36
	Bus: primary=16, secondary=22, subordinate=22, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: None
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

16:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 38
	Bus: primary=16, secondary=23, subordinate=23, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: None
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

16:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 39
	Bus: primary=16, secondary=24, subordinate=24, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: None
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

16:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 40
	Bus: primary=16, secondary=25, subordinate=25, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: f7500000-f75fffff [size=1M]
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

21:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 34
	Region 0: I/O ports at f000 [size=256]
	Region 2: Memory at f7604000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory at f7600000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: r8169
	Kernel modules: r8169

25:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller (prog-if 30 [XHCI])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 45
	Region 0: Memory at f7500000 (64-bit, non-prefetchable) [size=32K]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd

26:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. Device 1438
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 56
	Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M]
	Region 5: I/O ports at e000 [size=128]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidia_drm, nvidia

26:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. Device 1438
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 53
	Region 0: Memory at f7080000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

27:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
	Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Capabilities: <access denied>

27:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
	Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 49
	Region 2: Memory at f7300000 (32-bit, non-prefetchable) [size=1M]
	Region 5: Memory at f7400000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: ccp
	Kernel modules: ccp

27:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller (prog-if 30 [XHCI])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin C routed to IRQ 47
	Region 0: Memory at f7200000 (64-bit, non-prefetchable) [size=1M]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd

28:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function
	Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Capabilities: <access denied>

28:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) (prog-if 01 [AHCI 1.0])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 43
	Region 5: Memory at f7808000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: ahci

28:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
	DeviceName: Realtek ALC898
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device fa33
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin C routed to IRQ 55
	Region 0: Memory at f7800000 (32-bit, non-prefetchable) [size=32K]
	Capabilities: <access denied>
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
Comment 2 Matthias Andree 2019-08-02 18:28:37 UTC
This issue also affects 5.3-rc2, which, after resume from STR, spams my logs with an infinite loop of these at few-ms distance (while interspersing missing hundreds of missed kernel messages):

pcieport 0000:00:01.3: PME: Spurious native interrupt!



00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 28
	Bus: primary=00, secondary=03, subordinate=25, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff [size=4K]
	Memory behind bridge: f7500000-f77fffff [size=3M]
	Prefetchable memory behind bridge: None
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Port (Slot+), MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150] Advanced Error Reporting
	Capabilities: [270] Secondary PCI Express <?>
	Capabilities: [2a0] Access Control Services
	Capabilities: [370] L1 PM Substates
	Capabilities: [3c4] Designated Vendor-Specific <?>
	Kernel driver in use: pcieport
Comment 3 Matthias Andree 2019-08-02 23:16:37 UTC
Sorry for being unclear:
- 5.1.20, 5.2.5, 5.3-rc2 all become unusable in a similar vein after wakeup, disk and net I/O stall (no more writes after wakeup, no ICMP ping) 
- 5.2.5 and 5.3-rc2 additionally log the PME interrupt storm to console.

Reverting the relevant commit of the "PCI: Add missing link delays required by the PCIe spec" 'fixes' resuming from ACPI S3/STR suspend for all three kernel versions for me, and fixes the interrupt (log) storm.
Comment 4 Mika Westerberg 2019-08-05 12:45:41 UTC
Can you attach output of 'sudo lspci -vv' to the bug? The dump in comment #1 misses capabilities etc.
Comment 5 Matthias Andree 2019-08-05 13:56:10 UTC
Created attachment 284193 [details]
root user's lspci -vv on modified 5.2.5 kernel.
Comment 6 Matthias Andree 2019-08-05 13:57:16 UTC
This is 5.3-rc2 with only one modification, reverted  00ebf1348cb332941dab52948f29480592bfbe6a as proposed on linux-pci@ by Mika:

[    0.698739] pcieport 0000:00:01.3: Signaling PME with IRQ 28
[    0.698799] pcieport 0000:00:01.3: AER: enabled with IRQ 28
[    0.698966] pcieport 0000:00:03.1: Signaling PME with IRQ 29
[    0.699017] pcieport 0000:00:03.1: AER: enabled with IRQ 29
[    0.699188] pcieport 0000:00:07.1: Signaling PME with IRQ 30
[    0.699230] pcieport 0000:00:07.1: AER: enabled with IRQ 30
[    0.699816] pcieport 0000:00:08.1: Signaling PME with IRQ 31
[    0.699860] pcieport 0000:00:08.1: AER: enabled with IRQ 31
[  119.637492] pcieport 0000:00:03.1: waiting downstream link for 100 ms
[  119.649285] pcieport 0000:00:08.1: waiting downstream link for 100 ms
[  119.649287] pcieport 0000:00:07.1: waiting downstream link for 100 ms
[  119.649376] pcieport 0000:00:01.3: waiting downstream link for 100 ms
[  119.803025] pcieport 0000:16:08.0: waiting downstream link for 100 ms
[  119.803031] pcieport 0000:16:01.0: waiting downstream link for 100 ms
Comment 7 Nicholas Johnson 2019-08-05 14:17:25 UTC
Created attachment 284195 [details]
Nicholas Johnson - "sudo lspci -vv" output - by request of Mika Westerberg

I did sudo lspci -vv a couple of times, the second time at about 44 seconds in.
Comment 8 Nicholas Johnson 2019-08-05 14:19:43 UTC
Created attachment 284199 [details]
Nicholas Johnson - dmesg with "sudo lspci -vv" by request of Mika Westerberg
Comment 9 Mika Westerberg 2019-08-05 16:07:52 UTC
Created attachment 284203 [details]
Patch to force 100ms always for the delay

Can you try this patch and see if it makes any difference?
Comment 10 Matthias Andree 2019-08-05 19:40:12 UTC
(In reply to Mika Westerberg from comment #9)
> Created attachment 284203 [details]
> Patch to force 100ms always for the delay
> 
> Can you try this patch and see if it makes any difference?

Mika, I tried the patch from said attachment on top of 5.3-rc3, no improvement (and I did not expect it to - we've had 100 ms before, but I'm not the expert, so I tried).

Again, 100 ms wait. 
Again, "sync &" just hangs in D (along with jbd2/dm-whatever) and ethernet remains broken. I can configure the interface, the route is brought up properly, but communication to the directly connected router remains impossible, as before.

pcieport logging at 130.501... s of uptime, this is only the resume part:

[  128.772900] ACPI: Low-level resume complete
[  128.773267] PM: Restoring platform NVS memory
[  128.774065] Enabling non-boot CPUs ...
[  128.774392] x86: Booting SMP configuration:
[  128.774714] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  128.775261] microcode: CPU1: patch_level=0x08001137
[  128.777758] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  128.781621] CPU1 is up
[  128.781833] smpboot: Booting Node 0 Processor 2 APIC 0x2
[  128.782344] microcode: CPU2: patch_level=0x08001137
[  128.784613] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  128.788451] CPU2 is up
[  128.788650] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  128.789157] microcode: CPU3: patch_level=0x08001137
[  128.791427] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  128.795282] CPU3 is up
[  128.795482] smpboot: Booting Node 0 Processor 4 APIC 0x4
[  128.796130] microcode: CPU4: patch_level=0x08001137
[  128.798404] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  128.802273] CPU4 is up
[  128.802476] smpboot: Booting Node 0 Processor 5 APIC 0x5
[  128.814093] microcode: CPU5: patch_level=0x08001137
[  128.816374] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  128.904909] CPU5 is up
[  128.916068] smpboot: Booting Node 0 Processor 6 APIC 0x6
[  128.927780] microcode: CPU6: patch_level=0x08001137
[  128.930074] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  129.023614] CPU6 is up
[  129.035949] smpboot: Booting Node 0 Processor 7 APIC 0x7
[  129.048582] microcode: CPU7: patch_level=0x08001137
[  129.050862] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  129.148796] CPU7 is up
[  129.161426] smpboot: Booting Node 0 Processor 8 APIC 0x8
[  129.174621] microcode: CPU8: patch_level=0x08001137
[  129.176917] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  129.280952] CPU8 is up
[  129.294652] smpboot: Booting Node 0 Processor 9 APIC 0x9
[  129.308926] microcode: CPU9: patch_level=0x08001137
[  129.311206] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  129.425418] CPU9 is up
[  129.440405] smpboot: Booting Node 0 Processor 10 APIC 0xa
[  129.456123] microcode: CPU10: patch_level=0x08001137
[  129.458420] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  129.582087] CPU10 is up
[  129.598231] smpboot: Booting Node 0 Processor 11 APIC 0xb
[  129.614990] microcode: CPU11: patch_level=0x08001137
[  129.617267] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  129.751382] CPU11 is up
[  129.768662] smpboot: Booting Node 0 Processor 12 APIC 0xc
[  129.786485] microcode: CPU12: patch_level=0x08001137
[  129.788789] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  129.927899] CPU12 is up
[  129.945624] smpboot: Booting Node 0 Processor 13 APIC 0xd
[  129.963689] microcode: CPU13: patch_level=0x08001137
[  129.966010] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  130.105903] CPU13 is up
[  130.123671] smpboot: Booting Node 0 Processor 14 APIC 0xe
[  130.141781] microcode: CPU14: patch_level=0x08001137
[  130.144094] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  130.284339] CPU14 is up
[  130.302152] smpboot: Booting Node 0 Processor 15 APIC 0xf
[  130.320288] microcode: CPU15: patch_level=0x08001137
[  130.322588] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  130.463218] CPU15 is up
[  130.482721] ACPI: Waking up from system sleep state S3
[  130.501733] pcieport 0000:00:03.1: waiting downstream link for 100 ms
[  130.513506] pcieport 0000:00:07.1: waiting downstream link for 100 ms
[  130.513508] pcieport 0000:00:08.1: waiting downstream link for 100 ms
[  130.513602] pcieport 0000:00:01.3: waiting downstream link for 100 ms
[  130.668044] pcieport 0000:16:08.0: waiting downstream link for 100 ms
[  130.668216] pcieport 0000:16:01.0: waiting downstream link for 100 ms
[  130.808622] usb usb1: root hub lost power or was reset
[  130.809074] sd 0:0:0:0: [sda] Starting disk
[  130.809075] sd 2:0:0:0: [sdb] Starting disk
[  130.809248] usb usb2: root hub lost power or was reset
[  130.888838] r8169 0000:21:00.0 eth0: Link is Down
[  131.112856] ata4: SATA link down (SStatus 0 SControl 300)
[  131.112913] ata9: SATA link down (SStatus 0 SControl 300)
[  131.113657] ata2: SATA link down (SStatus 0 SControl 300)
[  131.114806] ata8: SATA link down (SStatus 0 SControl 300)
[  131.190861] usb 2-4: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[  131.267802] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  131.268742] ata1.00: supports DRM functions and may not be fully accessible
[  131.273538] ata1.00: supports DRM functions and may not be fully accessible
[  131.275805] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  131.276908] ata1.00: configured for UDMA/133
[  131.277465] ata1.00: Enabling discard_zeroes_data
[  131.283867] ata7.00: configured for UDMA/100
[  131.323796] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  131.340466] ata3.00: configured for UDMA/133
[  131.463910] usb 1-13: reset low-speed USB device number 3 using xhci_hcd
[  131.843780] ata6: failed to resume link (SControl 0)
[  131.844317] ata6: SATA link down (SStatus 0 SControl 0)
[  131.844844] ata5: failed to resume link (SControl 0)
[  131.845367] ata5: SATA link down (SStatus 0 SControl 0)
[  132.080040] usb 1-11: reset low-speed USB device number 2 using xhci_hcd
[  132.465069] PM: resume devices took 1.657 seconds
[  132.465599] OOM killer enabled.
[  132.466122] Restarting tasks ... done.
[  132.470885] rfkill: input handler enabled
[  132.474008] PM: suspend exit
[  132.641171] rfkill: input handler disabled
[  137.987906] rfkill: input handler enabled
[  170.889270] rfkill: input handler disabled
Comment 11 Nicholas Johnson 2019-08-06 04:34:06 UTC
Regarding attachment 284203 [details]: Patch to force 100ms always for the delay

It did not help at all.

$ time lspci -vt 1>/dev/null

real	0m2.313s
user	0m0.005s
sys	0m0.006s

$ time cat /sys/bus/pci/devices/0000\:05\:00.0/config 1>/dev/null

real	0m1.166s
user	0m0.001s
sys	0m0.000s

Bus 05 is under Thunderbolt NHI.

The delay can disappear for short intervals (presumably when the controller is not in a sleep state perhaps). After removing the Thunderbolt controller from bus 03 and rescanning, the delay was gone for the first ten seconds and then it came back.
Comment 12 Mika Westerberg 2019-08-06 09:37:55 UTC
Thanks for checking. Given the amount of issues the commit in question causes, I asked Rafael and Bjorn to revert it for now.
Comment 13 Mika Westerberg 2019-08-06 12:16:05 UTC
Created attachment 284215 [details]
Don't use link active reporting for the delay

Matthias, can you also try the attached patch? I would still like to understand where the issue might be.
Comment 14 Matthias Andree 2019-08-06 13:43:34 UTC
Created attachment 284219 [details]
pci_info() instrumentation spray all over PCIe link delay code

(In reply to Mika Westerberg from comment #13)
> Created attachment 284215 [details]
> Don't use link active reporting for the delay
> 
> Matthias, can you also try the attached patch? I would still like to
> understand where the issue might be.

I have tried 284215, but there is no improvement after resume.
Have we inadvertently removed a delay in another place so that we stall the hardware?

I have also tried a different approach, adding a truckload of pci_info() all over the place to get debug information, with the attached patch. 

Here's what I got (on second attempts, the first one with /sys/power/pm_async == 1 as default, and since that was a mess WRT readability, I made a second attempt with pm_async set to 0), shown here - this is logging after "systemctl suspend". 

Not sure if that helps any or gives us new debugging ideas.

[  103.585190] r8169 0000:21:00.0 eth0: Link is Down
[  103.598086] PM: suspend entry (deep)
[  103.607649] Filesystems sync: 0.009 seconds
[  103.759020] Freezing user space processes ... 
[  103.760071] rfkill: input handler enabled
[  103.761849] (elapsed 0.002 seconds) done.
[  103.761956] OOM killer disabled.
[  103.761970] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  103.769806] sd 9:0:0:0: [sdc] Synchronizing SCSI cache
[  103.813670] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
[  103.814034] sd 2:0:0:0: [sdb] Stopping disk
[  104.957655] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  104.957997] sd 0:0:0:0: [sda] Stopping disk
[  105.095695] pcieport 0000:16:04.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  105.095728] pcieport 0000:16:04.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  105.107682] pcieport 0000:16:03.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  105.107712] pcieport 0000:16:03.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  105.119681] pcieport 0000:16:02.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  105.119710] pcieport 0000:16:02.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  105.131681] pcieport 0000:16:00.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  105.131710] pcieport 0000:16:00.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  105.132387] PM: suspend devices took 1.369 seconds
[  105.348770] ACPI: Preparing to enter system sleep state S3
[  105.651892] PM: Saving platform NVS memory
[  105.651955] Disabling non-boot CPUs ...
[  105.653792] smpboot: CPU 1 is now offline
!! ...boring CPU sleeps elided...
[  105.717257] smpboot: CPU 15 is now offline
[  105.720490] ACPI: Low-level resume complete
[  105.720858] PM: Restoring platform NVS memory
[  105.721662] Enabling non-boot CPUs ...
[  105.721991] x86: Booting SMP configuration:
[  105.722328] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  105.722859] microcode: CPU1: patch_level=0x08001137
[  105.725385] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  105.729255] CPU1 is up
!! ...boring CPU boots elided...
[  107.504390] smpboot: Booting Node 0 Processor 15 APIC 0xf
[  107.522545] microcode: CPU15: patch_level=0x08001137
[  107.524838] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
[  107.665605] CPU15 is up
[  107.684969] ACPI: Waking up from system sleep state S3
[  107.715191] pcieport 0000:00:01.3: WFDL PCI_EXP_TYPE=4, disconnected=0
[  107.733296] pcieport 0000:00:01.3: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1
[  107.751365] pcieport 0000:00:01.3: WFDL subordinate delay 100 ms
[  107.769246] pcieport 0000:00:01.3: waiting downstream link for 100 ms
[  107.769248] pcieport 0000:00:01.3: WFDL pcie_wait_for_link_delay(100)
[  107.917194] pcieport 0000:00:03.1: WFDL PCI_EXP_TYPE=4, disconnected=0
[  107.917784] pcieport 0000:00:03.1: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1
[  107.918322] pcieport 0000:00:03.1: WFDL subordinate delay 100 ms
[  107.918853] pcieport 0000:00:03.1: waiting downstream link for 100 ms
[  107.918855] pcieport 0000:00:03.1: WFDL pcie_wait_for_link_delay(100)
[  108.057178] pcieport 0000:00:07.1: WFDL PCI_EXP_TYPE=4, disconnected=0
[  108.057757] pcieport 0000:00:07.1: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1
[  108.058280] pcieport 0000:00:07.1: WFDL subordinate delay 100 ms
[  108.058796] pcieport 0000:00:07.1: waiting downstream link for 100 ms
[  108.058797] pcieport 0000:00:07.1: WFDL pcie_wait_for_link_delay(100)
[  108.201178] pcieport 0000:00:08.1: WFDL PCI_EXP_TYPE=4, disconnected=0
[  108.201742] pcieport 0000:00:08.1: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1
[  108.202250] pcieport 0000:00:08.1: WFDL subordinate delay 100 ms
[  108.202755] pcieport 0000:00:08.1: waiting downstream link for 100 ms
[  108.202756] pcieport 0000:00:08.1: WFDL pcie_wait_for_link_delay(100)
[  108.369122] pcieport 0000:03:00.2: WFDL PCI_EXP_TYPE=5, disconnected=0
[  108.381134] pcieport 0000:16:00.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  108.381680] pcieport 0000:16:00.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  108.394134] pcieport 0000:16:01.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  108.394677] pcieport 0000:16:01.0: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1
[  108.395159] pcieport 0000:16:01.0: WFDL subordinate delay 100 ms
[  108.395630] pcieport 0000:16:01.0: waiting downstream link for 100 ms
[  108.395632] pcieport 0000:16:01.0: WFDL PCIE_SPEED=21 link_active_reporting=1 msleep(100)
[  108.513134] pcieport 0000:16:02.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  108.513669] pcieport 0000:16:02.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  108.526134] pcieport 0000:16:03.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  108.526660] pcieport 0000:16:03.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  108.539136] pcieport 0000:16:04.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  108.539659] pcieport 0000:16:04.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1
[  108.552134] pcieport 0000:16:08.0: WFDL PCI_EXP_TYPE=6, disconnected=0
[  108.552657] pcieport 0000:16:08.0: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1
[  108.553114] pcieport 0000:16:08.0: WFDL subordinate delay 100 ms
[  108.553565] pcieport 0000:16:08.0: waiting downstream link for 100 ms
[  108.553567] pcieport 0000:16:08.0: WFDL pcie_wait_for_link_delay(100)
[  108.749377] usb usb1: root hub lost power or was reset
[  108.749844] usb usb2: root hub lost power or was reset
[  108.888074] r8169 0000:21:00.0 eth0: Link is Down
[  109.771660] ata9: SATA link down (SStatus 0 SControl 300)
[  109.772299] ata2: SATA link down (SStatus 0 SControl 300)
[  109.772885] ata8: SATA link down (SStatus 0 SControl 300)
[  109.773515] ata6: SATA link down (SStatus 0 SControl 300)
[  109.774003] ata4: SATA link down (SStatus 0 SControl 300)
[  109.789390] sd 0:0:0:0: [sda] Starting disk
[  109.925038] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  109.933040] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  109.933511] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  109.934610] ata1.00: supports DRM functions and may not be fully accessible
[  109.939143] ata1.00: supports DRM functions and may not be fully accessible
[  109.941407] ata7.00: configured for UDMA/100
[  109.942361] ata1.00: configured for UDMA/133
[  109.942803] ata1.00: Enabling discard_zeroes_data
[  109.943279] ata3.00: configured for UDMA/133
[  110.064170] usb 1-3: reset high-speed USB device number 2 using xhci_hcd
[  110.501036] ata5: failed to resume link (SControl 0)
[  110.501487] ata5: SATA link down (SStatus 0 SControl 0)
[  110.601152] usb 1-11: reset low-speed USB device number 3 using xhci_hcd
[  110.985772] sd 2:0:0:0: [sdb] Starting disk
[  111.241141] usb 1-13: reset low-speed USB device number 4 using xhci_hcd
[  111.836531] PM: resume devices took 3.087 seconds
[  111.836996] OOM killer enabled.
[  111.837442] Restarting tasks ... done.
[  111.842429] PM: suspend exit
[  111.991396] rfkill: input handler disabled
Comment 15 Mika Westerberg 2019-08-06 14:27:57 UTC
Thanks for the log. Unfortunately I don't see anything suspicious there that could explain.

Are you able to check /proc/interrupts on resume? Maybe the PME storm makes the system unresponsive?
Comment 16 Mika Westerberg 2019-08-06 14:35:48 UTC
Also another experiment would be to comment out both delays in wait_for_downstream_link() and see if that changes anything. If it starts working then we are adding too much delay somewhere.
Comment 17 Matthias Andree 2019-08-06 15:40:17 UTC
Created attachment 284223 [details]
/proc/interrupts before/after the suspend with unmodified 5.3.0-rc3.

(In reply to Mika Westerberg from comment #15)
> Thanks for the log. Unfortunately I don't see anything suspicious there that
> could explain.
> 
> Are you able to check /proc/interrupts on resume? Maybe the PME storm makes
> the system unresponsive?

Yes, I can with the help of a script. 5.3.0-rc3 unmodified. 

dmesg doesn't provide anything useful because I didn't raise the buffer, the PCI driver spams with rate-unlimited interrupts... I see a report rate of 10792/s, and I guess that's not all.  /proc/interrupts attached in a .tar.gz tarball because the lines are so long.

$ nl dmesg6.txt  | head -n5
     1	[   95.856742] pcieport 0000:00:01.3: PME: Spurious native interrupt!
     2	[   95.856829] pcieport 0000:00:01.3: PME: Spurious native interrupt!
     3	[   95.856916] pcieport 0000:00:01.3: PME: Spurious native interrupt!
     4	[   95.857003] pcieport 0000:00:01.3: PME: Spurious native interrupt!
     5	[   95.857091] pcieport 0000:00:01.3: PME: Spurious native interrupt!
$ nl dmesg6.txt  | tail -n5
  2537	[   96.091739] pcieport 0000:00:01.3: PME: Spurious native interrupt!
  2538	[   96.091825] pcieport 0000:00:01.3: PME: Spurious native interrupt!
  2539	[   96.091913] pcieport 0000:00:01.3: PME: Spurious native interrupt!
  2540	[   96.092000] pcieport 0000:00:01.3: PME: Spurious native interrupt!
  2541	[   96.092088] pcieport 0000:00:01.3: PME: Spurious native interrupt!
Comment 18 Matthias Andree 2019-08-06 15:46:22 UTC
(In reply to Mika Westerberg from comment #16)
> Also another experiment would be to comment out both delays in
> wait_for_downstream_link() and see if that changes anything. If it starts
> working then we are adding too much delay somewhere.

Masking the if...else... block containing the msleep() and the wait-for-link-then-delay with the preprocessor (#if 0... #endif) yields a working system after suspend.  With what command (debugfs writing or whatnot) can I enable "full debug" on everything PCI-related? I'd try to kill the PME message above and see if anything else complains about missing devices or such to pinpoint the issue.

(Note that the pci.c delay apparently doesn't trigger on my system, with my debugger patch mentioned earlier, I never saw the added pci_info(dev, ...) message.)

I am wondering about the 0000:00:01.3 PME interrupts which are also gone without those delays.
Comment 19 Matthias Andree 2019-08-06 16:03:42 UTC
Created attachment 284225 [details]
lspci -vv on 5.3.0-rc3 before/after, as .tar.gz

I am also attaching lspci -vv on 5.3.0-rc3 before/after (this is taken under PME IRQ storm, apparently the system is responsive enough to write this out to the USB key, and gdm would try and let me in had not my test script rebooted automatically after 30 s...).
Comment 20 Matthias Andree 2019-08-06 16:21:59 UTC
Created attachment 284227 [details]
.tar.xz (!) tarball of dmesg-after-suspend, and before/after lspci and /proc/interrupts

Finally, this is 5.3.0-rc3 with lots of PCI and ACPI dyndbg enabled, and the PME Spurious interrupt message removed: 
dmesg (from where I called systemctl suspend), lspci -vv, /proc/interrupts.  This time as .tar.xz archive.

Perhaps this helps.
Comment 21 Mika Westerberg 2019-08-06 16:31:56 UTC
Created attachment 284229 [details]
Disable PME interrupt during suspend/resume

Thanks for the logs! The /proc/interrupts shows that it only gets one PME interrupt but the driver seems to spin in the worker function trying to identify the originating device.

I'm attaching a patch to disable PME interrupt over suspend can you try whether it makes the system more responsive?
Comment 22 Matthias Andree 2019-08-06 17:34:11 UTC
Created attachment 284231 [details]
interrupts before/after suspend

(In reply to Mika Westerberg from comment #21)
> Created attachment 284229 [details]
> Disable PME interrupt during suspend/resume
> 
> Thanks for the logs! The /proc/interrupts shows that it only gets one PME
> interrupt but the driver seems to spin in the worker function trying to
> identify the originating device.
> 
> I'm attaching a patch to disable PME interrupt over suspend can you try
> whether it makes the system more responsive?

It helps with debugging, but does not make the computer usable. 
Same PME spam. 

If I demote that PME Spurious interrupt to pci_dbg(), I am rid of the PME Spurious interrupt loging (/proc/interrupts before/after attached), but the system still winds itself up unusable after a short time, with processes stuck in state D again, and kworker/0:6+events eating 100% of one core (6.5 % sys makes perfect sense on 8 cores x 2 threads), see top:

top - 19:15:35 up 7 min,  1 user,  load average: 3,66, 2,00, 0,94
Tasks: 347 total,   3 running, 344 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,5 us,  6,5 sy,  0,0 ni, 86,6 id,  6,2 wa,  0,1 hi,  0,1 si,  0,0 st
MiB Mem :  32125,6 total,  30148,2 free,    891,4 used,   1086,0 buff/cache
MiB Swap:  16160,0 total,  16160,0 free,      0,0 used.  30816,8 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1058 root      20   0       0      0      0 R  99,7   0,0   6:30.41 kworker/0:6+events
 6092 root      20   0  226336   4436   3252 S   0,7   0,0   0:00.10 tmux: server
 1169 root      20   0       0      0      0 I   0,3   0,0   0:00.10 kworker/4:4-events
 6349 root      20   0  227364   4648   3824 R   0,3   0,0   0:00.04 top

This core fires up its turbo and runs at an average of ~3.6 GHz and burns almost 19 Watt, per turbostat, which is ¼...⅓ of its design power...

Core	CPU	Avg_MHz	Busy%	Bzy_MHz	TSC_MHz	IRQ	POLL	C1	C2	POLL%	C1%	C2%	CorWatt	PkgWatt
-	-	239	6.85	3496	3000	16824	1	5385	3596	0.00	6.10	87.15	4.76	18.81
0	0	3586	100.00	3586	3000	5051	0	0	0	0.00	0.00	0.00	3.86	18.81
0	1	6	0.17	3241	3000	1367	0	269	640	0.00	5.07	94.79
1	2	4	0.32	1393	3000	897	1	268	347	0.00	3.31	96.45	0.06
1	3	2	0.15	1389	3000	264	0	146	105	0.00	2.66	97.25
...
Comment 23 Mika Westerberg 2019-08-07 10:34:27 UTC
Created attachment 284245 [details]
PME debugging

It looks like it is the PME worker function that spins interrupts disabled causing the unresponsiveness. Can you try the attached patch? It should add bit more debugging so we hopefully understand more why that happens. Please also enable PCI debugging as you did previously.
Comment 24 Matthias Andree 2019-08-07 12:17:12 UTC
Created attachment 284249 [details]
debug logs with 284245

(In reply to Mika Westerberg from comment #23)
> Created attachment 284245 [details]
> PME debugging
> 
> It looks like it is the PME worker function that spins interrupts disabled
> causing the unresponsiveness. Can you try the attached patch? It should add
> bit more debugging so we hopefully understand more why that happens. Please
> also enable PCI debugging as you did previously.

Find the usual debug dumps attached as .tar.xz, 5.3-rc3 with only your patch on top. This was with debugging info per: echo 'module *pci* +p' >/sys/kernel/debug/dynamic_debug/control
Comment 25 Mika Westerberg 2019-08-07 13:36:18 UTC
Created attachment 284251 [details]
Updated PME debugging

Thanks for the logs! It looks like either the root port PME status is not cleared or xHCI continues to send PMEs even if disabled. I've updated the patch a bit so that it logs bit more and also bails out from the loop after 5000 iterations. Can you give it a try as well?
Comment 26 Mika Westerberg 2019-08-07 13:39:08 UTC
Created attachment 284253 [details]
Fixed updated PME debugging

Please try this one instead of the previous buggy one.
Comment 27 Matthias Andree 2019-08-07 16:22:45 UTC
Created attachment 284257 [details]
usual debug logs w/ 5.3-rc3 and patch 284253

(In reply to Mika Westerberg from comment #26)
> Created attachment 284253 [details]
> Fixed updated PME debugging
> 
> Please try this one instead of the previous buggy one.

Hum... the dmesg4.txt isn't really useful because it misses the time right after resume, please also check the journalctl.txt (note: 11 MB, choose your viewer wisely) which I took from journalctl -b-1 after next boot.
Comment 28 Oleg Chernovskiy 2019-08-16 09:17:40 UTC
Created attachment 284459 [details]
tree lspci and inxi from affected system

I seem to be affected by this bug, on 5.2.8. Same story, xHCI controller and "PME: Spurious native interrupt!" spam in dmesg, filling journal and making machine unusable. You can throw patches at me as well.
Comment 29 Mika Westerberg 2019-08-16 09:26:27 UTC
The fix (revert) should be already in v5.3-rc4 and possibly in latest 5.2.x as well.