Description of problem: vanilla 5.1.20 on x86_64 fails to wake from suspend (STR), Fedora and vanilla 5.1.19 and prior were fine. 5.2.5 (from Fedora's-200.fc30) also fails in a different way (spurious PME interrupts on pcie). How reproducible: always Steps to Reproduce: 1. boot Fedora 30 and log into GNOME desktop 2. click pause symbol to suspend the computer to RAM, wait until suspended 3. press key on keyboard, or power button Actual results: computer tries to wake up, HDD LED blinks a bit, but console does not wake. Other computer on network cannot ping the waking computer. "sync" hangs in D deep sleep for long amounts of time. Expected results: computer wakes up properly and continues to use its devices. Additional info: PM tracing was enabled, the next boot returned [ 0.827930] PM: hash matches drivers/base/power/main.c:1021 It appears that suspend to disk still works. Computer has an NVIDIA GeForce 1060 PCIe graphics board, but 5.1.19 and prior would suspend properly, and the 5.1.20 and 5.2.5 suspend issues also occur if nvidia kernel modules are renamed out of the way and nouveau remains blocked, so it's not an nvidia driver issue. I have "git bisect"ed this on the vanilla stable kernel, the stable/linux-5.1.y branch (because I have had starting points 5.1.19 and 5.1.20 there). The failure-inducing commit on the branch is 3c795a8e3481e4dec071b5956e7177e816f6e7f1 (see below), which got picked from master's c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5, (merged through cf2d213e49fdf47e4c10dc629a3659e0026a54b8, v5.3-rc1~167) and also got picked to stable/linux-5.2.y 5817d78eba34f6c86f5462ae2c5212f80a013357 (v5.2.3~291). Sasha Levin's signoff is only on the stable branches, not on master. ------------------------------------------------------------ commit 3c795a8e3481e4dec071b5956e7177e816f6e7f1 (refs/bisect/bad) Author: Mika Westerberg <mika.westerberg@linux.intel.com> 2019-06-12 12:57:38 Committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2019-07-26 09:12:37 Parent: 70cc29dba925b8a99a4917c2b5fa6702d0d496d1 (bpf: fix callees pruning callers) Child: a98c15177f72ae3c0a736bb324e66c279bf94899 (net: netsec: initialize tx ring on ndo_open) Branch: remotes/stable/linux-5.1.y Follows: v5.1.19 Precedes: v5.1.20 PCI: Add missing link delays required by the PCIe spec [ Upstream commit c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5 ] Currently Linux does not follow PCIe spec regarding the required delays after reset. A concrete example is a Thunderbolt add-in-card that consists of a PCIe switch and two PCIe endpoints: +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller +-01.0-[04-36]-- DS hotplug port +-02.0-[37]----00.0 xHCI controller \-04.0-[38-6b]-- DS hotplug port The root port (1b.0) and the PCIe switch downstream ports are all PCIe gen3 so they support 8GT/s link speeds. We wait for the PCIe hierarchy to enter D3cold (runtime): pcieport 0000:00:1b.0: power state changed by ACPI to D3cold When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the PCIe switch is put to reset and its power is re-applied. This means that we must follow the rules in PCIe 4.0 section 6.6.1. [...] Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> drivers/pci/pci.c | 29 +++++++++++++++++++---------- drivers/pci/pci.h | 1 + drivers/pci/pcie/portdrv_core.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 86 insertions(+), 10 deletions(-)
$ lspci -vvt -[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit +-01.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge +-01.3-[03-25]--+-00.0 Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller | +-00.1 Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller | \-00.2-[16-25]--+-00.0-[17]-- | +-01.0-[21]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller | +-02.0-[22]-- | +-03.0-[23]-- | +-04.0-[24]-- | \-08.0-[25]----00.0 ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller +-02.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge +-03.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge +-03.1-[26]--+-00.0 NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] | \-00.1 NVIDIA Corporation GP106 High Definition Audio Controller +-04.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge +-07.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge +-07.1-[27]--+-00.0 Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function | +-00.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor | \-00.3 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller +-08.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge +-08.1-[28]--+-00.0 Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function | +-00.2 Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] | \-00.3 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller +-14.0 Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller +-14.3 Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge +-18.0 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 +-18.1 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 +-18.2 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 +-18.3 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 +-18.4 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 +-18.5 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 +-18.6 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 \-18.7 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 $ lspci -vv 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin ? routed to IRQ 27 Capabilities: <access denied> 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 28 Bus: primary=00, secondary=03, subordinate=25, sec-latency=0 I/O behind bridge: 0000f000-0000ffff [size=4K] Memory behind bridge: f7500000-f77fffff [size=3M] Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 29 Bus: primary=00, secondary=26, subordinate=26, sec-latency=0 I/O behind bridge: 0000e000-0000efff [size=4K] Memory behind bridge: f6000000-f70fffff [size=17M] Prefetchable memory behind bridge: 00000000e0000000-00000000f1ffffff [size=288M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 30 Bus: primary=00, secondary=27, subordinate=27, sec-latency=0 I/O behind bridge: None Memory behind bridge: f7200000-f74fffff [size=3M] Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 31 Bus: primary=00, secondary=28, subordinate=28, sec-latency=0 I/O behind bridge: None Memory behind bridge: f7800000-f78fffff [size=1M] Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel modules: sp5100_tco 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33 Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel driver in use: k10temp Kernel modules: k10temp 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller (rev 02) (prog-if 30 [XHCI]) Subsystem: ASMedia Technology Inc. Device 1142 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 44 Region 0: Memory at f77a0000 (64-bit, non-prefetchable) [size=32K] Capabilities: <access denied> Kernel driver in use: xhci_hcd 03:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller (rev 02) (prog-if 01 [AHCI 1.0]) Subsystem: ASMedia Technology Inc. Device 1062 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 41 Region 5: Memory at f7780000 (32-bit, non-prefetchable) [size=128K] Expansion ROM at f7700000 [disabled] [size=512K] Capabilities: <access denied> Kernel driver in use: ahci 03:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset PCIe Upstream Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 32 Bus: primary=03, secondary=16, subordinate=25, sec-latency=0 I/O behind bridge: 0000f000-0000ffff [size=4K] Memory behind bridge: f7500000-f76fffff [size=2M] Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 16:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode]) DeviceName: Broadcom 5762 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 33 Bus: primary=16, secondary=17, subordinate=17, sec-latency=0 I/O behind bridge: None Memory behind bridge: None Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 16:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 35 Bus: primary=16, secondary=21, subordinate=21, sec-latency=0 I/O behind bridge: 0000f000-0000ffff [size=4K] Memory behind bridge: f7600000-f76fffff [size=1M] Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 16:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 36 Bus: primary=16, secondary=22, subordinate=22, sec-latency=0 I/O behind bridge: None Memory behind bridge: None Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 16:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 38 Bus: primary=16, secondary=23, subordinate=23, sec-latency=0 I/O behind bridge: None Memory behind bridge: None Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 16:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 39 Bus: primary=16, secondary=24, subordinate=24, sec-latency=0 I/O behind bridge: None Memory behind bridge: None Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 16:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 40 Bus: primary=16, secondary=25, subordinate=25, sec-latency=0 I/O behind bridge: None Memory behind bridge: f7500000-f75fffff [size=1M] Prefetchable memory behind bridge: None Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport 21:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 34 Region 0: I/O ports at f000 [size=256] Region 2: Memory at f7604000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at f7600000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: r8169 Kernel modules: r8169 25:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller (prog-if 30 [XHCI]) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 45 Region 0: Memory at f7500000 (64-bit, non-prefetchable) [size=32K] Capabilities: <access denied> Kernel driver in use: xhci_hcd 26:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1) (prog-if 00 [VGA controller]) Subsystem: ZOTAC International (MCO) Ltd. Device 1438 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 56 Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M] Region 5: I/O ports at e000 [size=128] [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: nvidia Kernel modules: nvidia_drm, nvidia 26:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1) Subsystem: ZOTAC International (MCO) Ltd. Device 1438 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 53 Region 0: Memory at f7080000 (32-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 27:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Capabilities: <access denied> 27:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 49 Region 2: Memory at f7300000 (32-bit, non-prefetchable) [size=1M] Region 5: Memory at f7400000 (32-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: ccp Kernel modules: ccp 27:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller (prog-if 30 [XHCI]) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 47 Region 0: Memory at f7200000 (64-bit, non-prefetchable) [size=1M] Capabilities: <access denied> Kernel driver in use: xhci_hcd 28:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Capabilities: <access denied> 28:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) (prog-if 01 [AHCI 1.0]) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7a33 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 43 Region 5: Memory at f7808000 (32-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: ahci 28:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller DeviceName: Realtek ALC898 Subsystem: Micro-Star International Co., Ltd. [MSI] Device fa33 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 55 Region 0: Memory at f7800000 (32-bit, non-prefetchable) [size=32K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel
This issue also affects 5.3-rc2, which, after resume from STR, spams my logs with an infinite loop of these at few-ms distance (while interspersing missing hundreds of missed kernel messages): pcieport 0000:00:01.3: PME: Spurious native interrupt! 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 28 Bus: primary=00, secondary=03, subordinate=25, sec-latency=0 I/O behind bridge: 0000f000-0000ffff [size=4K] Memory behind bridge: f7500000-f77fffff [size=3M] Prefetchable memory behind bridge: None Capabilities: [50] Power Management version 3 Capabilities: [58] Express Root Port (Slot+), MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Capabilities: [270] Secondary PCI Express <?> Capabilities: [2a0] Access Control Services Capabilities: [370] L1 PM Substates Capabilities: [3c4] Designated Vendor-Specific <?> Kernel driver in use: pcieport
Sorry for being unclear: - 5.1.20, 5.2.5, 5.3-rc2 all become unusable in a similar vein after wakeup, disk and net I/O stall (no more writes after wakeup, no ICMP ping) - 5.2.5 and 5.3-rc2 additionally log the PME interrupt storm to console. Reverting the relevant commit of the "PCI: Add missing link delays required by the PCIe spec" 'fixes' resuming from ACPI S3/STR suspend for all three kernel versions for me, and fixes the interrupt (log) storm.
Can you attach output of 'sudo lspci -vv' to the bug? The dump in comment #1 misses capabilities etc.
Created attachment 284193 [details] root user's lspci -vv on modified 5.2.5 kernel.
This is 5.3-rc2 with only one modification, reverted 00ebf1348cb332941dab52948f29480592bfbe6a as proposed on linux-pci@ by Mika: [ 0.698739] pcieport 0000:00:01.3: Signaling PME with IRQ 28 [ 0.698799] pcieport 0000:00:01.3: AER: enabled with IRQ 28 [ 0.698966] pcieport 0000:00:03.1: Signaling PME with IRQ 29 [ 0.699017] pcieport 0000:00:03.1: AER: enabled with IRQ 29 [ 0.699188] pcieport 0000:00:07.1: Signaling PME with IRQ 30 [ 0.699230] pcieport 0000:00:07.1: AER: enabled with IRQ 30 [ 0.699816] pcieport 0000:00:08.1: Signaling PME with IRQ 31 [ 0.699860] pcieport 0000:00:08.1: AER: enabled with IRQ 31 [ 119.637492] pcieport 0000:00:03.1: waiting downstream link for 100 ms [ 119.649285] pcieport 0000:00:08.1: waiting downstream link for 100 ms [ 119.649287] pcieport 0000:00:07.1: waiting downstream link for 100 ms [ 119.649376] pcieport 0000:00:01.3: waiting downstream link for 100 ms [ 119.803025] pcieport 0000:16:08.0: waiting downstream link for 100 ms [ 119.803031] pcieport 0000:16:01.0: waiting downstream link for 100 ms
Created attachment 284195 [details] Nicholas Johnson - "sudo lspci -vv" output - by request of Mika Westerberg I did sudo lspci -vv a couple of times, the second time at about 44 seconds in.
Created attachment 284199 [details] Nicholas Johnson - dmesg with "sudo lspci -vv" by request of Mika Westerberg
Created attachment 284203 [details] Patch to force 100ms always for the delay Can you try this patch and see if it makes any difference?
(In reply to Mika Westerberg from comment #9) > Created attachment 284203 [details] > Patch to force 100ms always for the delay > > Can you try this patch and see if it makes any difference? Mika, I tried the patch from said attachment on top of 5.3-rc3, no improvement (and I did not expect it to - we've had 100 ms before, but I'm not the expert, so I tried). Again, 100 ms wait. Again, "sync &" just hangs in D (along with jbd2/dm-whatever) and ethernet remains broken. I can configure the interface, the route is brought up properly, but communication to the directly connected router remains impossible, as before. pcieport logging at 130.501... s of uptime, this is only the resume part: [ 128.772900] ACPI: Low-level resume complete [ 128.773267] PM: Restoring platform NVS memory [ 128.774065] Enabling non-boot CPUs ... [ 128.774392] x86: Booting SMP configuration: [ 128.774714] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 128.775261] microcode: CPU1: patch_level=0x08001137 [ 128.777758] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 128.781621] CPU1 is up [ 128.781833] smpboot: Booting Node 0 Processor 2 APIC 0x2 [ 128.782344] microcode: CPU2: patch_level=0x08001137 [ 128.784613] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 128.788451] CPU2 is up [ 128.788650] smpboot: Booting Node 0 Processor 3 APIC 0x3 [ 128.789157] microcode: CPU3: patch_level=0x08001137 [ 128.791427] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 128.795282] CPU3 is up [ 128.795482] smpboot: Booting Node 0 Processor 4 APIC 0x4 [ 128.796130] microcode: CPU4: patch_level=0x08001137 [ 128.798404] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 128.802273] CPU4 is up [ 128.802476] smpboot: Booting Node 0 Processor 5 APIC 0x5 [ 128.814093] microcode: CPU5: patch_level=0x08001137 [ 128.816374] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 128.904909] CPU5 is up [ 128.916068] smpboot: Booting Node 0 Processor 6 APIC 0x6 [ 128.927780] microcode: CPU6: patch_level=0x08001137 [ 128.930074] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 129.023614] CPU6 is up [ 129.035949] smpboot: Booting Node 0 Processor 7 APIC 0x7 [ 129.048582] microcode: CPU7: patch_level=0x08001137 [ 129.050862] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 129.148796] CPU7 is up [ 129.161426] smpboot: Booting Node 0 Processor 8 APIC 0x8 [ 129.174621] microcode: CPU8: patch_level=0x08001137 [ 129.176917] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 129.280952] CPU8 is up [ 129.294652] smpboot: Booting Node 0 Processor 9 APIC 0x9 [ 129.308926] microcode: CPU9: patch_level=0x08001137 [ 129.311206] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 129.425418] CPU9 is up [ 129.440405] smpboot: Booting Node 0 Processor 10 APIC 0xa [ 129.456123] microcode: CPU10: patch_level=0x08001137 [ 129.458420] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 129.582087] CPU10 is up [ 129.598231] smpboot: Booting Node 0 Processor 11 APIC 0xb [ 129.614990] microcode: CPU11: patch_level=0x08001137 [ 129.617267] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 129.751382] CPU11 is up [ 129.768662] smpboot: Booting Node 0 Processor 12 APIC 0xc [ 129.786485] microcode: CPU12: patch_level=0x08001137 [ 129.788789] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 129.927899] CPU12 is up [ 129.945624] smpboot: Booting Node 0 Processor 13 APIC 0xd [ 129.963689] microcode: CPU13: patch_level=0x08001137 [ 129.966010] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 130.105903] CPU13 is up [ 130.123671] smpboot: Booting Node 0 Processor 14 APIC 0xe [ 130.141781] microcode: CPU14: patch_level=0x08001137 [ 130.144094] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 130.284339] CPU14 is up [ 130.302152] smpboot: Booting Node 0 Processor 15 APIC 0xf [ 130.320288] microcode: CPU15: patch_level=0x08001137 [ 130.322588] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 130.463218] CPU15 is up [ 130.482721] ACPI: Waking up from system sleep state S3 [ 130.501733] pcieport 0000:00:03.1: waiting downstream link for 100 ms [ 130.513506] pcieport 0000:00:07.1: waiting downstream link for 100 ms [ 130.513508] pcieport 0000:00:08.1: waiting downstream link for 100 ms [ 130.513602] pcieport 0000:00:01.3: waiting downstream link for 100 ms [ 130.668044] pcieport 0000:16:08.0: waiting downstream link for 100 ms [ 130.668216] pcieport 0000:16:01.0: waiting downstream link for 100 ms [ 130.808622] usb usb1: root hub lost power or was reset [ 130.809074] sd 0:0:0:0: [sda] Starting disk [ 130.809075] sd 2:0:0:0: [sdb] Starting disk [ 130.809248] usb usb2: root hub lost power or was reset [ 130.888838] r8169 0000:21:00.0 eth0: Link is Down [ 131.112856] ata4: SATA link down (SStatus 0 SControl 300) [ 131.112913] ata9: SATA link down (SStatus 0 SControl 300) [ 131.113657] ata2: SATA link down (SStatus 0 SControl 300) [ 131.114806] ata8: SATA link down (SStatus 0 SControl 300) [ 131.190861] usb 2-4: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd [ 131.267802] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 131.268742] ata1.00: supports DRM functions and may not be fully accessible [ 131.273538] ata1.00: supports DRM functions and may not be fully accessible [ 131.275805] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 131.276908] ata1.00: configured for UDMA/133 [ 131.277465] ata1.00: Enabling discard_zeroes_data [ 131.283867] ata7.00: configured for UDMA/100 [ 131.323796] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 131.340466] ata3.00: configured for UDMA/133 [ 131.463910] usb 1-13: reset low-speed USB device number 3 using xhci_hcd [ 131.843780] ata6: failed to resume link (SControl 0) [ 131.844317] ata6: SATA link down (SStatus 0 SControl 0) [ 131.844844] ata5: failed to resume link (SControl 0) [ 131.845367] ata5: SATA link down (SStatus 0 SControl 0) [ 132.080040] usb 1-11: reset low-speed USB device number 2 using xhci_hcd [ 132.465069] PM: resume devices took 1.657 seconds [ 132.465599] OOM killer enabled. [ 132.466122] Restarting tasks ... done. [ 132.470885] rfkill: input handler enabled [ 132.474008] PM: suspend exit [ 132.641171] rfkill: input handler disabled [ 137.987906] rfkill: input handler enabled [ 170.889270] rfkill: input handler disabled
Regarding attachment 284203 [details]: Patch to force 100ms always for the delay It did not help at all. $ time lspci -vt 1>/dev/null real 0m2.313s user 0m0.005s sys 0m0.006s $ time cat /sys/bus/pci/devices/0000\:05\:00.0/config 1>/dev/null real 0m1.166s user 0m0.001s sys 0m0.000s Bus 05 is under Thunderbolt NHI. The delay can disappear for short intervals (presumably when the controller is not in a sleep state perhaps). After removing the Thunderbolt controller from bus 03 and rescanning, the delay was gone for the first ten seconds and then it came back.
Thanks for checking. Given the amount of issues the commit in question causes, I asked Rafael and Bjorn to revert it for now.
Created attachment 284215 [details] Don't use link active reporting for the delay Matthias, can you also try the attached patch? I would still like to understand where the issue might be.
Created attachment 284219 [details] pci_info() instrumentation spray all over PCIe link delay code (In reply to Mika Westerberg from comment #13) > Created attachment 284215 [details] > Don't use link active reporting for the delay > > Matthias, can you also try the attached patch? I would still like to > understand where the issue might be. I have tried 284215, but there is no improvement after resume. Have we inadvertently removed a delay in another place so that we stall the hardware? I have also tried a different approach, adding a truckload of pci_info() all over the place to get debug information, with the attached patch. Here's what I got (on second attempts, the first one with /sys/power/pm_async == 1 as default, and since that was a mess WRT readability, I made a second attempt with pm_async set to 0), shown here - this is logging after "systemctl suspend". Not sure if that helps any or gives us new debugging ideas. [ 103.585190] r8169 0000:21:00.0 eth0: Link is Down [ 103.598086] PM: suspend entry (deep) [ 103.607649] Filesystems sync: 0.009 seconds [ 103.759020] Freezing user space processes ... [ 103.760071] rfkill: input handler enabled [ 103.761849] (elapsed 0.002 seconds) done. [ 103.761956] OOM killer disabled. [ 103.761970] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 103.769806] sd 9:0:0:0: [sdc] Synchronizing SCSI cache [ 103.813670] sd 2:0:0:0: [sdb] Synchronizing SCSI cache [ 103.814034] sd 2:0:0:0: [sdb] Stopping disk [ 104.957655] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 104.957997] sd 0:0:0:0: [sda] Stopping disk [ 105.095695] pcieport 0000:16:04.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 105.095728] pcieport 0000:16:04.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 105.107682] pcieport 0000:16:03.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 105.107712] pcieport 0000:16:03.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 105.119681] pcieport 0000:16:02.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 105.119710] pcieport 0000:16:02.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 105.131681] pcieport 0000:16:00.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 105.131710] pcieport 0000:16:00.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 105.132387] PM: suspend devices took 1.369 seconds [ 105.348770] ACPI: Preparing to enter system sleep state S3 [ 105.651892] PM: Saving platform NVS memory [ 105.651955] Disabling non-boot CPUs ... [ 105.653792] smpboot: CPU 1 is now offline !! ...boring CPU sleeps elided... [ 105.717257] smpboot: CPU 15 is now offline [ 105.720490] ACPI: Low-level resume complete [ 105.720858] PM: Restoring platform NVS memory [ 105.721662] Enabling non-boot CPUs ... [ 105.721991] x86: Booting SMP configuration: [ 105.722328] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 105.722859] microcode: CPU1: patch_level=0x08001137 [ 105.725385] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 105.729255] CPU1 is up !! ...boring CPU boots elided... [ 107.504390] smpboot: Booting Node 0 Processor 15 APIC 0xf [ 107.522545] microcode: CPU15: patch_level=0x08001137 [ 107.524838] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) [ 107.665605] CPU15 is up [ 107.684969] ACPI: Waking up from system sleep state S3 [ 107.715191] pcieport 0000:00:01.3: WFDL PCI_EXP_TYPE=4, disconnected=0 [ 107.733296] pcieport 0000:00:01.3: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1 [ 107.751365] pcieport 0000:00:01.3: WFDL subordinate delay 100 ms [ 107.769246] pcieport 0000:00:01.3: waiting downstream link for 100 ms [ 107.769248] pcieport 0000:00:01.3: WFDL pcie_wait_for_link_delay(100) [ 107.917194] pcieport 0000:00:03.1: WFDL PCI_EXP_TYPE=4, disconnected=0 [ 107.917784] pcieport 0000:00:03.1: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1 [ 107.918322] pcieport 0000:00:03.1: WFDL subordinate delay 100 ms [ 107.918853] pcieport 0000:00:03.1: waiting downstream link for 100 ms [ 107.918855] pcieport 0000:00:03.1: WFDL pcie_wait_for_link_delay(100) [ 108.057178] pcieport 0000:00:07.1: WFDL PCI_EXP_TYPE=4, disconnected=0 [ 108.057757] pcieport 0000:00:07.1: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1 [ 108.058280] pcieport 0000:00:07.1: WFDL subordinate delay 100 ms [ 108.058796] pcieport 0000:00:07.1: waiting downstream link for 100 ms [ 108.058797] pcieport 0000:00:07.1: WFDL pcie_wait_for_link_delay(100) [ 108.201178] pcieport 0000:00:08.1: WFDL PCI_EXP_TYPE=4, disconnected=0 [ 108.201742] pcieport 0000:00:08.1: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1 [ 108.202250] pcieport 0000:00:08.1: WFDL subordinate delay 100 ms [ 108.202755] pcieport 0000:00:08.1: waiting downstream link for 100 ms [ 108.202756] pcieport 0000:00:08.1: WFDL pcie_wait_for_link_delay(100) [ 108.369122] pcieport 0000:03:00.2: WFDL PCI_EXP_TYPE=5, disconnected=0 [ 108.381134] pcieport 0000:16:00.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 108.381680] pcieport 0000:16:00.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 108.394134] pcieport 0000:16:01.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 108.394677] pcieport 0000:16:01.0: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1 [ 108.395159] pcieport 0000:16:01.0: WFDL subordinate delay 100 ms [ 108.395630] pcieport 0000:16:01.0: waiting downstream link for 100 ms [ 108.395632] pcieport 0000:16:01.0: WFDL PCIE_SPEED=21 link_active_reporting=1 msleep(100) [ 108.513134] pcieport 0000:16:02.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 108.513669] pcieport 0000:16:02.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 108.526134] pcieport 0000:16:03.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 108.526660] pcieport 0000:16:03.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 108.539136] pcieport 0000:16:04.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 108.539659] pcieport 0000:16:04.0: WFDL subordinate? 1 list_empty? 1 bridge_d3? 1 [ 108.552134] pcieport 0000:16:08.0: WFDL PCI_EXP_TYPE=6, disconnected=0 [ 108.552657] pcieport 0000:16:08.0: WFDL subordinate? 1 list_empty? 0 bridge_d3? 1 [ 108.553114] pcieport 0000:16:08.0: WFDL subordinate delay 100 ms [ 108.553565] pcieport 0000:16:08.0: waiting downstream link for 100 ms [ 108.553567] pcieport 0000:16:08.0: WFDL pcie_wait_for_link_delay(100) [ 108.749377] usb usb1: root hub lost power or was reset [ 108.749844] usb usb2: root hub lost power or was reset [ 108.888074] r8169 0000:21:00.0 eth0: Link is Down [ 109.771660] ata9: SATA link down (SStatus 0 SControl 300) [ 109.772299] ata2: SATA link down (SStatus 0 SControl 300) [ 109.772885] ata8: SATA link down (SStatus 0 SControl 300) [ 109.773515] ata6: SATA link down (SStatus 0 SControl 300) [ 109.774003] ata4: SATA link down (SStatus 0 SControl 300) [ 109.789390] sd 0:0:0:0: [sda] Starting disk [ 109.925038] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 109.933040] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 109.933511] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 109.934610] ata1.00: supports DRM functions and may not be fully accessible [ 109.939143] ata1.00: supports DRM functions and may not be fully accessible [ 109.941407] ata7.00: configured for UDMA/100 [ 109.942361] ata1.00: configured for UDMA/133 [ 109.942803] ata1.00: Enabling discard_zeroes_data [ 109.943279] ata3.00: configured for UDMA/133 [ 110.064170] usb 1-3: reset high-speed USB device number 2 using xhci_hcd [ 110.501036] ata5: failed to resume link (SControl 0) [ 110.501487] ata5: SATA link down (SStatus 0 SControl 0) [ 110.601152] usb 1-11: reset low-speed USB device number 3 using xhci_hcd [ 110.985772] sd 2:0:0:0: [sdb] Starting disk [ 111.241141] usb 1-13: reset low-speed USB device number 4 using xhci_hcd [ 111.836531] PM: resume devices took 3.087 seconds [ 111.836996] OOM killer enabled. [ 111.837442] Restarting tasks ... done. [ 111.842429] PM: suspend exit [ 111.991396] rfkill: input handler disabled
Thanks for the log. Unfortunately I don't see anything suspicious there that could explain. Are you able to check /proc/interrupts on resume? Maybe the PME storm makes the system unresponsive?
Also another experiment would be to comment out both delays in wait_for_downstream_link() and see if that changes anything. If it starts working then we are adding too much delay somewhere.
Created attachment 284223 [details] /proc/interrupts before/after the suspend with unmodified 5.3.0-rc3. (In reply to Mika Westerberg from comment #15) > Thanks for the log. Unfortunately I don't see anything suspicious there that > could explain. > > Are you able to check /proc/interrupts on resume? Maybe the PME storm makes > the system unresponsive? Yes, I can with the help of a script. 5.3.0-rc3 unmodified. dmesg doesn't provide anything useful because I didn't raise the buffer, the PCI driver spams with rate-unlimited interrupts... I see a report rate of 10792/s, and I guess that's not all. /proc/interrupts attached in a .tar.gz tarball because the lines are so long. $ nl dmesg6.txt | head -n5 1 [ 95.856742] pcieport 0000:00:01.3: PME: Spurious native interrupt! 2 [ 95.856829] pcieport 0000:00:01.3: PME: Spurious native interrupt! 3 [ 95.856916] pcieport 0000:00:01.3: PME: Spurious native interrupt! 4 [ 95.857003] pcieport 0000:00:01.3: PME: Spurious native interrupt! 5 [ 95.857091] pcieport 0000:00:01.3: PME: Spurious native interrupt! $ nl dmesg6.txt | tail -n5 2537 [ 96.091739] pcieport 0000:00:01.3: PME: Spurious native interrupt! 2538 [ 96.091825] pcieport 0000:00:01.3: PME: Spurious native interrupt! 2539 [ 96.091913] pcieport 0000:00:01.3: PME: Spurious native interrupt! 2540 [ 96.092000] pcieport 0000:00:01.3: PME: Spurious native interrupt! 2541 [ 96.092088] pcieport 0000:00:01.3: PME: Spurious native interrupt!
(In reply to Mika Westerberg from comment #16) > Also another experiment would be to comment out both delays in > wait_for_downstream_link() and see if that changes anything. If it starts > working then we are adding too much delay somewhere. Masking the if...else... block containing the msleep() and the wait-for-link-then-delay with the preprocessor (#if 0... #endif) yields a working system after suspend. With what command (debugfs writing or whatnot) can I enable "full debug" on everything PCI-related? I'd try to kill the PME message above and see if anything else complains about missing devices or such to pinpoint the issue. (Note that the pci.c delay apparently doesn't trigger on my system, with my debugger patch mentioned earlier, I never saw the added pci_info(dev, ...) message.) I am wondering about the 0000:00:01.3 PME interrupts which are also gone without those delays.
Created attachment 284225 [details] lspci -vv on 5.3.0-rc3 before/after, as .tar.gz I am also attaching lspci -vv on 5.3.0-rc3 before/after (this is taken under PME IRQ storm, apparently the system is responsive enough to write this out to the USB key, and gdm would try and let me in had not my test script rebooted automatically after 30 s...).
Created attachment 284227 [details] .tar.xz (!) tarball of dmesg-after-suspend, and before/after lspci and /proc/interrupts Finally, this is 5.3.0-rc3 with lots of PCI and ACPI dyndbg enabled, and the PME Spurious interrupt message removed: dmesg (from where I called systemctl suspend), lspci -vv, /proc/interrupts. This time as .tar.xz archive. Perhaps this helps.
Created attachment 284229 [details] Disable PME interrupt during suspend/resume Thanks for the logs! The /proc/interrupts shows that it only gets one PME interrupt but the driver seems to spin in the worker function trying to identify the originating device. I'm attaching a patch to disable PME interrupt over suspend can you try whether it makes the system more responsive?
Created attachment 284231 [details] interrupts before/after suspend (In reply to Mika Westerberg from comment #21) > Created attachment 284229 [details] > Disable PME interrupt during suspend/resume > > Thanks for the logs! The /proc/interrupts shows that it only gets one PME > interrupt but the driver seems to spin in the worker function trying to > identify the originating device. > > I'm attaching a patch to disable PME interrupt over suspend can you try > whether it makes the system more responsive? It helps with debugging, but does not make the computer usable. Same PME spam. If I demote that PME Spurious interrupt to pci_dbg(), I am rid of the PME Spurious interrupt loging (/proc/interrupts before/after attached), but the system still winds itself up unusable after a short time, with processes stuck in state D again, and kworker/0:6+events eating 100% of one core (6.5 % sys makes perfect sense on 8 cores x 2 threads), see top: top - 19:15:35 up 7 min, 1 user, load average: 3,66, 2,00, 0,94 Tasks: 347 total, 3 running, 344 sleeping, 0 stopped, 0 zombie %Cpu(s): 0,5 us, 6,5 sy, 0,0 ni, 86,6 id, 6,2 wa, 0,1 hi, 0,1 si, 0,0 st MiB Mem : 32125,6 total, 30148,2 free, 891,4 used, 1086,0 buff/cache MiB Swap: 16160,0 total, 16160,0 free, 0,0 used. 30816,8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1058 root 20 0 0 0 0 R 99,7 0,0 6:30.41 kworker/0:6+events 6092 root 20 0 226336 4436 3252 S 0,7 0,0 0:00.10 tmux: server 1169 root 20 0 0 0 0 I 0,3 0,0 0:00.10 kworker/4:4-events 6349 root 20 0 227364 4648 3824 R 0,3 0,0 0:00.04 top This core fires up its turbo and runs at an average of ~3.6 GHz and burns almost 19 Watt, per turbostat, which is ¼...⅓ of its design power... Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ POLL C1 C2 POLL% C1% C2% CorWatt PkgWatt - - 239 6.85 3496 3000 16824 1 5385 3596 0.00 6.10 87.15 4.76 18.81 0 0 3586 100.00 3586 3000 5051 0 0 0 0.00 0.00 0.00 3.86 18.81 0 1 6 0.17 3241 3000 1367 0 269 640 0.00 5.07 94.79 1 2 4 0.32 1393 3000 897 1 268 347 0.00 3.31 96.45 0.06 1 3 2 0.15 1389 3000 264 0 146 105 0.00 2.66 97.25 ...
Created attachment 284245 [details] PME debugging It looks like it is the PME worker function that spins interrupts disabled causing the unresponsiveness. Can you try the attached patch? It should add bit more debugging so we hopefully understand more why that happens. Please also enable PCI debugging as you did previously.
Created attachment 284249 [details] debug logs with 284245 (In reply to Mika Westerberg from comment #23) > Created attachment 284245 [details] > PME debugging > > It looks like it is the PME worker function that spins interrupts disabled > causing the unresponsiveness. Can you try the attached patch? It should add > bit more debugging so we hopefully understand more why that happens. Please > also enable PCI debugging as you did previously. Find the usual debug dumps attached as .tar.xz, 5.3-rc3 with only your patch on top. This was with debugging info per: echo 'module *pci* +p' >/sys/kernel/debug/dynamic_debug/control
Created attachment 284251 [details] Updated PME debugging Thanks for the logs! It looks like either the root port PME status is not cleared or xHCI continues to send PMEs even if disabled. I've updated the patch a bit so that it logs bit more and also bails out from the loop after 5000 iterations. Can you give it a try as well?
Created attachment 284253 [details] Fixed updated PME debugging Please try this one instead of the previous buggy one.
Created attachment 284257 [details] usual debug logs w/ 5.3-rc3 and patch 284253 (In reply to Mika Westerberg from comment #26) > Created attachment 284253 [details] > Fixed updated PME debugging > > Please try this one instead of the previous buggy one. Hum... the dmesg4.txt isn't really useful because it misses the time right after resume, please also check the journalctl.txt (note: 11 MB, choose your viewer wisely) which I took from journalctl -b-1 after next boot.
Created attachment 284459 [details] tree lspci and inxi from affected system I seem to be affected by this bug, on 5.2.8. Same story, xHCI controller and "PME: Spurious native interrupt!" spam in dmesg, filling journal and making machine unusable. You can throw patches at me as well.
The fix (revert) should be already in v5.3-rc4 and possibly in latest 5.2.x as well.