Created attachment 303500 [details] lspci -vv output without pcie_aspm=off I recently purchased a Thunderbolt 4 dock (CalDigit TS4) and started having millions of these warnings in my logs after resuming from sleep. I previously didn't have any Thunderbolt peripherals. The device is a ThinkPad X1 Extreme Gen 5 (BIOS 1.12 N3JET28W, EC 1.08 N3JHT21W). Dec 29 18:51:05 FredArch systemd[1]: Starting System Suspend... Dec 29 18:51:05 FredArch systemd-sleep[31007]: Entering sleep state 'suspend'... Dec 29 18:51:05 FredArch kernel: PM: suspend entry (s2idle) Dec 29 18:51:07 FredArch kernel: Filesystems sync: 1.566 seconds Dec 29 18:52:30 FredArch kernel: Freezing user space processes ... (elapsed 0.001 seconds) done. Dec 29 18:52:30 FredArch kernel: OOM killer disabled. Dec 29 18:52:30 FredArch kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Dec 29 18:52:30 FredArch kernel: printk: Suspending console(s) (use no_console_suspend to debug) Dec 29 18:52:30 FredArch kernel: ACPI: EC: interrupt blocked Dec 29 18:52:30 FredArch kernel: ACPI: EC: interrupt unblocked Dec 29 18:52:30 FredArch kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:21:01.0 Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: device [8086:1136] error status/mask=00001100/00002000 Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: [ 8] Rollover Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: [12] Timeout Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: AER: Error of this Agent is reported first Dec 29 18:52:30 FredArch kernel: pcieport 0000:23:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) Dec 29 18:52:30 FredArch kernel: pcieport 0000:23:00.0: device [8086:0b26] error status/mask=00001000/00002000 Dec 29 18:52:30 FredArch kernel: pcieport 0000:23:00.0: [12] Timeout Dec 29 18:52:30 FredArch kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:21:01.0 Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: device [8086:1136] error status/mask=00001100/00002000 Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: [ 8] Rollover Dec 29 18:52:30 FredArch kernel: pcieport 0000:21:01.0: [12] Timeout $ cat /proc/version Linux version 6.1.1-arch1-1 (linux@archlinux) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0) #1 SMP PREEMPT_DYNAMIC Wed, 21 Dec 2022 22:27:55 +0000 $ lspci -nn 00:00.0 Host bridge [0600]: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers [8086:4641] (rev 02) 00:01.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 [8086:460d] (rev 02) 00:04.0 Signal processing controller [1180]: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant [8086:461d] (rev 02) 00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 02) 00:08.0 System peripheral [0880]: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] (rev 02) 00:0a.0 Signal processing controller [1180]: Intel Corporation Platform Monitoring Technology [8086:467d] (rev 01) 00:14.0 USB controller [0c03]: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller [8086:51ed] (rev 01) 00:14.2 RAM memory [0500]: Intel Corporation Alder Lake PCH Shared SRAM [8086:51ef] (rev 01) 00:14.3 Network controller [0280]: Intel Corporation Alder Lake-P PCH CNVi WiFi [8086:51f0] (rev 01) 00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 [8086:51e8] (rev 01) 00:16.0 Communication controller [0780]: Intel Corporation Alder Lake PCH HECI Controller [8086:51e0] (rev 01) 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:51b8] (rev 01) 00:1c.7 PCI bridge [0604]: Intel Corporation Alder Lake PCH-P PCI Express Root Port #9 [8086:51bf] (rev 01) 00:1d.0 PCI bridge [0604]: Intel Corporation Device [8086:51b0] (rev 01) 00:1f.0 ISA bridge [0601]: Intel Corporation Alder Lake PCH eSPI Controller [8086:5182] (rev 01) 00:1f.3 Multimedia audio controller [0401]: Intel Corporation Alder Lake PCH-P High Definition Audio Controller [8086:51c8] (rev 01) 00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake PCH-P SMBus Host Controller [8086:51a3] (rev 01) 00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-P PCH SPI Controller [8086:51a4] (rev 01) 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA103M [GeForce RTX 3080 Ti Mobile] [10de:2420] (rev a1) 01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:2288] (rev a1) 04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a80c] 0a:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5261 PCI Express Card Reader [10ec:5261] (rev 01) 20:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02) 21:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02) 21:01.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02) 21:02.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02) 21:03.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02) 22:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020] [8086:1137] 23:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03) 24:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03) 24:01.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03) 24:02.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03) 24:03.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03) 24:04.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Goshen Ridge 2020] [8086:0b26] (rev 03) 55:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller (2) I225-LMvP [8086:5502] (rev 03) 56:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020] [8086:1138] It happened every time after resuming from sleep. pcie_aspm=off solved the issue for me. Some related posts I found: [1][2]. Maybe we need some quirk patches like [3]? [1] https://bbs.archlinux.org/viewtopic.php?id=274935 [2] https://askubuntu.com/questions/1394924/35-gb-day-of-pcie-bus-error-severity-corrected-type-data-link-layer-in-sy [3] https://lkml.iu.edu/hypermail/linux/kernel/2008.0/01418.html
Created attachment 303501 [details] lspci -vv output with pcie_aspm=off
I just realised that pcie_aspm=off broke most of my dock's functions. I still had Ethernet but wake-on-lan stopped working. The dock's Thunderbolt ports, USB Type-A/C data ports, SD card slots all stopped working too (no logs at all after plugging in things). Then I tested pcie_aspm.policy=performance. The dock started working again but the warning logs were also back. Also tried applying quirk_disable_aspm_l0s_l1 on the Thunderbolt bridges but unfortunately I still had the logs. diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 285acc4aaccc..495e976606b6 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -2393,8 +2393,11 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev) * disable both L0s and L1 for now to be safe. */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1136, quirk_disable_aspm_l0s_l1); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0b26, quirk_disable_aspm_l0s_l1); + /* * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain * Link bit cleared after starting the link retrain process to allow this * process to finish. And I noticed that the warning logs stopped once I plugged something in (NVMe enclosure or SD card), and started again once I ran `udisksctl power-off`. This was without any parameters or patches.
I noticed that I can disable ACPI wakeup to avoid the warning log flooding. $ lspci -t -[0000:00]-+-00.0 +-01.0-[01]--+-00.0 | \-00.1 +-04.0 +-06.0-[04]----00.0 +-08.0 +-0a.0 +-14.0 +-14.2 +-14.3 +-15.0 +-16.0 +-1c.0-[08]-- +-1c.7-[0a]----00.0 +-1d.0-[20-89]----00.0-[21-89]--+-00.0-[22]----00.0 | +-01.0-[23-55]----00.0-[24-55]--+-00.0-[25]-- | | +-01.0-[26-34]-- | | +-02.0-[35-43]-- | | +-03.0-[44-54]-- | | \-04.0-[55]----00.0 | +-02.0-[56]----00.0 | \-03.0-[57-89]-- +-1f.0 +-1f.3 +-1f.4 \-1f.5 $ cat /proc/acpi/wakeup Device S-state Status Sysfs node PEG0 S4 *enabled pci:0000:00:06.0 PEGP S4 *disabled pci:0000:04:00.0 PEG1 S4 *enabled pci:0000:00:01.0 PEGP S4 *disabled pci:0000:01:00.0 PEG2 S4 *disabled PEGP S4 *disabled XHCI S3 *enabled pci:0000:00:14.0 XDCI S4 *disabled HDAS S4 *disabled pci:0000:00:1f.3 CNVW S4 *disabled pci:0000:00:14.3 RP01 S4 *enabled pci:0000:00:1c.0 PXSX S4 *disabled RP02 S4 *disabled PXSX S4 *disabled RP03 S4 *disabled PXSX S4 *disabled RP04 S4 *disabled PXSX S4 *disabled PXSX S4 *disabled RP06 S4 *disabled PXSX S4 *disabled RP07 S4 *disabled PXSX S4 *disabled RP08 S4 *enabled pci:0000:00:1c.7 PXSX S4 *disabled pci:0000:0a:00.0 *disabled platform:rtsx_pci_sdmmc.0 RP09 S4 *enabled pci:0000:00:1d.0 PXSX S4 *enabled pci:0000:20:00.0 RP10 S4 *disabled PXSX S4 *disabled RP11 S4 *disabled PXSX S4 *disabled RP12 S4 *disabled PXSX S4 *disabled RP13 S4 *disabled PXSX S4 *disabled RP14 S4 *disabled PXSX S4 *disabled RP15 S4 *disabled PXSX S4 *disabled RP16 S4 *disabled PXSX S4 *disabled RP17 S4 *disabled PXSX S4 *disabled RP18 S4 *disabled PXSX S4 *disabled RP19 S4 *disabled PXSX S4 *disabled RP20 S4 *disabled PXSX S4 *disabled RP21 S4 *disabled PXSX S4 *disabled RP22 S4 *disabled PXSX S4 *disabled RP23 S4 *disabled PXSX S4 *disabled RP24 S4 *disabled PXSX S4 *disabled RP25 S4 *disabled PXSX S4 *disabled RP26 S4 *disabled PXSX S4 *disabled RP27 S4 *disabled PXSX S4 *disabled RP28 S4 *disabled PXSX S4 *disabled AWAC S4 *enabled platform:ACPI000E:00 SLPB S3 *enabled platform:PNP0C0E:00 LID S4 *enabled platform:PNP0C0D:00 $ echo RP09 | sudo tee /proc/acpi/wakeup RP09 $ grep RP09 /proc/acpi/wakeup RP09 S4 *disabled pci:0000:00:1d.0 Wake-on-LAN from S3 stopped working (as expected) though.
Frederick, do you think it is an adapter problem? Are you able to test with a device, maybe from the IT group or a friend?