Bug 215453

Summary: Uncorrected errors reported from Thunderbolt ports
Product: Drivers Reporter: Kai-Heng Feng (kai.heng.feng)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: normal CC: bjorn, frederick888, koba.ko, makagucci, mika.westerberg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: mainline, linux-next Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
lspci -vvnn
dmesg_20220105

Description Kai-Heng Feng 2022-01-05 06:03:56 UTC
[   30.100211] pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.0
[   30.100251] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[   30.100256] pcieport 0000:00:1d.0:   device [8086:7ab0] error status/mask=00100000/00004000
[   30.100262] pcieport 0000:00:1d.0:    [20] UnsupReq               (First)
[   30.100267] pcieport 0000:00:1d.0: AER:   TLP Header: 34000000 08000052 00000000 00000000
[   30.100372] thunderbolt 0000:0a:00.0: AER: can't recover (no error_detected callback)
[   30.100401] xhci_hcd 0000:3e:00.0: AER: can't recover (no error_detected callback)
[   30.100427] pcieport 0000:00:1d.0: AER: device recovery failed
Comment 1 Kai-Heng Feng 2022-01-05 06:04:30 UTC
Created attachment 300227 [details]
dmesg
Comment 2 Kai-Heng Feng 2022-01-05 06:04:46 UTC
Created attachment 300228 [details]
lspci -vvnn
Comment 3 KobaKo 2022-01-06 03:47:20 UTC
after hotplug multiple times on tbt ports,
the errors keep continuing to show.
check log, dmesg_20220105
Comment 4 KobaKo 2022-01-06 03:47:56 UTC
Created attachment 300233 [details]
dmesg_20220105
Comment 5 KobaKo 2022-01-06 04:09:15 UTC
reply for comment3, the kernel didn't disable AER.
Comment 6 Frederick Zhang 2022-11-11 06:46:37 UTC
I wonder if [1] is the same issue?

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1990272
Comment 7 Bjorn Helgaas 2022-11-11 15:52:32 UTC
Yes, it looks like the same issue.  This should be resolved by https://git.kernel.org/linus/c01163dbd1b8 ("PCI/PM: Always disable PTM for all devices during suspend"), which appeared in v6.1-rc1.

The critical piece is:

  TLP Header: 34...... ......52

This means the TLP was a PTM Request from a downstream device.  The recipient logs an Unsupported Request error when it receives this request when PTM is disabled (as it is in low-power states).