Bug 218556
Summary: | high number of messages "PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)" | ||
---|---|---|---|
Product: | Drivers | Reporter: | Harald Dunkel (harri) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | RESOLVED ANSWERED | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg -T of a few days
lspci -vxxxxxx |
Created attachment 305955 [details]
lspci -vxxxxxx
Try booting with pci=noaer Or turn off ASPM in BIOS. This is not limited to Linux: https://www.reddit.com/r/intel/comments/17qftj1/whea_corrected_errors_event_id_17_every_once_in_a/ You could also try disabling ASPM for just this device alone: https://bbs.archlinux.org/viewtopic.php?id=264364 Anyways, it's a HW issue which needs to be reported to Intel. There is no BIOS option to turn ASPM off for this host, but I can move control over ASPM from BIOS to the operating system and boot Linux with pcie_aspm=off. This seems to have worked. The warning is gone. |
Created attachment 305954 [details] dmesg -T of a few days I get a pretty high number of messages [Mon Mar 4 00:00:58 2024] pcieport 0000:00:06.0: AER: Corrected error message received from 0000:00:06.0 [Mon Mar 4 00:00:58 2024] pcieport 0000:00:06.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) [Mon Mar 4 00:00:58 2024] pcieport 0000:00:06.0: device [8086:a74d] error status/mask=00000001/00002000 [Mon Mar 4 00:00:58 2024] pcieport 0000:00:06.0: [ 0] RxErr (First) dmesg and lspci are attached. Platform is Debian 12 amd64, self-built kernel 6.7.6. I have seen these messages using Debian's backports kernel 6.5.10 and the default kernel 6.1.76 as well. Between Jan 11th and Mar 4th I got >2000 messages about this, all for "device [8086:a74d]".