Bug 208667
Summary: | PCIe AER errors on ASMedia ASM1083/1085 PCIe to PCI bridge with ASPM enabled | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Robert Hancock (hancockrwd) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.7.9 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg output (with pci=noaer set)
lspci -vvnnxxx output Patch to disable ASPM on ASM1083/1085 lspci -vvnnxxx output under Windows 10 build 2004 Windows system error log entry |
Description
Robert Hancock
2020-07-23 00:39:15 UTC
Created attachment 290465 [details]
lspci -vvnnxxx output
Created attachment 290467 [details] Patch to disable ASPM on ASM1083/1085 Here is a patch I wrote up which seems to fix the issue by disabling ASPM on these devices. I came across this page on ASMedia's site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114 which indicates this device has "No PCIe ASPM support". It's not clear why this problem isn't occurring on Windows however - either it is not enabling ASPM, somehow it doesn't cause issues with the PCIe link, or it is causing issues and just doesn't notify the user in any way.. Created attachment 290471 [details]
lspci -vvnnxxx output under Windows 10 build 2004
It appears that Windows 10 (build 2004) has ASPM L0s enabled for this device according to this lspci output. In the Power Options window, the Balanced plan is selected, and under PCI Express, Link State Power Management, it's set to "Moderate power savings" which the tooltip says enables L0s.
Created attachment 290473 [details]
Windows system error log entry
However, Windows seems to have the exact same issue with AER errors occurring on this device. There are WHEA correctable hardware error event entries being logged for the PCI Express Root Port in the system event log (see screenshot).
The commit below has now been merged into mainline and should be in the next release after 5.8-rc7. commit b361663c5a40c8bc758b7f7f2239f7a192180e7c Author: Robert Hancock <hancockrwd@gmail.com> Date: Tue Jul 21 20:18:03 2020 -0600 PCI/ASPM: Disable ASPM on ASMedia ASM1083/1085 PCIe-to-PCI bridge Recently ASPM handling was changed to allow ASPM on PCIe-to-PCI/PCI-X bridges. Unfortunately the ASMedia ASM1083/1085 PCIe to PCI bridge device doesn't seem to function properly with ASPM enabled. On an Asus PRIME H270-PRO motherboard, it causes errors like these: pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) pcieport 0000:00:1c.0: AER: device [8086:a292] error status/mask=00003000/00002000 pcieport 0000:00:1c.0: AER: [12] Timeout pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 pcieport 0000:00:1c.0: AER: can't find device of ID00e0 In addition to flooding the kernel log, this also causes the machine to wake up immediately after suspend is initiated. The device advertises ASPM L0s and L1 support in the Link Capabilities register, but the ASMedia web page for ASM1083 [1] claims "No PCIe ASPM support". Windows 10 (build 2004) enables L0s, but it also logs correctable PCIe errors. Add a quirk to disable ASPM for this device. [1] https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114 [bhelgaas: commit log] Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208667 Link: https://lore.kernel.org/r/20200722021803.17958-1-hancockrwd@gmail.com Signed-off-by: Robert Hancock <hancockrwd@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> |