I noticed that my desktops network card was slow.
I was getting ~40mbit instead of the expected ~933mbit when connecting to machines on the internet (~6 hops), locally all was fine.
After some looking with help from Alexander Duyck it was discovered that disabling L1 ASPM on the networkcard fixed it.
This caused me to dig around in the code, eventually discovering that the ASPM latency check was incorrect, and as a side-effect it fixes my issue.
Created attachment 293047 [details]
lspci without my patches
Created attachment 293049 [details]
lspci with my patches
The fix is that it disables L1 ASPM on 0000:01:00.0-0000:00:01.2 when handling 04:00.0 and both devices are connected there.
The pcie paths for the devices are:
00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
So they share 01:00.0 as a switch.
Exit latency Acceptable latency
Tree: L1 L0s L1 L0s
---------- ------- ----- ------- ------
00:01.2 <32 us -
| 01:00.0 <32 us -
|- 02:03.0 <32 us -
| \03:00.0 <16 us <2us <64 us <512ns
|- 02:04.0 <32 us -
\04:00.0 <64 us unlimited <64 us <512ns
04:00.0 has it's own max latency, as we walk the path the first switch will pass the latency mark, this is something that is currently not detected.
For my system the unlimited L0s could also be a problem with bus stalls...
ASUS Pro WS X570-ACE with AMD Ryzen 9 3900X
BIOS Version: 2206, Release Date: 08/13/2020
Apparently there is a newer BIOS available (Version 3003, 2020/12/07).
Ian was able to work around the I211 NIC performance issue with:
# echo 0 > /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/link/l1_aspm
The I211 NIC is at 03:00.0 in his system, and the path to it is:
00:01.2 --- 01:00.0 -- 02:03.0 --- 03:00.0
The shell command above disables ASPM L1 on the link from 00:01.2 to 01:00.0. The "l1_aspm" sysfs file was added in v5.5.
More details in the thread at