Created attachment 241651 [details] Device specific ACS enable skipping RR enabling The Syba dual-port NIC, model SD-PEX24033[1] makes use of a Pericom switch (PI7C9X2G404SLAFDE-1331GT[2]) with two Realtek NIC controllers (RTL8111F-E411631-GE28). What I've found is that with an IOMMU enabled (on either Intel or AMD x86_64 hosts), the device fails to work, while it works normally with the IOMMU disabled (iommu=pt makes no difference). No DMA faults are reported in the non-working case. The non-working behavior is that the NIC is simply unable to transmit packets. A DHCP will time out with ifconfig reporting zero TX packets and a tcpdump from another NIC showing no evidence of requests from the device. The RX count on the 8111F is non-zero and a tcpdump from the NIC is able to see packets. ACS enabling on the downstream switch ports appears to be the cause of this. Using setpci I can clear the ACS control word on each downstream port. This allows the downstream NIC to operate normally. Through trial and error, I've found that the P2P Request Redirect enabling is specifically to blame. Creating a device specific ACS enable function, as shown in the attached patch, avoids this problem, at the cost of reduced isolation for devices downstream of these ports. Without this patch each RTL8111F NIC appears in its own IOMMU group (but of course doesn't work), with it, each NIC shares a group with the downstream switch port above it. If the NIC were a multifunction device, all functions would be grouped together. I'm attempting to contact both Pericom and Syba support to check whether this is a known issue with this switch or particular to this configuration. I'm also not sure if the device ID 0x2404 is more widely used on other Pericom products, which may or may not exhibit this issue (I hope that 0x2404 is based on the product ID, but without access to more devices or the datasheets, which are per request, I can only guess). The product page[2] does indicate a 'B' rev chip is available, which is newer than I have, but the PCN does not mention this particular issue. +-03.0-[02-06]----00.0-[03-06]--+-01.0-[04]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller | +-02.0-[05]-- | \-03.0-[06]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [1] http://www.sybausa.com/index.php?route=product/product&product_id=131 [2] https://www.pericom.com/products/pcie-switch/part/PI7C9X2G404SL
Created attachment 241661 [details] lspci -nnvvv from device components
Note that the lspci shows ACS RR disabled, this is running with the attached patch which does this. This can be achieved manually with 'setpci -s $DEV 226.w=0:4' for each downstream switch port. Note that once in a non-working state, re-configuring ACS does not seem to fix it, a reboot is required. The manual fixup should be applied before loading the r8169 driver or at least prior to configuring the interface up. (feel free to assign this bug to me, I just want to document it so I don't lose it while I wait to see if I get an info from the vendor)
Another question is whether it's worthwhile to enable ACS at all if we cannot enable RR, a simpler patch might be an enable function which is a nop.