Bug 177471 - PCIe ACS incompatibility with Pericom device ID 0x2404 4-port PCIe switch
Summary: PCIe ACS incompatibility with Pericom device ID 0x2404 4-port PCIe switch
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-13 20:18 UTC by Alex Williamson
Modified: 2017-06-12 15:41 UTC (History)
3 users (show)

See Also:
Kernel Version: v4.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Device specific ACS enable skipping RR enabling (1.11 KB, patch)
2016-10-13 20:18 UTC, Alex Williamson
Details | Diff
lspci -nnvvv from device components (21.20 KB, text/plain)
2016-10-13 20:19 UTC, Alex Williamson
Details

Description Alex Williamson 2016-10-13 20:18:58 UTC
Created attachment 241651 [details]
Device specific ACS enable skipping RR enabling

The Syba dual-port NIC, model SD-PEX24033[1] makes use of a Pericom switch (PI7C9X2G404SLAFDE-1331GT[2]) with two Realtek NIC controllers (RTL8111F-E411631-GE28).  What I've found is that with an IOMMU enabled (on either Intel or AMD x86_64 hosts), the device fails to work, while it works normally with the IOMMU disabled (iommu=pt makes no difference).  No DMA faults are reported in the non-working case.  The non-working behavior is that the NIC is simply unable to transmit packets.  A DHCP will time out with ifconfig reporting zero TX packets and a tcpdump from another NIC showing no evidence of requests from the device.  The RX count on the 8111F is non-zero and a tcpdump from the NIC is able to see packets.

ACS enabling on the downstream switch ports appears to be the cause of this.  Using setpci I can clear the ACS control word on each downstream port.  This allows the downstream NIC to operate normally.  Through trial and error, I've found that the P2P Request Redirect enabling is specifically to blame.  Creating a device specific ACS enable function, as shown in the attached patch, avoids this problem, at the cost of reduced isolation for devices downstream of these ports.  Without this patch each RTL8111F NIC appears in its own IOMMU group (but of course doesn't work), with it, each NIC shares a group with the downstream switch port above it.  If the NIC were a multifunction device, all functions would be grouped together.

I'm attempting to contact both Pericom and Syba support to check whether this is a known issue with this switch or particular to this configuration.  I'm also not sure if the device ID 0x2404 is more widely used on other Pericom products, which may or may not exhibit this issue (I hope that 0x2404 is based on the product ID, but without access to more devices or the datasheets, which are per request, I can only guess).  The product page[2] does indicate a 'B' rev chip is available, which is newer than I have, but the PCN does not mention this particular issue.

 +-03.0-[02-06]----00.0-[03-06]--+-01.0-[04]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
 |                               +-02.0-[05]--
 |                               \-03.0-[06]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller

[1] http://www.sybausa.com/index.php?route=product/product&product_id=131
[2] https://www.pericom.com/products/pcie-switch/part/PI7C9X2G404SL
Comment 1 Alex Williamson 2016-10-13 20:19:37 UTC
Created attachment 241661 [details]
lspci -nnvvv from device components
Comment 2 Alex Williamson 2016-10-13 20:25:17 UTC
Note that the lspci shows ACS RR disabled, this is running with the attached patch which does this.  This can be achieved manually with 'setpci -s $DEV 226.w=0:4' for each downstream switch port.  Note that once in a non-working state, re-configuring ACS does not seem to fix it, a reboot is required.  The manual fixup should be applied before loading the r8169 driver or at least prior to configuring the interface up.

(feel free to assign this bug to me, I just want to document it so I don't lose it while I wait to see if I get an info from the vendor)
Comment 3 Alex Williamson 2016-10-13 20:31:07 UTC
Another question is whether it's worthwhile to enable ACS at all if we cannot enable RR, a simpler patch might be an enable function which is a nop.

Note You need to log in before you can comment on or make changes to this bug.