Bug 214259 - Discrete Thunderbold Controller 8086:1137 throws DMAR and XHCI errors and is only partially functional
Summary: Discrete Thunderbold Controller 8086:1137 throws DMAR and XHCI errors and is ...
Status: RESOLVED DUPLICATE of bug 206459
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-01 13:44 UTC by wse
Modified: 2022-05-23 16:33 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.13.12 and 5.14-rc7
Tree: Mainline
Regression: No


Attachments
dmsg of boot without tb dock connected (81.47 KB, text/plain)
2021-09-01 13:45 UTC, wse
Details
dmsg after connecting tb dock (81.70 KB, text/plain)
2021-09-01 13:45 UTC, wse
Details
dmsg of boot with tb dock connected (91.15 KB, text/plain)
2021-09-01 13:46 UTC, wse
Details
lspci of boot without tb dock connected (88.79 KB, text/plain)
2021-09-01 13:47 UTC, wse
Details
lspci after connecting tb dock (88.78 KB, text/plain)
2021-09-01 13:48 UTC, wse
Details
lspci of boot with tb dock connected (121.05 KB, text/plain)
2021-09-01 13:48 UTC, wse
Details
dmsg of boot without tb dock connected (5.13.12) (82.44 KB, text/plain)
2021-09-01 13:52 UTC, wse
Details
dmsg after connecting tb dock (5.13.12) (97.54 KB, text/plain)
2021-09-01 13:52 UTC, wse
Details
dmsg of boot with tb dock connected (5.13.12) (92.45 KB, text/plain)
2021-09-01 13:52 UTC, wse
Details
lspci of boot without tb dock connected (5.13.12) (88.79 KB, text/plain)
2021-09-01 13:53 UTC, wse
Details
lspci after connecting tb dock (5.13.12) (120.37 KB, text/plain)
2021-09-01 13:53 UTC, wse
Details
lspci of boot with tb dock connected (5.13.12) (121.05 KB, text/plain)
2021-09-01 13:53 UTC, wse
Details

Description wse 2021-09-01 13:44:11 UTC
Affected devices: Clevo X170KM Barebone (I have one here) and according to this reddit thread that describes the exact same problem a Thunderbold PCIe exapansion card: https://www.reddit.com/r/Thunderbolt/comments/ohjakr/asus_thunderboltex_4_linux_compatability/

The Clevo does not use the build in thunderbold controler of the CPU but a discrete Thunderbold controler, which seems to be the exact same one from that expansion card with the pci id of 8086:1137.

High level problem desciption: I have Thunderbold dock with DP-out, USB ports and an Ethernet port. When I plug it in, only the DP port works. When its plugged in before boot, ethernet also works. The USB ports on the Dock never work.

dmesg is showing several erros regarding DMAR and xhci, since the DMAR errors are shwing up first is suspect them to be the root cause making the rest afterwards fails also.

The error is slightly different between 5.13 and 5.14

5.14-rc7:
[    3.148557] DMAR: DRHD: handling fault status reg 2
[    3.148561] DMAR: [DMA Write NO_PASID] Request device [0x04:0x00.0] fault addr 0x69974000 [fault reason 0x05] PTE Write access is not set

5.13.12:
[    3.737783] DMAR: DRHD: handling fault status reg 2
[    3.737790] DMAR: [DMA Write] Request device [04:00.0] PASID ffffffff fault addr 69974000 [fault reason 05] PTE Write access is not set

04.00.0 is the thunderbold controller:
04:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020] [8086:1137] (prog-if 40 [USB4 Host Interface])
Comment 1 wse 2021-09-01 13:45:20 UTC
Created attachment 298567 [details]
dmsg of boot without tb dock connected
Comment 2 wse 2021-09-01 13:45:58 UTC
Created attachment 298569 [details]
dmsg after connecting tb dock
Comment 3 wse 2021-09-01 13:46:41 UTC
Created attachment 298571 [details]
dmsg of boot with tb dock connected
Comment 4 wse 2021-09-01 13:47:45 UTC
Created attachment 298573 [details]
lspci of boot without tb dock connected
Comment 5 wse 2021-09-01 13:48:30 UTC
Created attachment 298575 [details]
lspci after connecting tb dock
Comment 6 wse 2021-09-01 13:48:57 UTC
Created attachment 298577 [details]
lspci of boot with tb dock connected
Comment 7 wse 2021-09-01 13:50:01 UTC
These logs are all for kernel 5.14-rc7
Comment 8 wse 2021-09-01 13:52:01 UTC
Created attachment 298579 [details]
dmsg of boot without tb dock connected (5.13.12)
Comment 9 wse 2021-09-01 13:52:22 UTC
Created attachment 298581 [details]
dmsg after connecting tb dock (5.13.12)
Comment 10 wse 2021-09-01 13:52:43 UTC
Created attachment 298583 [details]
dmsg of boot with tb dock connected (5.13.12)
Comment 11 wse 2021-09-01 13:53:12 UTC
Created attachment 298585 [details]
lspci of boot without tb dock connected (5.13.12)
Comment 12 wse 2021-09-01 13:53:42 UTC
Created attachment 298587 [details]
lspci after connecting tb dock (5.13.12)
Comment 13 wse 2021-09-01 13:53:59 UTC
Created attachment 298589 [details]
lspci of boot with tb dock connected (5.13.12)
Comment 14 wse 2021-09-01 17:40:00 UTC
New info: The intel_iommu=off boot flag makes the DMAR errors go away. However xhci errors stay and the USB ports of the dock are still disfunctional.
So they are seperate issues after all.
Comment 15 wse 2021-09-28 16:24:03 UTC
For reference: A reddit thread discussing the descrete Asus thunderbolt pcie card failing in the exact same way: https://www.reddit.com/r/Thunderbolt/comments/ohjakr/asus_thunderboltex_4_linux_compatability/
Comment 16 wse 2021-10-08 08:32:37 UTC
Found a preexisteng hack originally for tb3 fixing the issue also on this tb4 controller: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c59
Comment 17 wse 2021-10-08 08:33:24 UTC

*** This bug has been marked as a duplicate of bug 206459 ***
Comment 18 Hans de Goede 2022-05-19 18:32:52 UTC
Thank you for your bug report.

I've prepared a patch series fixing bug 206459 as well as this bug:

https://lore.kernel.org/linux-pci/20220519152150.6135-1-hdegoede@redhat.com/T/#t

This series is using DMI matching to identify affected systems and to enable the workaround only on affected systems.

I've used DMI_MATCH(DMI_BOARD_NAME, "X170KM-G") as match for this Clevo Barebone.

Can you confirm that:

cat /sys/class/dmi/id/board_name

outputs "X170KM-G" ?

Or even better, give this patch series a try ? Note the series is based on top of:
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=pci/resource
Comment 19 wse 2022-05-23 13:25:17 UTC
Thank you for the patch.

Yes, "X170KM-G" is the exact match for /sys/class/dmi/id/board_name on the affected device.

Kernel with patch series is compiling atm. Will add another post wether or not it worked.
Comment 20 wse 2022-05-23 16:33:28 UTC
Successfully tested the patchset: Works like a charm.

Note You need to log in before you can comment on or make changes to this bug.