Bug 218525 - Thunderbolt eGPU bad performance
Summary: Thunderbolt eGPU bad performance
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Default virtual assignee for Drivers/USB
URL: https://gitlab.freedesktop.org/drm/am...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-24 10:14 UTC by gipawu
Modified: 2024-09-08 12:53 UTC (History)
3 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
RTX 4090 Unigine Windows Benchmark (2.68 KB, text/html)
2024-02-29 01:50 UTC, Kaukov
Details
RTX 4090 Unigine Linux Benchmark (2.59 KB, text/html)
2024-02-29 01:50 UTC, Kaukov
Details
RTX 4090 dmesg log (113.28 KB, text/plain)
2024-02-29 01:55 UTC, Kaukov
Details
RTX 4090 lspci log (101.75 KB, text/plain)
2024-02-29 01:55 UTC, Kaukov
Details
RTX 4090 dmesg log (112.39 KB, text/plain)
2024-03-01 09:00 UTC, Kaukov
Details
RTX 4090 lspci log (101.17 KB, text/plain)
2024-03-01 09:00 UTC, Kaukov
Details

Description gipawu 2024-02-24 10:14:18 UTC
I opened this bug report because I think there is a performance problem in the Thunderbolt stack.

I have an eGPU R43SG-TB3 adapter, which can be connected either via Thunderbolt 3/4 port, or directly via PCIe M.2 x4 link.

Using the eGPU connected via Thunderbolt, game performance is often absymal, especially with games that use DXVK.

The problem is described at length here: https://gitlab.freedesktop.org/drm/amd/-/issues/2885

but in summary, the main symptoms that indicate the presence of a performance problem with Thunderbolt eGPU are:

- Low and bumpy fps, GPU clock speeds and GPU usage.
- Low or no difference when downscaling the resolution or reducing graphic settings.

Using the direct M.2 connection instead, performance is as good as expected, almost on the level of Windows.

In Windows there is a performance gap of 5-10% between Thunderbolt and M.2 connection, but in Linux the gap is much wider, sometimes more than 50%. Very few games seem to be unaffected by the problem (e.g. Shadows of Tomb Raider and Doom Eternal).

I tested with an Nvidia RTX 2060 Super GPU, paired with an Intel Nuc 13 Pro (i5-1340p and Thunderbolt 4) and with a Dell XPS 9570 (i7-8750H and Thunderbolt 3), both running Fedora 39 and kernel 6.7.5. I tried kernel 6.8.0 rc5 too, but with no differences.

Since the problem emerges with both Nvidia and AMD GPUs (as evident from the linked report) and that the problem is not reproducible by connecting the GPU directly via M.2 connection, bypassing the Thunderbolt port, I suspect there is a performance issue in the Thunderbolt stack, perhaps in proper bandwidth allocation.

I hope that information given here, and in the link given, may help to resolve the problem.

I am of course available to provide additional information, logs, and test any patches if needed. Thank you.
Comment 1 Artem S. Tashkinov 2024-02-24 11:45:04 UTC
CC'ing Mika Westerberg as requested by Mario Limonciello.
Comment 2 Mika Westerberg 2024-02-26 06:35:09 UTC
I'm not a graphics expert but I wonder if you have tried to run some benchmark that works the same on both Windows and Linux? I understand that most of the games you play in Linux are using some sort of "emulation" and that might have some affect here (which is then issue that you need to take with the folks who develop that part if it turns out to be the problem).

I did some testing back then with Unigine Heaven: 

https://benchmark.unigine.com/heaven 

Can you perhaps run that (or anything similar) and see what's the difference between the two OSSes on that same hardware?

Then we would need to understand the hardware configuration, can you add the full dmesg and output of 'sudo lspci -vv' when you have the eGPU plugged in?

The third thing that comes to mind is that do you have any PCIe ASPM states enabled? That can be seen in the dmesg too I think and also in the lspci dump.

Also do you have the monitor connected directly to the eGPU outputs or the host?
Comment 3 Kaukov 2024-02-29 01:49:43 UTC
I would also like to chime in regarding eGPUs on Linux. I'm using Nvidia and my current setup uses an RTX 4090. I ran the Unigine Heaven Benchmark 4.0 on the Extreme preset on both Windows 11 Pro and Gentoo with kernel 6.6.16 and Nvidia's proprietary 550.54.14 drivers. I will attempt to attach both benchmark results later.

The Windows benchmark scored 238 points higher and its minimum fps 21.2 higher, while the maximum fps is 23.8 higher. The average fps is 9.4 higher on Windows.

The results seem ok, but the real-world performance is different.

On Nvidia, when opening a program that's hardware-accelerated, the whole system stutters. I think this is an Nvidia-only bug so it can be dismissed.

Some native Linux games straight up crash or run very poorly on the eGPU. My latest encounter is Last Epoch where I got ~20fps on the native port and ~50-60fps via DXVK.

The worst offenders are Unity3D Engine games and MMORPGs. The only game that was able to achieve close to Windows performance was World of Warcraft and that's after ticking or unticking the "Tripple Buffering" graphics option even though vsync isn't used in-game. This somehow refreshes the game/renderer and it starts rendering at high frames per second. FFXIV is stuck at 40-70 fps. Guild Wars 2 struggles to go beyond 20fps even though the RTX3050Ti dGPU of the same laptop manages to get 40-60fps.

Baldur's Gate 3 is also a prime example of the issue. On Linux, I get 35-40fps on the character select screen and 2-12fps in-game. I can get 60-80fps on the 3050Ti dGPU. Both when running via DXVK.

The eGPU runs at the expected PCIe3.0x4 speed - LinkSta 8GT/s.

Wayland is completely unusable on the eGPU while perfectly fine on the dGPU.

I'll also try uploading dmesg and lspci logs here.
Comment 4 Kaukov 2024-02-29 01:50:14 UTC
Created attachment 305922 [details]
RTX 4090 Unigine Windows Benchmark
Comment 5 Kaukov 2024-02-29 01:50:36 UTC
Created attachment 305923 [details]
RTX 4090 Unigine Linux Benchmark
Comment 6 Kaukov 2024-02-29 01:55:23 UTC
Created attachment 305924 [details]
RTX 4090 dmesg log
Comment 7 Kaukov 2024-02-29 01:55:39 UTC
Created attachment 305925 [details]
RTX 4090 lspci log
Comment 8 Artem S. Tashkinov 2024-02-29 09:07:22 UTC
> RTX 4090 Unigine Linux Benchmark
> RTX 4090 Unigine Windows Benchmark

There's like a 3% difference between Windows and Linux results for this benchmark though Windows has two times better min fps.
Comment 9 Mika Westerberg 2024-03-01 06:42:10 UTC
Okay I see a couple of things from HW side that may affect (or may not). First is that the real PCIe link to the 06:00 eGPU device is running on limited bandwidth:

		LnkSta:	Speed 5GT/s (downgraded), Width x4 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-


I don't know why this is but it could be that the graphics driver tunes this because it finds the virtual links only supporting gen 1 speeds. However, this is not true, the virtual PCIe links over Thunderbolt can get up to 90% of the 40G link if there is no other tunneling (such as DisplayPort).

The second thing is that there is ASPM L1 enabled on this same real PCIe link:

		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-

I suggest trying to disable it. Passing something like "pcie_aspm.policy=performance" should make the above to turn into ASPM disabled or so.

Also I suggest experimenting to disable IOMMU (pass intel_iommu=off in the kernel command line).
Comment 10 Kaukov 2024-03-01 08:59:47 UTC
> There's like a 3% difference between Windows and Linux results for this
> benchmark though Windows has two times better min fps.

Yes, but that is a one-off case. The performance otherwise is abysmal.

> First is that the real PCIe link to the 06:00 eGPU device is running on
> limited bandwidth:

This automatically adjusts to 8GT/s, Width x4 when actively using the eGPU. On AMD I couldn't get it to run past 2.5GT/s but I'll try with the new kernel parameters when I have an AMD GPU at hand.

After further testing and setting `pcie_aspm.policy=performance` and `intel_iommu=off`, nothing changed. The performance in games is still abysmal and unplayable. I'll attach my lspci and dmesg logs again, after running Baldur's Gate 3 via Proton Experimental and on DX11.

Could this be a Wine/DXVK/Vulkan issue and not a kernel issue? Although OP stated that when running via a PCIe M.2 x4 link no issues occur.
Comment 11 Kaukov 2024-03-01 09:00:25 UTC
Created attachment 305934 [details]
RTX 4090 dmesg log
Comment 12 Kaukov 2024-03-01 09:00:42 UTC
Created attachment 305935 [details]
RTX 4090 lspci log
Comment 13 Mika Westerberg 2024-03-01 09:39:28 UTC
Okay with this you have ASPM disabled:

		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (downgraded), Width x4 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

However, the link speed is now at gen 1 but I guess that gets adjusted then as you mentioned. I suggest to check that when you play some affected game that it actually runs at the 8GT/s x 4.

One more thing comes to mind but you probably checked it already. The game may not be using the eGPU and instead it is running on the internal one. With the Unigine benchmark it clearly used the eGPU.
Comment 14 Kaukov 2024-03-01 09:46:37 UTC
I tested further, again with Baldur's Gate 3 as that's the only game I have installed currently.

The game runs on the eGPU as confirmed by MangoHud, DXVK_HUD, and in-game settings.

The GPU runs at 8GT/s also confirmed. Nvidia have some power settings which I can't change where the GPU switches to a lower speed when it's not used intensively, i.e. I'm on the desktop and only my wallpaper is drawn, as was the case when I took the logs. When running a game, benchmark, YouTube video - it runs at 8GT/s.

On thing I noticed while testing now is the power draw. For Unigine the GPU drew ~280W and was running fine. For Baldur's Gate 3 it drew ~120W on the menus, but only 70-80W when in-game, where it should be drawing at least 150W and more. On Windows BG3 draws 120-160W on the menus and 160-200W when in-game. Same settings used for both systems.
Comment 15 Benjamin 2024-03-06 02:18:57 UTC
(In reply to Mika Westerberg from comment #13)
> Okay with this you have ASPM disabled:
> 
>               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>               LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded)
>                       TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 
> However, the link speed is now at gen 1 but I guess that gets adjusted then
> as you mentioned. I suggest to check that when you play some affected game
> that it actually runs at the 8GT/s x 4.
> 
> One more thing comes to mind but you probably checked it already. The game
> may not be using the eGPU and instead it is running on the internal one.
> With the Unigine benchmark it clearly used the eGPU.

Hello. I'm just confirming the issue over TB4 on a Lenovo Z13 laptop and RX 7600 XT in an eGPU setup using the same kernel parameters as workarounds (amd_iommu=off instead). It never pulls more than ~50 watts and maybe ~2000MHz on the GPU and ~400 MHz on the VRAM.
Comment 16 Kaukov 2024-03-08 08:17:08 UTC
I did more testing today with the game Last Epoch.

When running the native Linux build, it was running on the 4090 as per the MangoHud overlay, but I got 15-20 fps max. Switching to the Windows build via Proton Experimental with DXVK provided me with 60+ fps (still way lower than what I get on Windows), but abysmal fps drops when some enemies spawn/move underground.

I don't know how to debug this though, so any tips are appreciated!
Comment 17 Kaukov 2024-09-08 12:53:11 UTC
I just upgraded to a Framework 13 with AMD Ryzen 7 7840U. I'm using the same Gentoo installation I've used for the past few years (rebuilt for the new CPU).

Baldur's Gate 3 runs flawlessly with a few less FPS than on Windows, but above 60 and sometimes above 100.

This leads me to believe the issue comes from Intel and only Intel.

Note You need to log in before you can comment on or make changes to this bug.