I opened this bug report because I think there is a performance problem in the Thunderbolt stack. I have an eGPU R43SG-TB3 adapter, which can be connected either via Thunderbolt 3/4 port, or directly via PCIe M.2 x4 link. Using the eGPU connected via Thunderbolt, game performance is often absymal, especially with games that use DXVK. The problem is described at length here: https://gitlab.freedesktop.org/drm/amd/-/issues/2885 but in summary, the main symptoms that indicate the presence of a performance problem with Thunderbolt eGPU are: - Low and bumpy fps, GPU clock speeds and GPU usage. - Low or no difference when downscaling the resolution or reducing graphic settings. Using the direct M.2 connection instead, performance is as good as expected, almost on the level of Windows. In Windows there is a performance gap of 5-10% between Thunderbolt and M.2 connection, but in Linux the gap is much wider, sometimes more than 50%. Very few games seem to be unaffected by the problem (e.g. Shadows of Tomb Raider and Doom Eternal). I tested with an Nvidia RTX 2060 Super GPU, paired with an Intel Nuc 13 Pro (i5-1340p and Thunderbolt 4) and with a Dell XPS 9570 (i7-8750H and Thunderbolt 3), both running Fedora 39 and kernel 6.7.5. I tried kernel 6.8.0 rc5 too, but with no differences. Since the problem emerges with both Nvidia and AMD GPUs (as evident from the linked report) and that the problem is not reproducible by connecting the GPU directly via M.2 connection, bypassing the Thunderbolt port, I suspect there is a performance issue in the Thunderbolt stack, perhaps in proper bandwidth allocation. I hope that information given here, and in the link given, may help to resolve the problem. I am of course available to provide additional information, logs, and test any patches if needed. Thank you.
CC'ing Mika Westerberg as requested by Mario Limonciello.
I'm not a graphics expert but I wonder if you have tried to run some benchmark that works the same on both Windows and Linux? I understand that most of the games you play in Linux are using some sort of "emulation" and that might have some affect here (which is then issue that you need to take with the folks who develop that part if it turns out to be the problem). I did some testing back then with Unigine Heaven: https://benchmark.unigine.com/heaven Can you perhaps run that (or anything similar) and see what's the difference between the two OSSes on that same hardware? Then we would need to understand the hardware configuration, can you add the full dmesg and output of 'sudo lspci -vv' when you have the eGPU plugged in? The third thing that comes to mind is that do you have any PCIe ASPM states enabled? That can be seen in the dmesg too I think and also in the lspci dump. Also do you have the monitor connected directly to the eGPU outputs or the host?
I would also like to chime in regarding eGPUs on Linux. I'm using Nvidia and my current setup uses an RTX 4090. I ran the Unigine Heaven Benchmark 4.0 on the Extreme preset on both Windows 11 Pro and Gentoo with kernel 6.6.16 and Nvidia's proprietary 550.54.14 drivers. I will attempt to attach both benchmark results later. The Windows benchmark scored 238 points higher and its minimum fps 21.2 higher, while the maximum fps is 23.8 higher. The average fps is 9.4 higher on Windows. The results seem ok, but the real-world performance is different. On Nvidia, when opening a program that's hardware-accelerated, the whole system stutters. I think this is an Nvidia-only bug so it can be dismissed. Some native Linux games straight up crash or run very poorly on the eGPU. My latest encounter is Last Epoch where I got ~20fps on the native port and ~50-60fps via DXVK. The worst offenders are Unity3D Engine games and MMORPGs. The only game that was able to achieve close to Windows performance was World of Warcraft and that's after ticking or unticking the "Tripple Buffering" graphics option even though vsync isn't used in-game. This somehow refreshes the game/renderer and it starts rendering at high frames per second. FFXIV is stuck at 40-70 fps. Guild Wars 2 struggles to go beyond 20fps even though the RTX3050Ti dGPU of the same laptop manages to get 40-60fps. Baldur's Gate 3 is also a prime example of the issue. On Linux, I get 35-40fps on the character select screen and 2-12fps in-game. I can get 60-80fps on the 3050Ti dGPU. Both when running via DXVK. The eGPU runs at the expected PCIe3.0x4 speed - LinkSta 8GT/s. Wayland is completely unusable on the eGPU while perfectly fine on the dGPU. I'll also try uploading dmesg and lspci logs here.
Created attachment 305922 [details] RTX 4090 Unigine Windows Benchmark
Created attachment 305923 [details] RTX 4090 Unigine Linux Benchmark
Created attachment 305924 [details] RTX 4090 dmesg log
Created attachment 305925 [details] RTX 4090 lspci log
> RTX 4090 Unigine Linux Benchmark > RTX 4090 Unigine Windows Benchmark There's like a 3% difference between Windows and Linux results for this benchmark though Windows has two times better min fps.
Okay I see a couple of things from HW side that may affect (or may not). First is that the real PCIe link to the 06:00 eGPU device is running on limited bandwidth: LnkSta: Speed 5GT/s (downgraded), Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- I don't know why this is but it could be that the graphics driver tunes this because it finds the virtual links only supporting gen 1 speeds. However, this is not true, the virtual PCIe links over Thunderbolt can get up to 90% of the 40G link if there is no other tunneling (such as DisplayPort). The second thing is that there is ASPM L1 enabled on this same real PCIe link: LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- I suggest trying to disable it. Passing something like "pcie_aspm.policy=performance" should make the above to turn into ASPM disabled or so. Also I suggest experimenting to disable IOMMU (pass intel_iommu=off in the kernel command line).
> There's like a 3% difference between Windows and Linux results for this > benchmark though Windows has two times better min fps. Yes, but that is a one-off case. The performance otherwise is abysmal. > First is that the real PCIe link to the 06:00 eGPU device is running on > limited bandwidth: This automatically adjusts to 8GT/s, Width x4 when actively using the eGPU. On AMD I couldn't get it to run past 2.5GT/s but I'll try with the new kernel parameters when I have an AMD GPU at hand. After further testing and setting `pcie_aspm.policy=performance` and `intel_iommu=off`, nothing changed. The performance in games is still abysmal and unplayable. I'll attach my lspci and dmesg logs again, after running Baldur's Gate 3 via Proton Experimental and on DX11. Could this be a Wine/DXVK/Vulkan issue and not a kernel issue? Although OP stated that when running via a PCIe M.2 x4 link no issues occur.
Created attachment 305934 [details] RTX 4090 dmesg log
Created attachment 305935 [details] RTX 4090 lspci log
Okay with this you have ASPM disabled: LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- However, the link speed is now at gen 1 but I guess that gets adjusted then as you mentioned. I suggest to check that when you play some affected game that it actually runs at the 8GT/s x 4. One more thing comes to mind but you probably checked it already. The game may not be using the eGPU and instead it is running on the internal one. With the Unigine benchmark it clearly used the eGPU.
I tested further, again with Baldur's Gate 3 as that's the only game I have installed currently. The game runs on the eGPU as confirmed by MangoHud, DXVK_HUD, and in-game settings. The GPU runs at 8GT/s also confirmed. Nvidia have some power settings which I can't change where the GPU switches to a lower speed when it's not used intensively, i.e. I'm on the desktop and only my wallpaper is drawn, as was the case when I took the logs. When running a game, benchmark, YouTube video - it runs at 8GT/s. On thing I noticed while testing now is the power draw. For Unigine the GPU drew ~280W and was running fine. For Baldur's Gate 3 it drew ~120W on the menus, but only 70-80W when in-game, where it should be drawing at least 150W and more. On Windows BG3 draws 120-160W on the menus and 160-200W when in-game. Same settings used for both systems.
(In reply to Mika Westerberg from comment #13) > Okay with this you have ASPM disabled: > > LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ > ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > > However, the link speed is now at gen 1 but I guess that gets adjusted then > as you mentioned. I suggest to check that when you play some affected game > that it actually runs at the 8GT/s x 4. > > One more thing comes to mind but you probably checked it already. The game > may not be using the eGPU and instead it is running on the internal one. > With the Unigine benchmark it clearly used the eGPU. Hello. I'm just confirming the issue over TB4 on a Lenovo Z13 laptop and RX 7600 XT in an eGPU setup using the same kernel parameters as workarounds (amd_iommu=off instead). It never pulls more than ~50 watts and maybe ~2000MHz on the GPU and ~400 MHz on the VRAM.
I did more testing today with the game Last Epoch. When running the native Linux build, it was running on the 4090 as per the MangoHud overlay, but I got 15-20 fps max. Switching to the Windows build via Proton Experimental with DXVK provided me with 60+ fps (still way lower than what I get on Windows), but abysmal fps drops when some enemies spawn/move underground. I don't know how to debug this though, so any tips are appreciated!
I just upgraded to a Framework 13 with AMD Ryzen 7 7840U. I'm using the same Gentoo installation I've used for the past few years (rebuilt for the new CPU). Baldur's Gate 3 runs flawlessly with a few less FPS than on Windows, but above 60 and sometimes above 100. This leads me to believe the issue comes from Intel and only Intel.