Bug 196409

Summary: kvm_amd nested pagetable gpu passthrough performance oddities
Product: Virtualization Reporter: efeu (efeu)
Component: kvmAssignee: virtualization_kvm
Status: NEW ---    
Severity: high CC: andizhigh, andri, benjamin, bierhumpen, bj, bonzini, clemens, dark.vvorth, dennis, durok, eldiablodivino, en.yaye, ephemient, eric, ethan.palmer, georg.markowitsch, hzzhan9, ian.montgomery, imatiimba, itvend, iwanovich, john, kalinda, manuel, marco_silva85, martin, maspe36, mfadom99, michael.osmolski, patrickett, pavel.kondjukov, pousaduarte, sarnex, sgsdxzy, thbeck84, xrevolver
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.10.8-1 Subsystem:
Regression: No Bisected commit-id:

Description efeu 2017-07-18 10:39:59 UTC
The hardware I was testing with:
AMD Ryzen R7 1700
Gigabyte GA-AX370-Gaming 5
different GPUs
Windows10 x64 Guest

But this bug is reproducable on AMD FX series too.

While community is discussing for a while about this bug, I haven't found it here. Based on our discussion here:
http://www.spinics.net/lists/kvm/msg149446.html

While npt is enabled a passed through GPU is giving much less performance as expected. Here some test results the community already did:

First Heaven benchmark with ultra settings on 1920x1080:

- DirectX 11:
  - npt=0: 87.0 fps
  - npt=1: 78.4 fps (10% drop)
- DirectX 9:
  - npt=0: 100.0 fps
  - npt=1: 66.4 fps (33% drop)
- OpenGL:
  - npt=0: 82.5 fps
  - npt=1: 35.2 fps (58% drop)

Heaven Benchmark again, this time with low settings on 1280x720:

- DirectX 11:
  - npt=0: 182.5 fps
  - npt=1: 140.1 fps (25% drop)
- DirectX 9:
  - npt=0: 169.2 fps
  - npt=1: 74.1 fps (56% drop)
- OpenGL:
  - npt=0: 202.8 fps
  - npt=1: 45.0 fps (78% drop)

PerformanceTest 9.0 3d benchmark:

- DirectX 9:
  - npt=0: 157 fps
  - npt=1: 13 fps (92% drop)
- DirectX 10:
  - npt=0: 220 fps
  - npt=1: 212 fps (4% drop)
- DirectX 11:
  - npt=0: 234 fps
  - npt=1: 140 fps (40% drop)
- DirectX 12:
  - npt=0: 88 fps (scored 35 because of the penalized FPS of not being
able to run at 4k)
  - npt=1: 4.5 fps (scored 1, 95% drop)
- GPU Compute:
  - Mandel:
    - npt=0: ~= 2000 fps
    - npt=1: ~= 2000 fps
  - Bitonic Sort:
    - npt=0: ~= 153583696.0 elements/sec
    - npt=1: ~= 106233376.0 elements/sec (31% drop)
  - QJulia4D:
    - npt=0: ~= 1000 fps
    - npt=1: ~= 1000 fps
  - OpenCL:
    - npt=0: ~= 750 fps
    - npt=1: ~= 220 fps
Some more data from 3DMark benchmarks:

Time Spy(DirectX 12):
- Graphics test 1:
  - npt=0: 37.65 FPS
  - npt=1: 24.22 FPS (36% drop)
- Graphics test 2:
  - npt=0: 33.05 FPS
  - npt=1: 29.65 FPS (10% drop)
- CPU test:
  - npt=0: 17.35 FPS
  - npt=1: 12.03 FPS (31% drop)

Fire Strike(DirectX 11):
- Graphics test 1:
  - npt=0: 80.56 FPS
  - npt=1: 41.89 FPS (49% drop)
- Graphics test 2:
  - npt=0: 70.64 FPS
  - npt=1: 60.75 FPS (14% drop)
- Physics test:
  - npt=0: 50.14 FPS
  - npt=1: 5.78 FPS (89% drop)
- Combined test:
  - npt=0: 32.83 FPS
  - npt=1: 17.70 FPS (47% drop)

Sky Diver(DirectX 11):
- Graphics test 1:
  - npt=0: 248.81 FPS
  - npt=1: 173.63 FPS (31% drop)
- Graphics test 2:
  - npt=0: 250.49 FPS
  - npt=1: 124.84 FPS (51% drop)
- Physics test:
  - 8 threads:
    - npt=0: 140.93 FPS
    - npt=1: 119.08 FPS (15% drop)
  - 24 threads:
    - npt=0: 110.22 FPS
    - npt=1: 74.55 FPS (33% drop)
  - 48 threads:
    - npt=1: 71.56 FPS
    - npt=1: 45.93 FPS (36% drop)
  - 96 threads:
    - npt=0: 41.04 FPS
    - npt=1: 24.81 FPS (40% drop)
- Combined test:
  - npt=0: 75.65 FPS
  - npt=1: 50.45 FPS (33% drop)

I compared the performance with XEN and found out there is no performance impact, so the bug should be in the nested pagetable implementation in kvm_amd module and not a hardware related issue in AMD-Vi.
Comment 1 Michael 2017-07-20 18:53:32 UTC
I too have this NPT issue with Ryzen 1800x and KVM latest kernel.
It's incredible that no-one has managed to resolve this yet.
GPU performance is absolutely dire with npt=1 and CPU is so bad with npt=0. So loss/loss situation.
Comment 2 Thomas 2017-07-26 07:33:44 UTC
Please look into this well documented issue, it makes the new AMD platform very unattractive for advanced virtualization. 

This is very unfortunate because the nature of Ryzen (many cores, competitive IPC, good value, inofficial ECC support) would be excellent for this.
Comment 3 imatiimba 2017-07-26 08:59:47 UTC
I'm also having this issue.  

AMD Ryzen R7 1700  
Asus Prime B350-Plus  
Nvidia 970  
Host: Arch Linux Kernel 4.11.9  
Guest: Windows 10 x64  

Adding to the info collected by efeu seems like the bug isn't specific to Ryzen.  
The first bug report is 9 years old.  
https://sourceforge.net/p/kvm/bugs/230/
Comment 4 durok 2017-07-26 13:52:34 UTC
Same problem here.

AMD Ryzen R7 1700
Gigabyte GA-AX370-Gaming 5
GT 570
R9 290x
Host: Arch
Guest: Windows 8.1/10
Comment 5 Andrey 2017-07-26 16:34:51 UTC
Confirmed with:

AMD Ryzen 5 1600
Gigabyte AB350 Gaming 3
GTX 750 for host
GTX 1050 Ti for guest
Arch Linux as a host, Windows 10 as a guest.
Comment 6 Michael 2017-07-27 10:40:25 UTC
Threadripper just around the corner and noone here knows the answer to the KVM/NPT/AMD issue. Shame really, I think quite a few folk will switch to Intel if this isn't fixed soon.
Comment 7 mfadom99 2017-08-01 05:10:24 UTC
Same issue with kernel 4.10 - 4.12 here.

CPU - AMD Ryzen 7 1700
MB - Asus Prime x370 Pro
Host GPU - NVidia GTX 750ti
VFIO GPU - AMD Radeon RX480
Host OS - Manjaro 17.2
Guest OS - Windows 10

With npt=1, everything is fast, except GPU fps is 10-15 in Skyrim
with npt=0, everything is slow, GPU fps is ~30 in Skyrim
Comment 8 Symon 2017-08-18 06:48:33 UTC
Same Problems here :(
Comment 9 JohnSmith 2017-08-21 18:12:07 UTC
Same problem here. This issue should be a very high priority. It basically destroys GPU passthrough on Ryzen CPUs.
Comment 10 maspe 2017-08-25 14:33:07 UTC
Same problem for me :(

Ubuntu 17.04 - all mainline kernels up to 4.13-rc6

CPU - AMD Ryzen 7 1700
MB - Asus Prime x370 Pro
Host GPU - NVidia GTX 1050ti
passthrough GPU - AMD Radeon RX480
Host OS - Ubuntu 17.04
Guest OS - Windows 10
Comment 11 Duarte Pousa 2017-08-27 10:09:39 UTC
Behaviour confirmed in OpenSUSE Tumbleweed, both in the distribution provided kernels and mainline.

Current Kernel Version: 4.12
Tested Kernel Versions: 4.11,4.12

CPU - AMD Ryzen 7 1700
MB - MSI x370 SLI PLUS PRO
Host GPU - AMD Radeon RX 560
Guest GPU - NVIDIA GTX 750 ti
Host OS - OpenSUSE Tumbleweed
Guest OS - Windows 10 LTSB 2016
Comment 12 Dennis 2017-08-27 15:58:08 UTC
I have the same problems.

Current Kernel Version: 4.12.8
Tested Kernel Versions: 4.10, 4.11, 4.12

CPU - AMD Ryzen 7 1700
MB - ASRock X370 Taichi
Host GPU - Nvidia GT1030
Guest GPU - Nvidia GTX 980 Ti
Host OS - Fedora 26 Workstation
Guest OS - Windows 10 Education
Comment 13 Georg 2017-08-28 21:11:05 UTC
Same problem here

CPU - AMD Ryzen 7 1700
MB - ASRock X370 Killer SLI
Host GPU - AMD Radeon HD6850
Guest GPU - Nvidia GTX 1080 Ti
Host OS - Fedora 26 Workstation
Guest OS - Windows 10 Pro
Comment 14 John Kelley 2017-08-30 08:36:02 UTC
I am also experiencing the above mentioned issue with very similar performance impact. I am currently not using the virtualization features due to this performance bug with AMD Nested Page Tables.

CPU - AMD Ryzen 7 1800x
MB - Asus PRIME X370-PRO
Host GPU - AMD Radeon 480
Guest GPU - AMD Radeon r290
Host OS - Arch Linux, 4.11.9 kernel (Previously 4.9, 4.10)
Guest OS - Windows 10 Pro Anniversary Edition
Comment 15 ML 2017-08-31 11:58:06 UTC
Also the same problem.

CPU - AMD Ryzen 7 1700 & TR 1950X
MB - Gigabyte Gaming 5 & Asrock X399 Fatality Pro
Host GPU - AMD RX 570 & RX 460
Guest GPU - Nvidia 980Ti & 1080ti
Host OS - Fedora 26 Workstation
Guest OS - Windows 10 Pro
Comment 16 sddearjack 2017-09-01 13:50:22 UTC
Had to reply, affected by the same problem, annoying as hell, because I cant find a solution or how to adapt to this *bug.

CPU: R7 1700x
MB: X370 Taichi
Host GPU: RX 550
Guest GPU: 980 ti
Host OS: Gentoo, Arch, Fedora.
Guest OS: Windows 10 Pro
Comment 17 EPalmer 2017-09-02 08:50:07 UTC
I'm also having this issue, currently still using my old machine in the meantime.

CPU:		Ryzen 1700
MB:		Gigabyte X370 Gaming 5
Host GPU:	Radeon 7870 (PITCAIRN)
Guest GPU: 	Radeon 290X (HAWAII)
Host OS: 	Ubuntu 16.04 (Kernel 4.11.0-14-generic Ubuntu)
Guest OS: 	Windows 10/Windows 8.1/Windows 7
Comment 18 Iwanovich 2017-09-02 18:44:54 UTC
Dealing with the same problem here.

CPU - AMD Ryzen 5 1600
Motherboard - Gigabyte AX370 Gaming 5
Host GPU - MSI Nvidia GT710 2GB
Guest GPU - Gainward Nvidia GTX 1050Ti 4GB
Host OS - Fedora 26, kernel 4.12.8.
Guest OS - Microsoft Windows 10
Comment 19 Tiit 2017-09-05 15:31:32 UTC
Dealing with the exact same problem here.

CPU - AMD Ryzen 7 1700X
Motherboard - MSI X370 Gaming Carbon Pro
Host GPU - Any Random GPU
Guest GPU - MSI GTX 1080
Host OS - Fedora 26, kernel 4.13
Guest OS - Microsoft Windows 10
Comment 20 Fruhwirth Clemens 2017-09-08 09:58:24 UTC
For a fix, I am publicly offering a bounty expiring 2017/11/01 of 0.02 Bitcoin (~100$ at the current price) redeemable by the person on the respective git commit.

Maybe a good starting point is grepping through the code for "npt_enabled" and doing some random guesswork, and/or introducing more performance counter at host or guest to increase visibility. Maybe we can take apart a benchmark and identity the operation that is slow.
Comment 21 Andrey 2017-09-09 04:16:44 UTC
(In reply to Fruhwirth Clemens from comment #20)
> For a fix, I am publicly offering a bounty expiring 2017/11/01 of 0.02
> Bitcoin (~100$ at the current price) redeemable by the person on the
> respective git commit.
> 

Another $100 from me.
Comment 22 Allison 2017-09-13 20:57:21 UTC
(In reply to Fruhwirth Clemens and Andrey from comment #20-21)
>> For a fix, I am publicly offering a bounty expiring 2017/11/01 of 0.02
>> Bitcoin (~100$ at the current price) redeemable by the person on the
>> respective git commit.
> 
>Another $100 from me.

Another $25 from me.
In bitcoin that is, and sorry I couldn't give more. Will update if I can get more.
Comment 23 Paolo Bonzini 2017-09-13 21:27:44 UTC
Guys, stop spamming the bug.  AMD is looking at it.
Comment 24 efeu 2017-09-15 07:30:11 UTC
(In reply to Paolo Bonzini from comment #23)
> Guys, stop spamming the bug.  AMD is looking at it.

Source?
Comment 25 efeu 2017-09-24 08:09:20 UTC
@Paolo
There is neither an official AMD statemant nor a confirmation on this issue. You have spoken to an AMD developer? What they told you?

I mean the only thing beeing of interest is how long it will take to be fixed
1 month?, 1 year?, another 10 years?, never?

I mean this issue is not only about GPU passthrough, all PCI-E devices I've tested are affected, those are: 
-SATA-Storage-/SAS-Raid-Controller-Cards (slow-downs/hangs in transfer)
-PCI-E SSDs (slow-downs/hangs in transfer)
-Network-Cards (slow-downs/hangs in transfer)
-TV-Cards (hangs in (hardware decoded) streams)
Comment 26 hzzhan9 2017-09-24 10:43:31 UTC
(In reply to efeu from comment #24)
> (In reply to Paolo Bonzini from comment #23)
> > Guys, stop spamming the bug.  AMD is looking at it.
> 
> Source?

I guess this one https://community.amd.com/thread/215931. I saw an id "ray_m" with the AMD logo and the title "Technical Support Engineer" replied "We are researching the issue ..." several months ago.
Comment 27 efeu 2017-09-24 11:48:44 UTC
Well this is my thread on the AMD board and I have talked to ray_m directly, he "escalated" the issue, that's it. But ray_m did not receive anything from his developers, so this suggests on "noone is working on this issue" to me.
Comment 28 sddearjack 2017-09-24 12:25:43 UTC
Main concern is it software or hardware side. Some sources tell that Nested page tables performance(AMD) is 10 years old *bug. So they are unable to fix it, someone is being paid by someone not to look in this problem, not a major problem for consumer market?
Comment 29 Dennis 2017-09-24 12:28:31 UTC
(In reply to sddearjack from comment #28)
> Main concern is it software or hardware side. Some sources tell that Nested
> page tables performance(AMD) is 10 years old *bug. So they are unable to fix
> it, someone is being paid by someone not to look in this problem, not a
> major problem for consumer market?

Since there doesn't seem to a problem in Xen or ESXi, doesn't that point to a software problem?
Comment 30 efeu 2017-09-24 12:59:06 UTC
We already know that this is in kvm_amd module.
Comment 31 Fawwal 2017-09-26 14:12:30 UTC
Came here to give my support for this.  Looking to make one box to do it all. I think Ryzen is a perfect choice. I'm going to hold off for now, wait a few months for a solution. If ya'll think it will take longer to find a solution than that I may just go for a typical two processor xenon build if no one can find a solution in my time frame.
Comment 32 badOne 2017-10-09 10:19:49 UTC
I also noticed heavy performance loss with my Ryzen 1700X and KVM. Disabling npt completely is no option either, it justs slows down the CPU a ton (which totally makes sense)

My specs are:
 - Ryzen 1700X
 - ASUS Prime x370 Pro
 - GPU0: GTX 1050 (host)
 - GPU1: GTX 1070 (guest)

I also experimented with CPU pinning, mounting a whole SSD instead of just using a virtual disk, trying different drivers, settings, etc.

Nothing of the above seemed to change anything. With npt=1 some games are running fine (e.g. League of Legends), some are running ok (e.g. Path of Exile) and some are running extremely bad (e.g. Player Unknown's Battlegrounds, GTA 5, CS:GO, etc.)

I looked into the code but I'm not able to dive into the project due to the lack of time.

This is a major issue in my opinion and AMD + the guys from KVM should look into this together. Intel got it running, too. In addition with Xen it's working perfectly but I really don't like using Xen, I like KVM a lot more.

Please fix this issue. If help is needed just contact me, I will when time allows it.
Comment 33 Benjamin 2017-10-13 18:44:21 UTC
Came here to give my support for this.

My setup
Ryzen 1700X
MSI x370 gaming pro carbon
GPU0: Asus RX550
GPU1: MSI RX580


Hope this will be fixed soon.
Comment 34 Nick Sarnie 2017-10-24 23:42:42 UTC
Hi all,

Just a quick update. A workaround has been found. Please see the following post:
https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024823.html

Please note that this patch is only a workaround, is incorrect in some cases and cannot be merged upstream in this form. But, it's a start and you can make this change locally to workaround the GPU performance hit.


Thanks,
Sarnex
Comment 35 Benjamin 2017-10-25 10:44:10 UTC
That is awesome news!
Comment 36 Nick Sarnie 2017-10-25 18:46:05 UTC
Another update, a better patch from Paolo is here:

https://marc.info/?l=kvm&m=150891016802546&w=2

Sarnex
Comment 37 Nick Sarnie 2017-10-26 15:11:52 UTC
Paolo has submitted a final patch here:

https://patchwork.kernel.org/patch/10027525/

People who put up a bounty should prepare to pay out (to Geoff because Paolo said so) :)

Sarnex
Comment 38 Fruhwirth Clemens 2017-10-27 08:34:13 UTC
As promised, I paid out my bounty here:
https://blockchain.info/tx/91077ba4e7be1ab303b591efa268f194ab186112caa014576a65a26d69eb1dd2
More payout information for geoff@ is posted here:
https://www.reddit.com/r/VFIO/comments/78i3jx/possible_fix_for_the_npt_issue_discussed_on_iommu/doxtwyn/

Thanks again for working on this.