Bug 196409
Summary: | kvm_amd nested pagetable gpu passthrough performance oddities | ||
---|---|---|---|
Product: | Virtualization | Reporter: | efeu (efeu) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | high | CC: | andizhigh, andri, benjamin, bierhumpen, bj, bonzini, clemens, dark.vvorth, dennis, durok, eldiablodivino, en.yaye, ephemient, eric, ethan.palmer, georg.markowitsch, hzzhan9, ian.montgomery, imatiimba, itvend, iwanovich, john, kalinda, manuel, marco_silva85, martin, maspe36, mfadom99, michael.osmolski, patrickett, pavel.kondjukov, pousaduarte, sarnex, sgsdxzy, thbeck84, xrevolver |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.10.8-1 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
efeu
2017-07-18 10:39:59 UTC
I too have this NPT issue with Ryzen 1800x and KVM latest kernel. It's incredible that no-one has managed to resolve this yet. GPU performance is absolutely dire with npt=1 and CPU is so bad with npt=0. So loss/loss situation. Please look into this well documented issue, it makes the new AMD platform very unattractive for advanced virtualization. This is very unfortunate because the nature of Ryzen (many cores, competitive IPC, good value, inofficial ECC support) would be excellent for this. I'm also having this issue. AMD Ryzen R7 1700 Asus Prime B350-Plus Nvidia 970 Host: Arch Linux Kernel 4.11.9 Guest: Windows 10 x64 Adding to the info collected by efeu seems like the bug isn't specific to Ryzen. The first bug report is 9 years old. https://sourceforge.net/p/kvm/bugs/230/ Same problem here. AMD Ryzen R7 1700 Gigabyte GA-AX370-Gaming 5 GT 570 R9 290x Host: Arch Guest: Windows 8.1/10 Confirmed with: AMD Ryzen 5 1600 Gigabyte AB350 Gaming 3 GTX 750 for host GTX 1050 Ti for guest Arch Linux as a host, Windows 10 as a guest. Threadripper just around the corner and noone here knows the answer to the KVM/NPT/AMD issue. Shame really, I think quite a few folk will switch to Intel if this isn't fixed soon. Same issue with kernel 4.10 - 4.12 here. CPU - AMD Ryzen 7 1700 MB - Asus Prime x370 Pro Host GPU - NVidia GTX 750ti VFIO GPU - AMD Radeon RX480 Host OS - Manjaro 17.2 Guest OS - Windows 10 With npt=1, everything is fast, except GPU fps is 10-15 in Skyrim with npt=0, everything is slow, GPU fps is ~30 in Skyrim Same Problems here :( Same problem here. This issue should be a very high priority. It basically destroys GPU passthrough on Ryzen CPUs. Same problem for me :( Ubuntu 17.04 - all mainline kernels up to 4.13-rc6 CPU - AMD Ryzen 7 1700 MB - Asus Prime x370 Pro Host GPU - NVidia GTX 1050ti passthrough GPU - AMD Radeon RX480 Host OS - Ubuntu 17.04 Guest OS - Windows 10 Behaviour confirmed in OpenSUSE Tumbleweed, both in the distribution provided kernels and mainline. Current Kernel Version: 4.12 Tested Kernel Versions: 4.11,4.12 CPU - AMD Ryzen 7 1700 MB - MSI x370 SLI PLUS PRO Host GPU - AMD Radeon RX 560 Guest GPU - NVIDIA GTX 750 ti Host OS - OpenSUSE Tumbleweed Guest OS - Windows 10 LTSB 2016 I have the same problems. Current Kernel Version: 4.12.8 Tested Kernel Versions: 4.10, 4.11, 4.12 CPU - AMD Ryzen 7 1700 MB - ASRock X370 Taichi Host GPU - Nvidia GT1030 Guest GPU - Nvidia GTX 980 Ti Host OS - Fedora 26 Workstation Guest OS - Windows 10 Education Same problem here CPU - AMD Ryzen 7 1700 MB - ASRock X370 Killer SLI Host GPU - AMD Radeon HD6850 Guest GPU - Nvidia GTX 1080 Ti Host OS - Fedora 26 Workstation Guest OS - Windows 10 Pro I am also experiencing the above mentioned issue with very similar performance impact. I am currently not using the virtualization features due to this performance bug with AMD Nested Page Tables. CPU - AMD Ryzen 7 1800x MB - Asus PRIME X370-PRO Host GPU - AMD Radeon 480 Guest GPU - AMD Radeon r290 Host OS - Arch Linux, 4.11.9 kernel (Previously 4.9, 4.10) Guest OS - Windows 10 Pro Anniversary Edition Also the same problem. CPU - AMD Ryzen 7 1700 & TR 1950X MB - Gigabyte Gaming 5 & Asrock X399 Fatality Pro Host GPU - AMD RX 570 & RX 460 Guest GPU - Nvidia 980Ti & 1080ti Host OS - Fedora 26 Workstation Guest OS - Windows 10 Pro Had to reply, affected by the same problem, annoying as hell, because I cant find a solution or how to adapt to this *bug. CPU: R7 1700x MB: X370 Taichi Host GPU: RX 550 Guest GPU: 980 ti Host OS: Gentoo, Arch, Fedora. Guest OS: Windows 10 Pro I'm also having this issue, currently still using my old machine in the meantime. CPU: Ryzen 1700 MB: Gigabyte X370 Gaming 5 Host GPU: Radeon 7870 (PITCAIRN) Guest GPU: Radeon 290X (HAWAII) Host OS: Ubuntu 16.04 (Kernel 4.11.0-14-generic Ubuntu) Guest OS: Windows 10/Windows 8.1/Windows 7 Dealing with the same problem here. CPU - AMD Ryzen 5 1600 Motherboard - Gigabyte AX370 Gaming 5 Host GPU - MSI Nvidia GT710 2GB Guest GPU - Gainward Nvidia GTX 1050Ti 4GB Host OS - Fedora 26, kernel 4.12.8. Guest OS - Microsoft Windows 10 Dealing with the exact same problem here. CPU - AMD Ryzen 7 1700X Motherboard - MSI X370 Gaming Carbon Pro Host GPU - Any Random GPU Guest GPU - MSI GTX 1080 Host OS - Fedora 26, kernel 4.13 Guest OS - Microsoft Windows 10 For a fix, I am publicly offering a bounty expiring 2017/11/01 of 0.02 Bitcoin (~100$ at the current price) redeemable by the person on the respective git commit. Maybe a good starting point is grepping through the code for "npt_enabled" and doing some random guesswork, and/or introducing more performance counter at host or guest to increase visibility. Maybe we can take apart a benchmark and identity the operation that is slow. (In reply to Fruhwirth Clemens from comment #20) > For a fix, I am publicly offering a bounty expiring 2017/11/01 of 0.02 > Bitcoin (~100$ at the current price) redeemable by the person on the > respective git commit. > Another $100 from me. (In reply to Fruhwirth Clemens and Andrey from comment #20-21) >> For a fix, I am publicly offering a bounty expiring 2017/11/01 of 0.02 >> Bitcoin (~100$ at the current price) redeemable by the person on the >> respective git commit. > >Another $100 from me. Another $25 from me. In bitcoin that is, and sorry I couldn't give more. Will update if I can get more. Guys, stop spamming the bug. AMD is looking at it. (In reply to Paolo Bonzini from comment #23) > Guys, stop spamming the bug. AMD is looking at it. Source? @Paolo There is neither an official AMD statemant nor a confirmation on this issue. You have spoken to an AMD developer? What they told you? I mean the only thing beeing of interest is how long it will take to be fixed 1 month?, 1 year?, another 10 years?, never? I mean this issue is not only about GPU passthrough, all PCI-E devices I've tested are affected, those are: -SATA-Storage-/SAS-Raid-Controller-Cards (slow-downs/hangs in transfer) -PCI-E SSDs (slow-downs/hangs in transfer) -Network-Cards (slow-downs/hangs in transfer) -TV-Cards (hangs in (hardware decoded) streams) (In reply to efeu from comment #24) > (In reply to Paolo Bonzini from comment #23) > > Guys, stop spamming the bug. AMD is looking at it. > > Source? I guess this one https://community.amd.com/thread/215931. I saw an id "ray_m" with the AMD logo and the title "Technical Support Engineer" replied "We are researching the issue ..." several months ago. Well this is my thread on the AMD board and I have talked to ray_m directly, he "escalated" the issue, that's it. But ray_m did not receive anything from his developers, so this suggests on "noone is working on this issue" to me. Main concern is it software or hardware side. Some sources tell that Nested page tables performance(AMD) is 10 years old *bug. So they are unable to fix it, someone is being paid by someone not to look in this problem, not a major problem for consumer market? (In reply to sddearjack from comment #28) > Main concern is it software or hardware side. Some sources tell that Nested > page tables performance(AMD) is 10 years old *bug. So they are unable to fix > it, someone is being paid by someone not to look in this problem, not a > major problem for consumer market? Since there doesn't seem to a problem in Xen or ESXi, doesn't that point to a software problem? We already know that this is in kvm_amd module. Came here to give my support for this. Looking to make one box to do it all. I think Ryzen is a perfect choice. I'm going to hold off for now, wait a few months for a solution. If ya'll think it will take longer to find a solution than that I may just go for a typical two processor xenon build if no one can find a solution in my time frame. I also noticed heavy performance loss with my Ryzen 1700X and KVM. Disabling npt completely is no option either, it justs slows down the CPU a ton (which totally makes sense) My specs are: - Ryzen 1700X - ASUS Prime x370 Pro - GPU0: GTX 1050 (host) - GPU1: GTX 1070 (guest) I also experimented with CPU pinning, mounting a whole SSD instead of just using a virtual disk, trying different drivers, settings, etc. Nothing of the above seemed to change anything. With npt=1 some games are running fine (e.g. League of Legends), some are running ok (e.g. Path of Exile) and some are running extremely bad (e.g. Player Unknown's Battlegrounds, GTA 5, CS:GO, etc.) I looked into the code but I'm not able to dive into the project due to the lack of time. This is a major issue in my opinion and AMD + the guys from KVM should look into this together. Intel got it running, too. In addition with Xen it's working perfectly but I really don't like using Xen, I like KVM a lot more. Please fix this issue. If help is needed just contact me, I will when time allows it. Came here to give my support for this. My setup Ryzen 1700X MSI x370 gaming pro carbon GPU0: Asus RX550 GPU1: MSI RX580 Hope this will be fixed soon. Hi all, Just a quick update. A workaround has been found. Please see the following post: https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024823.html Please note that this patch is only a workaround, is incorrect in some cases and cannot be merged upstream in this form. But, it's a start and you can make this change locally to workaround the GPU performance hit. Thanks, Sarnex That is awesome news! Another update, a better patch from Paolo is here: https://marc.info/?l=kvm&m=150891016802546&w=2 Sarnex Paolo has submitted a final patch here: https://patchwork.kernel.org/patch/10027525/ People who put up a bounty should prepare to pay out (to Geoff because Paolo said so) :) Sarnex As promised, I paid out my bounty here: https://blockchain.info/tx/91077ba4e7be1ab303b591efa268f194ab186112caa014576a65a26d69eb1dd2 More payout information for geoff@ is posted here: https://www.reddit.com/r/VFIO/comments/78i3jx/possible_fix_for_the_npt_issue_discussed_on_iommu/doxtwyn/ Thanks again for working on this. |