Created attachment 307382 [details] dmesg Device: Asus Zephyrus GA402RJ CPU: Ryzen 7 6800HS GPU: RX 6700S Kernel: 6.13.0-rc3-g8faabc041a00 Problem: Launching games or gpu bench-marking tools in qemu windows 11 vm will cause screen artifacts, ultimately qemu will pause with unrecoverable error. Commit: f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101 is the first bad commit commit f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101 Author: Alex Williamson <alex.williamson@redhat.com> Date: Mon Aug 26 16:43:53 2024 -0400 vfio/pci: implement huge_fault support With the addition of pfnmap support in vmf_insert_pfn_{pmd,pud}() we can take advantage of PMD and PUD faults to PCI BAR mmaps and create more efficient mappings. PCI BARs are always a power of two and will typically get at least PMD alignment without userspace even trying. Userspace alignment for PUD mappings is also not too difficult. Consolidate faults through a single handler with a new wrapper for standard single page faults. The pre-faulting behavior of commit d71a989cf5d9 ("vfio/pci: Insert full vma on mmap'd MMIO fault") is removed in this refactoring since huge_fault will cover the bulk of the faults and results in more efficient page table usage. We also want to avoid that pre-faulted single page mappings preempt huge page mappings. Link: https://lkml.kernel.org/r/20240826204353.2228736-20-peterx@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> drivers/vfio/pci/vfio_pci_core.c | 60 ++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 17 deletions(-)
Created attachment 307424 [details] Host dmesg, 6.13-rc2 stock, 16G BAR Kernel 6.13-rc2 (MANJARO stock), 16G BAR for 6700XT's VRAM. A kernel log opening a Linux VM with the GPU passed through. In this case, the VM fails to init the GPU. vfio-pci-core has its new huge_fault handler. Created with `echo "file vfio_pci_core.c +p">/sys/kernel/debug/dynamic_debug/control` See mailing list thread: https://lore.kernel.org/regressions/20241222223604.GA3735586@bhelgaas/
Created attachment 307425 [details] Host dmesg, 6.13-rc2 vfio-pci-core without huge_fault, 16G BAR Kernel 6.13-rc2 (MANJARO, patched vfio-pci-core), 16G BAR for 6700XT's VRAM. A kernel log opening a Linux VM with the GPU passed through. In this case, the VM successfully inits the GPU. Patched vfio-pci-core only to remove its new huge_fault handler. Created with `echo "file vfio_pci_core.c +p">/sys/kernel/debug/dynamic_debug/control` See mailing list thread: https://lore.kernel.org/regressions/20241222223604.GA3735586@bhelgaas/
Created attachment 307429 [details] Host dmesgs, 6.13-rc2, QEMU 9.1.2/9.2.0, vfio-pci-core huge_fault PUD/PMD/both Kernel log excerpts, opening a Linux VM with the GPU passed through. Kernel 6.13-rc2 (MANJARO), 16G BAR for 6700XT's VRAM. QEMU 9.1.2 vs. QEMU 9.2.0. vfio-pci-core huge_fault support set to either PUD only ('no2Mpages') / PMD only ('no1Gpages') / both ('stock') using the patches by Alex https://lore.kernel.org/regressions/20241230182737.154cd33a.alex.williamson@redhat.com/ . Configurations where the guest fails to initialize the GPU: QEMU 9.1.2 'stock'/'no2Mpages'; Working configurations: QEMU 9.1.2 'no1Gpages', QEMU 9.2.0 all Created with `echo "file vfio_pci_core.c +p">/sys/kernel/debug/dynamic_debug/control`.
Created attachment 307432 [details] Host dmesgs, 6.12.4/6.13-rc2, QEMU 9.1.2/9.2.0, vfio-pci-core huge_fault alignment patch Kernel log excerpts, opening a Linux VM with the GPU passed through. Kernels 6.12.4, 6.13-rc2 (MANJARO), 16G BAR for 6700XT's VRAM. QEMU 9.1.2, additionally QEMU 9.2.0 for 6.12.4. Logs are with the vfio-pci-core patch by Alex https://lore.kernel.org/regressions/20241231090733.5cc5504a.alex.williamson@redhat.com/ . All configurations work as expected, with QEMU 9.2.0 getting the 1G mappings (as before) and 9.1.2 now falling back to 2M. Created with `echo "file vfio_pci_core.c +p">/sys/kernel/debug/dynamic_debug/control`.
Created attachment 307444 [details] Host dmesgs, 6.12.4, QEMU 9.1.2/9.2.0, vfio-pci-core submitted huge_fault alignment patch Kernel log excerpts, opening a Linux VM with the GPU passed through. Kernel 6.12.4 (MANJARO, mostly stock), 16G BAR for 6700XT's VRAM. QEMU 9.1.2 and QEMU 9.2.0. Logs are with the submitted vfio-pci-core patch by Alex Williamson https://lore.kernel.org/lkml/2025010322-overblown-symptom-d4cd@gregkh/T/#t . All configurations work as expected, with QEMU 9.2.0 getting the 1G mappings (as before) and 9.1.2 falling back to 2M (which it didn't before, causing the bug). Created with `echo "file vfio_pci_core.c +p">/sys/kernel/debug/dynamic_debug/control`.