Bug 107921 - Regression with KVM virtual machine using VFIO for PCI passthrough.
Summary: Regression with KVM virtual machine using VFIO for PCI passthrough.
Status: RESOLVED DUPLICATE of bug 107561
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-15 19:38 UTC by Jasen Borisov
Modified: 2015-11-22 23:51 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel config for my 4.1.13 test build. (107.54 KB, text/plain)
2015-11-15 19:39 UTC, Jasen Borisov
Details
Kernel config for my 4.2.6 test build. (109.68 KB, text/plain)
2015-11-15 19:39 UTC, Jasen Borisov
Details
Kernel config for my 4.3.0 test build. (111.12 KB, text/plain)
2015-11-15 19:40 UTC, Jasen Borisov
Details
`dmesg` output from 4.1.13. (87.40 KB, text/plain)
2015-11-15 19:40 UTC, Jasen Borisov
Details
`dmesg` output from 4.2.6 (93.11 KB, text/plain)
2015-11-15 19:41 UTC, Jasen Borisov
Details
`dmesg` output from 4.3.0 (93.23 KB, text/plain)
2015-11-15 19:42 UTC, Jasen Borisov
Details
libvirt virtual machine domain configuration (5.37 KB, application/xml)
2015-11-15 19:42 UTC, Jasen Borisov
Details
`lspci -vv` output from the host system (76.72 KB, text/plain)
2015-11-15 19:43 UTC, Jasen Borisov
Details

Description Jasen Borisov 2015-11-15 19:38:41 UTC
KVM-based virtual machine using host PCIe devices via VFIO hangs in kernels 4.2 and later. Works correctly (guest boots and is fully usable) in 4.1.x kernels. The hang happens soon after the virtual machine is started: during the TianoCore firmware boot splash, before the guest OS even starts booting.

I have a KVM virtual machine which uses three of my host PCIe devices via VFIO: my graphics card, its audio component, and an XHCI USB controller.

The virtual machine works flawlessly with the 4.1.x kernel series. However, when I upgraded to 4.2, the virtual machine started to hang during boot, becoming unusable.

I tried removing various combinations of VFIO PCI devices from my virtual machine configuration (only passing through the XHCI controller, only the graphics card, etc), and the problem persisted, so I know it is not caused by any one particular device I am trying to pass through.

The guest OS in the virtual machine is Windows 10, but that should not make any difference with regards to this bug, as the hang happens earlier during the VM startup (before Windows even starts booting).

I am using pure UEFI firmware (no CSM) on both the host and the guest, so there is no VGA involved anywhere, and the graphics card is passed through as a regular PCIe device.

I am using Gentoo Linux, and have tried this on kernels 4.1.13, 4.2.6, and 4.3.0 built from mainline sources. 4.1.13 is the only one of the above versions which works correctly. I created this virtual machine back when I was using an earlier 4.1.x revision, and it worked there, too. I initially noticed this issue with 4.2.0 back when it was first released (which is why I stayed back with 4.1.x), and I am sorry for not reporting it earlier.

Relevant hardware on my system:
- Intel i7 Extreme 5960X CPU
- ASRock X99M Extreme4 motherboard
- 32GB DDR4 2666MHz RAM
- NVIDIA GeForce GTX 980 graphics card (that I am passing through to the VM)
- AMD Radeon r7 250 (for graphics on the host)
- Generic USB3 (XHCI) PCIe expansion card (based on VIA hardware according to `lspci`)

I have attached: 1) kernel configs for each kernel version above, 2) `lspci -vv` output, 3) my libvirt virtual machine configuration, 4) `dmesg` output from each kernel version, taken after attempting to start the virtual machine straight after booting the system.

My kernel commandline (same on each kernel version tried):
"resume=/dev/nvme0n1 resume_offset=6681926 rw root=/dev/nvme0n1 rootfstype=f2fs intel_iommu=on iommu=pt vfio-pci.ids=10de:13c0,10de:0fbb,1106:3483 vfio-pci.disable_vga=1 kvm.ignore_msrs=1 hugepages=6000"

I am booting my kernel with EFISTUB; no separate bootloader present.

My QEMU version is 2.4.1, and my libvirt version is 1.2.21.

This is my first kernel bug report, so please let me know if there is any additional information I should provide to help diagnose/resolve the issue.
Comment 1 Jasen Borisov 2015-11-15 19:39:15 UTC
Created attachment 193051 [details]
Kernel config for my 4.1.13 test build.
Comment 2 Jasen Borisov 2015-11-15 19:39:46 UTC
Created attachment 193061 [details]
Kernel config for my 4.2.6 test build.
Comment 3 Jasen Borisov 2015-11-15 19:40:19 UTC
Created attachment 193071 [details]
Kernel config for my 4.3.0 test build.
Comment 4 Jasen Borisov 2015-11-15 19:40:58 UTC
Created attachment 193081 [details]
`dmesg` output from 4.1.13.
Comment 5 Jasen Borisov 2015-11-15 19:41:44 UTC
Created attachment 193091 [details]
`dmesg` output from 4.2.6
Comment 6 Jasen Borisov 2015-11-15 19:42:13 UTC
Created attachment 193101 [details]
`dmesg` output from 4.3.0
Comment 7 Jasen Borisov 2015-11-15 19:42:57 UTC
Created attachment 193111 [details]
libvirt virtual machine domain configuration
Comment 8 Jasen Borisov 2015-11-15 19:43:22 UTC
Created attachment 193121 [details]
`lspci -vv` output from the host system
Comment 9 Paolo Bonzini 2015-11-22 23:51:03 UTC
UEFI issues with kernels 4.2+ have been fixed.  The remaining issues are a duplicate.

*** This bug has been marked as a duplicate of bug 107561 ***

Note You need to log in before you can comment on or make changes to this bug.