Bug 219085
Summary: | kvm_spurious_fault in L1 when running a nested kvm instance on AMD Opteron_G5_qemu L0 | ||
---|---|---|---|
Product: | Virtualization | Reporter: | ununpta |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | RESOLVED INVALID | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 6.10.0 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
ununpta
2024-07-22 18:50:22 UTC
Command I used on L0 AMD Ryzen: qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu Opteron_G5,check,+svm -hda c:\debian.qcow2 It's reproducible in 100% cases On Mon, Jul 22, 2024, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=219085 > > --- Comment #1 from ununpta@mailto.plus --- > Command I used on L0 AMD Ryzen: > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu This is likely an issue in the L0 hypervisor, which in this case is Hyper-V. KVM (L1) hits a #GP when trying to enable EFER.SVME, which leads to the #UD on VMSAVE (SVM isn't enabled). [ 355.714362] unchecked MSR access error: WRMSR to 0xc0000080 (tried to write 0x0000000000001d01) at rIP: 0xffffffff9228a274 (native_write_msr+0x4/0x20) Do you you see the same behavior on other kernel (L1) version? Have you changed any other components (especially in L0)? > Opteron_G5,check,+svm -hda c:\debian.qcow2 > > It's reproducible in 100% cases > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are watching the assignee of the bug. > Do you you see the same behavior on other kernel (L1) version? Have you > changed any other components (especially in L0)? Thank you for your help. What I tried: * Opened Hyper-V manager built in Windows and created Ubuntu 22.04 LTS available by default. * Opened PowerShell console and ran `Set-VMProcessor -VMName "Ubuntu 22.04 LTS" -ExposeVirtualizationExtensions $true` to allow Nested Virtualization in Hyper-V. I have to notice, though, that even without `ExposeVirtualizationExtensions $true`, KVM inside Hyper-V manager didn't crash as it did in qemu. Bash just printed a warning that nested virtualization is restricted. * Booted into "Ubuntu 22.04 LTS", installed qemu and `qemu-system-x86_64 -accel kvm` was successfull - BIOS was shown up. Default kernel was vmlinuz-5.15.0-27-generic - After qemu launch, only kvm-related messages were: [2.485820] kvm: Nested Virtualization enabled [2.485822] SVM: kvm: Nested Paging enabled [2.485823] SVM: kvm: Hyper-V enlightened NPT TLB flush enabled [2.485824] SVM: kvm: Hyper-V Direct TLB flush enabled [2.485828] SVM: Virtual VMLOAD VMSAVE supported Then I recompiled latest kernel and installed it with the same successful KVM-accelerated qemu BIOS boot. vmlinuz-6.10.0 - After qemu launch, only kvm-related messages are: [1.701988] kvm_amd: TSC scaling supported [1.701992] kvm_amd: Nested Virtualization enabled [1.701993] kvm_amd: Nested Paging enabled [1.701996] kvm_amd: kvm_amd: Hyper-V enlightened NPT TLB flush enabled [1.701997] kvm_amd: kvm_amd: Hyper-V Direct TLB flush enabled [1.701999] kvm_amd: Virtual VMLOAD VMSAVE supported [1.702000] kvm_amd: PMU virtualization is disabled I have to guess how to allow `Set-VMProcessor -VMName "Ubuntu 22.04 LTS" -ExposeVirtualizationExtensions $true` for third-party software, not only for machines created by Hyper-V manager. Maybe Qemu has to be run under admin priveleges as well. I also saw a claim from Peter Maydell, qemu developer, who had said this about qemu command line parameter `-cpu _processor_type_`: > using a specific cpu type will only work with KVM if the host CPU really is > that exact CPU type, otherwise, use "-cpu host" or "-cpu max". > This is a restriction in the kernel's KVM handling, and not something that > can be worked around in the QEMU side. Per https://gitlab.com/qemu-project/qemu/-/issues/239 I was somewhat confused by this claim because > --- Comment #1 from ununpta@mailto.plus --- > Command I used on L0 AMD Ryzen: > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu > Opteron_G5 Let me ask you a few questions. Q1: Can one use an older cpu (but still supporting SVM), not the actual bare one in qemu command line for nested virtualization or KVM will crash due to restriction in the kernel's KVM handling? Q2: Is there a command in bare Kernel/KVM console to figure out if EFER.SVME register/bit is writeable? If not, Q3: Can you recommend any package to figure out it? On Tue, Jul 23, 2024, bugzilla-daemon@kernel.org wrote: > I also saw a claim from Peter Maydell, qemu developer, who had said this > about > qemu command line parameter `-cpu _processor_type_`: > > using a specific cpu type will only work with KVM if the host CPU really is > > that exact CPU type, otherwise, use "-cpu host" or "-cpu max". This generally isn't true. KVM is very capable of running older vCPU models on newer hardware. What won't work (at least, not well) is cross-vendor virtualization, i.e. advertising AMD on Intel and vice versa, but that's not what you're doing. > > This is a restriction in the kernel's KVM handling, and not something that > > can be worked around in the QEMU side. > Per https://gitlab.com/qemu-project/qemu/-/issues/239 > > I was somewhat confused by this claim because > > --- Comment #1 from ununpta@mailto.plus --- > > Command I used on L0 AMD Ryzen: > > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu > > Opteron_G5 > > Let me ask you a few questions. > Q1: Can one use an older cpu (but still supporting SVM), not the actual bare > one in qemu command line for nested virtualization or KVM will crash due to > restriction in the kernel's KVM handling? Yes. There might be caveats, but AFAIK, QEMU's predefined vCPU models should always work. If it doesn't work, and you have decent evidence that it's a KVM problem, definitely feel free to file a KVM bug. > Q2: Is there a command in bare Kernel/KVM console to figure out if EFER.SVME > register/bit is writeable? If not, grep -q svm /proc/cpuinfo SVM can be disabled by firmware via MSR_VM_CR (0xc0010114) even if SVM is reported in raw CPUID, but the kernel accounts for that and clears the "svm" flag from the CPU data that's reported in /proc/cpuinfo. > Q3: Can you recommend any package to figure out it? Sorry, I don't follow this question. Sean, after looking into AMD documentation on https://unix.stackexchange.com/questions/74376 I think it's clear why KVM in L1 crashes. AMD says: > Secure Virtual Machine Enable (SVME) Bit. Bit 12, read/write. Enables the SVM > extensions. When this bit is zero, the SVM instructions cause #UD exceptions. > EFER.SVME defaults to a reset value of zero. > The effect of turning off EFER.SVME while a guest is running is undefined; > therefore, the VMM should always prevent guests from writing EFER. > SVM extensions can be disabled by setting VM_CR.SVME_DISABLE. Command to read from EFER.SVME is `sudo rdmsr 0xC0000080 #EFER`. Both in non-working and working machines this command returns d01. d01 is 1101 0000 0001 in bin. Crashing command from Comment #1 did `WRMSR to 0xc0000080 (tried to write 0x0000000000001d01)`. 1d01 is 0001 1101 0000 0001 in bin. The leftmost 0001 is Bit 12. So crashing command in L1 tries to write Bit 12 to exclude #UD. Nested VM is impossible without Bit 12. Writing this bit needs 0ring priveleges, guests cannot do this but the VM manager can. VM manager hooks into the write operation, checks whether VM_CR.SVME_DISABLE == 0 and if true, sets the Bit 12 by itself with L0 priveleges, then returns success to the guest. This is what happens on Windows if KVM L1 runs on the top of native Windows Hyper-V manager L0. Qemu on windows does not hook into write command and guest tries to write the Bit with user privileges, which of course fails. Questions are: * How Does Processor determine who tries to write - L0 or L1? * Does KVM determine in its code source whether KVM itself runs on the top of Hyper-V or on the top of another KVM? * Should Qemu hook into WRMSR to 0xc0000080 (tried to write 0x0000000000001d01) coming from KVM if Qemu is accelerated by Hyper-V on L0 and KVM is L1? > Sorry, I don't follow this question. I figured out that the commands I had tried to describe turned out `sudo rdmsr 0xC0000080 #EFER` and `sudo rdmsr 0xC0010114 #VM_CR`. The package is called msr-tools :) Closed as invalid since it is a qemu bug. |