Created attachment 301050 [details] Guest hypervisor to reproduce this bug (xz compressed) CPU model I am running: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz Host kernel version: 5.17.8-200.fc35.x86_64 Host kernel arch: x86_64 Guest: a hypervisor I wrote myself, 32-bits, compressed and attached as c.img.xz. QEMU command line: qemu-system-x86_64 -m 512M -smp 1 -cpu Haswell,vmx=yes -enable-kvm -serial stdio -drive media=disk,file=c.img,index=1 The problem does not go away if using -machine kernel_irqchip=off Since the guest is a hypervisor, -accel tcg cannot be used (TCG does not support nested virtualization) Actual behavior: serial port shows: ... CR0 = 0x0000000080000015 CR0 fixed0 = 0x0000000080000021 CR0 fixed1 = 0x00000000ffffffff VMXON succeeds Expected behaivor: serial port shows: ... CR0 = 0x0000000080000015 CR0 fixed0 = 0x0000000080000021 CR0 fixed1 = 0x00000000ffffffff [00]: unhandled exception 13 (0xd), halting! [00]: error code: 0x00000000 [00]: state dump follows... [00] CS:EIP ... ... Explanation: When the guest hypervisor starts VMX using the VMXON instruction, the guest hypervisor's CR0 is not legal. IA32_VMX_CR0_FIXED0 = 0x0000000080000021. The 0x20 bit in this MSR is 1, which indicates that the 0x20 bit in CR0 must be 1 when executing VMXON. However, my hypervisor uses CR0 = 0x80000015 (the 0x20 bit is 0). According to SDM 29.3, if "the values of CR0 and CR4 are not supported in VMX operation", then a general protection exception (#GP(0)) should be raised. This happens on real hardware, but not on KVM. The relevant code in my hypervisor is: https://github.com/lxylxy123456/uberxmhf/blob/770bdaa7afce560b9f46348bee5a05e2c680de06/xmhf/src/xmhf-core/xmhf-runtime/xmhf-startup/lhv-vmx.c#L250 The pseudo code is print the hypervisor's CR0 to serial port print MSR value IA32_VMX_CR0_FIXED0 to serial port print MSR value IA32_VMX_CR0_FIXED1 to serial port sleep for 3 seconds run VMXON instruction If succeed, write "VMXON succeeds" to serial port. If VMXON receives an exception, write exception details to serial port ("[00]: unhandled exception...") To fix this bug, handle_vmon() in arch/x86/kvm/vmx/nested.c needs to be updated. The check to CR0 and CR4 against IA32_VMX_CR0_FIXED0 etc need to be added.
Ugh, KVM is comically wrong. It _deliberately_ avoids checking CR0/CR4 with a comment saying that "most faulting conditions have already been checked by hardware", but the SDM pseudocode makes it very clear that only the (CR0.PE = 0) or (CR4.VMXE = 0) or (RFLAGS.VM = 1) or (IA32_EFER.LMA = 1 and CS.L = 0) checks are performed before the VM-Exit occurs. /* * The Intel VMX Instruction Reference lists a bunch of bits that are * prerequisite to running VMXON, most notably cr4.VMXE must be set to * 1 (see vmx_is_valid_cr4() for when we allow the guest to set this). * Otherwise, we should fail with #UD. But most faulting conditions * have already been checked by hardware, prior to the VM-exit for * VMXON. We do test guest cr4.VMXE because processor CR4 always has * that bit set to 1 in non-root mode. */ if (!kvm_read_cr4_bits(vcpu, X86_CR4_VMXE)) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; }
@Sean Christopherson Thanks for submitting the fix to this bug in https://lore.kernel.org/lkml/20220607213604.3346000-4-seanjc@google.com/ . However, I recently tested this fix and the behavior is not as expected. According to Intel's SDM, VMXON may generate 2 types of exceptions: IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ... THEN #UD; ELSIF not in VMX operation THEN IF (CPL > 0) or (in A20M mode) or (the values of CR0 and CR4 are not supported in VMX operation ... THEN #GP(0); For example, when CR4 value is incorrect, different exceptions may be generated depending on which bit is incorrect. If CR4.VMXE = 0, #UD should be generated. Otherwise, #GP(0) should be generated. However, after the fix, #UD is always generated.
On Fri, Sep 02, 2022, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216033 > > --- Comment #2 from Eric Li (ercli@ucdavis.edu) --- > @Sean Christopherson Thanks for submitting the fix to this bug in > https://lore.kernel.org/lkml/20220607213604.3346000-4-seanjc@google.com/ . > However, I recently tested this fix and the behavior is not as expected. > > According to Intel's SDM, VMXON may generate 2 types of exceptions: > > IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ... > THEN #UD; > ELSIF not in VMX operation > THEN > IF (CPL > 0) or (in A20M mode) or > (the values of CR0 and CR4 are not supported in VMX operation ... > THEN #GP(0); > > For example, when CR4 value is incorrect, different exceptions may be > generated > depending on which bit is incorrect. If CR4.VMXE = 0, #UD should be > generated. > Otherwise, #GP(0) should be generated. However, after the fix, #UD is always > generated. /facepalm All that and I overlooked that the other CR0/CR4 checks take a #GP. On the bright side, it does mean I can blame Jim at least a little bit for commit 70f3aac964ae ("kvm: nVMX: Remove superfluous VMX instruction fault checks"). Untested, but this should do the trick. --- arch/x86/kvm/vmx/nested.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ddd4367d4826..86ee2ab8a497 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4936,25 +4936,32 @@ static int handle_vmxon(struct kvm_vcpu *vcpu) | FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX; /* - * Note, KVM cannot rely on hardware to perform the CR0/CR4 #UD checks - * that have higher priority than VM-Exit (see Intel SDM's pseudocode - * for VMXON), as KVM must load valid CR0/CR4 values into hardware while - * running the guest, i.e. KVM needs to check the _guest_ values. + * Note, KVM cannot rely on hardware to perform the CR0.PE and CR4.VMXE + * #UD checks that have higher priority than VM-Exit (see Intel SDM's + * pseudocode for VMXON), as KVM must load valid CR0/CR4 values into + * hardware while running the guest, i.e. KVM needs to check the _guest_ + * values. * * Rely on hardware for the other two pre-VM-Exit checks, !VM86 and * !COMPATIBILITY modes. KVM may run the guest in VM86 to emulate Real * Mode, but KVM will never take the guest out of those modes. */ + if (!kvm_read_cr4_bits(vcpu, X86_CR4_VMXE) || + !kvm_read_cr0_bits(vcpu, X86_CR0_PE)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + /* + * All other checks that are lower priority than VM-Exit must be + * checked manually, including the other CR0/CR4 reserved bit checks. + */ if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; } - /* - * CPL=0 and all other checks that are lower priority than VM-Exit must - * be checked manually. - */ if (vmx_get_cpl(vcpu)) { kvm_inject_gp(vcpu, 0); return 1; base-commit: 476d5fb78ea6438941559af4814a2795849cb8f0
> if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || > !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { > kvm_queue_exception(vcpu, UD_VECTOR); > return 1; > } Thanks for the reply. I think there is still a typo. Do you mean the following? if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { kvm_queue_exception_e(vcpu, GP_VECTOR, 0); return 1; } Or maybe: if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { kvm_inject_gp(vcpu, 0); return 1; } I am not familiar with KVM code so not sure which one should be used. Thanks again!
On Fri, Sep 02, 2022, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216033 > > --- Comment #4 from Eric Li (ercli@ucdavis.edu) --- > > if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || > > !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { > > kvm_queue_exception(vcpu, UD_VECTOR); > > return 1; > > } > > Thanks for the reply. I think there is still a typo. Do you mean the > following? Yes, yes I did. > if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || > !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { > kvm_queue_exception_e(vcpu, GP_VECTOR, 0); > return 1; > } > > Or maybe: > > if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || > !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { > kvm_inject_gp(vcpu, 0); > return 1; > } > > I am not familiar with KVM code Heh, for all the good that being familiar with KVM is doing me. > so not sure which one should be used. Thanks again!