Created attachment 305601 [details] Boot up 8 Windows VM script System Environment ======= Platform: Sapphire Rapids Platform Host OS: CentOS Stream 9 Kernel:6.7.0-rc1 (commit:8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1) Qemu: QEMU emulator version 8.1.94 (v8.2.0-rc4) (commit:039afc5ef7367fbc8fb475580c291c2655e856cb) Host Kernel cmdline:BOOT_IMAGE=/kvm-vmlinuz root=/dev/mapper/cs_spr--2s2-root ro crashkernel=auto console=tty0 console=ttyS0,115200,8n1 3 intel_iommu=on disable_mtrr_cleanup Bug detailed description ======= We boot up 8 Windows VMs (total vCPUs > pCPUs) in host, random run application on each VM such as WPS editing etc, and wait for a moment, then Some of the Windows Guest hang and console reports "KVM internal error. Suberror: 3". Tips:We add "-cpu host,host-cache-info=on,migratable=on,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff" in qemu parameters and boot up VMs.Some of VMs easy to hang. Reproduce Steps ============== 1.Boot up 8 Windows VMs in Host: for ((i=1;i<=8;i++));do qemu-img create -b /home/guoqiang/win2k16_vdi_local.qcow2 -F qcow2 -f qcow2 /home/guoqiang/win2016$i.qcow2 sleep 1 qemu-system-x86_64 -accel kvm -cpu host,host-cache-info=on,migratable=on,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff -smp 30 -drive file=/home/guoqiang/win2016$i.qcow2,if=none,id=virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -m 4096 -daemonize -vnc :$i -device virtio-net-pci,netdev=nic0 -netdev tap,id=nic0,br=virbr0,helper=/usr/local/libexec/qemu-bridge-helper,vhost=on sleep 5 done 2.Wait a monent and VMs hang. Host error log: KVM internal error. Suberror: 3 extra data[0]: 0x000000008000002f extra data[1]: 0x0000000000000020 extra data[2]: 0x0000000000000d83 extra data[3]: 0x0000000000000038 RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000 RSI=0000000000000000 RDI=ffffc58dcf552010 RBP=fffff801ed48e100 RSP=fffff801ed48e060 R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000 R12=000000133fd128fc R13=0000000000000046 R14=0000000000000000 R15=0000000000000000 RIP=fffff801eb94fd7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] FS =0053 000000000059b000 00003c00 0040f300 DPL=3 DS [-WA] GS =002b fffff801ebb3f000 ffffffff 00c0f300 DPL=3 DS [-WA] LDT=0000 0000000000000000 ffffffff 00c00000 TR =0040 fffff801ed486070 00000067 00008b00 DPL=0 TSS64-busy GDT= fffff801ed485000 0000006f IDT= fffff801ed485070 00000fff CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84 KVM internal error. Suberror: 3 extra data[0]: 0x000000008000002f extra data[1]: 0x0000000000000020 extra data[2]: 0x0000000000000d81 extra data[3]: 0x00000000000000a2 RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000 RSI=0000000000000000 RDI=ffffdf86659d07b0 RBP=ffff96806225b100 RSP=ffff96806225b060 R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000 R12=00000013e153ce49 R13=0000000000000046 R14=0000000000000000 R15=0000000000000000 RIP=fffff8001f1ddd7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] FS =0053 0000000000604000 00007c00 0040f300 DPL=3 DS [-WA] GS =002b ffff968062230000 ffffffff 00c0f300 DPL=3 DS [-WA] LDT=0000 0000000000000000 ffffffff 00c00000 TR =0040 ffff968062236ac0 00000067 00008b00 DPL=0 TSS64-busy GDT= ffff96806223db80 0000006f IDT= ffff96806223dbf0 00000fff CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000fffe07f0 DR7=0000000000000400 EFER=0000000000000d01 Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84 KVM internal error. Suberror: 3 extra data[0]: 0x000000008000002f extra data[1]: 0x0000000000000020 extra data[2]: 0x0000000000000f82 extra data[3]: 0x000000000000004b KVM internal error. Suberror: 3 extra data[0]: 0x000000008000002f extra data[1]: 0x0000000000000020 extra data[2]: 0x0000000000000f82 extra data[3]: 0x000000000000004b RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000 RSI=0000000000000000 RDI=ffffe7885a932010 RBP=fffff802a5a8e100 RSP=fffff802a5a8e060 R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000 R12=000000144b0a7258 R13=0000000000000046 R14=0000000000000000 R15=0000000000000000 RIP=fffff802a3f60d7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA] SS =0018 0000000000000000 00000000 00409300 DPL=0 DS [-WA] DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA] FS =0053 0000000013b70000 00003c00 0040f300 DPL=3 DS [-WA] GS =002b fffff802a4150000 ffffffff 00c0f300 DPL=3 DS [-WA] LDT=0000 0000000000000000 ffffffff 00c00000 TR =0040 fffff802a5a86070 00000067 00008b00 DPL=0 TSS64-busy GDT= fffff802a5a85000 0000006f IDT= fffff802a5a85070 00000fff CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84
On Fri, Dec 15, 2023, bugzilla-daemon@kernel.org wrote: > Platform: Sapphire Rapids Platform > > Host OS: CentOS Stream 9 > > Kernel:6.7.0-rc1 (commit:8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1) ... > Qemu: QEMU emulator version 8.1.94 (v8.2.0-rc4) > (commit:039afc5ef7367fbc8fb475580c291c2655e856cb) > > Host Kernel cmdline:BOOT_IMAGE=/kvm-vmlinuz root=/dev/mapper/cs_spr--2s2-root > ro crashkernel=auto console=tty0 console=ttyS0,115200,8n1 3 intel_iommu=on > disable_mtrr_cleanup > > Bug detailed description > ======= > We boot up 8 Windows VMs (total vCPUs > pCPUs) in host, random run > application > on each VM such as WPS editing etc, and wait for a moment, then Some of the > Windows Guest hang and console reports "KVM internal error. Suberror: 3". ... > Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a > 58 > 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84 > > KVM internal error. Suberror: 3 > extra data[0]: 0x000000008000002f <= Vectoring IRQ 47 (decimal) > extra data[1]: 0x0000000000000020 <= WRMSR VM-Exit > extra data[2]: 0x0000000000000f82 > extra data[3]: 0x000000000000004b KVM exits with an internal error because the CPU indicates that IRQ 47 was being delivered/vectored when the VM-Exit occurred, but the VM-Exit is due to WRMSR. A WRMSR VM-Exit is supposed to only occur on an instruction boundary, i.e. can't occur while delivering an IRQ (or any exception/event), and so KVM kicks out to userspace because something has gone off the rails. b9 70 00 00 40 mov 0x40000070, ecx 0f ba 32 00 btr 0x0, DWORD PTR [rdx] 72 06 jb 0x16 33 c0 xor eax,eax 8b d0 mov eax, edx 0f 30 wrmsr FWIW, the MSR in question is Hyper-V's synthetic EOI, a.k.a. HV_X64_MSR_EOI, though I doubt the exact MSR matters. Have you tried an older host kernel? If not can you try something like v6.1? Note, if you do, use base v6.1, *not* the stable tree in case a bug was backported. There was a recent change to relevant code, commit 50011c2a2457 ("KVM: VMX: Refresh available regs and IDT vectoring info before NMI handling"), though I don't see any obvious bugs. But I'm pretty sure the only alternative explanation is a CPU/ucode bug, so it's definitely worth checking older versions of KVM.
Do you have any progress on this issue? I have the same error on Windows 2008R2, but the same virtual machine works fine on an Ice Lake CPU
This is not considered a Linux/KVM issue. Guoqiang, could you close this ticket? Yuxiating, I assume you are using APICv and also have "hv-vapic" in qemu cmdline. At this point, you can remove "hv-vapic" to work around this issue. Note that, APICv outperforms Hyper-V's synthetic MSRs; regardless of this bug, it is recommended to remove "hv-vapic" if KVM enables APICv.
On Mon, Apr 08, 2024, bugzilla-daemon@kernel.org wrote: > This is not considered a Linux/KVM issue. Can you elaborate? E.g. if this an SPR ucode/CPU bug, it would be nice to know what's going wrong, so that at the very least we can more easily triage issues.
(In reply to Chao Gao from comment #3) > Note that, APICv outperforms Hyper-V's synthetic MSRs; regardless of this > bug, it is recommended to remove "hv-vapic" if KVM enables APICv. 'hv-vapic' is a prerequisite for some other Hyper-V features, e.g. Enlightened VMCS so disabling it may not be desired. Also, there's 'hv-apicv' (AKA 'hv-avic') feature which prevents AutoEOI advertisement (can't work with APICv and this KVM inhibits it with APICV_INHIBIT_REASON_HYPERV). AFAIR, newer Windows versions don't use AutoEOI either way but Win8/Win7 may.
On Mon, 2024-04-08 at 17:22 +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=218267 > > --- Comment #4 from Sean Christopherson (seanjc@google.com) --- > On Mon, Apr 08, 2024, bugzilla-daemon@kernel.org wrote: > > This is not considered a Linux/KVM issue. > > Can you elaborate? E.g. if this an SPR ucode/CPU bug, it would be nice to > know > what's going wrong, so that at the very least we can more easily triage > issues. > Hi! Any update on this? We seem to hit this bug as well but so far I don't have new details on what is going on. Best regards, Maxim Levitsky
Hi Maxim, I was told the erratum writeup and microcode fix would be released this month. I just checked the microcode release https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases. The microcode fix hasn't been released yet, but the erratum is already in SPR/EMR specification. e.g., for SPR, SPR141. VM Exit Following MOV to CR8 Instruction May Lead to Unexpected IDT Vectoring-Information Problem: Under certain conditions, a VM exit following execution of the MOV to CR8 instruction may unexpectedly result in setting the Valid bit (bit 31) of the IDTVectoring Information Field in the Virtual Machine Control Structure (VMCS). Implication: Depending on the operation of the virtual-machine monitor (VMM), this may result in unexpected VM behavior. Workaround: It may be possible for the BIOS to contain a workaround for this erratum
Thanks Chao! Until the ucode update is available, I think we can workaround the issue in KVM by clearing VECTORING_INFO_VALID_MASK _immediately_ after exit, i.e. before queueing the event for re-injection, if it should be impossible for the exit to have occurred while vectoring. I'm not sure I want to carry something like this long-term since a ucode fix is imminent, but at the least it can hopefully unblock end users. The below uses a fairly conservative list of exits (a false positive could be quite painful). A slightly less conservative approach would be to also include: case EXIT_REASON_EXTERNAL_INTERRUPT: case EXIT_REASON_TRIPLE_FAULT: case EXIT_REASON_INIT_SIGNAL: case EXIT_REASON_SIPI_SIGNAL: case EXIT_REASON_INTERRUPT_WINDOW: case EXIT_REASON_NMI_WINDOW: as those exits should all be recognized only at instruction boundaries. Compile tested only... --- arch/x86/kvm/vmx/vmx.c | 66 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 893366e53732..7240bd72b5f2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -147,6 +147,9 @@ module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUGO); extern bool __read_mostly allow_smaller_maxphyaddr; module_param(allow_smaller_maxphyaddr, bool, S_IRUGO); +static bool __ro_after_init enable_spr141_erratum_workaround = true; +module_param(enable_spr141_erratum_workaround, bool, S_IRUGO); + #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD) #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR0_NE #define KVM_VM_CR0_ALWAYS_ON \ @@ -7163,8 +7166,67 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, } } +static bool is_vectoring_on_exit_impossible(struct vcpu_vmx *vmx) +{ + switch (vmx->exit_reason.basic) { + case EXIT_REASON_CPUID: + case EXIT_REASON_HLT: + case EXIT_REASON_INVD: + case EXIT_REASON_INVLPG: + case EXIT_REASON_RDPMC: + case EXIT_REASON_RDTSC: + case EXIT_REASON_VMCALL: + case EXIT_REASON_VMCLEAR: + case EXIT_REASON_VMLAUNCH: + case EXIT_REASON_VMPTRLD: + case EXIT_REASON_VMPTRST: + case EXIT_REASON_VMREAD: + case EXIT_REASON_VMRESUME: + case EXIT_REASON_VMWRITE: + case EXIT_REASON_VMOFF: + case EXIT_REASON_VMON: + case EXIT_REASON_CR_ACCESS: + case EXIT_REASON_DR_ACCESS: + case EXIT_REASON_IO_INSTRUCTION: + case EXIT_REASON_MSR_READ: + case EXIT_REASON_MSR_WRITE: + case EXIT_REASON_MSR_LOAD_FAIL: + case EXIT_REASON_MWAIT_INSTRUCTION: + case EXIT_REASON_MONITOR_TRAP_FLAG: + case EXIT_REASON_MONITOR_INSTRUCTION: + case EXIT_REASON_PAUSE_INSTRUCTION: + case EXIT_REASON_TPR_BELOW_THRESHOLD: + case EXIT_REASON_GDTR_IDTR: + case EXIT_REASON_LDTR_TR: + case EXIT_REASON_INVEPT: + case EXIT_REASON_RDTSCP: + case EXIT_REASON_PREEMPTION_TIMER: + case EXIT_REASON_INVVPID: + case EXIT_REASON_WBINVD: + case EXIT_REASON_XSETBV: + case EXIT_REASON_APIC_WRITE: + case EXIT_REASON_RDRAND: + case EXIT_REASON_INVPCID: + case EXIT_REASON_VMFUNC: + case EXIT_REASON_ENCLS: + case EXIT_REASON_RDSEED: + case EXIT_REASON_XSAVES: + case EXIT_REASON_XRSTORS: + case EXIT_REASON_UMWAIT: + case EXIT_REASON_TPAUSE: + return true; + } + + return false; +} + static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { + if ((vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) && + enable_spr141_erratum_workaround && + is_vectoring_on_exit_impossible(vmx)) + vmx->idt_vectoring_info &= ~VECTORING_INFO_VALID_MASK; + __vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info, VM_EXIT_INSTRUCTION_LEN, IDT_VECTORING_ERROR_CODE); @@ -8487,6 +8549,10 @@ __init int vmx_hardware_setup(void) if (!enable_apicv || !cpu_has_vmx_ipiv()) enable_ipiv = false; + if (boot_cpu_data.x86_vfm != INTEL_SAPPHIRERAPIDS_X && + boot_cpu_data.x86_vfm != INTEL_EMERALDRAPIDS_X) + enable_spr141_erratum_workaround = false; + if (cpu_has_vmx_tsc_scaling()) kvm_caps.has_tsc_control = true; base-commit: 50e5669285fc2586c9f946c1d2601451d77cb49e --
On Mon, Dec 16, 2024 at 07:08:13PM +0000, bugzilla-daemon@kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=218267 > >--- Comment #8 from Sean Christopherson (seanjc@google.com) --- >Thanks Chao! > >Until the ucode update is available, I think we can workaround the issue in >KVM >by clearing VECTORING_INFO_VALID_MASK _immediately_ after exit, i.e. before >queueing the event for re-injection, if it should be impossible for the exit >to >have occurred while vectoring. I'm not sure I want to carry something like Yes. I tried a similar workaround (i.e., clearing the "valid" bit only for EXIT_REASON_MSR_WRITE) and our tests showed that it works well. Strictly speaking, this issue also impacts those VM exits which may occur during event delivery. Because they might be reported as occurring during event delivery even if they didn't. KVM won't notice this, and the guest will receive an extra event due to event re-injection. I wrote a kselftest to demonstrate this. Clearing the valid bit works in practice. And there is no ideal software workaround for all cases. Disabling APICv or intercepting MOV-to-CR8 can eliminate the issue, but neither is ideal due to the performance impact. >this long-term since a ucode fix is imminent, but at the least it can >hopefully >unblock end users. > >The below uses a fairly conservative list of exits (a false positive could be >quite painful). A slightly less conservative approach would be to also >include: > >case EXIT_REASON_EXTERNAL_INTERRUPT: We need to include EXTERNAL_INTERRUPT because we observed it in real workloads on affected CPUs. >case EXIT_REASON_TRIPLE_FAULT: >case EXIT_REASON_INIT_SIGNAL: >case EXIT_REASON_SIPI_SIGNAL: >case EXIT_REASON_INTERRUPT_WINDOW: >case EXIT_REASON_NMI_WINDOW: > >as those exits should all be recognized only at instruction boundaries. > >Compile tested only... > >--- ... >@@ -8487,6 +8549,10 @@ __init int vmx_hardware_setup(void) > if (!enable_apicv || !cpu_has_vmx_ipiv()) > enable_ipiv = false; > >+ if (boot_cpu_data.x86_vfm != INTEL_SAPPHIRERAPIDS_X && >+ boot_cpu_data.x86_vfm != INTEL_EMERALDRAPIDS_X) >+ enable_spr141_erratum_workaround = false; RaptorLake has the same issue. https://cdrdv2.intel.com/v1/dl/getContent/740518 >+ > if (cpu_has_vmx_tsc_scaling()) > kvm_caps.has_tsc_control = true; > > >base-commit: 50e5669285fc2586c9f946c1d2601451d77cb49e >-- > >-- >You may reply to this email to add a comment. > >You are receiving this mail because: >You are on the CC list for the bug.
Hi Gao, (In reply to Chao Gao from comment #7) > I just checked the microcode release > https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases. > The microcode fix hasn't been released yet, but the erratum is already in > SPR/EMR specification. e.g., for SPR, > > SPR141. VM Exit Following MOV to CR8 Instruction May Lead to Unexpected > IDT Vectoring-Information > Problem: Under certain conditions, a VM exit following execution of the MOV > to CR8 > instruction may unexpectedly result in setting the Valid bit (bit 31) of the > IDTVectoring Information Field in the Virtual Machine Control Structure > (VMCS). > Implication: Depending on the operation of the virtual-machine monitor > (VMM), this > may result in unexpected VM behavior. > Workaround: It may be possible for the BIOS to contain a workaround for this > erratum Can we resolve this bug with only BIOS updates if a update patch includes a fix for this bug? If so, what is ticket number for a update patch of BIOS? it is SPR141? Thanks, Hidehiko Matsumoto
On Mon, 2024-12-16 at 19:08 +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=218267 > > --- Comment #8 from Sean Christopherson (seanjc@google.com) --- > Thanks Chao! > > Until the ucode update is available, I think we can workaround the issue in > KVM > by clearing VECTORING_INFO_VALID_MASK _immediately_ after exit, i.e. before > queueing the event for re-injection, if it should be impossible for the exit > to > have occurred while vectoring. I'm not sure I want to carry something like > this long-term since a ucode fix is imminent, but at the least it can > hopefully > unblock end users. > > The below uses a fairly conservative list of exits (a false positive could be > quite painful). A slightly less conservative approach would be to also > include: > > case EXIT_REASON_EXTERNAL_INTERRUPT: > case EXIT_REASON_TRIPLE_FAULT: > case EXIT_REASON_INIT_SIGNAL: > case EXIT_REASON_SIPI_SIGNAL: > case EXIT_REASON_INTERRUPT_WINDOW: > case EXIT_REASON_NMI_WINDOW: > > as those exits should all be recognized only at instruction boundaries. > > Compile tested only... > > --- > arch/x86/kvm/vmx/vmx.c | 66 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 66 insertions(+) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 893366e53732..7240bd72b5f2 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -147,6 +147,9 @@ module_param_named(preemption_timer, > enable_preemption_timer, bool, S_IRUGO); > extern bool __read_mostly allow_smaller_maxphyaddr; > module_param(allow_smaller_maxphyaddr, bool, S_IRUGO); > > +static bool __ro_after_init enable_spr141_erratum_workaround = true; > +module_param(enable_spr141_erratum_workaround, bool, S_IRUGO); > + > #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD) > #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR0_NE > #define KVM_VM_CR0_ALWAYS_ON \ > @@ -7163,8 +7166,67 @@ static void __vmx_complete_interrupts(struct kvm_vcpu > *vcpu, > } > } > > +static bool is_vectoring_on_exit_impossible(struct vcpu_vmx *vmx) > +{ > + switch (vmx->exit_reason.basic) { > + case EXIT_REASON_CPUID: > + case EXIT_REASON_HLT: > + case EXIT_REASON_INVD: > + case EXIT_REASON_INVLPG: > + case EXIT_REASON_RDPMC: > + case EXIT_REASON_RDTSC: > + case EXIT_REASON_VMCALL: > + case EXIT_REASON_VMCLEAR: > + case EXIT_REASON_VMLAUNCH: > + case EXIT_REASON_VMPTRLD: > + case EXIT_REASON_VMPTRST: > + case EXIT_REASON_VMREAD: > + case EXIT_REASON_VMRESUME: > + case EXIT_REASON_VMWRITE: > + case EXIT_REASON_VMOFF: > + case EXIT_REASON_VMON: > + case EXIT_REASON_CR_ACCESS: > + case EXIT_REASON_DR_ACCESS: > + case EXIT_REASON_IO_INSTRUCTION: > + case EXIT_REASON_MSR_READ: > + case EXIT_REASON_MSR_WRITE: > + case EXIT_REASON_MSR_LOAD_FAIL: > + case EXIT_REASON_MWAIT_INSTRUCTION: > + case EXIT_REASON_MONITOR_TRAP_FLAG: > + case EXIT_REASON_MONITOR_INSTRUCTION: > + case EXIT_REASON_PAUSE_INSTRUCTION: > + case EXIT_REASON_TPR_BELOW_THRESHOLD: > + case EXIT_REASON_GDTR_IDTR: > + case EXIT_REASON_LDTR_TR: > + case EXIT_REASON_INVEPT: > + case EXIT_REASON_RDTSCP: > + case EXIT_REASON_PREEMPTION_TIMER: > + case EXIT_REASON_INVVPID: > + case EXIT_REASON_WBINVD: > + case EXIT_REASON_XSETBV: > + case EXIT_REASON_APIC_WRITE: > + case EXIT_REASON_RDRAND: > + case EXIT_REASON_INVPCID: > + case EXIT_REASON_VMFUNC: > + case EXIT_REASON_ENCLS: > + case EXIT_REASON_RDSEED: > + case EXIT_REASON_XSAVES: > + case EXIT_REASON_XRSTORS: > + case EXIT_REASON_UMWAIT: > + case EXIT_REASON_TPAUSE: > + return true; > + } > + > + return false; > +} > + > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > { > + if ((vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) && > + enable_spr141_erratum_workaround && > + is_vectoring_on_exit_impossible(vmx)) > + vmx->idt_vectoring_info &= ~VECTORING_INFO_VALID_MASK; > + > __vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info, > VM_EXIT_INSTRUCTION_LEN, > IDT_VECTORING_ERROR_CODE); > @@ -8487,6 +8549,10 @@ __init int vmx_hardware_setup(void) > if (!enable_apicv || !cpu_has_vmx_ipiv()) > enable_ipiv = false; > > + if (boot_cpu_data.x86_vfm != INTEL_SAPPHIRERAPIDS_X && > + boot_cpu_data.x86_vfm != INTEL_EMERALDRAPIDS_X) > + enable_spr141_erratum_workaround = false; > + > if (cpu_has_vmx_tsc_scaling()) > kvm_caps.has_tsc_control = true; > > > base-commit: 50e5669285fc2586c9f946c1d2601451d77cb49e > -- > Do we plan to move forward with this workaround or you think this is adds too much complexity to KVM? Best regards, Maxim Levitsky
I'm not terribly concerned about the complexity, I'm more concerned about the efficacy of a software workaround, and to a lesser extent the risk of doing more harm than good (this seems unlikely though). E.g. if an exit that _can_ occur during vectoring collides with the bug, then KVM will inject a spurious fault into the guest. And if our list of "impossible" exits is wrong, KVM could incorrectly suppress an exception. I suppose we could mitigate the efficacy concerns by emitting a pr_err_once() to suggest a ucode update if the erratum is hit.