Bug 218267

Summary: [Sapphire Rapids][Upstream]Boot up multiple Windows VMs hang
Product: Virtualization Reporter: guoqiang (qiangx.guo)
Component: kvmAssignee: virtualization_kvm
Status: NEW ---    
Severity: high CC: chao.gao, seanjc, yuxiating
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: Boot up 8 Windows VM script

Description guoqiang 2023-12-15 08:23:46 UTC
Created attachment 305601 [details]
Boot up 8 Windows VM script

System Environment
=======

Platform: Sapphire Rapids Platform

Host OS: CentOS Stream 9

Kernel:6.7.0-rc1 (commit:8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1)
Qemu: QEMU emulator version 8.1.94 (v8.2.0-rc4) (commit:039afc5ef7367fbc8fb475580c291c2655e856cb)

Host Kernel cmdline:BOOT_IMAGE=/kvm-vmlinuz root=/dev/mapper/cs_spr--2s2-root ro crashkernel=auto console=tty0 console=ttyS0,115200,8n1 3 intel_iommu=on disable_mtrr_cleanup

Bug detailed description
=======
We boot up 8 Windows VMs (total vCPUs > pCPUs) in host, random run application on each VM such as WPS editing etc, and wait for a moment, then Some of the Windows Guest hang and console reports "KVM internal error. Suberror: 3".

Tips:We add "-cpu host,host-cache-info=on,migratable=on,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff" in qemu parameters and boot up VMs.Some of VMs easy to hang.
 

Reproduce Steps
==============
1.Boot up 8 Windows VMs in Host:

for ((i=1;i<=8;i++));do
qemu-img create -b /home/guoqiang/win2k16_vdi_local.qcow2 -F qcow2 -f qcow2 /home/guoqiang/win2016$i.qcow2

sleep 1

qemu-system-x86_64 -accel kvm -cpu host,host-cache-info=on,migratable=on,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff -smp 30 -drive file=/home/guoqiang/win2016$i.qcow2,if=none,id=virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -m 4096 -daemonize -vnc :$i -device virtio-net-pci,netdev=nic0 -netdev tap,id=nic0,br=virbr0,helper=/usr/local/libexec/qemu-bridge-helper,vhost=on

sleep 5

done

2.Wait a monent and VMs hang.

Host error log:
KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000d83

extra data[3]: 0x0000000000000038

RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000

RSI=0000000000000000 RDI=ffffc58dcf552010 RBP=fffff801ed48e100 RSP=fffff801ed48e060

R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000

R12=000000133fd128fc R13=0000000000000046 R14=0000000000000000 R15=0000000000000000

RIP=fffff801eb94fd7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]

SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]

DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

FS =0053 000000000059b000 00003c00 0040f300 DPL=3 DS [-WA]

GS =002b fffff801ebb3f000 ffffffff 00c0f300 DPL=3 DS [-WA]

LDT=0000 0000000000000000 ffffffff 00c00000

TR =0040 fffff801ed486070 00000067 00008b00 DPL=0 TSS64-busy

GDT= fffff801ed485000 0000006f

IDT= fffff801ed485070 00000fff

CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8

DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

DR6=00000000ffff0ff0 DR7=0000000000000400

EFER=0000000000000d01

Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84

KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000d81

extra data[3]: 0x00000000000000a2

RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000

RSI=0000000000000000 RDI=ffffdf86659d07b0 RBP=ffff96806225b100 RSP=ffff96806225b060

R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000

R12=00000013e153ce49 R13=0000000000000046 R14=0000000000000000 R15=0000000000000000

RIP=fffff8001f1ddd7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]

SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]

DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

FS =0053 0000000000604000 00007c00 0040f300 DPL=3 DS [-WA]

GS =002b ffff968062230000 ffffffff 00c0f300 DPL=3 DS [-WA]

LDT=0000 0000000000000000 ffffffff 00c00000

TR =0040 ffff968062236ac0 00000067 00008b00 DPL=0 TSS64-busy

GDT= ffff96806223db80 0000006f

IDT= ffff96806223dbf0 00000fff

CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8

DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

DR6=00000000fffe07f0 DR7=0000000000000400

EFER=0000000000000d01

Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84

KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000f82

extra data[3]: 0x000000000000004b

KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000f82

extra data[3]: 0x000000000000004b

RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000

RSI=0000000000000000 RDI=ffffe7885a932010 RBP=fffff802a5a8e100 RSP=fffff802a5a8e060

R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000

R12=000000144b0a7258 R13=0000000000000046 R14=0000000000000000 R15=0000000000000000

RIP=fffff802a3f60d7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]

SS =0018 0000000000000000 00000000 00409300 DPL=0 DS [-WA]

DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

FS =0053 0000000013b70000 00003c00 0040f300 DPL=3 DS [-WA]

GS =002b fffff802a4150000 ffffffff 00c0f300 DPL=3 DS [-WA]

LDT=0000 0000000000000000 ffffffff 00c00000

TR =0040 fffff802a5a86070 00000067 00008b00 DPL=0 TSS64-busy

GDT= fffff802a5a85000 0000006f

IDT= fffff802a5a85070 00000fff

CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8

DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

DR6=00000000ffff0ff0 DR7=0000000000000400

EFER=0000000000000d01

Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84
Comment 1 Sean Christopherson 2023-12-18 17:54:16 UTC
On Fri, Dec 15, 2023, bugzilla-daemon@kernel.org wrote:
> Platform: Sapphire Rapids Platform
> 
> Host OS: CentOS Stream 9
> 
> Kernel:6.7.0-rc1 (commit:8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1)

...

> Qemu: QEMU emulator version 8.1.94 (v8.2.0-rc4)
> (commit:039afc5ef7367fbc8fb475580c291c2655e856cb)
> 
> Host Kernel cmdline:BOOT_IMAGE=/kvm-vmlinuz root=/dev/mapper/cs_spr--2s2-root
> ro crashkernel=auto console=tty0 console=ttyS0,115200,8n1 3 intel_iommu=on
> disable_mtrr_cleanup
> 
> Bug detailed description
> =======
> We boot up 8 Windows VMs (total vCPUs > pCPUs) in host, random run
> application
> on each VM such as WPS editing etc, and wait for a moment, then Some of the
> Windows Guest hang and console reports "KVM internal error. Suberror: 3".

...

> Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a
> 58
> 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84
> 
> KVM internal error. Suberror: 3
> extra data[0]: 0x000000008000002f  <= Vectoring IRQ 47 (decimal)
> extra data[1]: 0x0000000000000020  <= WRMSR VM-Exit
> extra data[2]: 0x0000000000000f82
> extra data[3]: 0x000000000000004b

KVM exits with an internal error because the CPU indicates that IRQ 47 was being
delivered/vectored when the VM-Exit occurred, but the VM-Exit is due to WRMSR.
A WRMSR VM-Exit is supposed to only occur on an instruction boundary, i.e. can't
occur while delivering an IRQ (or any exception/event), and so KVM kicks out to
userspace because something has gone off the rails.

   b9 70 00 00 40          mov    0x40000070, ecx
   0f ba 32 00             btr    0x0, DWORD PTR [rdx]
   72 06                   jb     0x16
   33 c0                   xor    eax,eax
   8b d0                   mov    eax, edx
   0f 30                   wrmsr

FWIW, the MSR in question is Hyper-V's synthetic EOI, a.k.a. HV_X64_MSR_EOI, though
I doubt the exact MSR matters.

Have you tried an older host kernel?  If not can you try something like v6.1?
Note, if you do, use base v6.1, *not* the stable tree in case a bug was backported.

There was a recent change to relevant code, commit 50011c2a2457 ("KVM: VMX: Refresh
available regs and IDT vectoring info before NMI handling"), though I don't see
any obvious bugs.  But I'm pretty sure the only alternative explanation is a
CPU/ucode bug, so it's definitely worth checking older versions of KVM.
Comment 2 yuxiating 2024-03-27 11:59:26 UTC
Do you have any progress on this issue?

I have the same error on Windows 2008R2, but the same virtual machine works fine on an Ice Lake CPU
Comment 3 Chao Gao 2024-04-08 05:21:38 UTC
This is not considered a Linux/KVM issue.

Guoqiang, could you close this ticket?

Yuxiating, I assume you are using APICv and also have "hv-vapic" in qemu cmdline. At this point, you can remove "hv-vapic" to work around this issue. Note that, APICv outperforms Hyper-V's synthetic MSRs; regardless of this bug, it is recommended to remove "hv-vapic" if KVM enables APICv.
Comment 4 Sean Christopherson 2024-04-08 17:22:24 UTC
On Mon, Apr 08, 2024, bugzilla-daemon@kernel.org wrote:
> This is not considered a Linux/KVM issue.

Can you elaborate?  E.g. if this an SPR ucode/CPU bug, it would be nice to know
what's going wrong, so that at the very least we can more easily triage issues.