Bug 218267 - [Sapphire Rapids][Upstream]Boot up multiple Windows VMs hang
Summary: [Sapphire Rapids][Upstream]Boot up multiple Windows VMs hang
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-15 08:23 UTC by guoqiang
Modified: 2024-04-08 17:22 UTC (History)
3 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Boot up 8 Windows VM script (880 bytes, text/plain)
2023-12-15 08:23 UTC, guoqiang
Details

Description guoqiang 2023-12-15 08:23:46 UTC
Created attachment 305601 [details]
Boot up 8 Windows VM script

System Environment
=======

Platform: Sapphire Rapids Platform

Host OS: CentOS Stream 9

Kernel:6.7.0-rc1 (commit:8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1)
Qemu: QEMU emulator version 8.1.94 (v8.2.0-rc4) (commit:039afc5ef7367fbc8fb475580c291c2655e856cb)

Host Kernel cmdline:BOOT_IMAGE=/kvm-vmlinuz root=/dev/mapper/cs_spr--2s2-root ro crashkernel=auto console=tty0 console=ttyS0,115200,8n1 3 intel_iommu=on disable_mtrr_cleanup

Bug detailed description
=======
We boot up 8 Windows VMs (total vCPUs > pCPUs) in host, random run application on each VM such as WPS editing etc, and wait for a moment, then Some of the Windows Guest hang and console reports "KVM internal error. Suberror: 3".

Tips:We add "-cpu host,host-cache-info=on,migratable=on,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff" in qemu parameters and boot up VMs.Some of VMs easy to hang.
 

Reproduce Steps
==============
1.Boot up 8 Windows VMs in Host:

for ((i=1;i<=8;i++));do
qemu-img create -b /home/guoqiang/win2k16_vdi_local.qcow2 -F qcow2 -f qcow2 /home/guoqiang/win2016$i.qcow2

sleep 1

qemu-system-x86_64 -accel kvm -cpu host,host-cache-info=on,migratable=on,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff -smp 30 -drive file=/home/guoqiang/win2016$i.qcow2,if=none,id=virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -m 4096 -daemonize -vnc :$i -device virtio-net-pci,netdev=nic0 -netdev tap,id=nic0,br=virbr0,helper=/usr/local/libexec/qemu-bridge-helper,vhost=on

sleep 5

done

2.Wait a monent and VMs hang.

Host error log:
KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000d83

extra data[3]: 0x0000000000000038

RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000

RSI=0000000000000000 RDI=ffffc58dcf552010 RBP=fffff801ed48e100 RSP=fffff801ed48e060

R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000

R12=000000133fd128fc R13=0000000000000046 R14=0000000000000000 R15=0000000000000000

RIP=fffff801eb94fd7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]

SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]

DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

FS =0053 000000000059b000 00003c00 0040f300 DPL=3 DS [-WA]

GS =002b fffff801ebb3f000 ffffffff 00c0f300 DPL=3 DS [-WA]

LDT=0000 0000000000000000 ffffffff 00c00000

TR =0040 fffff801ed486070 00000067 00008b00 DPL=0 TSS64-busy

GDT= fffff801ed485000 0000006f

IDT= fffff801ed485070 00000fff

CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8

DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

DR6=00000000ffff0ff0 DR7=0000000000000400

EFER=0000000000000d01

Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84

KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000d81

extra data[3]: 0x00000000000000a2

RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000

RSI=0000000000000000 RDI=ffffdf86659d07b0 RBP=ffff96806225b100 RSP=ffff96806225b060

R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000

R12=00000013e153ce49 R13=0000000000000046 R14=0000000000000000 R15=0000000000000000

RIP=fffff8001f1ddd7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]

SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]

DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

FS =0053 0000000000604000 00007c00 0040f300 DPL=3 DS [-WA]

GS =002b ffff968062230000 ffffffff 00c0f300 DPL=3 DS [-WA]

LDT=0000 0000000000000000 ffffffff 00c00000

TR =0040 ffff968062236ac0 00000067 00008b00 DPL=0 TSS64-busy

GDT= ffff96806223db80 0000006f

IDT= ffff96806223dbf0 00000fff

CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8

DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

DR6=00000000fffe07f0 DR7=0000000000000400

EFER=0000000000000d01

Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84

KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000f82

extra data[3]: 0x000000000000004b

KVM internal error. Suberror: 3

extra data[0]: 0x000000008000002f

extra data[1]: 0x0000000000000020

extra data[2]: 0x0000000000000f82

extra data[3]: 0x000000000000004b

RAX=0000000000000000 RBX=0000000000000000 RCX=0000000040000070 RDX=0000000000000000

RSI=0000000000000000 RDI=ffffe7885a932010 RBP=fffff802a5a8e100 RSP=fffff802a5a8e060

R8 =00000000ffffffff R9 =0000000000000000 R10=00000000ffffffff R11=0000000000000000

R12=000000144b0a7258 R13=0000000000000046 R14=0000000000000000 R15=0000000000000000

RIP=fffff802a3f60d7c RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

CS =0010 0000000000000000 00000000 00209b00 DPL=0 CS64 [-RA]

SS =0018 0000000000000000 00000000 00409300 DPL=0 DS [-WA]

DS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]

FS =0053 0000000013b70000 00003c00 0040f300 DPL=3 DS [-WA]

GS =002b fffff802a4150000 ffffffff 00c0f300 DPL=3 DS [-WA]

LDT=0000 0000000000000000 ffffffff 00c00000

TR =0040 fffff802a5a86070 00000067 00008b00 DPL=0 TSS64-busy

GDT= fffff802a5a85000 0000006f

IDT= fffff802a5a85070 00000fff

CR0=80050031 CR2=0000000000000030 CR3=00000000001aa000 CR4=001506f8

DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

DR6=00000000ffff0ff0 DR7=0000000000000400

EFER=0000000000000d01

Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a 58 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84
Comment 1 Sean Christopherson 2023-12-18 17:54:16 UTC
On Fri, Dec 15, 2023, bugzilla-daemon@kernel.org wrote:
> Platform: Sapphire Rapids Platform
> 
> Host OS: CentOS Stream 9
> 
> Kernel:6.7.0-rc1 (commit:8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1)

...

> Qemu: QEMU emulator version 8.1.94 (v8.2.0-rc4)
> (commit:039afc5ef7367fbc8fb475580c291c2655e856cb)
> 
> Host Kernel cmdline:BOOT_IMAGE=/kvm-vmlinuz root=/dev/mapper/cs_spr--2s2-root
> ro crashkernel=auto console=tty0 console=ttyS0,115200,8n1 3 intel_iommu=on
> disable_mtrr_cleanup
> 
> Bug detailed description
> =======
> We boot up 8 Windows VMs (total vCPUs > pCPUs) in host, random run
> application
> on each VM such as WPS editing etc, and wait for a moment, then Some of the
> Windows Guest hang and console reports "KVM internal error. Suberror: 3".

...

> Code=25 88 61 00 00 b9 70 00 00 40 0f ba 32 00 72 06 33 c0 8b d0 <0f> 30 5a
> 58
> 59 c3 cc cc cc cc cc cc 0f 1f 84 00 00 00 00 00 48 81 ec 38 01 00 00 48 8d 84
> 
> KVM internal error. Suberror: 3
> extra data[0]: 0x000000008000002f  <= Vectoring IRQ 47 (decimal)
> extra data[1]: 0x0000000000000020  <= WRMSR VM-Exit
> extra data[2]: 0x0000000000000f82
> extra data[3]: 0x000000000000004b

KVM exits with an internal error because the CPU indicates that IRQ 47 was being
delivered/vectored when the VM-Exit occurred, but the VM-Exit is due to WRMSR.
A WRMSR VM-Exit is supposed to only occur on an instruction boundary, i.e. can't
occur while delivering an IRQ (or any exception/event), and so KVM kicks out to
userspace because something has gone off the rails.

   b9 70 00 00 40          mov    0x40000070, ecx
   0f ba 32 00             btr    0x0, DWORD PTR [rdx]
   72 06                   jb     0x16
   33 c0                   xor    eax,eax
   8b d0                   mov    eax, edx
   0f 30                   wrmsr

FWIW, the MSR in question is Hyper-V's synthetic EOI, a.k.a. HV_X64_MSR_EOI, though
I doubt the exact MSR matters.

Have you tried an older host kernel?  If not can you try something like v6.1?
Note, if you do, use base v6.1, *not* the stable tree in case a bug was backported.

There was a recent change to relevant code, commit 50011c2a2457 ("KVM: VMX: Refresh
available regs and IDT vectoring info before NMI handling"), though I don't see
any obvious bugs.  But I'm pretty sure the only alternative explanation is a
CPU/ucode bug, so it's definitely worth checking older versions of KVM.
Comment 2 yuxiating 2024-03-27 11:59:26 UTC
Do you have any progress on this issue?

I have the same error on Windows 2008R2, but the same virtual machine works fine on an Ice Lake CPU
Comment 3 Chao Gao 2024-04-08 05:21:38 UTC
This is not considered a Linux/KVM issue.

Guoqiang, could you close this ticket?

Yuxiating, I assume you are using APICv and also have "hv-vapic" in qemu cmdline. At this point, you can remove "hv-vapic" to work around this issue. Note that, APICv outperforms Hyper-V's synthetic MSRs; regardless of this bug, it is recommended to remove "hv-vapic" if KVM enables APICv.
Comment 4 Sean Christopherson 2024-04-08 17:22:24 UTC
On Mon, Apr 08, 2024, bugzilla-daemon@kernel.org wrote:
> This is not considered a Linux/KVM issue.

Can you elaborate?  E.g. if this an SPR ucode/CPU bug, it would be nice to know
what's going wrong, so that at the very least we can more easily triage issues.

Note You need to log in before you can comment on or make changes to this bug.