Bug 217307
Summary: | windows guest entering boot loop when nested virtualization enabled and hyperv installed | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Michał Zegan (webczat) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | high | CC: | 780553323, seanjc, vkuznets, xuli |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
This is a qemu command line for the vm which experiences the problem
This is host's cpuinfo which shows the host cpu model and information. This is a partial kvm trace for the vm's reboot, i set it to shutdown after reboot. |
Description
Michał Zegan
2023-04-06 10:39:27 UTC
Created attachment 304091 [details]
This is a qemu command line for the vm which experiences the problem
Created attachment 304092 [details]
This is host's cpuinfo which shows the host cpu model and information.
Created attachment 304093 [details]
This is a partial kvm trace for the vm's reboot, i set it to shutdown after reboot.
There isn't much to go on in the trace. The guest is "voluntarily" rebooting by writing I/O port 0xcf9, e.g. it's not a triple fault shutdown due to KVM injecting an exception that the guest doesn't expect. My best (but nearly blind) guess would be that Windows expects functionality to exist, e.g. is querying CPUID and MSRs to enumerate platform features, and goes into recovery mode when the expected feature(s) aren't found. But that's very much a wild guess. Unfortunately, trace_kvm_exit doesn't provide guest GPRs, so it's impossible to glean information from the CPUID, RDMSR, and WRMSR exits, e.g. to see what Windows appears to be doing. The easiest way to debug this probably to get the guest into a debugger, even a rudimentary one like QEMU's interactive monitor. That would hopefully provide some insight into why Windows decides to reboot. Hello. It's sad that it's not that easy to figure out what the ... is going on here. The problem with using qemu debugger is that it probably? doesn't really allow me to break anywhere. but even with a non qemu debugger, I am unsure how would I debug that usefully, that is stop at the right moment to capture anything useful...? of course the best would be if someone with necessary skills could repro it. ping? can anyone help me to at least push this forward a little? this is really an annoying bug and I would at least gather some info if I knew what to look for. Assuming this is not a KVM/QEMU regression, I'd suggest to explore two options: 1) Change "-cpu host" to a named CPU model. I don't see "alderlake" CPU models in QEMU so I'd start with something like "Skylake-Client-v4". Remove all other CPU options you have, like "rtm=off,mpx=off,host-cache-info=on,l3-cache=off". Try to find the exact CPU option which breaks things. There were similar but reversed (works with '-cpu host', doesn't work with a named model) issues in the past, e.g. https://lore.kernel.org/qemu-devel/20220308113445.859669-21-pbonzini@redhat.com/ 2) Try disabling certain Hyper-V enlightenments, start with "hv-evmcs". In theory, things should work (slowly, but still) without any Hyper-V enlightenments. Hello, the problem is, nothing helps. Currently I have the following setup changed in reference to my previous config (qemu cmdline below): - set cpu to Skylake-Client-noTSX-IBRS (note i also tried nehalem or qemu64 randomly and nothing worked, including qemu64 not even booting at all but unsure why). - actually removed/commented out all devices except the ones I need like sound/video/disks. - Disabled things like smm, secureboot and everything else, just in case. - Also disabled any and all enlightenments i could find. The effect is all the same (note I enable vmx feature in cpu settings, if not enabled then the system boots without it). Command line is: /usr/bin/qemu-system-x86_64 \ -name guest=win11,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-22-win11/master-key.aes"}' \ -blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win11_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \ -machine pc-q35-7.0,usb=off,smm=off,kernel_irqchip=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \ -accel kvm \ -cpu Skylake-Client-noTSX-IBRS,mpx=off,vmx=on,kvm-pv-unhalt=off,kvm-pv-ipi=off,pmu=off \ -m 8192 \ -object '{"qom-type":"memory-backend-memfd","id":"pc.ram","share":true,"x-use-canonical-path-for-ramblock-id":false,"size":8589934592}' \ -overcommit mem-lock=off \ -smp 8,sockets=1,dies=1,cores=8,threads=1 \ -uuid 589e17db-9ea9-49ac-8a66-c75bbc39ddd3 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=29,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=localtime,clock=vm,driftfix=slew \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \ -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \ -device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \ -device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \ -device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \ -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \ -device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \ -device '{"driver":"pcie-pci-bridge","id":"pci.8","bus":"pci.1","addr":"0x0"}' \ -device '{"driver":"pcie-root-port","port":23,"chassis":9,"id":"pci.9","bus":"pcie.0","addr":"0x2.0x7"}' \ -device '{"driver":"pcie-root-port","port":24,"chassis":10,"id":"pci.10","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \ -device '{"driver":"pcie-root-port","port":25,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x1"}' \ -device '{"driver":"qemu-xhci","id":"usb","bus":"pci.2","addr":"0x0"}' \ -device '{"driver":"virtio-scsi-pci","iommu_platform":true,"packed":true,"id":"scsi0","num_queues":8,"bus":"pci.4","addr":"0x0"}' \ -blockdev '{"driver":"host_device","filename":"/dev/pool/win11","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"raw","file":"libvirt-2-storage"}' \ -device '{"driver":"scsi-hd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":0,"device_id":"drive-scsi0-0-0-0","drive":"libvirt-2-format","id":"scsi0-0-0-0","bootindex":1}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/cdroms/virtio-win-0.1.225.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \ -device '{"driver":"ide-cd","bus":"ide.1","drive":"libvirt-1-format","id":"sata0-0-1"}' \ -object '{"qom-type":"input-linux","id":"input0","evdev":"/dev/input/by-id/usb-MOSART_Semi._2.4G_INPUT_DEVICE-event-kbd","repeat":true,"grab_all":true,"grab-toggle":"ctrl-ctrl"}' \ -object '{"qom-type":"input-linux","id":"input1","evdev":"/dev/input/by-path/platform-i8042-serio-0-event-kbd","repeat":true,"grab_all":true,"grab-toggle":"ctrl-ctrl"}' \ -object '{"qom-type":"input-linux","id":"input2","evdev":"/dev/input/by-path/pci-0000:00:15.0-platform-i2c_designware.0-event-mouse"}' \ -object '{"qom-type":"input-linux","id":"input3","evdev":"/dev/input/by-path/platform-i8042-serio-1-event-mouse"}' \ -audiodev '{"id":"audio1","driver":"spice"}' \ -spice port=0,disable-ticketing=on,seamless-migration=on \ -device '{"driver":"qxl-vga","id":"video0","max_outputs":1,"ram_size":67108864,"vram_size":67108864,"vram64_size_mb":0,"vgamem_mb":16,"bus":"pcie.0","addr":"0x1"}' \ -device '{"driver":"ich9-intel-hda","id":"sound0","bus":"pcie.0","addr":"0x1b"}' \ -device '{"driver":"hda-duplex","id":"sound0-codec0","bus":"sound0.0","cad":0,"audiodev":"audio1"}' \ -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.3","addr":"0x0"}' \ -device '{"driver":"vmcoreinfo"}' \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on correction: it was kvm64 which didn't work at all. qemu64 worked when disabling svm, but actually despite the feature being enabled it insists on having virtualization disabled. investigating further. Okay, i have done more tests. So generally host-passthrough does not work, broadwell-notsx-ibrs, skylake-notsx-ibrs and also nehalem do not work, kvm64 doesn't work, qemu64 works. However, qemu64, even when I enable vmx and disable svm, boots without hyperv. Windows shows that second level adress translation is disabled. I tried to use libvirt's xml files and even qemu sources to see difference between qemu64 and broadwell-notsx-ibrs features, then enabled them all, the effect is exactly the same. I mean cpuid features. For these tests, everything like devices, secureboot, tpm, hyperv enlightenments, were enabled. just fyi, there are other people with this issue. https://forums.unraid.net/topic/131838-windows-11-virtual-machine-platform-wsl2-boot-loop/ https://www.reddit.com/r/VFIO/comments/xxe8ud/hyperv_making_vm_bootloop_on_i712700k/ the common thing is that all of them have intel core 12'th gen or later. so almost likely anyone with 12'th gen or 13'th gen intel host should be able to repro this. I have also tried things like described in one of these posts, changing cpu features, changing machine model from q35, disabling secureboot/tpm/other devices, would try to install in legacy bios mode but win11 probably won't make it easy so not likely to be able to do this. Maybe just disable the sgx feature. I use virt-manager by myself, so I just added <feature policy="disable" name="sgx"/> to the cpu session. unfortunately that does not help. also, 12'th intel client cpus don't have sgx. <cpu mode="host-passthrough" check="none" migratable="on"> <topology sockets="1" dies="1" cores="4" threads="1"/> <feature policy="require" name="sgxlc"/> <feature policy="require" name="intel-pt"/> <feature policy="require" name="ibrs-all"/> <feature policy="require" name="dtes64"/> <feature policy="require" name="monitor"/> <feature policy="require" name="ds_cpl"/> <feature policy="require" name="vmx"/> <feature policy="require" name="smx"/> <feature policy="require" name="est"/> <feature policy="require" name="tm2"/> <feature policy="require" name="xtpr"/> <feature policy="require" name="pdcm"/> <feature policy="require" name="ssbd"/> <feature policy="require" name="ibpb"/> <feature policy="require" name="stibp"/> <feature policy="require" name="tsc_adjust"/> <feature policy="disable" name="sgx"/> <feature policy="require" name="avx2"/> <feature policy="require" name="clflushopt"/> <feature policy="require" name="xsaves"/> <feature policy="require" name="md-clear"/> </cpu> That is my cpu config. Maybe there are something that also needs. what is your host's cpu? from what I read from other posts on forums/etc, only it seems that this affects 12th gen intel cpus and above. i7 9700 actually. Before switch to host mode, the config below also works for me. <cpu mode='custom' match='exact' check='partial'> <model fallback='allow'>Skylake-Client-noTSX-IBRS</model> <feature policy='require' name='hypervisor'/> <feature policy='require' name='vmx'/> </cpu> well, no matter what problems you had before with nested virtualization, according to some forum posts I was reading lately it seems that my problem is something that exclusively affects 12th gen and above. and no toggling of features help, except features which would just disable nested virtualization. sep, vme, vmx features when disabled make vm boot without hypervisor. Any other combination results in boot loop, and i was even crazy enough to both disable and enable them one by one. nothing comes close to working. > it seems that my problem is something that exclusively affects 12th gen and
> above
Can you try running a single vCPU VM, and pin that vCPU to a pCPU on a P-Core? If this is indeed specific to 12th gen CPUs, then my guess is that hybrid CPUs are to blame. E.g. KVM already disables vPMU (commit 4d7404e5ee00 "KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)"), I wouldn't be at all surprised if there are more problems lurking.
to be honest, i have tried with no luck, even though someone suggested that also works. it might be i've done it wrong, but i was checking virsh vcpuinfo to confirm assignment. someone said using one core should work, it didn't. tried pinning to cpu0 which is one of my p-core's threads, and no. when i have a chance i might re-test, but for now it's a no. |