Bug 217307 - windows guest entering boot loop when nested virtualization enabled and hyperv installed
Summary: windows guest entering boot loop when nested virtualization enabled and hyper...
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-06 10:39 UTC by Michał Zegan
Modified: 2023-07-10 19:50 UTC (History)
4 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
This is a qemu command line for the vm which experiences the problem (7.49 KB, text/plain)
2023-04-06 10:58 UTC, Michał Zegan
Details
This is host's cpuinfo which shows the host cpu model and information. (34.17 KB, text/plain)
2023-04-06 11:00 UTC, Michał Zegan
Details
This is a partial kvm trace for the vm's reboot, i set it to shutdown after reboot. (57 bytes, text/plain)
2023-04-06 11:14 UTC, Michał Zegan
Details

Description Michał Zegan 2023-04-06 10:39:27 UTC
Environment:
My host is fedora 37, currently running linux 6.2.5, but this is present for a long time.
CPU is intel alderlake, core i7 12700h.
The os has kvm_intel module loaded with nested=y parameter, so nested virtualization is enabled.
The guest vm is a q35 x86_64 vm with cpu=host set, running on qemu version 7.0.0, accel=kvm, smm, uefi, secureboot and tpm enabled.
Command line for qemu is attached to the bug.
VM is running a windows 11 pro 64 bit os.

What happens is that the moment I install any HyperV features on the windows11 os and then reboot, it does not boot again.
Basically it self reboots once and goes into recovery. Because I am blind I cannot really say whether it shows some blue screen of death before rebooting, but I actually don't think so.
The only thing i can do to make it work is to disable nested virtualization. There is no known workaround which leaves it enabled, unless i disable the vmx cpu feature, but that's not what I want to achieve, my goal is mostly to run/test wsl2 or to play with docker.
Comment 1 Michał Zegan 2023-04-06 10:58:36 UTC
Created attachment 304091 [details]
This is a qemu command line for the vm which experiences the problem
Comment 2 Michał Zegan 2023-04-06 11:00:34 UTC
Created attachment 304092 [details]
This is host's cpuinfo which shows the host cpu model and information.
Comment 3 Michał Zegan 2023-04-06 11:14:09 UTC
Created attachment 304093 [details]
This is a partial kvm trace for the vm's reboot, i set it to shutdown after reboot.
Comment 4 Sean Christopherson 2023-05-22 22:14:58 UTC
There isn't much to go on in the trace.  The guest is "voluntarily" rebooting by writing I/O port 0xcf9, e.g. it's not a triple fault shutdown due to KVM injecting an exception that the guest doesn't expect.

My best (but nearly blind) guess would be that Windows expects functionality to exist, e.g. is querying CPUID and MSRs to enumerate platform features, and goes into recovery mode when the expected feature(s) aren't found.  But that's very much a wild guess. 
 Unfortunately, trace_kvm_exit doesn't provide guest GPRs, so it's impossible to glean information from the CPUID, RDMSR, and WRMSR exits, e.g. to see what Windows appears to be doing.

The easiest way to debug this probably to get the guest into a debugger, even a rudimentary one like QEMU's interactive monitor.  That would hopefully provide some insight into why Windows decides to reboot.
Comment 5 Michał Zegan 2023-05-23 07:38:53 UTC
Hello.
It's sad that it's not that easy to figure out what the ... is going on here.
The problem with using qemu debugger is that it probably? doesn't really allow me to break anywhere. but even with a non qemu debugger, I am unsure how would I debug that usefully, that is stop at the right moment to capture anything useful...? of course the best would be if someone with necessary skills could repro it.
Comment 6 Michał Zegan 2023-06-25 19:12:23 UTC
ping? can anyone help me to at least push this forward a little? this is really an annoying bug and I would at least gather some info if I knew what to look for.
Comment 7 vkuznets 2023-06-27 08:28:19 UTC
Assuming this is not a KVM/QEMU regression, I'd suggest to explore two options:
1) Change "-cpu host" to a named CPU model. I don't see "alderlake" CPU models in QEMU so I'd start with something like "Skylake-Client-v4". Remove all other CPU options you have, like "rtm=off,mpx=off,host-cache-info=on,l3-cache=off". Try to find the exact CPU option which breaks things. There were similar but reversed (works with '-cpu host', doesn't work with a named model) issues in the past, e.g. https://lore.kernel.org/qemu-devel/20220308113445.859669-21-pbonzini@redhat.com/
2) Try disabling certain Hyper-V enlightenments, start with "hv-evmcs". In theory, things should work (slowly, but still) without any Hyper-V enlightenments.
Comment 8 Michał Zegan 2023-06-27 16:39:36 UTC
Hello,
the problem is, nothing helps.
Currently I have the following setup changed in reference to my previous config (qemu cmdline below):
- set cpu to Skylake-Client-noTSX-IBRS (note i also tried nehalem or qemu64 randomly and nothing worked, including qemu64 not even booting at all but unsure why).
- actually removed/commented out all devices except the ones I need like sound/video/disks.
- Disabled things like smm, secureboot and everything else, just in case.
- Also disabled any and all enlightenments i could find.

The effect is all the same (note I enable vmx feature in cpu settings, if not enabled then the system boots without it).

Command line is:

/usr/bin/qemu-system-x86_64 \
-name guest=win11,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-22-win11/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win11_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-7.0,usb=off,smm=off,kernel_irqchip=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-accel kvm \
-cpu Skylake-Client-noTSX-IBRS,mpx=off,vmx=on,kvm-pv-unhalt=off,kvm-pv-ipi=off,pmu=off \
-m 8192 \
-object '{"qom-type":"memory-backend-memfd","id":"pc.ram","share":true,"x-use-canonical-path-for-ramblock-id":false,"size":8589934592}' \
-overcommit mem-lock=off \
-smp 8,sockets=1,dies=1,cores=8,threads=1 \
-uuid 589e17db-9ea9-49ac-8a66-c75bbc39ddd3 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=29,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=localtime,clock=vm,driftfix=slew \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-pci-bridge","id":"pci.8","bus":"pci.1","addr":"0x0"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":9,"id":"pci.9","bus":"pcie.0","addr":"0x2.0x7"}' \
-device '{"driver":"pcie-root-port","port":24,"chassis":10,"id":"pci.10","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \
-device '{"driver":"pcie-root-port","port":25,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x1"}' \
-device '{"driver":"qemu-xhci","id":"usb","bus":"pci.2","addr":"0x0"}' \
-device '{"driver":"virtio-scsi-pci","iommu_platform":true,"packed":true,"id":"scsi0","num_queues":8,"bus":"pci.4","addr":"0x0"}' \
-blockdev '{"driver":"host_device","filename":"/dev/pool/win11","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"raw","file":"libvirt-2-storage"}' \
-device '{"driver":"scsi-hd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":0,"device_id":"drive-scsi0-0-0-0","drive":"libvirt-2-format","id":"scsi0-0-0-0","bootindex":1}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/cdroms/virtio-win-0.1.225.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \
-device '{"driver":"ide-cd","bus":"ide.1","drive":"libvirt-1-format","id":"sata0-0-1"}' \
-object '{"qom-type":"input-linux","id":"input0","evdev":"/dev/input/by-id/usb-MOSART_Semi._2.4G_INPUT_DEVICE-event-kbd","repeat":true,"grab_all":true,"grab-toggle":"ctrl-ctrl"}' \
-object '{"qom-type":"input-linux","id":"input1","evdev":"/dev/input/by-path/platform-i8042-serio-0-event-kbd","repeat":true,"grab_all":true,"grab-toggle":"ctrl-ctrl"}' \
-object '{"qom-type":"input-linux","id":"input2","evdev":"/dev/input/by-path/pci-0000:00:15.0-platform-i2c_designware.0-event-mouse"}' \
-object '{"qom-type":"input-linux","id":"input3","evdev":"/dev/input/by-path/platform-i8042-serio-1-event-mouse"}' \
-audiodev '{"id":"audio1","driver":"spice"}' \
-spice port=0,disable-ticketing=on,seamless-migration=on \
-device '{"driver":"qxl-vga","id":"video0","max_outputs":1,"ram_size":67108864,"vram_size":67108864,"vram64_size_mb":0,"vgamem_mb":16,"bus":"pcie.0","addr":"0x1"}' \
-device '{"driver":"ich9-intel-hda","id":"sound0","bus":"pcie.0","addr":"0x1b"}' \
-device '{"driver":"hda-duplex","id":"sound0-codec0","bus":"sound0.0","cad":0,"audiodev":"audio1"}' \
-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.3","addr":"0x0"}' \
-device '{"driver":"vmcoreinfo"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
Comment 9 Michał Zegan 2023-06-27 18:20:23 UTC
correction: it was kvm64 which didn't work at all.
qemu64 worked when disabling svm, but actually despite the feature being enabled it insists on having virtualization disabled. investigating further.
Comment 10 Michał Zegan 2023-06-27 21:37:02 UTC
Okay, i have done more tests.
So generally host-passthrough does not work, broadwell-notsx-ibrs, skylake-notsx-ibrs and also nehalem do not work, kvm64 doesn't work, qemu64 works.
However, qemu64, even when I enable vmx and disable svm, boots without hyperv. Windows shows that second level adress translation  is disabled.

I tried to use libvirt's xml files and even qemu sources to see difference between qemu64 and broadwell-notsx-ibrs features, then enabled them all, the effect is exactly the same. I mean cpuid features.

For these tests, everything like devices, secureboot, tpm, hyperv enlightenments, were enabled.
Comment 11 Michał Zegan 2023-07-01 09:44:41 UTC
just fyi, there are other people with this issue.
https://forums.unraid.net/topic/131838-windows-11-virtual-machine-platform-wsl2-boot-loop/
https://www.reddit.com/r/VFIO/comments/xxe8ud/hyperv_making_vm_bootloop_on_i712700k/
the common thing is that all of them have intel core 12'th gen or later. so almost likely anyone with 12'th gen or 13'th gen intel host should be able to repro this.
I have also tried things like described in one of these posts, changing cpu features, changing machine model from q35, disabling secureboot/tpm/other devices, would try to install in legacy bios mode but win11 probably won't make it easy so not likely to be able to do this.
Comment 12 Prob1d 2023-07-05 14:57:26 UTC
Maybe just disable the sgx feature.
I use virt-manager by myself, so I just added <feature policy="disable" name="sgx"/> to the cpu session.
Comment 13 Michał Zegan 2023-07-05 15:05:18 UTC
unfortunately that does not help. also, 12'th intel client cpus don't have sgx.
Comment 14 Prob1d 2023-07-05 15:15:51 UTC
  <cpu mode="host-passthrough" check="none" migratable="on">
    <topology sockets="1" dies="1" cores="4" threads="1"/>
    <feature policy="require" name="sgxlc"/>
    <feature policy="require" name="intel-pt"/>
    <feature policy="require" name="ibrs-all"/>
    <feature policy="require" name="dtes64"/>
    <feature policy="require" name="monitor"/>
    <feature policy="require" name="ds_cpl"/>
    <feature policy="require" name="vmx"/>
    <feature policy="require" name="smx"/>
    <feature policy="require" name="est"/>
    <feature policy="require" name="tm2"/>
    <feature policy="require" name="xtpr"/>
    <feature policy="require" name="pdcm"/>
    <feature policy="require" name="ssbd"/>
    <feature policy="require" name="ibpb"/>
    <feature policy="require" name="stibp"/>
    <feature policy="require" name="tsc_adjust"/>
    <feature policy="disable" name="sgx"/>
    <feature policy="require" name="avx2"/>
    <feature policy="require" name="clflushopt"/>
    <feature policy="require" name="xsaves"/>
    <feature policy="require" name="md-clear"/>
  </cpu>
That is my cpu config. Maybe there are something that also needs.
Comment 15 Michał Zegan 2023-07-05 15:18:22 UTC
what is your host's cpu? from what I read from other posts on forums/etc, only it seems that this affects 12th gen intel cpus and above.
Comment 16 Prob1d 2023-07-05 15:23:46 UTC
i7 9700 actually. Before switch to host mode, the config below also works for me.
<cpu mode='custom' match='exact' check='partial'>
 <model fallback='allow'>Skylake-Client-noTSX-IBRS</model>
 <feature policy='require' name='hypervisor'/>
 <feature policy='require' name='vmx'/>
</cpu>
Comment 17 Michał Zegan 2023-07-05 15:27:37 UTC
well, no matter what problems you had before with nested virtualization, according to some forum posts I was reading lately it seems that my problem is something that exclusively affects 12th gen and above. and no toggling of features help, except features which would just disable nested virtualization. sep, vme, vmx features when disabled make vm boot without hypervisor. Any other combination results in boot loop, and i was even crazy enough to both disable and enable them one by one. nothing comes close to working.
Comment 18 Sean Christopherson 2023-07-10 19:47:04 UTC
> it seems that my problem is something that exclusively affects 12th gen and
> above

Can you try running a single vCPU VM, and pin that vCPU to a pCPU on a P-Core?  If this is indeed specific to 12th gen CPUs, then my guess is that hybrid CPUs are to blame.  E.g. KVM already disables vPMU (commit 4d7404e5ee00 "KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)"), I wouldn't be at all surprised if there are more problems lurking.
Comment 19 Michał Zegan 2023-07-10 19:49:47 UTC
to be honest, i have tried with no luck, even though someone suggested that also works. it might be i've done it wrong, but i was checking virsh vcpuinfo to confirm assignment. someone said using one core should work, it didn't. tried pinning to cpu0 which is one of my p-core's threads, and no.
Comment 20 Michał Zegan 2023-07-10 19:50:58 UTC
when i have a chance i might re-test, but for now it's a no.

Note You need to log in before you can comment on or make changes to this bug.