I have a virtual machine running the old Windows Server 2003. On kernels 6.1.44 and 6.1.45, the QEMU VNC window stays dark, not switching to any of the guest's video modes and the VM process uses only ~64 MB of RAM of the assigned 2 GB, indefinitely. It's like the VM is paused/halted/stuck before even starting. The process can be killed successfully and then restarted again (with the same result), so it is not deadlocked in kernel or the like.
Kernel 6.1.43 works fine.
I have also tried downgrading CPU microcode from 20230808 to 20230719, but that did not help.
The CPU is AMD Ryzen 5900. I suspect some of the newly added mitigations may be the culprit?
Booting the kernel with "spec_rstack_overflow=off" solves the problem.
(In reply to Roman Mamedov from comment #0)
> I have a virtual machine running the old Windows Server 2003. On kernels
> 6.1.44 and 6.1.45, the QEMU VNC window stays dark, not switching to any of
> the guest's video modes and the VM process uses only ~64 MB of RAM of the
> assigned 2 GB, indefinitely. It's like the VM is paused/halted/stuck before
> even starting. The process can be killed successfully and then restarted
> again (with the same result), so it is not deadlocked in kernel or the like.
> Kernel 6.1.43 works fine.
> I have also tried downgrading CPU microcode from 20230808 to 20230719, but
> that did not help.
> The CPU is AMD Ryzen 5900. I suspect some of the newly added mitigations may
> be the culprit?
Can you do bisection between v6.1.44 and v6.1.45 to find out the specific
mitigation that have this regression?
Unfortunately I am not in a position to easily do bisects.
But as noted above, setting "spec_rstack_overflow=off" is enough to solve it.
Further info, trying with an XP x64 install ISO provided by Microsoft:
With "spec_rstack_overflow=off", it works fine. But in the default state of this new mitigation (which is "safe RET, no microcode" on my machine), the install ISO hangs at the "Setup is starting Windows" message. So if anyone wants to reproduce on their local machine, there is now a quick and legal way to do so.
My QEMU command-line:
kvm -cpu host -m 2048 -machine pc,mem-merge=on,accel=kvm -vnc [::]:24 -device ide-hd,drive=drive0,bus=ide.0 -drive if=none,id=drive0,cache=writeback,aio=threads,format=raw,discard=unmap,detect-zeroes=off,file=xp.img -rtc base=localtime -cdrom xp64ce.iso -boot d
I should add that when a VM is in this stuck state, the CPU load by QEMU process is 0% (not 100%).
And I am not sure why the default mitigation state says "no microcode", as I use a 2023-08-08 updated microcode package from Debian.
# dmesg | grep microcode
[ 0.401618] Speculative Return Stack Overflow: IBPB-extending microcode not applied!
[ 0.401618] Speculative Return Stack Overflow: Mitigation: safe RET, no microcode
[ 1.051941] microcode: CPU0: patch_level=0x0a201016
[ 1.051947] microcode: CPU1: patch_level=0x0a201016
[ 1.051953] microcode: CPU2: patch_level=0x0a201016
[ 1.051960] microcode: CPU3: patch_level=0x0a201016
[ 1.051967] microcode: CPU4: patch_level=0x0a201016
[ 1.051973] microcode: CPU5: patch_level=0x0a201016
[ 1.051981] microcode: CPU6: patch_level=0x0a201016
[ 1.051989] microcode: CPU7: patch_level=0x0a201016
[ 1.051996] microcode: CPU8: patch_level=0x0a201016
[ 1.052003] microcode: CPU9: patch_level=0x0a201016
[ 1.052010] microcode: CPU10: patch_level=0x0a201016
[ 1.052018] microcode: CPU11: patch_level=0x0a201016
[ 1.052024] microcode: CPU12: patch_level=0x0a201016
[ 1.052030] microcode: CPU13: patch_level=0x0a201016
[ 1.052036] microcode: CPU14: patch_level=0x0a201016
[ 1.052041] microcode: CPU15: patch_level=0x0a201016
[ 1.052046] microcode: CPU16: patch_level=0x0a201016
[ 1.052052] microcode: CPU17: patch_level=0x0a201016
[ 1.052058] microcode: CPU18: patch_level=0x0a201016
[ 1.052064] microcode: CPU19: patch_level=0x0a201016
[ 1.052070] microcode: CPU20: patch_level=0x0a201016
[ 1.052076] microcode: CPU21: patch_level=0x0a201016
[ 1.052082] microcode: CPU22: patch_level=0x0a201016
[ 1.052088] microcode: CPU23: patch_level=0x0a201016
[ 1.052092] microcode: Microcode Update Driver: v2.2.
Borislav, as you are author of the patch adding Speculative RAS Overflow mitigation, could you maybe take a look what could be wrong here? Thanks
Windows XP-era 64-bit guest VMs in KVM no longer work with it enabled.
Windows 7 (and likely newer) does work.
As pointed out by Vitaly, this is probably the guest RFLAGS corruption bug[*], especially since it's XP specific (more likely to trigger emulation). The fix should make its way to Linus' tree this week, and hopefully to stable kernels shortly thereafter. Though if you can manually apply and test the fix before then, that would be very helpful.
Indeed, this patch appears to fix it. I built 6.1.46 with it added, and the
issue is no longer present. Thanks!