Bug 216002

Summary: When a break point is set, nested virtualization sees "kvm_queue_exception: Assertion `!env->exception_has_payload' failed."
Product: Virtualization Reporter: Eric Li (lixiaoyi13691419520)
Component: kvmAssignee: virtualization_kvm
Status: RESOLVED MOVED    
Severity: normal CC: jmattson, lixiaoyi13691419520
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.17.6-200.fc35.x86_64 Subsystem:
Regression: No Bisected commit-id:
Attachments: Archive file that contains 1.img and 2.img

Description Eric Li 2022-05-19 23:53:18 UTC
Created attachment 301001 [details]
Archive file that contains 1.img and 2.img

One configuration that reproduces this bug:
CPU model: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
Host kernel version: 5.17.6-200.fc35.x86_64
Host kernel arch: x86_64
Guest: I am running a microhypervisor called XMHF. It is 32-bits. I am using the microhypervisor to launch a nested guest OS I wrote myself, called LHV.
This bug still exists if using -machine kernel_irqchip=off
It is impossible to test this bug with -accel tcg, because TCG does not support nested virtualization.

How to reproduce:

This bug happens when the guest is debugged. So first start GDB:
gdb --ex 'target remote :::1234' --ex 'hb *0' --ex c
The command above will simply set a break point in the guest. The address of the break point (0 in this case) is arbitrary

Then, in another shell, run QEMU:
qemu-system-i386 -m 512M -gdb tcp::1234 -smp 2 -cpu Haswell,vmx=yes -enable-kvm -serial stdio -drive media=disk,file=1.img,index=1 -drive media=disk,file=2.img,index=2

1.img and 2.img are attached as a.tar.xz in this bug report. If interested, 1.img's source code is https://github.com/lxylxy123456/uberxmhf/tree/a8610d2f9e69263c014b5e48270e42690b73b85d . 2.img's source code is https://github.com/lxylxy123456/uberxmhf/tree/10afe107cbeadb1c4dbe7f9b8e41c2a50c47bda5 .

After running QEMU and GDB above, XMHF and LHV will print a lot of messages in the serial port:

...
CPU #0: vcpu_vaddr_ptr=0x01e06080, esp=0x01e11000
CPU #1: vcpu_vaddr_ptr=0x01e06540, esp=0x01e15000
BSP(0x00): Rallying APs...
BSP(0x00): APs ready, doing DRTM...
LAPIC base and status=0xfee00900
Sending INIT IPI to all APs...

Then I see an assertion error:

qemu-system-i386: ../target/i386/kvm/kvm.c:645: kvm_queue_exception: Assertion `!env->exception_has_payload' failed.

Expected result: KVM should not crash. The behavior should be the same as if only the QEMU runs (i.e. GDB does not run)
Comment 1 Jim Mattson 2022-05-20 02:41:27 UTC
KVM did not crash. Had it crashed, it would have brought your host down with it. The failing assertion is in qemu. Qemu crashed.
Comment 2 Eric Li 2022-05-29 07:24:35 UTC
Thanks Jim. I think this is more likely a QEMU bug. I have filed https://gitlab.com/qemu-project/qemu/-/issues/1045 . I am marking this bug as resolved now.