Bug 213781
Summary: | KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on" | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Like Xu (like.xu.linux) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | blocking | CC: | maximlevitsky |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.19.0-rc1+ | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Like Xu
2021-07-19 10:08:44 UTC
I sadly know exactly why this happens and yes this commit is technically to blame. But the root cause is non atomic memslot updates that qemu does. It will be fixed this way or another I hope. Hi Maxim, Do we have any updates on this issue? Can you help provide more details about "non-atomic memslot update made by qemu" so I can try to fix it? For all practical purposes you can just revert this commit. The fix for root cause is not simple, and I will work on it when I get to it. The issue still exits on the AMD after we revert the commit in 31c25585695a. Just confirmed that it's caused by non-atomic accesses to memslot: - __do_insn_fetch_bytes() from the prot32 code page #NPF; - kvm_vm_ioctl_set_memory_region() from user space; Considering the expected result [selftests::test_zero_memory_regions on x86_64] is that the guest will trigger an internal KVM error due to the initial code fetch encountering a non-existent memslot and resulting in an emulation failure. More similar cases will gradually emerge. I'm not sure if KVM has documentation pointing out this restriction on memslot updates (fix one application QEMU may be one-sided), or any need to add something unwise like check gfn_to_memslot(kvm, gpa_to_gfn(cr2_or_gpa)) in the x86_emulate_instruction(). Any other suggestions ? On Wed, 2022-06-22 at 12:49 +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=213781 > > Like Xu (like.xu.linux@gmail.com) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Kernel Version|5.14.0-rc1+ |5.19.0-rc1+ > > --- Comment #4 from Like Xu (like.xu.linux@gmail.com) --- > The issue still exits on the AMD after we revert the commit in 31c25585695a. > > Just confirmed that it's caused by non-atomic accesses to memslot: > - __do_insn_fetch_bytes() from the prot32 code page #NPF; > - kvm_vm_ioctl_set_memory_region() from user space; > > Considering the expected result [selftests::test_zero_memory_regions on > x86_64] > is that the guest will trigger an internal KVM error due to the initial code > fetch encountering a non-existent memslot and resulting in an emulation > failure. > > More similar cases will gradually emerge. I'm not sure if KVM has > documentation > pointing out this restriction on memslot updates (fix one application QEMU > may > be one-sided), or any need to add something unwise like check > gfn_to_memslot(kvm, gpa_to_gfn(cr2_or_gpa)) in the x86_emulate_instruction(). > > Any other suggestions ? > Yep, agree. This has to be fixed on qemu and kvm level (kvm needs new API to upload atomaically a set of memslot changes (easy part), and the qemu needs code to batch the memslot updates when it does SMM related memslot updates. Best regards, Maxim Levitsky |