Created attachment 185241 [details] Test program (C99) Amusingly enough, I found this while trying to come up with a minimal test program for #103131. Running ioctl(KVM_CREATE_VCPU) _after_ ioctl(KVM_SET_USER_MEMORY_REGION) with certain address/size combinations may generate a null pointer dereference. dmesg after running the test program: [11557.519426] BUG: unable to handle kernel NULL pointer dereference at 000000000000005f [11557.520561] IP: [<ffffffffa045b2f5>] vmx_fpu_activate+0x5/0x20 [kvm_intel] [11557.521716] PGD 13841a067 PUD 13857c067 PMD 0 [11557.522891] Oops: 0000 [#25] PREEMPT SMP [11557.524073] Modules linked in: [REDACTED] [11557.534572] CPU: 5 PID: 4295 Comm: tcc Tainted: P D O 4.1.5-1-ARCH #1 [11557.536451] Hardware name: [REDACTED] [11557.538361] task: ffff880068425180 ti: ffff880138784000 task.ti: ffff880138784000 [11557.540331] RIP: 0010:[<ffffffffa045b2f5>] [<ffffffffa045b2f5>] vmx_fpu_activate+0x5/0x20 [kvm_intel] [11557.542367] RSP: 0018:ffff880138787da0 EFLAGS: 00010292 [11557.544411] RAX: ffffffffa0476160 RBX: ffffffffffffffef RCX: 0000000000000000 [11557.546476] RDX: 0000000000001f85 RSI: ffff88014b15e8b0 RDI: ffffffffffffffef [11557.548553] RBP: ffff880138787db8 R08: 000000000001e8b0 R09: ffffffffa045cbf3 [11557.550605] R10: ffffea00027eee00 R11: ffff88014b157348 R12: 0000000000000000 [11557.552637] R13: 0000000000000000 R14: 000000000000ae41 R15: 0000000000000000 [11557.554691] FS: 00007fba3936d700(0000) GS:ffff88014b140000(0000) knlGS:0000000000000000 [11557.556796] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11557.558914] CR2: 000000000000005f CR3: 000000013857d000 CR4: 00000000000426e0 [11557.561092] Stack: [11557.563213] ffffffffa03deaf1 0000000000000000 ffff8800a52fc000 ffff880138787e78 [11557.565412] ffffffffa03ca6d8 ffff880138787de8 ffffffff81175b5b ffff88011edffb80 [11557.567650] 0000000000000000 00000000fffbc000 0000000000044000 00007fba39371000 [11557.569906] Call Trace: [11557.572169] [<ffffffffa03deaf1>] ? kvm_arch_vcpu_create+0x51/0x70 [kvm] [11557.574476] [<ffffffffa03ca6d8>] kvm_vm_ioctl+0x1c8/0x7a0 [kvm] [11557.576773] [<ffffffff81175b5b>] ? lru_cache_add_active_or_unevictable+0x2b/0xb0 [11557.579118] [<ffffffff811f4646>] do_vfs_ioctl+0x2c6/0x4d0 [11557.581470] [<ffffffff811f48d1>] SyS_ioctl+0x81/0xa0 [11557.583841] [<ffffffff8158bf2e>] system_call_fastpath+0x12/0x71 [11557.586265] Code: 00 e8 20 bf ff ff 5b 41 5c 5d c3 0f 1f 00 48 8b 05 31 85 fc ff ff 90 b8 00 00 00 eb 87 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <8b> 47 70 85 c0 75 0a 55 48 89 e5 e8 3b ff ff ff 5d f3 c3 0f 1f [11557.592112] RIP [<ffffffffa045b2f5>] vmx_fpu_activate+0x5/0x20 [kvm_intel] [11557.594990] RSP <ffff880138787da0> [11557.597859] CR2: 000000000000005f [11557.600786] ---[ end trace b28b93d27b3449c9 ]--- When I move ioctl(KVM_CREATE_VCPU) immediately below ioctl(KVM_CREATE_VM) there is no oops, but a later KVM_RUN exits with KVM_EXIT_INTERNAL_ERROR, subcode KVM_INTERNAL_ERROR_EMULATION. The crashes also stop when I decrease umr.memory_size below what I specified in the attached test program.
The below commit can fix it. commit 370777daab3f024f1645177039955088e2e9ae73 Author: Radim Krčmář <rkrcmar@redhat.com> Date: Fri Jul 3 15:49:28 2015 +0200 KVM: VMX: fix vmwrite to invalid VMCS fpu_activate is called outside of vcpu_load(), which means it should not touch VMCS, but fpu_activate needs to. Avoid the call by moving it to a point where we know that the guest needs eager FPU and VMCS is loaded. This will get rid of the following trace vmwrite error: reg 6800 value 0 (err 1) [<ffffffff8162035b>] dump_stack+0x19/0x1b [<ffffffffa046c701>] vmwrite_error+0x2c/0x2e [kvm_intel] [<ffffffffa045f26f>] vmcs_writel+0x1f/0x30 [kvm_intel] [<ffffffffa04617e5>] vmx_fpu_activate.part.61+0x45/0xb0 [kvm_intel] [<ffffffffa0461865>] vmx_fpu_activate+0x15/0x20 [kvm_intel] [<ffffffffa0560b91>] kvm_arch_vcpu_create+0x51/0x70 [kvm] [<ffffffffa0548011>] kvm_vm_ioctl+0x1c1/0x760 [kvm] [<ffffffff8118b55a>] ? handle_mm_fault+0x49a/0xec0 [<ffffffff811e47d5>] do_vfs_ioctl+0x2e5/0x4c0 [<ffffffff8127abbe>] ? file_has_perm+0xae/0xc0 [<ffffffff811e4a51>] SyS_ioctl+0xa1/0xc0 [<ffffffff81630949>] system_call_fastpath+0x16/0x1b (Note: we also unconditionally activate FPU in vmx_vcpu_reset(), so the removed code added nothing.) Fixes: c447e76b4cab ("kvm/fpu: Enable eager restore kvm FPU for MPX") Cc: <stable@vger.kernel.org> Reported-by: Vlastimil Holer <vlastimil.holer@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Created attachment 185681 [details] Test program 2 (C99) You mean "can" as in "I think it does" or "it did for me"? And anyway, it seems to only fix the most proximate cause of the crash. My biggest worry is that KVM_SET_USER_MEMORY_REGION ioctls with guest_phys_addr around the 0xfff00000 to 0xffff0000 range seem not to "register"; starting the VM looks like as if the region wasn't placed there. I attach test program 2. Running that on my system with 0x44000 as an argument outputs "halted" (as expected), but 0x45000 and larger multiples of 0x1000 give "internal error, subcode 1".
Created attachment 185691 [details] Test program 2 (C99) [non-oopsing version]
We hit the same issue with kernel 3.18.19. After some debugging, I see that the first test program that felix attached, causes kvm_x86_ops->vcpu_create to return -EEXIST instead of a valid vcpu pointer. As a result, the call to kvm_x86_ops->fpu_activate tries to access an invalid pointer, and causes a NULL pointer dereference. The suggested fix was delivered in kernel 4.2. Although it was tagged as "stable", I don't see that it was backported to earlier kernels. I believe that the fix addresses a different issue, in which the vcpu pointer is valid, but further VMCS write has a problem (this is my understanding). But, of course, this fix will address also the issue that felix reported. Although for the latter, a simpler fix would suffice: --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7012,20 +7012,24 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) { struct kvm_vcpu *vcpu; if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0) printk_once(KERN_WARNING "kvm: SMP vm created on host with unstable TSC; " "guest TSC will not be reliable\n"); vcpu = kvm_x86_ops->vcpu_create(kvm, id); + if (IS_ERR(vcpu)) { + pr_err("kvm_x86_ops->vcpu_create id=%u err=%ld\n", id, PTR_ERR(vcpu)); + return vcpu; + } /* * Activate fpu unconditionally in case the guest needs eager FPU. It will be * deactivated soon if it doesn't. */ kvm_x86_ops->fpu_activate(vcpu); return vcpu; }