Created attachment 107077 [details] serial log of L1 guest Environment: ------------ Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Linux kvm.git next Commit:bf640876e21fe603f7f52b0c27d66b7716da0384 qemu-kvm uq/master Commit:0779caeb1a17f4d3ed14e2925b36ba09b084fb7b Host Kernel Version:3.11.0-rc1 Hardware: SNB-EP Bug detailed description: -------------------------- create L1 guest with "-cpu host", then create a L2 guest, L1 will call trace and L2 can't boot up. note: 1. create L1 guest with "-cpu qemu64,+vmx", L2 guest works fine. 2. This should be a kvm bug. kvm + qemu-kvm = result bf640876 + 0779caeb = bad 6d128e1e + 0779caeb = good the first bad commit is: commit 21feb4eb64e21f8dc91136b91ee886b978ce6421 Author: Arthur Chunqi Li <yzt356@gmail.com> Date: Mon Jul 15 16:04:08 2013 +0800 KVM: nVMX: Set segment infomation of L1 when L2 exits Reproduce steps: ---------------- 1. create L1 guest: qemu-system-x86_64 -enable-kvm -m 4G -smp 4 -net nic,macaddr=00:12:46:09:13:56 -net tap,script=/etc/kvm/qemu-ifup nested-kvm.qcow 2. create L2 guest: qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -net none rhel6u4.img Current result: ---------------- L1 call trace; L2 can't boot up. Expected result: ---------------- L1 and L2 work fine Basic root-causing log: (in L1 guest) ---------------------- [ 94.585378] BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 [ 94.586002] IP: [<ffffffffa010bb26>] write_segment_descriptor+0x66/0xa0 [kvm] [ 94.586002] PGD 0 [ 94.586002] Oops: 0000 [#1] SMP [ 94.586002] Modules linked in: fuse nfsv3 nfs_acl nfsv4 auth_rpcgss nfs fscache dns_resolver lockd sunrpc 8021q garp stp llc binfmt_misc uinput ppdev parport_pc parport kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr e1000 i2c_piix4 floppy(F) cirrus(F) ttm(F) drm_kms_helper(F) drm(F) i2c_core(F) [ 94.586002] CPU 0 [ 94.586002] Pid: 2132, comm: qemu-system-x86 Tainted: GF 3.8.5 #4 Bochs Bochs [ 94.586002] RIP: 0010:[<ffffffffa010bb26>] [<ffffffffa010bb26>] write_segment_descriptor+0x66/0xa0 [kvm] [ 94.586002] RSP: 0018:ffff880118d2bac8 EFLAGS: 00010246 [ 94.586002] RAX: 0000000000000000 RBX: ffff880106a79540 RCX: 0000000000000000 [ 94.586002] RDX: 0000000000001000 RSI: 0000000000000009 RDI: 00000000000000a0 [ 94.586002] RBP: ffff880118d2baf8 R08: 0000000000000008 R09: 00000000000000a0 [ 94.586002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880118d2bb38 [ 94.586002] R13: 0000000000000001 R14: 0000000000000008 R15: 0000000000000008 [ 94.586002] FS: 00007f83d3bb5700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 94.586002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 94.586002] CR2: 0000000000000034 CR3: 00000001069bf000 CR4: 00000000001427f0 [ 94.586002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 94.586002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 94.586002] Process qemu-system-x86 (pid: 2132, threadinfo ffff880118d2a000, task ffff8801086e2e80) [ 94.586002] Stack: [ 94.586002] 0000000091800027 ffff880106a70000 ffff880106a79540 0000000000000001 [ 94.586002] 0000000000000008 ffff880118d2bb38 ffff880118d2bb78 ffffffffa010be5f [ 94.586002] ffff880106a79700 0000000800009568 ffff880106a78000 0000000000009188 [ 94.586002] Call Trace: [ 94.586002] [<ffffffffa010be5f>] load_segment_descriptor+0x2ff/0x330 [kvm] [ 94.586002] [<ffffffffa010d4c8>] em_jmp_far+0x38/0x70 [kvm] [ 94.586002] [<ffffffffa0108bd0>] ? check_cr_read+0x40/0x40 [kvm] [ 94.586002] [<ffffffffa010c211>] x86_emulate_insn+0x261/0x1430 [kvm] [ 94.586002] [<ffffffffa010d490>] ? em_lldt+0x30/0x30 [kvm] [ 94.586002] [<ffffffffa00f5ff8>] x86_emulate_instruction+0x98/0x420 [kvm] [ 94.586002] [<ffffffffa016a72d>] vmx_handle_exit+0x20d/0x780 [kvm_intel] [ 94.586002] [<ffffffffa0165dfc>] ? vmx_vcpu_run+0x38c/0x5b0 [kvm_intel] [ 94.586002] [<ffffffffa0110a28>] ? kvm_apic_has_interrupt+0x28/0xd0 [kvm] [ 94.586002] [<ffffffffa01621b0>] ? vmx_invpcid_supported+0x20/0x20 [kvm_intel] [ 94.586002] [<ffffffffa00f3982>] kvm_arch_vcpu_ioctl_run+0x8c2/0x1140 [kvm] [ 94.586002] [<ffffffffa00ef607>] ? kvm_arch_vcpu_load+0x57/0x1e0 [kvm] [ 94.586002] [<ffffffffa00dfcde>] kvm_vcpu_ioctl+0x37e/0x540 [kvm] [ 94.586002] [<ffffffff8109a870>] ? __dequeue_entity+0x30/0x50 [ 94.586002] [<ffffffff811b074a>] do_vfs_ioctl+0x9a/0x550 [ 94.586002] [<ffffffff8164a817>] ? __schedule+0x3d7/0x7b0 [ 94.586002] [<ffffffff811b0ca1>] sys_ioctl+0xa1/0xb0 [ 94.586002] [<ffffffff81654659>] system_call_fastpath+0x16/0x1b [ 94.586002] Code: 41 0f b7 f5 0f b7 55 d0 c1 e6 03 8d 46 07 39 c2 7d 33 41 81 e6 fc ff 00 00 c6 43 28 0d c6 43 29 01 66 44 89 73 2a b8 02 00 00 00 <48> 8b 5d e0 4c 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 0f 1f 84 [ 94.586002] RIP [<ffffffffa010bb26>] write_segment_descriptor+0x66/0xa0 [kvm] [ 94.586002] RSP <ffff880118d2bac8> [ 94.586002] CR2: 0000000000000034 [ 94.637540] ---[ end trace d23673089dd8f566 ]--- [ 94.639336] ------------[ cut here ]------------ [ 94.640276] kernel BUG at arch/x86/kernel/traps.c:643! [ 94.640276] invalid opcode: 0000 [#2] SMP [ 94.640276] Modules linked in: fuse nfsv3 nfs_acl nfsv4 auth_rpcgss nfs fscache dns_resolver lockd sunrpc 8021q garp stp llc binfmt_misc uinput ppdev parport_pc parport kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr e1000 i2c_piix4 floppy(F) cirrus(F) ttm(F) drm_kms_helper(F) drm(F) i2c_core(F) [ 94.640276] CPU 0 [ 94.640276] Pid: 2132, comm: qemu-system-x86 Tainted: GF D 3.8.5 #4 Bochs Bochs [ 94.640276] RIP: 0010:[<ffffffff8164ca16>] [<ffffffff8164ca16>] do_device_not_available+0x16/0x30 [ 94.640276] RSP: 0018:ffffffff81c01d18 EFLAGS: 00010002 [ 94.640276] RAX: 000000008164c301 RBX: 0000000000000001 RCX: ffffffff8164c32c [ 94.640276] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff81c01d28 [ 94.640276] RBP: ffffffff81c01d18 R08: ffff8801086e2ef0 R09: 000000000000e910 [ 94.640276] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801086e2e80 [ 94.640276] R13: ffffffff81c148b0 R14: 0000000000000000 R15: ffff88011fc11980 [ 94.640276] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 94.640276] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 94.640276] CR2: 0000000000000034 CR3: 00000001069bf000 CR4: 00000000001427f0 [ 94.640276] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 94.640276] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 94.640276] Process qemu-system-x86 (pid: 2132, threadinfo ffff880118d2a000, task ffff8801086e2e80) [ 94.640276] Stack: [ 94.640276] ffffffff81c01e28 ffffffff8165578e ffff88011fc11980 0000000000000000 [ 94.640276] ffffffff81c148b0 ffff8801086e2e80 ffffffff81c01e28 ffffffff81c14420 [ 94.640276] 0000000000000000 0000000000000000 000000000000e910 ffff8801086e2ef0 [ 94.640276] Call Trace: [ 94.640276] Code: f0 c9 c3 66 90 48 2d a8 00 00 00 48 89 87 98 00 00 00 eb c8 90 55 48 89 e5 0f 1f 44 00 00 b0 01 84 c0 75 07 e8 9c 83 9c ff c9 c3 <0f> 0b 0f 1f 84 00 00 00 00 00 eb f6 66 66 66 66 66 2e 0f 1f 84 [ 94.640276] RIP [<ffffffff8164ca16>] do_device_not_available+0x16/0x30 [ 94.640276] RSP <ffffffff81c01d18> [ 94.640276] ---[ end trace d23673089dd8f567 ]--- [ 94.640276] Fixing recursive fault but reboot is needed! [ 125.299855] ------------[ cut here ]------------ [ 125.299855] WARNING: at kernel/watchdog.c:246 watchdog_overflow_callback+0x98/0xc0() [ 125.299855] Hardware name: Bochs [ 125.299855] Watchdog detected hard LOCKUP on cpu 1 [ 125.299855] Modules linked in: fuse nfsv3 nfs_acl nfsv4 auth_rpcgss nfs fscache dns_resolver lockd sunrpc 8021q garp stp llc binfmt_misc uinput ppdev parport_pc parport kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr e1000 i2c_piix4 floppy(F) cirrus(F) ttm(F) drm_kms_helper(F) drm(F) i2c_core(F) [ 125.299855] Pid: 0, comm: swapper/1 Tainted: GF D 3.8.5 #4 [ 125.299855] Call Trace: [ 125.299855] <NMI> [<ffffffff8106062f>] warn_slowpath_common+0x7f/0xc0 [ 125.299855] [<ffffffff81060726>] warn_slowpath_fmt+0x46/0x50 [ 125.299855] [<ffffffff810f00c8>] watchdog_overflow_callback+0x98/0xc0 [ 125.299855] [<ffffffff8112c0fc>] __perf_event_overflow+0x9c/0x220 [ 125.299855] [<ffffffff8102500a>] ? x86_perf_event_set_period+0xda/0x170 [ 125.299855] [<ffffffff8112c9a4>] perf_event_overflow+0x14/0x20 [ 125.299855] [<ffffffff8102b2a4>] intel_pmu_handle_irq+0x1c4/0x330 [ 125.299855] [<ffffffff8164db61>] perf_event_nmi_handler+0x21/0x30 [ 125.299855] [<ffffffff8164d2fa>] nmi_handle+0x5a/0x80 [ 125.299855] [<ffffffff8164d41d>] do_nmi+0xfd/0x360 [ 125.299855] [<ffffffff8164c901>] end_repeat_nmi+0x1e/0x2e [ 125.299855] [<ffffffff8164bfa2>] ? _raw_spin_lock+0x22/0x30 [ 125.299855] [<ffffffff8164bfa2>] ? _raw_spin_lock+0x22/0x30 [ 125.299855] [<ffffffff8164bfa2>] ? _raw_spin_lock+0x22/0x30 [ 125.299855] <<EOE>> <IRQ> [<ffffffff810a43f5>] sched_rt_period_timer+0x105/0x320 [ 125.299855] [<ffffffff81088ec0>] __run_hrtimer+0x70/0x1d0 [ 125.299855] [<ffffffff810a42f0>] ? enqueue_rt_entity+0x80/0x80 [ 125.299855] [<ffffffff81089296>] hrtimer_interrupt+0xf6/0x230 [ 125.299855] [<ffffffff816562d9>] smp_apic_timer_interrupt+0x69/0x99 [ 125.299855] [<ffffffff8165521d>] apic_timer_interrupt+0x6d/0x80 [ 125.299855] <EOI> [<ffffffff81045606>] ? native_safe_halt+0x6/0x10 [ 125.299855] [<ffffffff8101d5cf>] default_idle+0x4f/0x1a0 [ 125.299855] [<ffffffff8101ce99>] cpu_idle+0xd9/0x120 [ 125.299855] [<ffffffff81644245>] start_secondary+0x24c/0x24e [ 125.299855] ---[ end trace d23673089dd8f568 ]---
the following commit fixed the bug: commit 205befd9a5c701b56f569434045821f413f08f6d Author: Gleb Natapov <gleb@redhat.com> Date: Sun Aug 4 15:08:06 2013 +0300 KVM: nVMX: correctly set tr base on nested vmexit emulation After commit 21feb4eb64e21f8dc91136b91ee886b978ce6421 tr base is zeroed during vmexit. Set it to L1's HOST_TR_BASE. This should fix https://bugzilla.kernel.org/show_bug.cgi?id=60679 Reported-by: Yongjie Ren <yongjie.ren@intel.com> Reviewed-by: Arthur Chunqi Li <yzt356@gmail.com> Tested-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>