Bug 72381 - [Nested] L1 call trace when create windows 7 guest as L2 guest.
Summary: [Nested] L1 call trace when create windows 7 guest as L2 guest.
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
Depends on:
Reported: 2014-03-18 07:43 UTC by Robert Ho
Modified: 2014-08-08 05:38 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.14.0-rc3
Tree: Mainline
Regression: No

L1 serial (80.03 KB, text/plain)
2014-03-18 07:43 UTC, Robert Ho

Description Robert Ho 2014-03-18 07:43:26 UTC
Created attachment 129911 [details]
L1 serial

Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):linux
kvm.git Commit:94b3ffcd41a90d2cb0b32ca23aa58a01111d5dc0
qemu.git Commit:087edb503afebf184f07078900efc26c73035e98
Host Kernel Version:3.14.0-rc3

Bug detailed description:
when create a windows 7 guest as L2 guest,L2 guest boot up fail, L1 guest will call trace. ping L1 guest pass, but ssh L1 guest fail

when create a rhel6u4 guest as L2 guest, L1 guest and L2 guest work fine

Reproduce steps:
1.1.create L1 guest:
qemu-system-x86_64 -enable-kvm -m 6G -smp 4 -net nic,macaddr=00:12:52:13:46:67 -net tap,script=/etc/kvm/qemu-ifup ia32e_nested_kvm.img -cpu host,level=9
2.create L2 guest:
qemu-system-x86_64 -enable-kvm -m 1G -smp 2 -net none ia32e_win7.img

Current result:
L1 call trace

Expected result:
L1 and L2 guest boot up fine.

Basic root-causing log:
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 4186 Comm: qemu-system-x86 Not tainted 3.12.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8801b890c040 ti: ffff8801b77b8000 task.ti: ffff8801b77b8000
RIP: 0010:[<ffffffff8145dac4>]  [<ffffffff8145dac4>] _raw_spin_lock+0x20/0x24
RSP: 0018:ffff8801b77b9bc8  EFLAGS: 00000293
RAX: 0000000000006e6b RBX: ffff8801b8170080 RCX: 000000000018a337
RDX: 000000000000006e RSI: 0000000000000000 RDI: ffff8801b77dc000
RBP: ffff8801b77b9bc8 R08: 0000000000000007 R09: ffff8801b77b9c20
R10: 0000000000002900 R11: 0000000000002931 R12: 0000000000000000
R13: 000000001278f000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007ff63bfff700(0000) GS:ffff8801bfd80000(0000) knlGS:fffff880009e6000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffff8a0000e7000 CR3: 00000000b1e6d000 CR4: 00000000000026e0
 ffff8801b77b9c58 ffffffffa01c8cb3 ffff8801b77b9c27 0000000000000001
 ffff8801b77b9bf8 00ffffffa022d1bf 0000000000001b66 00000002a01b481b
 000000000001278f 000000000018a337 00000001b77b9c88 01ffffff00000000
Call Trace:
 [<ffffffffa01c8cb3>] tdp_page_fault+0x146/0x1dc [kvm]
 [<ffffffffa01c9024>] kvm_mmu_page_fault+0x22/0xc5 [kvm]
 [<ffffffffa022f10c>] handle_ept_violation+0x13d/0x149 [kvm_intel]
 [<ffffffffa02314c9>] vmx_handle_exit+0x171/0x193 [kvm_intel]
 [<ffffffffa022b14e>] ? vmx_invpcid_supported+0x18/0x18 [kvm_intel]
 [<ffffffffa01bc5ea>] vcpu_enter_guest+0x65a/0x696 [kvm]
 [<ffffffff81064030>] ? __cond_resched+0x25/0x30
 [<ffffffffa01bc6cb>] __vcpu_run+0xa5/0x262 [kvm]
 [<ffffffffa01c015c>] kvm_arch_vcpu_ioctl_run+0xef/0x1ac [kvm]
 [<ffffffffa01af04f>] kvm_vcpu_ioctl+0x121/0x4b1 [kvm]
 [<ffffffff81085b3d>] ? futex_wake+0xeb/0xfd
 [<ffffffff81464530>] ? ret_from_fork+0xb0/0xb0
 [<ffffffff8112a5a1>] do_vfs_ioctl+0x2ad/0x2c9
 [<ffffffffa01ba29b>] ? kvm_on_user_return+0x4f/0x51 [kvm]
 [<ffffffff8112a616>] SyS_ioctl+0x59/0x7d
 [<ffffffff814645a2>] system_call_fastpath+0x16/0x1b
Code: 07 38 d0 74 04 f3 90 eb f6 c9 c3 55 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0a 8a 07 38 d0 74 04 f3 90 <eb> f6 c9 c3 55 48 89 e5 9c 58 fa ba 00 01 00 00 f0 66 0f c1 17
Comment 1 Paolo Bonzini 2014-03-24 17:38:41 UTC
What guest kernel is this?  Can you try pairing 3.13.6 with 94b3ffcd (first 3.13.6 in the host, then 3.13.6 in the guest)?
Comment 2 Robert Ho 2014-03-26 08:38:24 UTC
Tried this case: L0 94b3ffcd + L1 94b3ffcd , L1 panic.
Comment 3 Zhou, Chao 2014-08-08 05:35:10 UTC
kvm.git + qemu.git:c77dcacb_69f87f71
kernel version: 3.16.0
test on Romley_EP, when create a windows 7 guest as L2 guest,L2/L1 work fine.
Comment 4 Zhou, Chao 2014-08-08 05:36:00 UTC
this commit fixed the bug:
commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3
Author: Bandan Das <bsd@redhat.com>
Date:   Tue Jul 8 00:30:23 2014 -0400

    KVM: x86: Check for nested events if there is an injectable interrupt

    With commit b6b8a1451fc40412c57d1 that introduced
    vmx_check_nested_events, checks for injectable interrupts happen
    at different points in time for L1 and L2 that could potentially
    cause a race. The regression occurs because KVM_REQ_EVENT is always
    set when nested_run_pending is set even if there's no pending interrupt.
    Consequently, there could be a small window when check_nested_events
    returns without exiting to L1, but an interrupt comes through soon
    after and it incorrectly, gets injected to L2 by inject_pending_event
    Fix this by adding a call to check for nested events too when a check
    for injectable interrupt returns true

    Signed-off-by: Bandan Das <bsd@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Note You need to log in before you can comment on or make changes to this bug.