53681 – nVMX: Rare crash on shadow-on-shadow case

Bug 53681 - nVMX: Rare crash on shadow-on-shadow case

Summary: nVMX: Rare crash on shadow-on-shadow case

Status:	NEW

Alias:	None

Product:	Virtualization
Classification:	Unclassified
Component:	kvm (show other bugs)
Hardware:	All Linux

Importance:	P1 low
Assignee:	virtualization_kvm

URL:
Keywords:

Depends on:
Blocks:	94971 53601
	Show dependency tree

Reported:	2013-02-12 08:24 UTC by Nadav Har'El
Modified:	2015-03-17 03:53 UTC (History)
CC List:	0 users

See Also:
Kernel Version:
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Nadav Har'El 2013-02-12 08:24:20 UTC

I tried (using an April 2011 codebase, so this bug needs to be verified again!) the following stress test of nested VMX: L0 and L1 are KVM, L0, L1 and L2 are Ubuntu. L0 has 16 hardware threads and runs parallel compilation ("make -j16") in a loop. L1 and L2 get one vcpu, and run "make -j3". This test is especially heavy on context-switches (which happen on all levels) and memory management (as all the separate processes have their separate page tables).

With the default nested mmu virtualization, shadow-on-EPT, things appear to work fine, and this stress test happily continues for 24 hours without incident.

However, with the non-recommended, slower, shadow-on-shadow (i.e., ept=0 in L0), after a couple of hours of successful compilation, L0 suddenly died, with the following oops:


BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffffa0015414>] mark_unsync+0x0/0x2a [kvm]
PGD 1746df067 PUD 174f39067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu9/cpufreq/scaling_governor
CPU 15 
Modules linked in: kvm_intel kvm [last unloaded: kvm]

Pid: 3353, comm: qemu-system-x86 Tainted: G    B       2.6.37mx-66117-gb966170
#
234 49Y6498     /IBM System x -[794692G]-
RIP: 0010:[<ffffffffa0015414>]  [<ffffffffa0015414>] mark_unsync+0x0/0x2a [kvm]
RSP: 0018:ffff880101131760  EFLAGS: 00010256
RAX: 0000000000000000 RBX: ffff880171ce87c0 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffff880000000ff7 RDI: 0000000000000000
RBP: ffff880101131798 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: ffffea0000000000 R12: 0000000000000008
R13: ffffea0000000000 R14: ffff880171ce8798 R15: ffff880000000ff7
FS:  00007fabf2b02910(0000) GS:ffff88007d5e0000(0000) knlGS:ffffffff80872980
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000030 CR3: 000000017a59a000 CR4: 00000000000026f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 3353, threadinfo ffff880101130000, task
ffff88007d
e87080)
Stack:
 ffffffffa0014aae ffff8801011317c8 ffff88006a9ea130 ffff880162618040
 ffff880076373068 0000000000056a0d 800000010b838203 ffff8801011317a8
 ffffffffa001543c ffff8801011317e8 ffffffffa0014a72 ffff8801011317c8
Call Trace:
 [<ffffffffa0014aae>] ? T.927+0x84/0xae [kvm]
 [<ffffffffa001543c>] mark_unsync+0x28/0x2a [kvm]
 [<ffffffffa0014a72>] T.927+0x48/0xae [kvm]
 [<ffffffffa001543c>] mark_unsync+0x28/0x2a [kvm]
 [<ffffffffa0014a72>] T.927+0x48/0xae [kvm]
 [<ffffffffa00156bd>] set_spte+0x27f/0x349 [kvm]
 [<ffffffffa0015882>] mmu_set_spte+0xfb/0x328 [kvm]
 [<ffffffffa0015c5f>] __direct_pte_prefetch+0x1b0/0x1ff [kvm]
 [<ffffffffa0011954>] ? gfn_to_rmap+0x12/0x4d [kvm]
 [<ffffffffa0017473>] paging64_page_fault+0x450/0x6b3 [kvm]
 [<ffffffffa00141fd>] kvm_mmu_page_fault+0x24/0x7f [kvm]
 [<ffffffffa0c3d6b4>] handle_exception+0x19f/0x31f [kvm_intel]
 [<ffffffffa000167d>] ? kvm_vcpu_block+0x31/0xa9 [kvm]
 [<ffffffffa0c40745>] vmx_handle_exit+0x5e4/0x613 [kvm_intel]
 [<ffffffffa000e698>] kvm_arch_vcpu_ioctl_run+0xa13/0xd92 [kvm]
 [<ffffffffa000e5fe>] ? kvm_arch_vcpu_ioctl_run+0x979/0xd92 [kvm]
 [<ffffffffa0c3eda6>] ? vmx_vcpu_load+0x2e/0x180 [kvm_intel]
 [<ffffffffa000d3d0>] ? kvm_arch_vcpu_load+0x8f/0x10b [kvm]
 [<ffffffffa000344f>] kvm_vcpu_ioctl+0x113/0x4e4 [kvm]
 [<ffffffffa0002d9d>] ? kvm_vm_ioctl+0x362/0x38b [kvm]
 [<ffffffff810add27>] do_vfs_ioctl+0x4a8/0x4f7
 [<ffffffff810a0d5a>] ? fget_light+0xdd/0xeb
 [<ffffffff810a0ccf>] ? fget_light+0x52/0xeb
 [<ffffffff810addb8>] sys_ioctl+0x42/0x65
 [<ffffffff81001f7b>] system_call_fastpath+0x16/0x1b
Code: 08 41 bc 01 00 00 00 eb 10 48 8b b3 70 03 00 00 48 89 df ff 93 20 03 00
00 48 83 c4 38 44 89 e0 5b 41 5c 41 5d 41 5e 41 5f c9 c3 <48> 2b 77 30 55 48 c1
ee 03 48 89 e5 0f ab 77 60 19 f6 85 f6 75 
RIP  [<ffffffffa0015414>] mark_unsync+0x0/0x2a [kvm]
 RSP <ffff880101131760>
CR2: 0000000000000030

Note You need to log in before you can comment on or make changes to this bug.