BUG: unable to handle kernel paging request at ffff87ffffffffff IP: [<ffffffffa03311b7>] __direct_map.clone.86+0xa7/0x240 [kvm] PGD 0 Oops: 0000 [#1] PREEMPT SMP CPU 0 Modules linked in: tun cdc_ether usbnet cdc_acm fuse usbmon pci_stub kvm_intel kvm netconsole configfs cfq_iosched blk_cgroup snd_seq_oss snd_seq_midi_event snd_seq bridge snd_seq_device ipv6 snd_pcm_oss snd_mixer_oss stp llc coretemp hwmon usblp snd_hda_codec_hdmi snd_hda_codec_realtek usb_storage ftdi_sio usbserial usbhid hid snd_hda_intel i915 snd_hda_codec drm_kms_helper snd_hwdep drm snd_pcm firewire_ohci tpm_tis 8139too tpm firewire_core xhci_hcd i2c_algo_bit snd_timer 8250_pci 8250_pnp ehci_hcd usbcore snd e1000e 8250 tpm_bios crc_itu_t serial_core snd_page_alloc sg rtc_cmos psmouse i2c_i801 mii usb_common video evdev ata_generic pata_acpi button Pid: 9995, comm: qemu-system-x86 Not tainted 3.2.2-gentoo #1 /DQ67SW RIP: 0010:[<ffffffffa03311b7>] [<ffffffffa03311b7>] __direct_map.clone.86+0xa7/0x240 [kvm] RSP: 0018:ffff88010bc39b08 EFLAGS: 00010293 RAX: ffff87ffffffffff RBX: 000ffffffffff000 RCX: 0000000000000027 RDX: 0000000029b55000 RSI: 0000000000000004 RDI: 0000000000000003 RBP: ffff88010bc39bb8 R08: ffff87ffffffffff R09: 0000000000113661 R10: 00000000c174f000 R11: 080000000000d974 R12: ffff880000000000 R13: ffff8803b7e6c240 R14: 0000000000000001 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88043e200000(0063) knlGS:00000000f5ffab70 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: ffff87ffffffffff CR3: 00000001027f1000 CR4: 00000000000426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-system-x86 (pid: 9995, threadinfo ffff88010bc38000, task ffff88000bc154f0) Stack: ffff8803b7e6c240 ffff88010bc39bf0 0000000000000000 0000000000029b55 ffff88010bc39b38 ffffffffa031ae14 00ff88010bc39bb8 0000000000000000 0000000000113661 0000000000029b55 0000000029b55000 ffffffffffffffff Call Trace: [<ffffffffa031ae14>] ? gfn_to_pfn_prot+0x14/0x20 [kvm] [<ffffffffa03316c0>] tdp_page_fault+0x1a0/0x1e0 [kvm] [<ffffffffa032d2e2>] kvm_mmu_page_fault+0x32/0xb0 [kvm] [<ffffffffa0362bec>] handle_ept_violation+0x4c/0xd0 [kvm_intel] [<ffffffffa0368ff4>] vmx_handle_exit+0xb4/0x6f0 [kvm_intel] [<ffffffff8103afad>] ? sub_preempt_count+0x9d/0xd0 [<ffffffffa0329e23>] kvm_arch_vcpu_ioctl_run+0x473/0xf40 [kvm] [<ffffffff8103afad>] ? sub_preempt_count+0x9d/0xd0 [<ffffffffa03197c2>] kvm_vcpu_ioctl+0x392/0x5e0 [kvm] [<ffffffffa031a3ed>] ? kvm_vm_ioctl+0x9d/0x410 [kvm] [<ffffffff81315529>] ? sys_sendto+0x119/0x140 [<ffffffffa0319a65>] kvm_vcpu_compat_ioctl+0x55/0x100 [kvm] [<ffffffff810f81df>] ? fget_light+0x8f/0xf0 [<ffffffff8113ee2e>] compat_sys_ioctl+0x8e/0xff0 [<ffffffff8105df3c>] ? posix_ktime_get_ts+0xc/0x10 [<ffffffff8105f190>] ? sys_clock_gettime+0x90/0xb0 [<ffffffff810860db>] ? compat_sys_clock_gettime+0x7b/0x90 [<ffffffff813c34c9>] sysenter_dispatch+0x7/0x27 Code: 89 d0 8d 4c ff 0c 4d 89 e0 48 d3 e8 4c 03 45 a8 25 ff 01 00 00 41 39 f6 89 45 bc 89 c0 49 8d 04 c0 48 89 45 b0 0f 84 e1 00 00 00 <4c> 8b 00 41 f6 c0 01 74 40 4c 8b 0d 89 80 01 00 4d 89 c2 4d 21 RIP [<ffffffffa03311b7>] __direct_map.clone.86+0xa7/0x240 [kvm] RSP <ffff88010bc39b08> CR2: ffff87ffffffffff ---[ end trace 4db76b33c09285f5 ]--- note: qemu-system-x86[9995] exited with preempt_count 1 usb 2-1.2: USB disconnect, device number 77 INFO: rcu_preempt detected stall on CPU 3 (t=60000 jiffies) Pid: 3610, comm: kwin Tainted: G D 3.2.2-gentoo #1 Call Trace: <IRQ> [<ffffffff810a2949>] __rcu_pending+0x1d9/0x420 [<ffffffff8106f920>] ? tick_nohz_handler+0xe0/0xe0 [<ffffffff810a2f62>] rcu_check_callbacks+0x122/0x1a0 [<ffffffff810504c3>] update_process_times+0x43/0x80 [<ffffffff8106f97b>] tick_sched_timer+0x5b/0xa0 [<ffffffff81063873>] __run_hrtimer.clone.30+0x63/0x140 [<ffffffff810641af>] hrtimer_interrupt+0xdf/0x210 [<ffffffff8101d643>] smp_apic_timer_interrupt+0x63/0xa0 [<ffffffff813c2b8b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff810b69a2>] ? __pagevec_free+0x22/0x30 [<ffffffff813c1862>] ? _raw_spin_lock+0x32/0x40 [<ffffffff813c1846>] ? _raw_spin_lock+0x16/0x40 [<ffffffffa0319c3c>] kvm_mmu_notifier_invalidate_page+0x3c/0x90 [kvm] [<ffffffff810e31c8>] __mmu_notifier_invalidate_page+0x48/0x60 [<ffffffff810d6ce5>] try_to_unmap_one+0x3c5/0x3f0 [<ffffffff810d762d>] try_to_unmap_anon+0x9d/0xe0 [<ffffffff810d7715>] try_to_unmap+0x55/0x70 [<ffffffff810e8d21>] migrate_pages+0x2f1/0x4d0 [<ffffffff810e1ec0>] ? suitable_migration_target+0x50/0x50 [<ffffffff810e271f>] compact_zone+0x44f/0x7a0 [<ffffffff810e2c07>] try_to_compact_pages+0x197/0x1f0 [<ffffffff810b7026>] __alloc_pages_direct_compact+0xc6/0x1c0 [<ffffffff810b74f9>] __alloc_pages_nodemask+0x3d9/0x7a0 [<ffffffff813c14b0>] ? _raw_spin_unlock+0x10/0x40 [<ffffffff810cd2fb>] ? handle_pte_fault+0x3bb/0x9f0 [<ffffffff810ec831>] do_huge_pmd_anonymous_page+0x131/0x350 [<ffffffff810cdcae>] handle_mm_fault+0x21e/0x300 [<ffffffff81027dad>] do_page_fault+0x12d/0x430 [<ffffffff810d3854>] ? do_mmap_pgoff+0x344/0x380 [<ffffffff813c1cef>] page_fault+0x1f/0x30
0: 89 d0 mov %edx,%eax 2: 8d 4c ff 0c lea 0xc(%rdi,%rdi,8),%ecx 6: 4d 89 e0 mov %r12,%r8 9: 48 d3 e8 shr %cl,%rax c: 4c 03 45 a8 add -0x58(%rbp),%r8 10: 25 ff 01 00 00 and $0x1ff,%eax 15: 41 39 f6 cmp %esi,%r14d 18: 89 45 bc mov %eax,-0x44(%rbp) 1b: 89 c0 mov %eax,%eax 1d: 49 8d 04 c0 lea (%r8,%rax,8),%rax 21: 48 89 45 b0 mov %rax,-0x50(%rbp) 25: 0f 84 e1 00 00 00 je 0x10c 2b: 4c 8b 00 mov (%rax),%r8 2e: 41 f6 c0 01 test $0x1,%r8b 32: 74 40 je 0x74 34: 4c 8b 0d 89 80 01 00 mov 0x18089(%rip),%r9 # 0x180c4 3b: 4d 89 c2 mov %r8,%r10 Appears to be __direct_map()'s if (!is_shadow_present_pte(*iterator.sptep)) { u64 base_addr = iterator.addr; %rax is 0xffff87ffffffffff. That is one less than the base of the direct map of all physical memory. So it looks like the code static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator) { if (iterator->level < PT_PAGE_TABLE_LEVEL) return false; iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level); iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index; return true; } saw iterator->shadow_addr == -1ULL. That might be INVALID_PAGE assigned to pae_root (but that is masked out in shadow_walk_init()) or a stray -1 due to a completely unrelated bug. Anything interesting about how this was triggered?
IIRC, it was pretty out of the blue. I might have had one or both of two KVMs running in the background at the time: - 64-bit Gentoo with a Radeon 5850 passthrough'd (VT-d) - 32-bit Ubuntu with a nested 32-bit KVM
You're a brave one. It wasn't the nested one (at least, it wasn't running in the guest's guest at the moment of the crash), but it might be related.
I suppose I should mention I'd been running both of these stable for at least a month now (and the GPU passthrough for nearly a full year). One factor that might (or might not) be related - the GPU fan recently died. When this crash took me down, I removed the GPU, so I won't be able to do any further testing with that setup (unless I find another similar GPU at a good price).
vcpu_enter_guest() kvm_mmu_reload() // now root_hpa is valid inject_pending_event() vmx_interrupt_allowed() nested_vmx_vmexit() load_vmcs12_host_state() kvm_mmu_reset_context() // root_hpa now invalid kvm_guest_enter() ... page fault because root_hpa is invalid, oops
Created attachment 73244 [details] Fix Please test the attached patch.
Is there anything I can do to reproduce the problem condition for the test? It seems to only occur about once every 6 months normally.
Try running while :; do :; done in the nested (L2) guest, and ping -f the L1 guest from the host.
The while/ping thing doesn't reproduce it even before the patch. :(
For what it's worth, no crashes in over a month. But it wasn't common enough that it can't be coincidence either...
A patch referencing this bug report has been merged in Linux v3.5-rc1: commit d8368af8b46b904def42a0f341d2f4f29001fa77 Author: Avi Kivity <avi@redhat.com> Date: Mon May 14 18:07:56 2012 +0300 KVM: Fix mmu_reload() clash with nested vmx event injection
Sorry I didn't report it sooner, but I have had the same crash since June, with this patch. :(
Which kernel ?
I'm not sure if it was 3.4.0, 3.4.3, or 3.4.4. Since May 17, I have been building all my kernels (including those) with this patch applied.
3.4.0: http://luke.dashjr.org/tmp/code/20120624_002.jpg
Thanks
I just hit this. Host: Intel DQ67SW, Core i7 2600, 24GB RAM BIOS: SWQ6710H.86A.0065.2012.0917.1519 Host OS: Fedora 17 kernel-3.6.6-1.fc17.x86_64 qemu-kvm-1.2.0-20.fc17.x86_64 L1 Guest OS: RHEL 6.3 kernel-2.6.32-279.14.1.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64 L2 Guest OS: RHEL 6.3 kernel-2.6.32-279.14.1.el6.x86_64 I was running a Pulp sync between a couple of L2 guests when this occurred, which presumably generated quite a bit of traffic across the virtual bridges. I am using Open vSwitch for all of the bridges on the host OS. The virtualized RHEV hypervisors use standard Linux bridges. Please let me know if I can provide any additional information to help track this down.
(In reply to comment #11) > A patch referencing this bug report has been merged in Linux v3.5-rc1: > > commit d8368af8b46b904def42a0f341d2f4f29001fa77 > Author: Avi Kivity <avi@redhat.com> > Date: Mon May 14 18:07:56 2012 +0300 > > KVM: Fix mmu_reload() clash with nested vmx event injection Silly question. Is this patch applicable to the physical host, the L1 guest (virtualized hypervisor), or both?
On 11/18/2012 12:10 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=42980 > > > > > > --- Comment #18 from Ian Pilcher <arequipeno@gmail.com> 2012-11-17 22:10:45 > --- > (In reply to comment #11) >> A patch referencing this bug report has been merged in Linux v3.5-rc1: >> >> commit d8368af8b46b904def42a0f341d2f4f29001fa77 >> Author: Avi Kivity <avi@redhat.com> >> Date: Mon May 14 18:07:56 2012 +0300 >> >> KVM: Fix mmu_reload() clash with nested vmx event injection > > Silly question. Is this patch applicable to the physical host, the L1 guest > (virtualized hypervisor), or both? > The physical host. If you want to run a hypervisor in L2, you need to apply it to L1 as well.
(In reply to comment #19) > The physical host. If you want to run a hypervisor in L2, you need to > apply it to L1 as well. OK. If I'm parsing that correctly, it sounds like backporting the patch to the RHEL 6 kernel, so I could run it in the L1 hypervisors, wouldn't help anything. Bummer. Any ideas on how I can make this environment stable? I see that Luke-Jr is also on a DQ67SW, and he's doing PCI passthrough. I do have VT-d enabled, although I'm not actually doing any PCI-passthrough. I that something that could be related to this?
I just hit this again (I think). Pretty much out of the blue, with a bunch of VMs running, including at least 2 nested guests. I have been trying to get a kdump of this, and I believe that I was at least somewhat successful. The system didn't dump automatically, but I was able to get it to do so by hitting alt-sysrq-c. The vmcore file is 3.7G, so suggestions as to a place to post it publicly would be appreciated.
Please tell against a newer kernel. This bug seems obsolete to me as of kernel versions released in 2014 time frame. Cheers Nick