Bug 42980 - BUG in gfn_to_pfn_prot
Summary: BUG in gfn_to_pfn_prot
Status: REOPENED
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-22 21:28 UTC by Luke-Jr
Modified: 2016-02-15 21:37 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Fix (1.76 KB, patch)
2012-05-10 10:53 UTC, Avi Kivity
Details | Diff

Description Luke-Jr 2012-03-22 21:28:37 UTC
BUG: unable to handle kernel paging request at ffff87ffffffffff
IP: [<ffffffffa03311b7>] __direct_map.clone.86+0xa7/0x240 [kvm]
PGD 0 
Oops: 0000 [#1] PREEMPT SMP 
CPU 0 
Modules linked in: tun cdc_ether usbnet cdc_acm fuse usbmon pci_stub kvm_intel kvm netconsole configfs cfq_iosched blk_cgroup snd_seq_oss snd_seq_midi_event snd_seq bridge snd_seq_device ipv6 snd_pcm_oss snd_mixer_oss stp llc coretemp hwmon usblp snd_hda_codec_hdmi snd_hda_codec_realtek usb_storage ftdi_sio usbserial usbhid hid snd_hda_intel i915 snd_hda_codec drm_kms_helper snd_hwdep drm snd_pcm firewire_ohci tpm_tis 8139too tpm firewire_core xhci_hcd i2c_algo_bit snd_timer 8250_pci 8250_pnp ehci_hcd usbcore snd e1000e 8250 tpm_bios crc_itu_t serial_core snd_page_alloc sg rtc_cmos psmouse i2c_i801 mii usb_common video evdev ata_generic pata_acpi button

Pid: 9995, comm: qemu-system-x86 Not tainted 3.2.2-gentoo #1                  /DQ67SW
RIP: 0010:[<ffffffffa03311b7>]  [<ffffffffa03311b7>] __direct_map.clone.86+0xa7/0x240 [kvm]
RSP: 0018:ffff88010bc39b08  EFLAGS: 00010293
RAX: ffff87ffffffffff RBX: 000ffffffffff000 RCX: 0000000000000027
RDX: 0000000029b55000 RSI: 0000000000000004 RDI: 0000000000000003
RBP: ffff88010bc39bb8 R08: ffff87ffffffffff R09: 0000000000113661
R10: 00000000c174f000 R11: 080000000000d974 R12: ffff880000000000
R13: ffff8803b7e6c240 R14: 0000000000000001 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88043e200000(0063) knlGS:00000000f5ffab70
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: ffff87ffffffffff CR3: 00000001027f1000 CR4: 00000000000426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 9995, threadinfo ffff88010bc38000, task ffff88000bc154f0)
Stack:
 ffff8803b7e6c240 ffff88010bc39bf0 0000000000000000 0000000000029b55
 ffff88010bc39b38 ffffffffa031ae14 00ff88010bc39bb8 0000000000000000
 0000000000113661 0000000000029b55 0000000029b55000 ffffffffffffffff
Call Trace:
 [<ffffffffa031ae14>] ? gfn_to_pfn_prot+0x14/0x20 [kvm]
 [<ffffffffa03316c0>] tdp_page_fault+0x1a0/0x1e0 [kvm]
 [<ffffffffa032d2e2>] kvm_mmu_page_fault+0x32/0xb0 [kvm]
 [<ffffffffa0362bec>] handle_ept_violation+0x4c/0xd0 [kvm_intel]
 [<ffffffffa0368ff4>] vmx_handle_exit+0xb4/0x6f0 [kvm_intel]
 [<ffffffff8103afad>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffffa0329e23>] kvm_arch_vcpu_ioctl_run+0x473/0xf40 [kvm]
 [<ffffffff8103afad>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffffa03197c2>] kvm_vcpu_ioctl+0x392/0x5e0 [kvm]
 [<ffffffffa031a3ed>] ? kvm_vm_ioctl+0x9d/0x410 [kvm]
 [<ffffffff81315529>] ? sys_sendto+0x119/0x140
 [<ffffffffa0319a65>] kvm_vcpu_compat_ioctl+0x55/0x100 [kvm]
 [<ffffffff810f81df>] ? fget_light+0x8f/0xf0
 [<ffffffff8113ee2e>] compat_sys_ioctl+0x8e/0xff0
 [<ffffffff8105df3c>] ? posix_ktime_get_ts+0xc/0x10
 [<ffffffff8105f190>] ? sys_clock_gettime+0x90/0xb0
 [<ffffffff810860db>] ? compat_sys_clock_gettime+0x7b/0x90
 [<ffffffff813c34c9>] sysenter_dispatch+0x7/0x27
Code: 89 d0 8d 4c ff 0c 4d 89 e0 48 d3 e8 4c 03 45 a8 25 ff 01 00 00 41 39 f6 89 45 bc 89 c0 49 8d 04 c0 48 89 45 b0 0f 84 e1 00 00 00 <4c> 8b 00 41 f6 c0 01 74 40 4c 8b 0d 89 80 01 00 4d 89 c2 4d 21 
RIP  [<ffffffffa03311b7>] __direct_map.clone.86+0xa7/0x240 [kvm]
 RSP <ffff88010bc39b08>
CR2: ffff87ffffffffff
---[ end trace 4db76b33c09285f5 ]---
note: qemu-system-x86[9995] exited with preempt_count 1
usb 2-1.2: USB disconnect, device number 77
INFO: rcu_preempt detected stall on CPU 3 (t=60000 jiffies)
Pid: 3610, comm: kwin Tainted: G      D      3.2.2-gentoo #1
Call Trace:
 <IRQ>  [<ffffffff810a2949>] __rcu_pending+0x1d9/0x420
 [<ffffffff8106f920>] ? tick_nohz_handler+0xe0/0xe0
 [<ffffffff810a2f62>] rcu_check_callbacks+0x122/0x1a0
 [<ffffffff810504c3>] update_process_times+0x43/0x80
 [<ffffffff8106f97b>] tick_sched_timer+0x5b/0xa0
 [<ffffffff81063873>] __run_hrtimer.clone.30+0x63/0x140
 [<ffffffff810641af>] hrtimer_interrupt+0xdf/0x210
 [<ffffffff8101d643>] smp_apic_timer_interrupt+0x63/0xa0
 [<ffffffff813c2b8b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffff810b69a2>] ? __pagevec_free+0x22/0x30
 [<ffffffff813c1862>] ? _raw_spin_lock+0x32/0x40
 [<ffffffff813c1846>] ? _raw_spin_lock+0x16/0x40
 [<ffffffffa0319c3c>] kvm_mmu_notifier_invalidate_page+0x3c/0x90 [kvm]
 [<ffffffff810e31c8>] __mmu_notifier_invalidate_page+0x48/0x60
 [<ffffffff810d6ce5>] try_to_unmap_one+0x3c5/0x3f0
 [<ffffffff810d762d>] try_to_unmap_anon+0x9d/0xe0
 [<ffffffff810d7715>] try_to_unmap+0x55/0x70
 [<ffffffff810e8d21>] migrate_pages+0x2f1/0x4d0
 [<ffffffff810e1ec0>] ? suitable_migration_target+0x50/0x50
 [<ffffffff810e271f>] compact_zone+0x44f/0x7a0
 [<ffffffff810e2c07>] try_to_compact_pages+0x197/0x1f0
 [<ffffffff810b7026>] __alloc_pages_direct_compact+0xc6/0x1c0
 [<ffffffff810b74f9>] __alloc_pages_nodemask+0x3d9/0x7a0
 [<ffffffff813c14b0>] ? _raw_spin_unlock+0x10/0x40
 [<ffffffff810cd2fb>] ? handle_pte_fault+0x3bb/0x9f0
 [<ffffffff810ec831>] do_huge_pmd_anonymous_page+0x131/0x350
 [<ffffffff810cdcae>] handle_mm_fault+0x21e/0x300
 [<ffffffff81027dad>] do_page_fault+0x12d/0x430
 [<ffffffff810d3854>] ? do_mmap_pgoff+0x344/0x380
 [<ffffffff813c1cef>] page_fault+0x1f/0x30
Comment 1 Avi Kivity 2012-03-28 13:03:25 UTC
   0:	89 d0                	mov    %edx,%eax
   2:	8d 4c ff 0c          	lea    0xc(%rdi,%rdi,8),%ecx
   6:	4d 89 e0             	mov    %r12,%r8
   9:	48 d3 e8             	shr    %cl,%rax
   c:	4c 03 45 a8          	add    -0x58(%rbp),%r8
  10:	25 ff 01 00 00       	and    $0x1ff,%eax
  15:	41 39 f6             	cmp    %esi,%r14d
  18:	89 45 bc             	mov    %eax,-0x44(%rbp)
  1b:	89 c0                	mov    %eax,%eax
  1d:	49 8d 04 c0          	lea    (%r8,%rax,8),%rax
  21:	48 89 45 b0          	mov    %rax,-0x50(%rbp)
  25:	0f 84 e1 00 00 00    	je     0x10c
  2b:	4c 8b 00             	mov    (%rax),%r8
  2e:	41 f6 c0 01          	test   $0x1,%r8b
  32:	74 40                	je     0x74
  34:	4c 8b 0d 89 80 01 00 	mov    0x18089(%rip),%r9        # 0x180c4
  3b:	4d 89 c2             	mov    %r8,%r10

Appears to be __direct_map()'s

		if (!is_shadow_present_pte(*iterator.sptep)) {
			u64 base_addr = iterator.addr;

%rax is 0xffff87ffffffffff. That is one less than the base of the direct map of all physical memory.  So it looks like the code


static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
{
	if (iterator->level < PT_PAGE_TABLE_LEVEL)
		return false;

	iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level);
	iterator->sptep	= ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
	return true;
}

saw iterator->shadow_addr == -1ULL.

That might be INVALID_PAGE assigned to pae_root (but that is masked out in shadow_walk_init()) or a stray -1 due to a completely unrelated bug.

Anything interesting about how this was triggered?
Comment 2 Luke-Jr 2012-03-28 13:37:53 UTC
IIRC, it was pretty out of the blue. I might have had one or both of two KVMs running in the background at the time:
- 64-bit Gentoo with a Radeon 5850 passthrough'd (VT-d)
- 32-bit Ubuntu with a nested 32-bit KVM
Comment 3 Avi Kivity 2012-03-28 13:45:25 UTC
You're a brave one.

It wasn't the nested one (at least, it wasn't running in the guest's guest at the moment of the crash), but it might be related.
Comment 4 Luke-Jr 2012-03-28 13:49:26 UTC
I suppose I should mention I'd been running both of these stable for at least a month now (and the GPU passthrough for nearly a full year). One factor that might (or might not) be related - the GPU fan recently died. When this crash took me down, I removed the GPU, so I won't be able to do any further testing with that setup (unless I find another similar GPU at a good price).
Comment 5 Avi Kivity 2012-03-28 15:07:25 UTC
vcpu_enter_guest()
  kvm_mmu_reload() // now root_hpa is valid
  inject_pending_event()
    vmx_interrupt_allowed()
      nested_vmx_vmexit()
        load_vmcs12_host_state()
          kvm_mmu_reset_context() // root_hpa now invalid
  kvm_guest_enter()
  ... page fault because root_hpa is invalid, oops
Comment 6 Avi Kivity 2012-05-10 10:53:48 UTC
Created attachment 73244 [details]
Fix

Please test the attached patch.
Comment 7 Luke-Jr 2012-05-10 13:17:17 UTC
Is there anything I can do to reproduce the problem condition for the test? It seems to only occur about once every 6 months normally.
Comment 8 Avi Kivity 2012-05-10 13:30:36 UTC
Try running 

  while :; do :; done

in the nested (L2) guest, and ping -f the L1 guest from the host.
Comment 9 Luke-Jr 2012-05-17 20:58:50 UTC
The while/ping thing doesn't reproduce it even before the patch. :(
Comment 10 Luke-Jr 2012-06-16 03:16:44 UTC
For what it's worth, no crashes in over a month. But it wasn't common enough that it can't be coincidence either...
Comment 11 Florian Mickler 2012-07-01 09:46:47 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc1:

commit d8368af8b46b904def42a0f341d2f4f29001fa77
Author: Avi Kivity <avi@redhat.com>
Date:   Mon May 14 18:07:56 2012 +0300

    KVM: Fix mmu_reload() clash with nested vmx event injection
Comment 12 Luke-Jr 2012-08-15 22:24:36 UTC
Sorry I didn't report it sooner, but I have had the same crash since June, with this patch. :(
Comment 13 Alan 2012-08-15 22:34:02 UTC
Which kernel ?
Comment 14 Luke-Jr 2012-08-15 22:38:45 UTC
I'm not sure if it was 3.4.0, 3.4.3, or 3.4.4. Since May 17, I have been building all my kernels (including those) with this patch applied.
Comment 15 Luke-Jr 2012-08-15 22:47:39 UTC
3.4.0: http://luke.dashjr.org/tmp/code/20120624_002.jpg
Comment 16 Alan 2012-08-16 09:32:17 UTC
Thanks
Comment 17 Ian Pilcher 2012-11-17 22:00:39 UTC
I just hit this.

Host:  Intel DQ67SW, Core i7 2600, 24GB RAM
BIOS:  SWQ6710H.86A.0065.2012.0917.1519

Host OS:  Fedora 17
          kernel-3.6.6-1.fc17.x86_64
          qemu-kvm-1.2.0-20.fc17.x86_64

L1 Guest OS:  RHEL 6.3
              kernel-2.6.32-279.14.1.el6.x86_64
              qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64

L2 Guest OS:  RHEL 6.3
              kernel-2.6.32-279.14.1.el6.x86_64

I was running a Pulp sync between a couple of L2 guests when this occurred, which presumably generated quite a bit of traffic across the virtual bridges.  I am using Open vSwitch for all of the bridges on the host OS.  The virtualized RHEV hypervisors use standard Linux bridges.

Please let me know if I can provide any additional information to help track this down.
Comment 18 Ian Pilcher 2012-11-17 22:10:45 UTC
(In reply to comment #11)
> A patch referencing this bug report has been merged in Linux v3.5-rc1:
> 
> commit d8368af8b46b904def42a0f341d2f4f29001fa77
> Author: Avi Kivity <avi@redhat.com>
> Date:   Mon May 14 18:07:56 2012 +0300
> 
>     KVM: Fix mmu_reload() clash with nested vmx event injection

Silly question.  Is this patch applicable to the physical host, the L1 guest (virtualized hypervisor), or both?
Comment 19 Avi Kivity 2012-11-18 14:15:41 UTC
On 11/18/2012 12:10 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=42980
> 
> 
> 
> 
> 
> --- Comment #18 from Ian Pilcher <arequipeno@gmail.com>  2012-11-17 22:10:45
> ---
> (In reply to comment #11)
>> A patch referencing this bug report has been merged in Linux v3.5-rc1:
>> 
>> commit d8368af8b46b904def42a0f341d2f4f29001fa77
>> Author: Avi Kivity <avi@redhat.com>
>> Date:   Mon May 14 18:07:56 2012 +0300
>> 
>>     KVM: Fix mmu_reload() clash with nested vmx event injection
> 
> Silly question.  Is this patch applicable to the physical host, the L1 guest
> (virtualized hypervisor), or both?
> 

The physical host.  If you want to run a hypervisor in L2, you need to
apply it to L1 as well.
Comment 20 Ian Pilcher 2012-11-18 17:06:36 UTC
(In reply to comment #19)
> The physical host.  If you want to run a hypervisor in L2, you need to
> apply it to L1 as well.

OK.  If I'm parsing that correctly, it sounds like backporting the patch to the RHEL 6 kernel, so I could run it in the L1 hypervisors, wouldn't help anything.

Bummer.

Any ideas on how I can make this environment stable?

I see that Luke-Jr is also on a DQ67SW, and he's doing PCI passthrough.  I do have VT-d enabled, although I'm not actually doing any PCI-passthrough.  I that something that could be related to this?
Comment 21 Ian Pilcher 2012-12-08 20:50:25 UTC
I just hit this again (I think).  Pretty much out of the blue, with a bunch of VMs running, including at least 2 nested guests.

I have been trying to get a kdump of this, and I believe that I was at least somewhat successful.  The system didn't dump automatically, but I was able to get it to do so by hitting alt-sysrq-c.  The vmcore file is 3.7G, so suggestions as to a place to post it publicly would be appreciated.
Comment 22 xerofoify 2014-06-25 02:11:39 UTC
Please tell against a newer kernel. This bug seems obsolete to me as of
kernel versions released in 2014 time frame.
Cheers Nick

Note You need to log in before you can comment on or make changes to this bug.