Bug 42703
Summary: | random hangs on virtualization host | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Bram De Wilde (gbramdewilde) |
Component: | kvm | Assignee: | Avi Kivity (avi) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, avi, gleb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.0.0-16 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Bram De Wilde
2012-01-31 16:55:25 UTC
> Jan 27 13:41:28 cmggcn01 kernel: [871350.761867] general protection fault: > 0000 [#2] SMP > Jan 27 13:41:28 cmggcn01 kernel: [871350.790117] CPU 14 > Jan 27 13:41:28 cmggcn01 kernel: [871350.790387] Modules linked in: btrfs > zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs > reiserfs ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat > ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp > iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser > rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp > libiscsi scsi_transport_iscsi psmouse dcdbas dm_multipath serio_raw joydev > ghes hed acpi_power_meter bonding lp parport i7core_edac edac_core ses > enclosure usbhid hid megaraid_sas bnx2 > Jan 27 13:41:28 cmggcn01 kernel: [871351.005151] > Jan 27 13:41:28 cmggcn01 kernel: [871351.036187] Pid: 90, comm: kswapd0 > Tainted: G D 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge > R710/0MD99X > Jan 27 13:41:28 cmggcn01 kernel: [871351.072809] RIP: > 0010:[<ffffffffa01a5890>] [<ffffffffa01a5890>] kvm_unmap_rmapp+0x20/0x60 > [kvm] > Jan 27 13:41:28 cmggcn01 kernel: [871351.105190] RSP: 0018:ffff8817f3e27a60 > EFLAGS: 00010202 > Jan 27 13:41:28 cmggcn01 kernel: [871351.141329] RAX: 00008817f5d067f8 RBX: > ffffc9001fd41ff8 RCX: ffffffffa01a58d0 > Jan 27 13:41:28 cmggcn01 kernel: [871351.179076] RDX: 0000000000000000 RSI: > 0000000000000000 RDI: 00008817f5d067f8 > Jan 27 13:41:28 cmggcn01 kernel: [871351.212086] RBP: ffff8817f3e27a80 R08: > ffff8817f315b3e0 R09: 0000000000000100 > Jan 27 13:41:28 cmggcn01 kernel: [871351.245788] R10: 000000000000000e R11: > 0000000000000002 R12: ffff8817f2f0c000 > Jan 27 13:41:28 cmggcn01 kernel: [871351.277514] R13: 0000000000000000 R14: > ffff880be235e000 R15: 00000000000d3cff > Jan 27 13:41:28 cmggcn01 kernel: [871351.308421] FS: 0000000000000000(0000) > GS:ffff88183fce0000(0000) knlGS:0000000000000000 > Jan 27 13:41:28 cmggcn01 kernel: [871351.339685] CS: 0010 DS: 0000 ES: 0000 > CR0: 000000008005003b > Jan 27 13:41:28 cmggcn01 kernel: [871351.370089] CR2: 00007f8836442000 CR3: > 0000000001c03000 CR4: 00000000000026e0 > Jan 27 13:41:28 cmggcn01 kernel: [871351.399771] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Jan 27 13:41:28 cmggcn01 kernel: [871351.428208] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Jan 27 13:41:28 cmggcn01 kernel: [871351.456153] Process kswapd0 (pid: 90, > threadinfo ffff8817f3e26000, task ffff8817f63ac560) > Jan 27 13:41:28 cmggcn01 kernel: [871351.484943] Stack: > Jan 27 13:41:28 cmggcn01 kernel: [871351.512984] 0000000000000000 > ffffc9001fd41ff8 0000000000000001 00007f834a87e000 > Jan 27 13:41:28 cmggcn01 kernel: [871351.542025] ffff8817f3e27aa0 > ffffffffa01a5945 ffff880be235e060 0000000000000001 > Jan 27 13:41:28 cmggcn01 kernel: [871351.571050] ffff8817f3e27b10 > ffffffffa01a1dd9 ffff8817f3e27ae0 ffffffffa01a58d0 <snip> > Jan 27 13:41:28 cmggcn01 kernel: [871352.080582] Code: e7 d0 e8 e0 66 90 e9 > a2 fe ff ff 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 45 31 ed 49 > 89 fc 48 89 f3 eb 20 0f 1f 00 <f6> 00 01 74 35 48 8b 15 74 7a 02 00 48 89 c6 > 4c 89 e7 41 bd 01 0: e8 e0 66 90 e9 callq 0xffffffffe99066e5 5: a2 fe ff ff 55 48 89 mov %al,0x41e5894855fffffe c: e5 41 e: 55 push %rbp f: 41 54 push %r12 11: 53 push %rbx 12: 48 83 ec 08 sub $0x8,%rsp 16: 66 66 66 66 90 data32 data32 data32 xchg %ax,%ax 1b: 45 31 ed xor %r13d,%r13d 1e: 49 89 fc mov %rdi,%r12 21: 48 89 f3 mov %rsi,%rbx 24: eb 20 jmp 0x46 26: 0f 1f 00 nopl (%rax) 29: f6 00 01 testb $0x1,(%rax) ^ dies here, %rax is non-canonical. 2c: 74 35 je 0x63 2e: 48 8b 15 74 7a 02 00 mov 0x27a74(%rip),%rdx # 0x27aa9 35: 48 89 c6 mov %rax,%rsi 38: 4c 89 e7 mov %r12,%rdi static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, unsigned long data) { u64 *spte; int need_tlb_flush = 0; while ((spte = rmap_next(kvm, rmapp, NULL))) { BUG_ON(!(*spte & PT_PRESENT_MASK)); ^ here, when fetching *spte. rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); drop_spte(kvm, spte); need_tlb_flush = 1; } return need_tlb_flush; Looks like a use-after-free with the two bytes at offset 6 zeroed. If this is reproducible, please rerun with the host kernel parameter slub_debug=FZPU. Are you using bridge and netfilter? Can you disable both? This looks similar to https://bugzilla.kernel.org/show_bug.cgi?id=27052 (In reply to comment #2) > Are you using bridge and netfilter? Can you disable both? This looks similar > to > https://bugzilla.kernel.org/show_bug.cgi?id=27052 Indeed I'm running both. Running both is required to run VM's in the openstack configuration so just disabling would brake my cloud config...I guess? Any alternatives? Have meanwhile upgraded to the 3.2.2 kernel to see if the problem persists will reboot with "slub_debug=FZPU" on next crash. lsmod: Module Size Used by des_generic 21415 0 md4 12595 0 nls_utf8 12557 1 cifs 281484 2 ebt_arp 12585 108 ebt_ip 12538 36 8021q 24151 0 garp 14313 1 8021q ip6table_filter 12815 0 ip6_tables 27617 1 ip6table_filter ebtable_nat 12807 1 ebtables 30966 1 ebtable_nat ipt_MASQUERADE 12759 3 xt_state 12578 25 ipt_REJECT 12576 2 xt_CHECKSUM 12549 1 iptable_mangle 12695 1 xt_tcpudp 12603 61 iptable_nat 13182 1 nf_nat 25545 2 ipt_MASQUERADE,iptable_nat nf_conntrack_ipv4 19588 28 iptable_nat,nf_nat nf_conntrack 81527 5 ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4 nf_defrag_ipv4 12729 1 nf_conntrack_ipv4 iptable_filter 12810 1 kvm_intel 136560 61 ip_tables 27227 3 iptable_mangle,iptable_nat,iptable_filter x_tables 29727 14 ebt_arp,ebt_ip,ip6table_filter,ip6_tables,ebtables,ipt_MASQUERADE,xt_state,ipt_REJECT,xt_CHECKSUM,iptable_mangle,xt_tcpudp,iptable_nat,iptable_filter,ip_tables bridge 90674 0 kvm 404475 1 kvm_intel stp 12931 2 garp,bridge nbd 17712 0 ib_iser 38366 0 rdma_cm 43625 1 ib_iser ib_cm 47663 1 rdma_cm iw_cm 18705 1 rdma_cm ib_sa 28854 2 rdma_cm,ib_cm ib_mad 47570 2 ib_cm,ib_sa ib_core 82371 6 ib_iser,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad ib_addr 14109 1 rdma_cm iscsi_tcp 18447 0 libiscsi_tcp 20862 1 iscsi_tcp libiscsi 57321 3 ib_iser,iscsi_tcp,libiscsi_tcp scsi_transport_iscsi 53383 4 ib_iser,iscsi_tcp,libiscsi ext2 73217 1 bonding 108597 0 psmouse 73859 0 dcdbas 14438 0 serio_raw 13211 0 joydev 17597 0 i7core_edac 27864 0 edac_core 53411 4 i7core_edac lp 17789 0 dm_multipath 23141 0 parport 46360 1 lp mac_hid 13205 0 acpi_power_meter 18139 0 ses 17385 0 enclosure 15209 1 ses usbhid 46754 0 hid 99171 1 usbhid megaraid_sas 87049 2 bnx2 85274 0 Given the silence from January I assume this is fixed |