Bug 42703 - random hangs on virtualization host
Summary: random hangs on virtualization host
Status: RESOLVED OBSOLETE
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Avi Kivity
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-31 16:55 UTC by Bram De Wilde
Modified: 2012-08-30 14:21 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.0.0-16
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Bram De Wilde 2012-01-31 16:55:25 UTC
Hi,

We are experiencing random system crashes on our ubuntu 11.10 server running an openstack / KVM cloud. Unfortunately we did not observe any specific behavior that trigers the problem, but I have collected a couple of stack traces.
Running on a dell 710  dual Intel Xeon X5650 CPU's

Please let me know if I can provide more information.

Kind regards
bram

Jan 13 17:06:46 cmggcn01 kernel: [876590.142455] general protection fault: 0000 [#1] SMP 
Jan 13 17:06:46 cmggcn01 kernel: [876590.165966] CPU 4 
Jan 13 17:06:46 cmggcn01 kernel: [876590.166135] Modules linked in: ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dcdbas dm_multipath psmouse serio_raw ghes hed acpi_power_meter bonding i7core_edac lp joydev parport edac_core ses enclosure usbhid hid megaraid_sas bnx2
Jan 13 17:06:46 cmggcn01 kernel: [876590.313703] 
Jan 13 17:06:46 cmggcn01 kernel: [876590.337109] Pid: 93, comm: ksmd Not tainted 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X
Jan 13 17:06:46 cmggcn01 kernel: [876590.362347] RIP: 0010:[<ffffffffa01b3411>]  [<ffffffffa01b3411>] kvm_set_pte_rmapp+0x51/0x130 [kvm]
Jan 13 17:06:46 cmggcn01 kernel: [876590.386424] RSP: 0018:ffff8817f4129bc0  EFLAGS: 00010202
Jan 13 17:06:46 cmggcn01 kernel: [876590.411497] RAX: 000088050d943ff8 RBX: 000088050d943ff8 RCX: ffffffffa01b33c0
Jan 13 17:06:46 cmggcn01 kernel: [876590.435831] RDX: ffff8817f4129c88 RSI: 0000000000000000 RDI: 000088050d943ff8
Jan 13 17:06:46 cmggcn01 kernel: [876590.460571] RBP: ffff8817f4129c00 R08: ffff880bf6012960 R09: 0000000000000100
Jan 13 17:06:46 cmggcn01 kernel: [876590.484258] R10: 00000000000000ab R11: 0000000000000002 R12: ffffc900288c3ff8
Jan 13 17:06:46 cmggcn01 kernel: [876590.507470] R13: ffff8817f4129c88 R14: ffff880b917d8000 R15: 00000000002f7279
Jan 13 17:06:46 cmggcn01 kernel: [876590.531508] FS:  0000000000000000(0000) GS:ffff88183fc40000(0000) knlGS:0000000000000000
Jan 13 17:06:46 cmggcn01 kernel: [876590.554747] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 13 17:06:46 cmggcn01 kernel: [876590.577706] CR2: 00007f3ac168a000 CR3: 0000000001c03000 CR4: 00000000000026e0
Jan 13 17:06:46 cmggcn01 kernel: [876590.602479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 13 17:06:46 cmggcn01 kernel: [876590.624973] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 13 17:06:46 cmggcn01 kernel: [876590.647214] Process ksmd (pid: 93, threadinfo ffff8817f4128000, task ffff8817f63fdc80)
Jan 13 17:06:46 cmggcn01 kernel: [876590.669444] Stack:
Jan 13 17:06:46 cmggcn01 kernel: [876590.690742]  ffff8817f4129c10 ffffffffa01b3409 000000000000008f ffff880b6bea70b0
Jan 13 17:06:46 cmggcn01 kernel: [876590.713167]  0000000000000002 00007f54c4050000 ffff880b6bea7000 000000000010d9ff
Jan 13 17:06:46 cmggcn01 kernel: [876590.735595]  ffff8817f4129c70 ffffffffa01b0dd9 ffff8817f4129c80 ffffffffa01b33c0
Jan 13 17:06:46 cmggcn01 kernel: [876590.758191] Call Trace:
Jan 13 17:06:46 cmggcn01 kernel: [876590.780877]  [<ffffffffa01b3409>] ? kvm_set_pte_rmapp+0x49/0x130 [kvm]
Jan 13 17:06:46 cmggcn01 kernel: [876590.804088]  [<ffffffffa01b0dd9>] kvm_handle_hva+0x99/0x180 [kvm]
Jan 13 17:06:46 cmggcn01 kernel: [876590.829628]  [<ffffffffa01b33c0>] ? rmap_write_protect+0x150/0x150 [kvm]
Jan 13 17:06:46 cmggcn01 kernel: [876590.853510]  [<ffffffffa01b7ad1>] kvm_set_spte_hva+0x21/0x30 [kvm]
Jan 13 17:06:46 cmggcn01 kernel: [876590.876966]  [<ffffffffa019591d>] kvm_mmu_notifier_change_pte+0x5d/0x90 [kvm]
Jan 13 17:06:46 cmggcn01 kernel: [876590.901395]  [<ffffffff8114d87e>] __mmu_notifier_change_pte+0x3e/0x80
Jan 13 17:06:46 cmggcn01 kernel: [876590.925191]  [<ffffffff8114e07f>] write_protect_page+0x10f/0x170
Jan 13 17:06:46 cmggcn01 kernel: [876590.950156]  [<ffffffff8114e2df>] ? replace_page+0x1ff/0x280
Jan 13 17:06:46 cmggcn01 kernel: [876590.974177]  [<ffffffff8114e3e0>] try_to_merge_one_page+0x80/0x220
Jan 13 17:06:46 cmggcn01 kernel: [876590.999032]  [<ffffffff8114e5f7>] try_to_merge_with_ksm_page+0x77/0xc0
Jan 13 17:06:46 cmggcn01 kernel: [876591.022944]  [<ffffffff8114f616>] cmp_and_merge_page+0xe6/0x260
Jan 13 17:06:46 cmggcn01 kernel: [876591.046993]  [<ffffffff8114f83f>] ksm_scan_thread+0xaf/0x2a0
Jan 13 17:06:46 cmggcn01 kernel: [876591.070807]  [<ffffffff81081660>] ? add_wait_queue+0x60/0x60
Jan 13 17:06:46 cmggcn01 kernel: [876591.094345]  [<ffffffff8114f790>] ? cmp_and_merge_page+0x260/0x260
Jan 13 17:06:46 cmggcn01 kernel: [876591.118363]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 13 17:06:46 cmggcn01 kernel: [876591.147851]  [<ffffffff81609164>] kernel_thread_helper+0x4/0x10
Jan 13 17:06:46 cmggcn01 kernel: [876591.178092]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 13 17:06:46 cmggcn01 kernel: [876591.203396]  [<ffffffff81609160>] ? gs_change+0x13/0x13
Jan 13 17:06:46 cmggcn01 kernel: [876591.226913] Code: 0f 85 e8 00 00 00 48 89 f8 66 66 66 90 49 8b 3c 24 49 89 c7 31 f6 49 c1 e7 12 49 c1 ef 1e e8 f7 fd ff ff 48 85 c0 48 89 c3 74 77 
Jan 13 17:06:46 cmggcn01 kernel: [876591.299573] RIP  [<ffffffffa01b3411>] kvm_set_pte_rmapp+0x51/0x130 [kvm]
Jan 13 17:06:46 cmggcn01 kernel: [876591.323386]  RSP <ffff8817f4129bc0>
Jan 13 17:06:46 cmggcn01 kernel: [876591.394334] ---[ end trace a6f88f15bc3d2aa0 ]---
Jan 13 17:07:46 cmggcn01 kernel: [876651.245759] INFO: rcu_sched_state detected stall on CPU 14 (t=15000 jiffies)
Jan 13 17:07:46 cmggcn01 kernel: [876651.249812] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 14} (detected by 22, t=15002 jiffies)




Jan 16 12:03:42 cmggcn01 kernel: [232444.624348] general protection fault: 0000 [#1] SMP 
Jan 16 12:03:42 cmggcn01 kernel: [232444.624791] CPU 14 
Jan 16 12:03:42 cmggcn01 kernel: [232444.624971] Modules linked in: ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vesafb bonding psmouse ghes acpi_power_meter dcdbas dm_multipath joydev hed i7core_edac serio_raw edac_core lp parport ses enclosure usbhid hid megaraid_sas bnx2
Jan 16 12:03:42 cmggcn01 kernel: [232444.630415] 
Jan 16 12:03:42 cmggcn01 kernel: [232444.630541] Pid: 92, comm: ksmd Not tainted 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X
Jan 16 12:03:42 cmggcn01 kernel: [232444.631356] RIP: 0010:[<ffffffff8114ecee>]  [<ffffffff8114ecee>] remove_rmap_item_from_tree+0x9e/0x150
Jan 16 12:03:42 cmggcn01 kernel: [232444.632148] RSP: 0018:ffff8817f3e1fe10  EFLAGS: 00010286
Jan 16 12:03:42 cmggcn01 kernel: [232444.632589] RAX: ffff8817bbee8c30 RBX: ffff880bf5847fc0 RCX: ffff880bf56f8463
Jan 16 12:03:42 cmggcn01 kernel: [232444.633188] RDX: 0000880ba0f4d030 RSI: 0000000000020072 RDI: ffffea0028d70730
Jan 16 12:03:42 cmggcn01 kernel: [232444.633785] RBP: ffff8817f3e1fe30 R08: ffffea0028d70738 R09: ffff880c3fff6928
Jan 16 12:03:42 cmggcn01 kernel: [232444.634383] R10: 00000000000000b7 R11: ffffea0028e44180 R12: ffff880bf56f8460
Jan 16 12:03:42 cmggcn01 kernel: [232444.634980] R13: ffffea0028d70730 R14: ffff8817f3e1fe98 R15: ffff8817f3e20000
Jan 16 12:03:42 cmggcn01 kernel: [232444.635406] FS:  0000000000000000(0000) GS:ffff88183fce0000(0000) knlGS:0000000000000000
Jan 16 12:03:42 cmggcn01 kernel: [232444.635858] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 16 12:03:42 cmggcn01 kernel: [232444.636340] CR2: 0000000001c1be08 CR3: 0000000001c03000 CR4: 00000000000026e0
Jan 16 12:03:42 cmggcn01 kernel: [232444.636939] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 16 12:03:42 cmggcn01 kernel: [232444.637535] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 16 12:03:42 cmggcn01 kernel: [232444.638134] Process ksmd (pid: 92, threadinfo ffff8817f3e1e000, task ffff8817f3e20000)
Jan 16 12:03:42 cmggcn01 kernel: [232444.638949] Stack:
Jan 16 12:03:42 cmggcn01 kernel: [232444.639208]  ffff880bf60e4980 ffff8817f3e20000 ffffea0022feb118 ffff880bf5847fc0
Jan 16 12:03:42 cmggcn01 kernel: [232444.640235]  ffff8817f3e1fe70 ffffffff8114f55a ffff8817f3e1fe60 0000000000000000
Jan 16 12:03:42 cmggcn01 kernel: [232444.640953]  ffff8817f3e20000 000000000000005b ffff8817f3e20000 ffff8817f3e1fe98
Jan 16 12:03:42 cmggcn01 kernel: [232444.641622] Call Trace:
Jan 16 12:03:42 cmggcn01 kernel: [232444.641828]  [<ffffffff8114f55a>] cmp_and_merge_page+0x2a/0x260
Jan 16 12:03:42 cmggcn01 kernel: [232444.642325]  [<ffffffff8114f83f>] ksm_scan_thread+0xaf/0x2a0
Jan 16 12:03:42 cmggcn01 kernel: [232444.642796]  [<ffffffff81081660>] ? add_wait_queue+0x60/0x60
Jan 16 12:03:42 cmggcn01 kernel: [232444.643269]  [<ffffffff8114f790>] ? cmp_and_merge_page+0x260/0x260
Jan 16 12:03:42 cmggcn01 kernel: [232444.643785]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 16 12:03:42 cmggcn01 kernel: [232444.644198]  [<ffffffff81609164>] kernel_thread_helper+0x4/0x10
Jan 16 12:03:42 cmggcn01 kernel: [232444.644690]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 16 12:03:42 cmggcn01 kernel: [232444.645207]  [<ffffffff81609160>] ? gs_change+0x13/0x13
Jan 16 12:03:42 cmggcn01 kernel: [232444.645642] Code: 28 4c 89 e7 e8 84 fc ff ff 48 85 c0 49 89 c5 74 d2 f0 0f ba 28 00 19 c0 85 c0 0f 85 a1 00 00 00 48 8b 43 30 48 8b 53 38 48 85 c0 
Jan 16 12:03:42 cmggcn01 kernel: [232444.648047] RIP  [<ffffffff8114ecee>] remove_rmap_item_from_tree+0x9e/0x150
Jan 16 12:03:42 cmggcn01 kernel: [232444.648721]  RSP <ffff8817f3e1fe10>
Jan 16 12:03:42 cmggcn01 kernel: [232444.695308] ---[ end trace 1466b29b5c8949e3 ]---



Jan 18 11:52:42 cmggcn01 kernel: [89218.740228] general protection fault: 0000 [#1] SMP 
Jan 18 11:52:42 cmggcn01 kernel: [89218.740662] CPU 3 
Jan 18 11:52:42 cmggcn01 kernel: [89218.740823] Modules linked in: ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi psmouse dcdbas dm_multipath serio_raw joydev ghes hed acpi_power_meter bonding lp parport i7core_edac edac_core ses enclosure usbhid hid megaraid_sas bnx2
Jan 18 11:52:42 cmggcn01 kernel: [89218.745746] 
Jan 18 11:52:42 cmggcn01 kernel: [89218.745869] Pid: 92, comm: ksmd Not tainted 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X
Jan 18 11:52:42 cmggcn01 kernel: [89218.746672] RIP: 0010:[<ffffffff812edf23>]  [<ffffffff812edf23>] rb_insert_color+0x43/0x150
Jan 18 11:52:42 cmggcn01 kernel: [89218.747377] RSP: 0018:ffff8817f3e4ddb8  EFLAGS: 00010206
Jan 18 11:52:42 cmggcn01 kernel: [89218.748058] RAX: ffff88135bb7ffe8 RBX: ffff88135bb7ffe8 RCX: 0000000000000008
Jan 18 11:52:42 cmggcn01 kernel: [89218.748982] RDX: ffff8809af4934e8 RSI: ffffffff81ee9840 RDI: ffff88098e40d9e8
Jan 18 11:52:42 cmggcn01 kernel: [89218.749617] RBP: ffff8817f3e4dde0 R08: 0000000000000079 R09: 0000000000000029
Jan 18 11:52:42 cmggcn01 kernel: [89218.750208] R10: ffff880aac5be000 R11: 0000000000000001 R12: ffff8809af4934e8
Jan 18 11:52:42 cmggcn01 kernel: [89218.750794] R13: 00008809ae4a5ae8 R14: ffff88098e40d9e8 R15: ffffffff81ee9840
Jan 18 11:52:42 cmggcn01 kernel: [89218.751386] FS:  0000000000000000(0000) GS:ffff880c3fc20000(0000) knlGS:0000000000000000
Jan 18 11:52:42 cmggcn01 kernel: [89218.752054] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 18 11:52:42 cmggcn01 kernel: [89218.752526] CR2: 00007f819ff52000 CR3: 0000000001c03000 CR4: 00000000000026e0
Jan 18 11:52:42 cmggcn01 kernel: [89218.753116] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 18 11:52:42 cmggcn01 kernel: [89218.753708] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 18 11:52:42 cmggcn01 kernel: [89218.754296] Process ksmd (pid: 92, threadinfo ffff8817f3e4c000, task ffff8817f63a9720)
Jan 18 11:52:42 cmggcn01 kernel: [89218.754956] Stack:
Jan 18 11:52:42 cmggcn01 kernel: [89218.755120]  ffffea00255b4190 ffff88098e40d9c0 ffffea00264a9c80 ffff8809af4934f0
Jan 18 11:52:42 cmggcn01 kernel: [89218.755785]  0000000000000000 ffff8817f3e4de30 ffffffff8114e855 ffff8809af4934e8
Jan 18 11:52:42 cmggcn01 kernel: [89218.756276]  ffff8817f3e4de48 ffff8817f3e4de30 ffffea0052a42d28 ffffea00255b4190
Jan 18 11:52:42 cmggcn01 kernel: [89218.756711] Call Trace:
Jan 18 11:52:42 cmggcn01 kernel: [89218.756892]  [<ffffffff8114e855>] unstable_tree_search_insert+0xe5/0x150
Jan 18 11:52:42 cmggcn01 kernel: [89218.757499]  [<ffffffff8114f690>] cmp_and_merge_page+0x160/0x260
Jan 18 11:52:42 cmggcn01 kernel: [89218.758284]  [<ffffffff8114f83f>] ksm_scan_thread+0xaf/0x2a0
Jan 18 11:52:42 cmggcn01 kernel: [89218.759010]  [<ffffffff81081660>] ? add_wait_queue+0x60/0x60
Jan 18 11:52:42 cmggcn01 kernel: [89218.759524]  [<ffffffff8114f790>] ? cmp_and_merge_page+0x260/0x260
Jan 18 11:52:42 cmggcn01 kernel: [89218.760037]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 18 11:52:42 cmggcn01 kernel: [89218.760438]  [<ffffffff81609164>] kernel_thread_helper+0x4/0x10
Jan 18 11:52:42 cmggcn01 kernel: [89218.760927]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 18 11:52:42 cmggcn01 kernel: [89218.761439]  [<ffffffff81609160>] ? gs_change+0x13/0x13
Jan 18 11:52:42 cmggcn01 kernel: [89218.761862] Code: 0f 1f 84 00 00 00 00 00 49 83 e4 fc 74 4a 49 8b 04 24 a8 01 75 42 48 89 c3 48 83 e3 fc 4c 8b 6b 10 4d 39 e5 74 7a 4d 85 ed 74 45 
Jan 18 11:52:42 cmggcn01 kernel: [89218.764260] RIP  [<ffffffff812edf23>] rb_insert_color+0x43/0x150
Jan 18 11:52:42 cmggcn01 kernel: [89218.764772]  RSP <ffff8817f3e4ddb8>
Jan 18 11:52:42 cmggcn01 kernel: [89218.811669] ---[ end trace e5a6d0f7fdfec15f ]---



Jan 27 13:41:28 cmggcn01 kernel: [871350.761867] general protection fault: 0000 [#2] SMP 
Jan 27 13:41:28 cmggcn01 kernel: [871350.790117] CPU 14 
Jan 27 13:41:28 cmggcn01 kernel: [871350.790387] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi psmouse dcdbas dm_multipath serio_raw joydev ghes hed acpi_power_meter bonding lp parport i7core_edac edac_core ses enclosure usbhid hid megaraid_sas bnx2
Jan 27 13:41:28 cmggcn01 kernel: [871351.005151] 
Jan 27 13:41:28 cmggcn01 kernel: [871351.036187] Pid: 90, comm: kswapd0 Tainted: G      D     3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X
Jan 27 13:41:28 cmggcn01 kernel: [871351.072809] RIP: 0010:[<ffffffffa01a5890>]  [<ffffffffa01a5890>] kvm_unmap_rmapp+0x20/0x60 [kvm]
Jan 27 13:41:28 cmggcn01 kernel: [871351.105190] RSP: 0018:ffff8817f3e27a60  EFLAGS: 00010202
Jan 27 13:41:28 cmggcn01 kernel: [871351.141329] RAX: 00008817f5d067f8 RBX: ffffc9001fd41ff8 RCX: ffffffffa01a58d0
Jan 27 13:41:28 cmggcn01 kernel: [871351.179076] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00008817f5d067f8
Jan 27 13:41:28 cmggcn01 kernel: [871351.212086] RBP: ffff8817f3e27a80 R08: ffff8817f315b3e0 R09: 0000000000000100
Jan 27 13:41:28 cmggcn01 kernel: [871351.245788] R10: 000000000000000e R11: 0000000000000002 R12: ffff8817f2f0c000
Jan 27 13:41:28 cmggcn01 kernel: [871351.277514] R13: 0000000000000000 R14: ffff880be235e000 R15: 00000000000d3cff
Jan 27 13:41:28 cmggcn01 kernel: [871351.308421] FS:  0000000000000000(0000) GS:ffff88183fce0000(0000) knlGS:0000000000000000
Jan 27 13:41:28 cmggcn01 kernel: [871351.339685] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 27 13:41:28 cmggcn01 kernel: [871351.370089] CR2: 00007f8836442000 CR3: 0000000001c03000 CR4: 00000000000026e0
Jan 27 13:41:28 cmggcn01 kernel: [871351.399771] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 27 13:41:28 cmggcn01 kernel: [871351.428208] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 27 13:41:28 cmggcn01 kernel: [871351.456153] Process kswapd0 (pid: 90, threadinfo ffff8817f3e26000, task ffff8817f63ac560)
Jan 27 13:41:28 cmggcn01 kernel: [871351.484943] Stack:
Jan 27 13:41:28 cmggcn01 kernel: [871351.512984]  0000000000000000 ffffc9001fd41ff8 0000000000000001 00007f834a87e000
Jan 27 13:41:28 cmggcn01 kernel: [871351.542025]  ffff8817f3e27aa0 ffffffffa01a5945 ffff880be235e060 0000000000000001
Jan 27 13:41:28 cmggcn01 kernel: [871351.571050]  ffff8817f3e27b10 ffffffffa01a1dd9 ffff8817f3e27ae0 ffffffffa01a58d0
Jan 27 13:41:28 cmggcn01 kernel: [871351.600455] Call Trace:
Jan 27 13:41:28 cmggcn01 kernel: [871351.628903]  [<ffffffffa01a5945>] kvm_age_rmapp+0x75/0x90 [kvm]
Jan 27 13:41:28 cmggcn01 kernel: [871351.659242]  [<ffffffffa01a1dd9>] kvm_handle_hva+0x99/0x180 [kvm]
Jan 27 13:41:28 cmggcn01 kernel: [871351.687386]  [<ffffffffa01a58d0>] ? kvm_unmap_rmapp+0x60/0x60 [kvm]
Jan 27 13:41:28 cmggcn01 kernel: [871351.716053]  [<ffffffffa01a8af7>] kvm_age_hva+0x17/0x20 [kvm]
Jan 27 13:41:28 cmggcn01 kernel: [871351.746652]  [<ffffffffa018a4dd>] kvm_mmu_notifier_clear_flush_young+0x4d/0x90 [kvm]
Jan 27 13:41:28 cmggcn01 kernel: [871351.774490]  [<ffffffff8114d7b8>] __mmu_notifier_clear_flush_young+0x48/0x60
Jan 27 13:41:28 cmggcn01 kernel: [871351.801948]  [<ffffffff81138f1b>] page_referenced_one+0x18b/0x1f0
Jan 27 13:41:28 cmggcn01 kernel: [871351.827654]  [<ffffffff8113a8a5>] page_referenced_anon+0xd5/0x130
Jan 27 13:41:28 cmggcn01 kernel: [871351.852371]  [<ffffffff8113a9c8>] page_referenced+0xc8/0xf0
Jan 27 13:41:28 cmggcn01 kernel: [871351.875896]  [<ffffffff8111cbe9>] shrink_active_list.isra.50+0x1d9/0x370
Jan 27 13:41:28 cmggcn01 kernel: [871351.899681]  [<ffffffff811146cd>] ? throttle_vm_writeout+0x3d/0xa0
Jan 27 13:41:28 cmggcn01 kernel: [871351.922135]  [<ffffffff8111dd8b>] balance_pgdat+0x16b/0x6f0
Jan 27 13:41:28 cmggcn01 kernel: [871351.944598]  [<ffffffff8111e3fa>] kswapd+0xea/0x1f0
Jan 27 13:41:28 cmggcn01 kernel: [871351.967441]  [<ffffffff8111e310>] ? balance_pgdat+0x6f0/0x6f0
Jan 27 13:41:28 cmggcn01 kernel: [871351.989825]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 27 13:41:28 cmggcn01 kernel: [871352.012498]  [<ffffffff81609164>] kernel_thread_helper+0x4/0x10
Jan 27 13:41:28 cmggcn01 kernel: [871352.035112]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 27 13:41:28 cmggcn01 kernel: [871352.057481]  [<ffffffff81609160>] ? gs_change+0x13/0x13
Jan 27 13:41:28 cmggcn01 kernel: [871352.080582] Code: e7 d0 e8 e0 66 90 e9 a2 fe ff ff 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 45 31 ed 49 89 fc 48 89 f3 eb 20 0f 1f 00 <f6> 00 01 74 35 48 8b 15 74 7a 02 00 48 89 c6 4c 89 e7 41 bd 01 
Jan 27 13:41:28 cmggcn01 kernel: [871352.128618] RIP  [<ffffffffa01a5890>] kvm_unmap_rmapp+0x20/0x60 [kvm]
Jan 27 13:41:28 cmggcn01 kernel: [871352.152218]  RSP <ffff8817f3e27a60>
Jan 27 13:41:28 cmggcn01 kernel: [871352.221414] ---[ end trace e5a6d0f7fdfec160 ]---
Jan 27 13:42:28 cmggcn01 kernel: [871412.095485] INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies)
Jan 27 13:42:28 cmggcn01 kernel: [871412.095493] INFO: rcu_sched_state detected stall on CPU 17 (t=15000 jiffies)
Jan 27 13:45:29 cmggcn01 kernel: [871591.757675] INFO: rcu_sched_state detected stall on CPU 17 (t=60030 jiffies)
Jan 27 13:45:29 cmggcn01 kernel: [871591.757686] INFO: rcu_sched_state detected stall on CPU 0 (t=60030 jiffies)
Jan 27 13:48:29 cmggcn01 kernel: [871771.419867] INFO: rcu_sched_state detected stall on CPU 0 (t=105060 jiffies)
Jan 27 13:48:29 cmggcn01 kernel: [871771.419878] INFO: rcu_sched_state detected stall on CPU 17 (t=105060 jiffies)
Comment 1 Avi Kivity 2012-01-31 17:15:48 UTC
> Jan 27 13:41:28 cmggcn01 kernel: [871350.761867] general protection fault:
> 0000 [#2] SMP 
> Jan 27 13:41:28 cmggcn01 kernel: [871350.790117] CPU 14 
> Jan 27 13:41:28 cmggcn01 kernel: [871350.790387] Modules linked in: btrfs
> zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs
> reiserfs ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat
> ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp
> iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser
> rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
> libiscsi scsi_transport_iscsi psmouse dcdbas dm_multipath serio_raw joydev
> ghes hed acpi_power_meter bonding lp parport i7core_edac edac_core ses
> enclosure usbhid hid megaraid_sas bnx2
> Jan 27 13:41:28 cmggcn01 kernel: [871351.005151] 
> Jan 27 13:41:28 cmggcn01 kernel: [871351.036187] Pid: 90, comm: kswapd0
> Tainted: G      D     3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge
> R710/0MD99X
> Jan 27 13:41:28 cmggcn01 kernel: [871351.072809] RIP:
> 0010:[<ffffffffa01a5890>]  [<ffffffffa01a5890>] kvm_unmap_rmapp+0x20/0x60
> [kvm]
> Jan 27 13:41:28 cmggcn01 kernel: [871351.105190] RSP: 0018:ffff8817f3e27a60 
> EFLAGS: 00010202
> Jan 27 13:41:28 cmggcn01 kernel: [871351.141329] RAX: 00008817f5d067f8 RBX:
> ffffc9001fd41ff8 RCX: ffffffffa01a58d0
> Jan 27 13:41:28 cmggcn01 kernel: [871351.179076] RDX: 0000000000000000 RSI:
> 0000000000000000 RDI: 00008817f5d067f8
> Jan 27 13:41:28 cmggcn01 kernel: [871351.212086] RBP: ffff8817f3e27a80 R08:
> ffff8817f315b3e0 R09: 0000000000000100
> Jan 27 13:41:28 cmggcn01 kernel: [871351.245788] R10: 000000000000000e R11:
> 0000000000000002 R12: ffff8817f2f0c000
> Jan 27 13:41:28 cmggcn01 kernel: [871351.277514] R13: 0000000000000000 R14:
> ffff880be235e000 R15: 00000000000d3cff
> Jan 27 13:41:28 cmggcn01 kernel: [871351.308421] FS:  0000000000000000(0000)
> GS:ffff88183fce0000(0000) knlGS:0000000000000000
> Jan 27 13:41:28 cmggcn01 kernel: [871351.339685] CS:  0010 DS: 0000 ES: 0000
> CR0: 000000008005003b
> Jan 27 13:41:28 cmggcn01 kernel: [871351.370089] CR2: 00007f8836442000 CR3:
> 0000000001c03000 CR4: 00000000000026e0
> Jan 27 13:41:28 cmggcn01 kernel: [871351.399771] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Jan 27 13:41:28 cmggcn01 kernel: [871351.428208] DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Jan 27 13:41:28 cmggcn01 kernel: [871351.456153] Process kswapd0 (pid: 90,
> threadinfo ffff8817f3e26000, task ffff8817f63ac560)
> Jan 27 13:41:28 cmggcn01 kernel: [871351.484943] Stack:
> Jan 27 13:41:28 cmggcn01 kernel: [871351.512984]  0000000000000000
> ffffc9001fd41ff8 0000000000000001 00007f834a87e000
> Jan 27 13:41:28 cmggcn01 kernel: [871351.542025]  ffff8817f3e27aa0
> ffffffffa01a5945 ffff880be235e060 0000000000000001
> Jan 27 13:41:28 cmggcn01 kernel: [871351.571050]  ffff8817f3e27b10
> ffffffffa01a1dd9 ffff8817f3e27ae0 ffffffffa01a58d0
<snip>

> Jan 27 13:41:28 cmggcn01 kernel: [871352.080582] Code: e7 d0 e8 e0 66 90 e9
> a2 fe ff ff 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 45 31 ed 49
> 89 fc 48 89 f3 eb 20 0f 1f 00 <f6> 00 01 74 35 48 8b 15 74 7a 02 00 48 89 c6
> 4c 89 e7 41 bd 01 

   0:    e8 e0 66 90 e9           callq  0xffffffffe99066e5
   5:    a2 fe ff ff 55 48 89     mov    %al,0x41e5894855fffffe
   c:    e5 41
   e:    55                       push   %rbp
   f:    41 54                    push   %r12
  11:    53                       push   %rbx
  12:    48 83 ec 08              sub    $0x8,%rsp
  16:    66 66 66 66 90           data32 data32 data32 xchg %ax,%ax
  1b:    45 31 ed                 xor    %r13d,%r13d
  1e:    49 89 fc                 mov    %rdi,%r12
  21:    48 89 f3                 mov    %rsi,%rbx
  24:    eb 20                    jmp    0x46
  26:    0f 1f 00                 nopl   (%rax)
  29:    f6 00 01                 testb  $0x1,(%rax)

^ dies here, %rax is non-canonical.

  2c:    74 35                    je     0x63
  2e:    48 8b 15 74 7a 02 00     mov    0x27a74(%rip),%rdx        # 0x27aa9
  35:    48 89 c6                 mov    %rax,%rsi
  38:    4c 89 e7                 mov    %r12,%rdi


static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
               unsigned long data)
{
    u64 *spte;
    int need_tlb_flush = 0;

    while ((spte = rmap_next(kvm, rmapp, NULL))) {
        BUG_ON(!(*spte & PT_PRESENT_MASK));

^ here, when fetching *spte.

        rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
        drop_spte(kvm, spte);
        need_tlb_flush = 1;
    }
    return need_tlb_flush;

Looks like a use-after-free with the two bytes at offset 6 zeroed.

If this is reproducible, please rerun with the host kernel parameter
slub_debug=FZPU.
Comment 2 Gleb 2012-01-31 18:15:32 UTC
Are you using bridge and netfilter? Can you disable both? This looks similar to https://bugzilla.kernel.org/show_bug.cgi?id=27052
Comment 3 Bram De Wilde 2012-01-31 19:04:34 UTC
(In reply to comment #2)
> Are you using bridge and netfilter? Can you disable both? This looks similar
> to
> https://bugzilla.kernel.org/show_bug.cgi?id=27052

Indeed I'm running both. Running both is required to run VM's in the openstack configuration so just disabling would brake my cloud config...I guess? Any alternatives?
Have meanwhile upgraded to the 3.2.2 kernel to see if the problem persists will reboot with "slub_debug=FZPU" on next crash.

lsmod:
Module                  Size  Used by
des_generic            21415  0 
md4                    12595  0 
nls_utf8               12557  1 
cifs                  281484  2 
ebt_arp                12585  108 
ebt_ip                 12538  36 
8021q                  24151  0 
garp                   14313  1 8021q
ip6table_filter        12815  0 
ip6_tables             27617  1 ip6table_filter
ebtable_nat            12807  1 
ebtables               30966  1 ebtable_nat
ipt_MASQUERADE         12759  3 
xt_state               12578  25 
ipt_REJECT             12576  2 
xt_CHECKSUM            12549  1 
iptable_mangle         12695  1 
xt_tcpudp              12603  61 
iptable_nat            13182  1 
nf_nat                 25545  2 ipt_MASQUERADE,iptable_nat
nf_conntrack_ipv4      19588  28 iptable_nat,nf_nat
nf_conntrack           81527  5 ipt_MASQUERADE,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
iptable_filter         12810  1 
kvm_intel             136560  61 
ip_tables              27227  3 iptable_mangle,iptable_nat,iptable_filter
x_tables               29727  14 ebt_arp,ebt_ip,ip6table_filter,ip6_tables,ebtables,ipt_MASQUERADE,xt_state,ipt_REJECT,xt_CHECKSUM,iptable_mangle,xt_tcpudp,iptable_nat,iptable_filter,ip_tables
bridge                 90674  0 
kvm                   404475  1 kvm_intel
stp                    12931  2 garp,bridge
nbd                    17712  0 
ib_iser                38366  0 
rdma_cm                43625  1 ib_iser
ib_cm                  47663  1 rdma_cm
iw_cm                  18705  1 rdma_cm
ib_sa                  28854  2 rdma_cm,ib_cm
ib_mad                 47570  2 ib_cm,ib_sa
ib_core                82371  6 ib_iser,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad
ib_addr                14109  1 rdma_cm
iscsi_tcp              18447  0 
libiscsi_tcp           20862  1 iscsi_tcp
libiscsi               57321  3 ib_iser,iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi    53383  4 ib_iser,iscsi_tcp,libiscsi
ext2                   73217  1 
bonding               108597  0 
psmouse                73859  0 
dcdbas                 14438  0 
serio_raw              13211  0 
joydev                 17597  0 
i7core_edac            27864  0 
edac_core              53411  4 i7core_edac
lp                     17789  0 
dm_multipath           23141  0 
parport                46360  1 lp
mac_hid                13205  0 
acpi_power_meter       18139  0 
ses                    17385  0 
enclosure              15209  1 ses
usbhid                 46754  0 
hid                    99171  1 usbhid
megaraid_sas           87049  2 
bnx2                   85274  0
Comment 4 Alan 2012-08-30 14:21:44 UTC
Given the silence from January I assume this is fixed

Note You need to log in before you can comment on or make changes to this bug.