Bug 68051 - Kernel stack trace when attempt to boot into 3.13.0-0.rc6.git0.1 results in "kernel BUG at mm/page_alloc.c:2788"
Summary: Kernel stack trace when attempt to boot into 3.13.0-0.rc6.git0.1 results in "...
Status: RESOLVED CODE_FIX
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-02 14:35 UTC by Kashyap Chamarthy
Modified: 2014-06-20 04:47 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.13.0-0.rc6.git0.1.fc21.x86_64+debug
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Complete stdout of dmesg (66.98 KB, text/plain)
2014-01-02 14:36 UTC, Kashyap Chamarthy
Details
Complete stdout of dmidecode (14.28 KB, text/plain)
2014-01-02 14:37 UTC, Kashyap Chamarthy
Details

Description Kashyap Chamarthy 2014-01-02 14:35:23 UTC
Description of problem
----------------------

Booting into 3.13.0-0.rc6.git0.1.fc21.x86_64+debug Kernel causes the below Stack trace (refer further below for complete trace):
========
.
.
.
Jan  2 13:17:59 tesla kernel: [54532.561299] ------------[ cut here ]------------
Jan  2 13:17:59 tesla kernel: [54532.561342] kernel BUG at mm/page_alloc.c:2788!
Jan  2 13:17:59 tesla kernel: [54532.561357] invalid opcode: 0000 [#1] SMP 
.
.
.
Jan  2 13:17:59 tesla kernel: [54532.562180]  [<ffffffffa0766f66>] free_loaded_vmcs+0x26/0x30 [kvm_intel]
Jan  2 13:17:59 tesla kernel: [54532.562203]  [<ffffffffa076dea3>] vmx_free_vcpu+0x33/0x70 [kvm_intel]
Jan  2 13:17:59 tesla kernel: [54532.562232]  [<ffffffffa05c2700>] kvm_arch_vcpu_free+0x50/0x60 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562259]  [<ffffffffa05c3312>] kvm_arch_destroy_vm+0x102/0x240 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562281]  [<ffffffff810ed77d>] ? synchronize_srcu+0x1d/0x20
Jan  2 13:17:59 tesla kernel: [54532.562300]  [<ffffffff811d3a8e>] ? mmu_notifier_unregister+0xee/0x130
Jan  2 13:17:59 tesla kernel: [54532.562331]  [<ffffffffa05a8be1>] kvm_put_kvm+0xe1/0x190 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562353]  [<ffffffffa05a8cc8>] kvm_vcpu_release+0x18/0x20 [kvm]
.
.
.
========



Version info
-------------

  $ uname -r; rpm -q qemu-system-x86
  3.13.0-0.rc6.git0.1.fc21.x86_64+debug
  qemu-system-x86-1.7.0-3.fc21.x86_64


Stack trace from logs
---------------------

========
.
.
.
Jan  2 13:17:59 tesla kernel: [54532.561299] ------------[ cut here ]------------
Jan  2 13:17:59 tesla kernel: [54532.561342] kernel BUG at mm/page_alloc.c:2788!
Jan  2 13:17:59 tesla kernel: [54532.561357] invalid opcode: 0000 [#1] SMP 
Jan  2 13:17:59 tesla kernel: [54532.561374] Modules linked in: cdc_acm usb_storage vfat fat mmc_block vhost_net vhost macvtap macvlan xt_CHECKSUM tun
 bridge stp llc ebtable_nat fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_secur
ity ip6table_raw ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack_
ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables openvswitch vxlan ip_tunnel gre libcrc32c dm_crypt iT
CO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel microcode snd_hd
a_codec_hdmi snd_hda_codec_conexant uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core serio_raw videodev media arc4 btusb iwldvm bluetooth ma
c80211 snd_hda_intel snd_hda_codec iwlwifi snd_hwdep snd_seq sdhci_pci lpc_ich sdhci snd_seq_device i2c_i801 mfd_core mmc_core cfg80211 snd_pcm e1000e
 ptp pps_core snd_page_alloc mei_me shpchp snd_timer mei wmi thinkpad_acpi snd soundcore rfkill binfmt_misc uinput i915 i2c_algo_bit drm_kms_helper dr
m i2c_core video
Jan  2 13:17:59 tesla kernel: [54532.561783] CPU: 0 PID: 3158 Comm: qemu-system-x86 Not tainted 3.13.0-0.rc6.git0.1.fc21.x86_64+debug #1
Jan  2 13:17:59 tesla kernel: [54532.561817] Hardware name: LENOVO 4291IQ1/4291IQ1, BIOS 8DET63WW (1.33 ) 07/19/2012
Jan  2 13:17:59 tesla kernel: [54532.561840] task: ffff8801e2f325f0 ti: ffff8801df3fe000 task.ti: ffff8801df3fe000
Jan  2 13:17:59 tesla kernel: [54532.561863] RIP: 0010:[<ffffffff8118a21d>]  [<ffffffff8118a21d>] free_pages+0x6d/0x70
Jan  2 13:17:59 tesla kernel: [54532.561891] RSP: 0018:ffff8801df3ffb78  EFLAGS: 00010246
Jan  2 13:17:59 tesla kernel: [54532.561907] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000001
Jan  2 13:17:59 tesla kernel: [54532.561929] RDX: 6b6b6b6beb6b6b6b RSI: 0000000000000000 RDI: 6b6be36b6b6b6b6b
Jan  2 13:17:59 tesla kernel: [54532.561950] RBP: ffff8801df3ffb88 R08: ffffea0002edce60 R09: 0000000000000000
Jan  2 13:17:59 tesla kernel: [54532.561971] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
Jan  2 13:17:59 tesla kernel: [54532.561993] R13: ffff8801d79002c0 R14: ffff8801d79002d8 R15: ffff8801d79030c0
Jan  2 13:17:59 tesla kernel: [54532.562014] FS:  00007ffc1d663700(0000) GS:ffff880214c00000(0000) knlGS:0000000000000000
Jan  2 13:17:59 tesla kernel: [54532.562038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan  2 13:17:59 tesla kernel: [54532.562055] CR2: 00007fdf33f94100 CR3: 0000000001c0c000 CR4: 00000000000427e0
Jan  2 13:17:59 tesla kernel: [54532.562076] Stack:
Jan  2 13:17:59 tesla kernel: [54532.562084]  00000000df3ffb88 ffff880009390638 ffff8801df3ffba0 ffffffffa0766f66
Jan  2 13:17:59 tesla kernel: [54532.562113]  ffff8800af0d8000 ffff8801df3ffbb8 ffffffffa076dea3 ffff8800af0d8000
Jan  2 13:17:59 tesla kernel: [54532.562140]  ffff8801df3ffbd0 ffffffffa05c2700 ffff8801d7900000 ffff8801df3ffc20
Jan  2 13:17:59 tesla kernel: [54532.562167] Call Trace:
Jan  2 13:17:59 tesla kernel: [54532.562180]  [<ffffffffa0766f66>] free_loaded_vmcs+0x26/0x30 [kvm_intel]
Jan  2 13:17:59 tesla kernel: [54532.562203]  [<ffffffffa076dea3>] vmx_free_vcpu+0x33/0x70 [kvm_intel]
Jan  2 13:17:59 tesla kernel: [54532.562232]  [<ffffffffa05c2700>] kvm_arch_vcpu_free+0x50/0x60 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562259]  [<ffffffffa05c3312>] kvm_arch_destroy_vm+0x102/0x240 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562281]  [<ffffffff810ed77d>] ? synchronize_srcu+0x1d/0x20
Jan  2 13:17:59 tesla kernel: [54532.562300]  [<ffffffff811d3a8e>] ? mmu_notifier_unregister+0xee/0x130
Jan  2 13:17:59 tesla kernel: [54532.562331]  [<ffffffffa05a8be1>] kvm_put_kvm+0xe1/0x190 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562353]  [<ffffffffa05a8cc8>] kvm_vcpu_release+0x18/0x20 [kvm]
Jan  2 13:17:59 tesla kernel: [54532.562372]  [<ffffffff811ff525>] __fput+0xf5/0x2c0
Jan  2 13:17:59 tesla kernel: [54532.562388]  [<ffffffff811ff73e>] ____fput+0xe/0x10
Jan  2 13:17:59 tesla kernel: [54532.562403]  [<ffffffff8109ca04>] task_work_run+0xb4/0xe0
Jan  2 13:17:59 tesla kernel: [54532.562421]  [<ffffffff81077664>] do_exit+0x2e4/0xcf0
Jan  2 13:17:59 tesla kernel: [54532.562437]  [<ffffffff810780fc>] do_group_exit+0x4c/0xc0
Jan  2 13:17:59 tesla kernel: [54532.562454]  [<ffffffff8108ae11>] get_signal_to_deliver+0x2d1/0x930
Jan  2 13:17:59 tesla kernel: [54532.562474]  [<ffffffff81019518>] do_signal+0x48/0x610
Jan  2 13:17:59 tesla kernel: [54532.562491]  [<ffffffff811018a6>] ? do_futex+0xe6/0xd00
Jan  2 13:17:59 tesla kernel: [54532.562509]  [<ffffffff810ab2df>] ? finish_task_switch+0x3f/0x120
Jan  2 13:17:59 tesla kernel: [54532.562528]  [<ffffffff810d1c8d>] ? trace_hardirqs_on+0xd/0x10
Jan  2 13:17:59 tesla kernel: [54532.563423]  [<ffffffff81019b50>] do_notify_resume+0x70/0xa0
Jan  2 13:17:59 tesla kernel: [54532.564274]  [<ffffffff81771562>] int_signal+0x12/0x17
Jan  2 13:17:59 tesla kernel: [54532.565134] Code: 00 00 00 ea ff ff 48 01 d3 48 0f 42 05 0d 8e a8 00 48 01 c3 48 c1 eb 0c 48 c1 e3 06 48 01 df e8 2a ff ff ff 48 83 c4 08 5b 5d c3 <0f> 0b 90 66 66 66 66 90 55 48 85 ff 48 89 e5 41 55 49 89 fd 41 
Jan  2 13:17:59 tesla kernel: [54532.567034] RIP  [<ffffffff8118a21d>] free_pages+0x6d/0x70
Jan  2 13:17:59 tesla kernel: [54532.567926]  RSP <ffff8801df3ffb78>
Jan  2 13:17:59 tesla kernel: [54532.572384] ---[ end trace ae34a7cb7620f953 ]---
Jan  2 13:17:59 tesla kernel: [54532.572389] Fixing recursive fault but reboot is needed!
========
Comment 1 Kashyap Chamarthy 2014-01-02 14:36:35 UTC
Created attachment 120651 [details]
Complete stdout of dmesg
Comment 2 Kashyap Chamarthy 2014-01-02 14:37:09 UTC
Created attachment 120661 [details]
Complete stdout of dmidecode
Comment 3 Kashyap Chamarthy 2014-01-02 18:22:37 UTC
I should add some more context: I noticed this issue when I found an un-killable defunct qemu process (I ended up with it when I tried to force power-off the guest via `virsh`), and I had to reboot the host.

  $ virsh destroy ostack-compute 
  error: Failed to destroy domain 3
  error: Failed to terminate process 3152 with SIGKILL: Device or resource busy

NOTE - The disk image is *not* on NFS mount. It's on local disk:

 $ virsh domblklist ostack-compute
  Target     Source
  ------------------------------------------------
  vda        /home/kashyap/vmimages/ostack-compute.qcow2


Try to destroy the guest again, with LIBVIRT_DEBUG enabled, that's what I see in a loop:
=============
.
.
.
2014-01-02 12:42:54.012+0000: 27462: debug : virKeepAliveCheckMessage:395 : Got keepalive request from client 0x7f4dfaca75d0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetMessageNew:44 : msg=0x7f4dfaca7770 tracked=0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetMessageEncodePayloadEmpty:479 : Encode length as 28
2014-01-02 12:42:54.012+0000: 27462: debug : virKeepAliveMessage:101 : Sending keepalive response to client 0x7f4dfaca75d0
2014-01-02 12:42:54.012+0000: 27462: debug : virKeepAliveMessage:104 : RPC_KEEPALIVE_SEND: ka=0x7f4dfaca78c0 client=0x7f4dfaca75d0 prog=1801807216 vers=1 proc=2
2014-01-02 12:42:54.012+0000: 27462: debug : virNetClientQueueNonBlocking:1926 : RPC_CLIENT_MSG_TX_QUEUE: client=0x7f4dfaca75d0 len=28 prog=1801807216 vers=1 proc=2 type=2 status=0 serial=0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetClientCallNew:1905 : New call 0x7f4dfaca7ae0: msg=0x7f4dfaca7770, expectReply=0, nonBlock=1
2014-01-02 12:42:54.012+0000: 27462: debug : virNetMessageClear:55 : msg=0x7f4dfaca7638 nfds=0
2014-01-02 12:42:54.012+0000: 27462: debug : virNetClientIOEventLoopRemoveDone:1379 : Removing completed call 0x7f4dfaca7ae0
2014-01-02 12:42:59.172+0000: 27462: debug : virNetMessageDecodeLength:149 : Got length, now need 28 total (24 more)
2014-01-02 12:42:59.172+0000: 27462: debug : virNetClientCallDispatch:1123 : RPC_CLIENT_MSG_RX: client=0x7f4dfaca75d0 len=28 prog=1801807216 vers=1 proc=1 type=2 status=0 serial=0
2014-01-02 12:42:59.172+0000: 27462: debug : virKeepAliveCheckMessage:374 : ka=0x7f4dfaca78c0, client=0x7f4dfaca75d0, msg=0x7f4dfaca7638
2014-01-02 12:42:59.172+0000: 27462: debug : virKeepAliveCheckMessage:391 : RPC_KEEPALIVE_RECEIVED: ka=0x7f4dfaca78c0 client=0x7f4dfaca75d0 prog=1801807216 vers=1 proc=1
.
.
.
=============

And:

  $ ps -ef | grep qemu
  qemu      3152     1 30  2013 ?        13:30:42 [qemu-system-x86] <defunct>

  $ pstree 3152
  qemu-system-x86───{qemu-system-x86}


Libvirt & QEMU version on the host:

  $ rpm -q libvirt-daemon-kvm qemu-system-x86
  libvirt-daemon-kvm-1.2.0-1.fc21.x86_64
  qemu-system-x86-1.7.0-3.fc21.x86_64
Comment 4 Marcelo Tosatti 2014-01-08 21:13:25 UTC
Fixed by http://comments.gmane.org/gmane.comp.emulators.kvm.devel/117837.
Comment 5 xerofoify 2014-06-20 04:05:21 UTC
Seems fixed after reading the comments here. Can you close this 
as the bug seems to be close still in June of 2014.
Thanks Nick
Comment 6 Kashyap Chamarthy 2014-06-20 04:47:34 UTC
Thanks Marcelo, Xerofoify. I missed to update this. Yes, this can be closed. With current Fedora Rawhide kernels (3.15.0-0.rc7.git4.2.fc21.x86_64) works in my current nested environment.

For reference, here's the associated Fedora Kernel bug (that's closed) -- https://bugzilla.redhat.com/show_bug.cgi?id=1047892

Note You need to log in before you can comment on or make changes to this bug.