Bug 196717 - CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 [kvm]
Summary: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x1...
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-21 09:18 UTC by Mikhail
Modified: 2017-08-23 01:39 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.13.0-0.rc5.git4.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
demfloro's dmesg (140.38 KB, text/plain)
2017-08-22 10:20 UTC, Dmitrii Tcvetkov
Details
full dmesg from boot (2.02 MB, text/plain)
2017-08-22 15:09 UTC, Jeff Cook
Details

Description Mikhail 2017-08-21 09:18:29 UTC
[83855.778147] ------------[ cut here ]------------
[83855.778169] WARNING: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 [kvm]
[83855.778169] Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nls_utf8 isofs rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc vfat fat xfs libcrc32c snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_logitech_hidpp huawei_cdc_ncm btusb cdc_wdm cdc_ncm btrtl gspca_zc3xx gspca_main v4l2_common videodev btbcm uas option usbnet usb_storage joydev btintel usb_wwan media hid_logitech_dj
[83855.778198]  bluetooth ecdh_generic rfkill intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp btrfs kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate xor snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt snd_hda_codec_hdmi snd_hda_codec_ca0132 iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep raid6_pq snd_seq intel_uncore snd_seq_device snd_pcm intel_rapl_perf snd_timer mei_me snd mei soundcore lpc_ich i2c_i801 tpm_infineon tpm_tis shpchp parport_pc parport tpm_tis_core tpm binfmt_misc i915 i2c_algo_bit drm_kms_helper drm crc32c_intel r8169 mii video bfq
[83855.778224] CPU: 0 PID: 5405 Comm: CPU 0/KVM Not tainted 4.13.0-0.rc5.git4.1.fc28.x86_64 #1
[83855.778225] Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014
[83855.778226] task: ffff9601fc104c80 task.stack: fffface3a05a8000
[83855.778238] RIP: 0010:mmu_spte_clear_track_bits+0xe7/0x100 [kvm]
[83855.778239] RSP: 0018:fffface3a05abb08 EFLAGS: 00010246
[83855.778240] RAX: 0000000000000000 RBX: 00000004b4e50f77 RCX: ffffd27089cc0f9f
[83855.778241] RDX: 0000000000000000 RSI: ffff96016cb26280 RDI: ffffd27092d39400
[83855.778241] RBP: fffface3a05abb20 R08: 0000000000000100 R09: 000000000000000f
[83855.778242] R10: ffff9601ee550008 R11: ffff9601ee550000 R12: 00000000004b4e50
[83855.778243] R13: ffff9601fc1c0000 R14: ffff96016cb8b5e8 R15: 0000000000000000
[83855.778244] FS:  00007f4bc9745700(0000) GS:ffff96045e200000(0000) knlGS:0000000000000000
[83855.778245] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83855.778246] CR2: 00000062a3526168 CR3: 00000005bc12d000 CR4: 00000000001426f0
[83855.778247] Call Trace:
[83855.778260]  drop_spte+0x1a/0xb0 [kvm]
[83855.778270]  mmu_page_zap_pte+0x9d/0xe0 [kvm]
[83855.778280]  kvm_mmu_prepare_zap_page+0x65/0x330 [kvm]
[83855.778291]  kvm_mmu_invalidate_zap_all_pages+0xeb/0x140 [kvm]
[83855.778310]  kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 [kvm]
[83855.778322]  kvm_page_track_flush_slot+0x59/0x80 [kvm]
[83855.778332]  kvm_arch_flush_shadow_memslot+0xe/0x10 [kvm]
[83855.778342]  __kvm_set_memory_region+0x807/0x8d0 [kvm]
[83855.778352]  kvm_set_memory_region+0x2b/0x40 [kvm]
[83855.778361]  kvm_vm_ioctl+0x49b/0x8c0 [kvm]
[83855.778364]  ? __handle_mm_fault+0xb0d/0x1070
[83855.778367]  do_vfs_ioctl+0xa5/0x600
[83855.778369]  SyS_ioctl+0x79/0x90
[83855.778372]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[83855.778373] RIP: 0033:0x7f4bd8936307
[83855.778374] RSP: 002b:00007f4bc97443d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[83855.778375] RAX: ffffffffffffffda RBX: 00000000000a0000 RCX: 00007f4bd8936307
[83855.778376] RDX: 00007f4bc9744440 RSI: 000000004020ae46 RDI: 000000000000000d
[83855.778376] RBP: 00000000000a0000 R08: 000000000000da78 R09: 00000062a58061b8
[83855.778377] R10: 0000000003150000 R11: 0000000000000246 R12: 0000000000010000
[83855.778378] R13: 00000000000b0000 R14: 00000062a351b0a0 R15: 00007f4b44600000
[83855.778379] Code: 6a 04 00 48 85 c0 75 1c 4c 89 e7 e8 94 3d fe ff 48 8b 05 dd 6a 04 00 48 85 c0 74 c1 48 85 c3 0f 95 c3 eb bf 48 85 c3 74 e7 eb dd <0f> ff eb 9e 4c 89 e7 e8 6d 3d fe ff eb a4 90 66 2e 0f 1f 84 00
[83855.778404] ---[ end trace 261a9d6d5fed67c0 ]---
[83855.778416] ------------[ cut here ]------------
[83855.778425] WARNING: CPU: 0 PID: 5405 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 [kvm]
[83855.778426] Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nls_utf8 isofs rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc vfat fat xfs libcrc32c snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_logitech_hidpp huawei_cdc_ncm btusb cdc_wdm cdc_ncm btrtl gspca_zc3xx gspca_main v4l2_common videodev btbcm uas option usbnet usb_storage joydev btintel usb_wwan media hid_logitech_dj
[83855.778446]  bluetooth ecdh_generic rfkill intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp btrfs kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate xor snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt snd_hda_codec_hdmi snd_hda_codec_ca0132 iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep raid6_pq snd_seq intel_uncore snd_seq_device snd_pcm intel_rapl_perf snd_timer mei_me snd mei soundcore lpc_ich i2c_i801 tpm_infineon tpm_tis shpchp parport_pc parport tpm_tis_core tpm binfmt_misc i915 i2c_algo_bit drm_kms_helper drm crc32c_intel r8169 mii video bfq
[83855.778464] CPU: 0 PID: 5405 Comm: CPU 0/KVM Tainted: G        W       4.13.0-0.rc5.git4.1.fc28.x86_64 #1
[83855.778465] Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014
[83855.778465] task: ffff9601fc104c80 task.stack: fffface3a05a8000
[83855.778474] RIP: 0010:mmu_spte_clear_track_bits+0xe7/0x100 [kvm]
[83855.778474] RSP: 0018:fffface3a05abb08 EFLAGS: 00010246
[83855.778475] RAX: 0000000000000000 RBX: 00000004b4e51e77 RCX: dead0000000000ff
[83855.778476] RDX: 0000000000000000 RSI: ffff96016cb26288 RDI: ffffd27092d39440
[83855.778477] RBP: fffface3a05abb20 R08: 0000000000000100 R09: 000000000000000f
[83855.778477] R10: ffff9601ee550008 R11: ffff9601ee550000 R12: 00000000004b4e51
[83855.778478] R13: ffff9601fc1c0000 R14: ffff96016cb8b5e8 R15: 0000000000000000
[83855.778479] FS:  00007f4bc9745700(0000) GS:ffff96045e200000(0000) knlGS:0000000000000000
[83855.778480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83855.778481] CR2: 00000062a3526168 CR3: 00000005bc12d000 CR4: 00000000001426f0
[83855.778481] Call Trace:
[83855.778491]  drop_spte+0x1a/0xb0 [kvm]
[83855.778500]  mmu_page_zap_pte+0x9d/0xe0 [kvm]
[83855.778509]  kvm_mmu_prepare_zap_page+0x65/0x330 [kvm]
[83855.778520]  kvm_mmu_invalidate_zap_all_pages+0xeb/0x140 [kvm]
[83855.778530]  kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 [kvm]
[83855.778541]  kvm_page_track_flush_slot+0x59/0x80 [kvm]
[83855.778552]  kvm_arch_flush_shadow_memslot+0xe/0x10 [kvm]
[83855.778561]  __kvm_set_memory_region+0x807/0x8d0 [kvm]
[83855.778571]  kvm_set_memory_region+0x2b/0x40 [kvm]
[83855.778581]  kvm_vm_ioctl+0x49b/0x8c0 [kvm]
[83855.778583]  ? __handle_mm_fault+0xb0d/0x1070
[83855.778585]  do_vfs_ioctl+0xa5/0x600
[83855.778586]  SyS_ioctl+0x79/0x90
[83855.778588]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[83855.778589] RIP: 0033:0x7f4bd8936307
[83855.778590] RSP: 002b:00007f4bc97443d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[83855.778591] RAX: ffffffffffffffda RBX: 00000000000a0000 RCX: 00007f4bd8936307
[83855.778592] RDX: 00007f4bc9744440 RSI: 000000004020ae46 RDI: 000000000000000d
[83855.778592] RBP: 00000000000a0000 R08: 000000000000da78 R09: 00000062a58061b8
[83855.778593] R10: 0000000003150000 R11: 0000000000000246 R12: 0000000000010000
[83855.778594] R13: 00000000000b0000 R14: 00000062a351b0a0 R15: 00007f4b44600000
[83855.778595] Code: 6a 04 00 48 85 c0 75 1c 4c 89 e7 e8 94 3d fe ff 48 8b 05 dd 6a 04 00 48 85 c0 74 c1 48 85 c3 0f 95 c3 eb bf 48 85 c3 74 e7 eb dd <0f> ff eb 9e 4c 89 e7 e8 6d 3d fe ff eb a4 90 66 2e 0f 1f 84 00
[83855.778620] ---[ end trace 261a9d6d5fed67c1 ]---
Comment 1 Dmitrii Tcvetkov 2017-08-22 10:20:34 UTC
Created attachment 258043 [details]
demfloro's dmesg

Have same issue on linux 4.13-rc5, reproducible by starting qemu-system-x86_64 and happens during guest OS (Arch Linux) loading.

Guest hard drive is an LVM volume.
Comment 2 Jeff Cook 2017-08-22 15:03:54 UTC
I've seen this on 4.12 and now on 4.13-rc6. On 4.12, the system-wide impact is very significant and the guests and host slow down or stop responding all together. On 4.13-rc6, only one guest crashes and the rest of the system seems to continue to operate as expected.

I initially get

[68470.767034] ------------[ cut here ]------------
[68470.767064] WARNING: CPU: 30 PID: 239 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xf0/0x100 [kvm]
[...]
[68470.767237] CPU: 30 PID: 239 Comm: khugepaged Tainted: P           O    4.13.0-rc6-g14ccee78fc82 #5
[68470.767246] Hardware name: Supermicro SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016
[68470.767255] task: ffff88085b6bc9c0 task.stack: ffffc90006b80000
[68470.767267] RIP: 0010:mmu_spte_clear_track_bits+0xf0/0x100 [kvm]
[68470.767271] RSP: 0018:ffffc90006b83bd8 EFLAGS: 00010246
[68470.767275] RAX: 0000000000000000 RBX: 0000000117f13f77 RCX: dead0000000000ff
[68470.767279] RDX: 0000000000000000 RSI: ffff8802caff9140 RDI: ffffea00045fc4c0
[68470.767283] RBP: ffffc90006b83bf0 R08: 0000000000000001 R09: 0000000000000000
[68470.767287] R10: ffff8803ed9d0008 R11: ffff8803ed9d0000 R12: 0000000000117f13
[68470.767291] R13: ffff880401f10000 R14: ffffffffa07a91f0 R15: ffff8803ed9d0008
[68470.767296] FS:  0000000000000000(0000) GS:ffff88105d580000(0000) knlGS:0000000000000000
[68470.767302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68470.767306] CR2: 000000000d070004 CR3: 0000000001a09000 CR4: 00000000003426e0
[68470.767310] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[68470.767314] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[68470.767318] Call Trace:
[68470.767332]  drop_spte+0x1a/0xb0 [kvm]
[68470.767342]  kvm_zap_rmapp+0x3b/0x70 [kvm]
[68470.767352]  kvm_unmap_rmapp+0xe/0x20 [kvm]
[68470.767361]  kvm_handle_hva_range+0x139/0x1b0 [kvm]
[68470.767373]  kvm_unmap_hva_range+0x17/0x20 [kvm]
[68470.767382]  kvm_mmu_notifier_invalidate_range_start+0x52/0x90 [kvm]
[68470.767389]  __mmu_notifier_invalidate_range_start+0x55/0x80
[68470.767395]  khugepaged+0x1eb7/0x1ee0
[68470.767403]  ? wait_woken+0x80/0x80
[68470.767408]  kthread+0x125/0x140
[68470.767413]  ? khugepaged_scan_abort.part.6+0x60/0x60
[68470.767417]  ? kthread_create_on_node+0x70/0x70
[68470.767423]  ret_from_fork+0x25/0x30
[68470.767427] Code: 5f 04 00 48 85 c0 75 1c 4c 89 e7 e8 9b 2d fe ff 48 8b 05 d4 5f 04 00 48 85 c0 74 be 48 85 c3 0f 95 c3 eb bc 48 85 c3 74 e7 eb dd <0f> ff eb 9b 4c 89 e7 e8 74 2d fe ff eb a1 66 90 0f 1f 44 00 00 
[68470.767463] ---[ end trace 249e3dbfe7765567 ]---
[68470.767478] ------------[ cut here ]------------


Followed by many messages like this:

68627.864783] BUG: Bad page state in process khugepaged  pfn:126aa9
[68627.864795] page:ffffea00049aaa40 count:0 mapcount:0 mapping:          (null) index:0x1
[68627.864802] flags: 0x17fff0000000014(referenced|dirty)
[68627.864808] raw: 017fff0000000014 0000000000000000 0000000000000001 00000000ffffffff
[68627.864814] raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
[68627.864819] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
[68627.864823] bad because of flags: 0x14(referenced|dirty)

Full dmesg on rc6 forthcoming.
Comment 3 Jeff Cook 2017-08-22 15:05:30 UTC
Also, see potentially related thread at https://lkml.org/lkml/2016/12/13/667.
Comment 4 Jeff Cook 2017-08-22 15:09:22 UTC
Created attachment 258053 [details]
full dmesg from boot

dmesg from boot onward, demonstrates this error condition. I've run into this multiple times on 4.12.3, 4.12.5, 4.13-rc5+git and 4.13-rc6. It does not seem to occur on 4.11. On 4.13-rc5+git and 4.13-rc6, system impact is greatly reduced (one QEMU guest remains hung/broken, instead of entire system).

I am using PCI passthrough, etc., as the dmesg shows.
Comment 5 Jeff Cook 2017-08-22 15:14:25 UTC
More similarities. 1. this seems to affect my Arch Linux guest, as the OP also noted. Ubuntu 16.04 and Windows guests remain unharmed on 4.13-rc6. It does not occur on boot for me, but may involve some interaction between KVM and later kernels.

and 2. I am hosting my VM out of an LVM thin volume.

Sorry to spam the bug.
Comment 6 Jeff Cook 2017-08-23 01:39:21 UTC
After a little bit of digging, it seems that the patchset "KVM: nVMX: nested EPT improvements and A/D bits, RDRAND and RDSEED exits" (described at https://lkml.org/lkml/2017/3/8/586) is the likely origin point of this error. This was first merged in 4.12, which is when I first started encountering this error.

Something about the "accessed_dirty" flag causes the warning on line 717 of arch/x86/kvm/mmu.c to trigger:

    WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));

and then later, the system refuses to allocate the memory requested because when checking if the page is safe to use, it encounters a dirty flag:

    [94449.442437] BUG: Bad page state in process makepkg  pfn:2a401a
    [94449.442447] flags: 0x17fff0000000014(referenced|dirty)
    [...]
    [94449.442462] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
    [94449.442465] bad because of flags: 0x14(referenced|dirty)
    [...]
    [94449.442615] Call Trace:
    [...]
    [94449.442907]  bad_page+0xce/0x130
    [94449.442912]  check_new_page_bad+0x67/0x80
    [94449.442916]  get_page_from_freelist+0x979/0xad0
    [...]

As the diff for kvm/mmu.c between 4.11 and 4.12 is quite small, it seems likely that this change is the origin point of the bug.

Perhaps something in along the way has not been updated to account for this flag? I would revert to test, but it appears that this patchset has grown several dependents. If someone wants to suggest a series of commits to revert or a patch to test, I am happy to try that.

Note You need to log in before you can comment on or make changes to this bug.