Bug 104091

Summary: [bisected] Starting a VM causes the host to halt and create Machine Check Exceptions
Product: Virtualization Reporter: Michael Long (harn-solo)
Component: kvmAssignee: virtualization_kvm
Status: NEW ---    
Severity: normal CC: frederik.schwan, v12aml
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg output after starting the VM
VM configuration of the VM causing the freeze
VM configuration of the VM that is still working

Description Michael Long 2015-09-06 10:03:56 UTC
Created attachment 186851 [details]
dmesg output after starting the VM

With kernel 4.2, starting one of my VMs instantly freezes the host system and creates Machine Check Exceptions on CPUs dedicated to that particula VM:

[12316.171917] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 17: be2000000003110a
[12316.171917] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff813217fd> {intel_idle+0xbd/0x120}
[12316.171917] mce: [Hardware Error]: TSC 76fd7352bf6 ADDR fa137140 MISC 30f0083884509086 
[12316.171917] mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1441130705 SOCKET 0 APIC 6 microcode 2d
[12316.171917] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
...

A bisection revealed that commit fd717f11015f673487ffc826e59b2bad69d20fe5 introduced the problem:

KVM: x86: apply guest MTRR virtualization on host reserved pages

Currently guest MTRR is avoided if kvm_is_reserved_pfn returns true.
However, the guest could prefer a different page type than UC for
such pages. A good example is that pass-throughed VGA frame buffer is
not always UC as host expected.

This patch enables full use of virtual guest MTRRs.

One could argue that the following warning is an obvious hint
[12311.584431] pmd_set_huge: Cannot satisfy [mem 0x383fe0000000-0x383fe0200000] with a huge-page mapping due to MTRR override.

but I'm able to run another VM without problems despite that warning.

Please let me know I you need additional information.
Comment 1 Michael Long 2015-09-06 10:05:51 UTC
Created attachment 186861 [details]
VM configuration of the VM causing the freeze
Comment 2 Michael Long 2015-09-06 10:08:36 UTC
Created attachment 186871 [details]
VM configuration of the VM that is still working
Comment 3 Michael Long 2015-09-08 04:13:27 UTC
mcelog: Family 6 Model 3f CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 17 
MISC 4f0083884501086 ADDR fa000200 
TIME 1441663568 Tue Sep  8 00:06:08 2015
MCG status:
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS be2000000003110a MCGSTATUS 0
MCGCAP 7000c16 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 63
Comment 4 frederik 2015-10-01 11:03:05 UTC
Seems to be related to my problem. Please try to set the cores of the freezing VM to 2 or less.