Bug 58771

Summary: VM performance degradation after KVM QEMU migration or save/restore with Intel EPT enabled
Product: Virtualization Reporter: Chris (ccormier)
Component: kvmAssignee: virtualization_kvm
Severity: high CC: bonzini
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0+ Tree: Mainline
Regression: Yes

Description Chris 2013-05-24 15:34:26 UTC
Once a VM has been migrated to another hypervisor or the VM has been saved and restored the performance of the VM will be immediately impacted and as a result disk access is slower and has an increased latency.

This only effects KVM QEMU hypervisors with Intel EPT capable CPUs and since Kernel 3.0 and higher with EPT enabled in the kvm_intel kernel module. 

Test Setup:
Hypervisor with an Intel CPU that has EPT feature is required.
(reproduced with Fedora and Ubuntu Distros)

-Ubuntu 12.04 (with any Kernel 3.0-3.9)
-Ubuntu 12.04

Steps to Reproduce:
Save/restore procedure on a single hypervisor:
-using virsh to manage VMs
-create a running VM
-save VMs running state ("virsh save <domid> savefile")
-restore VMs running state ("virsh restore savefile")

Alternative reproduction to above is using the virsh livemigration or migration option will also reproduce this bug.

Actual Results:
Guest VM IO intensive applications perform slower.

Expected Results:
Guest VM IO performance consistent before and after save/restore.

Build Date & Platform:
-Hypervisor - Ubuntu 12.04 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
-Guest - Ubuntu 12.04 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Additional Builds and Platforms:
Ubuntu 12.04, 13.04 with kernels 3.0-3.9
Fedora 18 (stock kernel)
Doesn't occur with tested Kernels 2.8.32, 2.8.39 squeezed into Ubuntu 12.04 (Fedora not tested with 2.8 kernels)

Additional Information:
Performance can be measured using various tools and benchmarks.
-lmbench wil show latencies
-some timed compilation benchmarks
-some disk benchmarks

Here are some examples of my before and after benchmarks.
Simple read: 0.1356 microseconds
Simple write: 0.1086 microseconds
Simple open/close: 1.0265 microseconds
Simple read: 0.2125 microseconds
Simple write: 0.1913 microseconds
Simple open/close: 1.4482 microseconds

Before: 2808
After: 1893
Comment 1 Chris 2013-05-27 18:37:46 UTC
Apologies, mentions of Kernel 2.8.x above should be 2.6.
Comment 2 Paolo Bonzini 2013-11-27 14:38:26 UTC
This is a QEMU bug.  It is fixed by commit fc1c4a5 (migration: drop MADVISE_DONT_NEED for incoming zero pages, 2013-10-24) which will be in QEMU 1.7.

Distros can backport that commit to QEMU 1.6.  It cannot be applied alone to QEMU 1.5 and older.