|Summary:||VM performance degradation after KVM QEMU migration or save/restore with Intel EPT enabled|
Description Chris 2013-05-24 15:34:26 UTC
Overview: Once a VM has been migrated to another hypervisor or the VM has been saved and restored the performance of the VM will be immediately impacted and as a result disk access is slower and has an increased latency. This only effects KVM QEMU hypervisors with Intel EPT capable CPUs and since Kernel 3.0 and higher with EPT enabled in the kvm_intel kernel module. Test Setup: Hypervisor with an Intel CPU that has EPT feature is required. (reproduced with Fedora and Ubuntu Distros) Hypervisor -Ubuntu 12.04 (with any Kernel 3.0-3.9) Guest -Ubuntu 12.04 Steps to Reproduce: Save/restore procedure on a single hypervisor: -using virsh to manage VMs -create a running VM -save VMs running state ("virsh save <domid> savefile") -restore VMs running state ("virsh restore savefile") Alternative reproduction to above is using the virsh livemigration or migration option will also reproduce this bug. Actual Results: Guest VM IO intensive applications perform slower. Expected Results: Guest VM IO performance consistent before and after save/restore. Build Date & Platform: -Hypervisor - Ubuntu 12.04 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux -Guest - Ubuntu 12.04 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Additional Builds and Platforms: Ubuntu 12.04, 13.04 with kernels 3.0-3.9 Fedora 18 (stock kernel) Doesn't occur with tested Kernels 2.8.32, 2.8.39 squeezed into Ubuntu 12.04 (Fedora not tested with 2.8 kernels) Additional Information: Performance can be measured using various tools and benchmarks. -lmbench wil show latencies -some timed compilation benchmarks -some disk benchmarks Here are some examples of my before and after benchmarks. LMBENCH : Before: Simple read: 0.1356 microseconds Simple write: 0.1086 microseconds Simple open/close: 1.0265 microseconds After: Simple read: 0.2125 microseconds Simple write: 0.1913 microseconds Simple open/close: 1.4482 microseconds PostMark: Before: 2808 After: 1893
Comment 1 Chris 2013-05-27 18:37:46 UTC
Apologies, mentions of Kernel 2.8.x above should be 2.6.
Comment 2 Paolo Bonzini 2013-11-27 14:38:26 UTC
This is a QEMU bug. It is fixed by commit fc1c4a5 (migration: drop MADVISE_DONT_NEED for incoming zero pages, 2013-10-24) which will be in QEMU 1.7. Distros can backport that commit to QEMU 1.6. It cannot be applied alone to QEMU 1.5 and older.