Bug 43501 - all 32bit binaries produce "Illegal Instruction" after KVM migration from AMD -> Intel host
Summary: all 32bit binaries produce "Illegal Instruction" after KVM migration from AMD...
Status: RESOLVED INVALID
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 high
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-17 01:04 UTC by Paul Zimdars
Modified: 2012-08-29 14:22 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32-220.17.1.el6.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Paul Zimdars 2012-06-17 01:04:01 UTC
We recently added an AMD KVM host (AMD Opteron 6276) to our existing Intel (E5630) only KVM host pool. We migrated multiple VMs (Linux CentOS 5.8 / 6.1 / 6.2) guests from the Intel hosts to the AMD host without issue. We rebooted multiple VM guests (5.8 / 6.2) on the AMD host and migrated them back to the Intel hosts. We then noticed that certain programs stopped working. We isolated the issue to all 32bit binaries failing to run with "Illegal Instruction" errors and we also receive "trap invalid opcode" on the VM guest console:

glibc_post_upgr[1812] trap invalid opcode ip:ae1423 sp:ff9e8d28 error:0
pango-querymodu[1825] trap invalid opcode ip:93c423 sp:ffe19604 error:0
java[1877] trap invalid opcode ip:f77a2423 sp:fffecb18 error:0
[..snip..]
(all 32bit applications)

If we migrate the VM guest back to the AMD host we no longer encounter any issues. If we reboot the failing VM guest(s) on the Intel host (after migrating from the AMD host) the 32bit binaries work. We attempted to remove CPU features that are presented to the KVM guests but still encounter the problem. Here is the flags we have enabled on the guest:

flags           : fpu de tsc msr pae cx8 apic cmov clflush fxsr sse sse2 lm unfair_spinlock hypervisor lahf_lm

This is reproducible every time. We attempted to use different OS installs (CentOS 5.8, RedHat 6.2, SL 6.2) but the issue is present on all of them (including their respective kernels). We also attempted to use different <models> and we custom created our own model. Nothing made a difference.
Comment 1 Avi Kivity 2012-06-17 09:56:57 UTC
What host kernel are you using?
Comment 2 Avi Kivity 2012-06-17 10:05:40 UTC
Try the following command line (on both hosts)

  qemu -cpu phenom,vendor=AuthenticAMD

Note that 32-on-64 applications will suffer a performance penalty when using cross-vendor migrations.  You can mitigate this by using vdso32=0 on the guest kernel command line (but this is slower than the default, when not doing cross-vendor migration).
Comment 3 Paul Zimdars 2012-06-17 23:28:24 UTC
Trying the above options results in us being unable to start the guest on the AMD host in order to test migration. It complains that guest cpu is not compatible with host CPU.

The only method that works is by creating our own profile and adding it to cpu-model definition on both the AMD and Intel hosts:

[cpudef]
   name = "cpu64-rhel6-dsio"
   level = "4"
   vendor = "AuthenticAMD"
   family = "6"
   model = "13"
   stepping = "3"
   feature_edx = "sse2 sse fxsr mmx clflush pse36 cmov mca pge mtrr apic cx8 mce pae msr tsc pse de fpu"
   feature_ecx = "cx16"
   extfeature_edx = "lm fxsr mmx nx cmov pge syscall apic cx8 mce pae msr tsc pse de fpu"
   extfeature_ecx = "lahf_lm"
   xlevel = "0x8000000A"
   model_id = "QEMU Virtual CPU version (cpu64-rhel6)"

We can then migrate the VM but we still encounter the same issue.
Comment 4 Paul Zimdars 2012-06-18 02:52:06 UTC
(In reply to comment #1)
> What host kernel are you using?

2.6.32-220.17.1.el6.x86_64 (SL 6.2)
Comment 5 Avi Kivity 2012-06-18 11:22:21 UTC
I'm not sure that the vendor string is passed correctly.

Please verify that when starting the guest on an Intel host, /proc/cpuinfo shows AuthenticAMD for vendor_id.
Comment 6 Avi Kivity 2012-06-18 12:44:18 UTC
(In reply to comment #4)
> (In reply to comment #1)
> > What host kernel are you using?
> 
> 2.6.32-220.17.1.el6.x86_64 (SL 6.2)

This bugzilla is for upstream kernels.  Use the vendor bugzilla for vendor kernels.

Note You need to log in before you can comment on or make changes to this bug.