Bug 25262 - Can't run qemu-kvm with recent kernels in PAE mode on AMD
Summary: Can't run qemu-kvm with recent kernels in PAE mode on AMD
Status: RESOLVED CODE_FIX
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: Avi Kivity
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-20 01:25 UTC by Dan H
Modified: 2011-02-01 18:23 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.36.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Dan H 2010-12-20 01:25:27 UTC
Overview:

Can't run qemu-kvm with recent kernels in PAE mode on AMD. Disabling KVM in PAE works. Enabling KVM on non-PAE works.


Steps to Reproduce:

1. Boot kernel 2.6.36.2 with PAE on AMD (see hardware info below), load kvm_amd module.

2. Run "qemu -hda linux-0.2.img -enable-kvm". This is the QEMU Linux test image.


Actual Results:

Right after the "Uncompressing Linux..." message the VM immediately reboots, and loops rebooting. Change "-enable-kvm" to "-no-kvm" in (2) and it works fine.


Expected Results:

Boot.


Build Date & Platform:

Package "qemu-kvm" info:
Name           : qemu-kvm
Version        : 0.13.0-1
URL            : http://www.linux-kvm.org
Licenses       : GPL2  LGPL2.1
Groups         : None
Provides       : qemu
Depends On     : libjpeg  libpng  libsasl  curl  sdl  alsa-lib  esound  gnutls>=2.4.1  bluez  vde2  util-linux-ng
Optional Deps  : None
Required By    : qemu-launcher
Conflicts With : qemu
Replaces       : kvm
Installed Size : 6484.00 K
Packager       : Tobias Powalowski <tpowa@archlinux.org>
Architecture   : i686
Build Date     : Sun 31 Oct 2010 03:36:52 AM CDT
Install Date   : Sun 28 Nov 2010 02:39:39 AM CST
Install Reason : Explicitly installed
Install Script : Yes
Description    : Latest KVM QEMU is a generic and open source processor emulator which achieves a good emulation speed by using dynamic translation.

$ uname -a
Linux xxx.yyy.com 2.6.36-pae #1 SMP PREEMPT Wed Dec 15 18:48:58 CST 2010 i686 AMD Turion(tm) II Dual-Core Mobile M500 AuthenticAMD GNU/Linux

Package "kernel26-pae" info:
Name           : kernel26-pae
Version        : 2.6.36.2-1
URL            : http://www.kernel.org
Licenses       : GPL2
Groups         : base
Provides       : None
Depends On     : coreutils  linux-firmware  module-init-tools  mkinitcpio>=0.5.20
Optional Deps  : crda: to set the correct wireless channels of your country
Required By    : None
Conflicts With : None
Replaces       : kernel24  kernel24-scsi  kernel26-scsi  alsa-driver  ieee80211  hostap-driver26  pwc  nforce  squashfs  unionfs  ivtv  zd1211  kvm-modules  iwlwifi
                 rt2x00-cvs  gspcav1  atl2  wlan-ng26  rt2500  nouveau-drm
Installed Size : 89932.00 K
Packager       : Dan Higgins <xxx@yyy.com>
Architecture   : i686
Build Date     : Wed 15 Dec 2010 06:56:43 PM CST
Install Date   : Wed 15 Dec 2010 07:49:05 PM CST
Install Reason : Explicitly installed
Install Script : Yes
Description    : The Linux Kernel and modules with PAE support (HIGHMEM64G)


Additional Builds and Platforms:

1. From after kernel 2.6.33 (approximate, not sure) to around 2.6.36.1, neither kvm_intel nor kvm_amd would work (I have 2 different machines). 

2. As of 2.6.36.2, kvm_intel seems to run ok (at least with my VMs) with PAE+KVM.


Additional Information:

1. AFTER 2.6.33 but BEFORE 2.6.36.1, with kvm_amd or kvm_intel, it looked like right after POST the screen would just stay black with a blinking text cursor at top-left, with CPU at 100%.

2. At 2.6.36.1, a Windows XP SP2 32-bit image says "Booting from Hard Disk..." then immediately it reboots. On the QEMU Linux Test image (linux-0.2.img), right after "Uncompressing Linux..." immediately it reboots.

3. My system is Arch Linux with all latest updates as of this writing. My PAE kernel is from a trustworthy Arch repository, based on the vanilla non-PAE kernel that Arch provides, with only the CONFIG_HIGHMEM64G compile option turned on.

4. Hardware info.
$ cat /proc/cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 16
model        : 6
model name    : AMD Turion(tm) II Dual-Core Mobile M500
stepping    : 2
cpu MHz        : 2194.355
cache size    : 512 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 2
apicid        : 0
initial apicid    : 0
fdiv_bug    : no
hlt_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 5
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc extd_apicid 
pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm 
sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock 
nrip_save
bogomips    : 4390.57
clflush size    : 64
cache_alignment    : 64
address sizes    : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor    : 1
vendor_id    : AuthenticAMD
cpu family    : 16
model        : 6
model name    : AMD Turion(tm) II Dual-Core Mobile M500
stepping    : 2
cpu MHz        : 2194.355
cache size    : 512 KB
physical id    : 0
siblings    : 2
core id        : 1
cpu cores    : 2
apicid        : 1
initial apicid    : 1
fdiv_bug    : no
hlt_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 5
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc extd_apicid 
pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm 
sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock 
nrip_save
bogomips    : 4390.69
clflush size    : 64
cache_alignment    : 64
address sizes    : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

$ cat /proc/meminfo 
MemTotal:        3859204 kB
MemFree:          183552 kB
Buffers:          172164 kB
Cached:          3024792 kB
SwapCached:            0 kB
Active:           598264 kB
Inactive:        2989492 kB
Active(anon):     304736 kB
Inactive(anon):    93928 kB
Active(file):     293528 kB
Inactive(file):  2895564 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       3017480 kB
HighFree:          99512 kB
LowTotal:         841724 kB
LowFree:           84040 kB
SwapTotal:       4000148 kB
SwapFree:        4000148 kB
Dirty:                48 kB
Writeback:             0 kB
AnonPages:        390652 kB
Mapped:            74852 kB
Shmem:              7864 kB
Slab:              46540 kB
SReclaimable:      32584 kB
SUnreclaim:        13956 kB
KernelStack:        2504 kB
PageTables:         5468 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5929748 kB
Committed_AS:    1111432 kB
VmallocTotal:     122880 kB
VmallocUsed:       20528 kB
VmallocChunk:      39636 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       32760 kB
DirectMap2M:      880640 kB
Comment 1 Avi Kivity 2010-12-26 08:17:53 UTC
Doesn't reproduce on kvm.git next.
Comment 2 Avi Kivity 2010-12-26 08:50:39 UTC
2.6.37-rc7 good, 2.6.36.2 bad
Comment 3 Avi Kivity 2010-12-26 10:02:14 UTC
Fix:

commit f87f928882d080eaec8b0d76aecff003d664697d
Author: Joerg Roedel <joerg.roedel@amd.com>
Date:   Thu Sep 2 17:29:45 2010 +0200

    KVM: MMU: Fix 32 bit legacy paging with NPT
    
    This patch fixes 32 bit legacy paging with NPT enabled. The
    mmu_check_root call on the top-level of the loop causes
    root_gfn to take values (in the tdp_enabled path) which are
    outside of guest memory. So the mmu_check_root call fails at
    some point in the loop interation causing the guest to
    tiple-fault.
    This patch changes the mmu_check_root calls to the places
    where they are really necessary. As a side-effect it
    introduces a check for the root of a pae page table too.
    
    Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Comment 4 Dan H 2011-01-11 07:32:41 UTC
Based on the changelog, this didn't seem to make it into 2.6.36.3. Any chance for 2.6.36.4?  If not, how long until 2.6.37?
Comment 5 Dan H 2011-02-01 18:23:06 UTC
I have confirmed that 2.6.37 fixes this issue. Thanks.

Note You need to log in before you can comment on or make changes to this bug.