Bug 43068 - Operation restricted to levels L0-2 - kerneloops when booting L3
Summary: Operation restricted to levels L0-2 - kerneloops when booting L3
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks: 94971 53601
  Show dependency tree
 
Reported: 2012-04-07 17:36 UTC by josef
Modified: 2015-03-17 03:53 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.2.0-2-amd64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel oops backtrace (1.47 KB, text/plain)
2012-04-07 17:36 UTC, josef
Details

Description josef 2012-04-07 17:36:22 UTC
Created attachment 72845 [details]
Kernel oops backtrace

Trying to run KVM with a high level of nesting works fine for the first two levels of virtual machines (L1 and L2) but leads to the attached oops when attempting to boot into L3.

The host system has 24 Xeon X5675 cores running Ubuntu 12.04 with kernel 3.2.0-21-generic. The nested VMs are restricted to one of these core. They all run resized clones of a Debian wheezy (testing) image with kernel 3.2.0-2-amd64.

In the Turtles paper, nested virtualisation has been presented to support Ln and includes L3 in the diagrams but the evaluation only went up to L2. The kernel documentation nested-vmx.txt doesn't list this as a restriction but doesn't mention L3 or higher either. Hence, it is not clear if the implementation is supposed to support L3 at all, at the moment or in the foreseeable future.
Comment 1 Nadav Har'El 2012-04-08 08:18:32 UTC
Hi, indeed, theoretically L3 should work; And also in practice - in my tests it did work (albeit very very slowly). I'll need to look into this issue, and check why this bug is happening now.

You should just be aware that even if it will work, L3 will be extremely slow, likely to the point of not being useful. One of reasons is the lack of nested EPT in the upstream version. An even bigger problem is the exponential explosion of exits described in the Turtles paper, which is made much worse by one particular part of the existing implementation: Right now, on every nested entry from L1 to L2, L0 recreates all the fields of the vmcs (see prepare_vmcs02()) doing a few dozen VMWRITEs. When L0 does it, it's fine - but when we are nested yet deeper, and there is an entry from L2 into L3 and *L1* needs to do prepare_vmcs02(), now L1 calls a lot of VMWRITEs and they all cause exits, and all of this is extremely slow.

There are two things we could do in the future to solve this performance problem - if people really need to use L3 (currently, even L2 isn't very popular ;-)). One is to have some sort of exit-less VMREAD/VMWRITE, with or without hardware support for this feature. The second thing is to emulate VMWRITE differently, writing to vmcs02 immediately and doing far fewer VMWRITES on nested entries (doing this correctly is harder than it might seem, as I can explain in a separate thread).

Nadav Har'El

Note You need to log in before you can comment on or make changes to this bug.