Created attachment 72845 [details] Kernel oops backtrace Trying to run KVM with a high level of nesting works fine for the first two levels of virtual machines (L1 and L2) but leads to the attached oops when attempting to boot into L3. The host system has 24 Xeon X5675 cores running Ubuntu 12.04 with kernel 3.2.0-21-generic. The nested VMs are restricted to one of these core. They all run resized clones of a Debian wheezy (testing) image with kernel 3.2.0-2-amd64. In the Turtles paper, nested virtualisation has been presented to support Ln and includes L3 in the diagrams but the evaluation only went up to L2. The kernel documentation nested-vmx.txt doesn't list this as a restriction but doesn't mention L3 or higher either. Hence, it is not clear if the implementation is supposed to support L3 at all, at the moment or in the foreseeable future.
Hi, indeed, theoretically L3 should work; And also in practice - in my tests it did work (albeit very very slowly). I'll need to look into this issue, and check why this bug is happening now. You should just be aware that even if it will work, L3 will be extremely slow, likely to the point of not being useful. One of reasons is the lack of nested EPT in the upstream version. An even bigger problem is the exponential explosion of exits described in the Turtles paper, which is made much worse by one particular part of the existing implementation: Right now, on every nested entry from L1 to L2, L0 recreates all the fields of the vmcs (see prepare_vmcs02()) doing a few dozen VMWRITEs. When L0 does it, it's fine - but when we are nested yet deeper, and there is an entry from L2 into L3 and *L1* needs to do prepare_vmcs02(), now L1 calls a lot of VMWRITEs and they all cause exits, and all of this is extremely slow. There are two things we could do in the future to solve this performance problem - if people really need to use L3 (currently, even L2 isn't very popular ;-)). One is to have some sort of exit-less VMREAD/VMWRITE, with or without hardware support for this feature. The second thing is to emulate VMWRITE differently, writing to vmcs02 immediately and doing far fewer VMWRITES on nested entries (doing this correctly is harder than it might seem, as I can explain in a separate thread). Nadav Har'El