Bug 53611

Summary: nVMX: Add nested EPT
Product: Virtualization Reporter: Nadav Har'El (nyh)
Component: kvmAssignee: virtualization_kvm
Status: RESOLVED CODE_FIX    
Severity: normal CC: bonzini
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.19 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 94971, 53601    
Attachments: Nested EPT patches, v2

Description Nadav Har'El 2013-02-11 12:49:05 UTC
Created attachment 93101 [details]
Nested EPT patches, v2

Nested EPT means emulating EPT for an L1 guest, allowing it to use EPT when
running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set
its own cr3 and take its own page faults without either of L0 or L1 getting
involved. In many workloads this significanlty improves L2's performance over
the previous two alternatives (shadow page tables over ept, and shadow page
tables over shadow page tables). As an example, I measured a single-threaded "make", which has a lot of context switches and page faults, on the three options:

 shadow over shadow: 105 seconds
 shadow over EPT: 87 seconds      (this is the default currently)
 EPT over EPT: 29 seconds

 single-level virtualization (with EPT): 25 seconds

So clearly nested EPT would be a big win for such workloads.

I attach a patch set which I worked on and allowed me to measure the above results. This is the same patch set I sent to KVM mailing list on August 1st, 2012, titled "nEPT v2: Nested EPT support for Nested VMX".

This patch set still needs some work: it is known to only work in some setups but not others, and the file "announce" in the attached tar lists 5 things which definitely need to be done. There were a few additional comments in the mailing list - see http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95395
Comment 1 Nadav Har'El 2013-02-27 08:14:13 UTC
In addition to the known issues list in the "announce" file attached above, I thought of several more issues that should be considered:

1. When switching back and forth between L1 and L2 it will be a waste to throw
away the EPT table already built. So I hope (need to check...) that the EPT
table is cached. But what is the cache key - the cr3? But cr3 has a different
meaning in L2 and L1, so it might not be correct to use that as the key.

2. When L0 swaps out pages, it needs to remove these entries in all EPT tables,
including the cached EPT02 even if not currently used. Does this happen
correctly?

3. If L1 uses EPT ("nested EPT") and gives us a malformed EPT12 table, we may
need to inject an EPT_MISCONFIGURATION exit when building the merged EPT02
entry. Typically, we do this building (see "fetch" in paging_tmpl.h) when
handling an EPT violation exit from L2, so if we encounter this problem
instead of reentering L2 immediately, we should exit to L1 with an EPT
misconfigration. I'm not sure exactly how to notice this problem. Perhaps the
pagetable walking code, which in our case walks EPT12 already notices a problem
and does something (#GP perhaps?) and we need to have it do the EPT misconfig
instead. But it is possible we need to add additional tests that are not done
for normal page tables - in particularly regarding reserved bits, and
especially bit 5 (in EPT it is reserved, in normal page tables it is the
accessed bit). This issue is low priority, as it only deals with the error
path; A well-written L1 will not caused EPT configurations anyway.
Comment 2 Paolo Bonzini 2015-04-08 09:02:34 UTC
Fixed by commit afa61f752ba6 (Advertise the support of EPT to the L1 guest, through the appropriate MSR., 2013-08-07)