Created attachment 93101 [details] Nested EPT patches, v2 Nested EPT means emulating EPT for an L1 guest, allowing it to use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. In many workloads this significanlty improves L2's performance over the previous two alternatives (shadow page tables over ept, and shadow page tables over shadow page tables). As an example, I measured a single-threaded "make", which has a lot of context switches and page faults, on the three options: shadow over shadow: 105 seconds shadow over EPT: 87 seconds (this is the default currently) EPT over EPT: 29 seconds single-level virtualization (with EPT): 25 seconds So clearly nested EPT would be a big win for such workloads. I attach a patch set which I worked on and allowed me to measure the above results. This is the same patch set I sent to KVM mailing list on August 1st, 2012, titled "nEPT v2: Nested EPT support for Nested VMX". This patch set still needs some work: it is known to only work in some setups but not others, and the file "announce" in the attached tar lists 5 things which definitely need to be done. There were a few additional comments in the mailing list - see http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95395
In addition to the known issues list in the "announce" file attached above, I thought of several more issues that should be considered: 1. When switching back and forth between L1 and L2 it will be a waste to throw away the EPT table already built. So I hope (need to check...) that the EPT table is cached. But what is the cache key - the cr3? But cr3 has a different meaning in L2 and L1, so it might not be correct to use that as the key. 2. When L0 swaps out pages, it needs to remove these entries in all EPT tables, including the cached EPT02 even if not currently used. Does this happen correctly? 3. If L1 uses EPT ("nested EPT") and gives us a malformed EPT12 table, we may need to inject an EPT_MISCONFIGURATION exit when building the merged EPT02 entry. Typically, we do this building (see "fetch" in paging_tmpl.h) when handling an EPT violation exit from L2, so if we encounter this problem instead of reentering L2 immediately, we should exit to L1 with an EPT misconfigration. I'm not sure exactly how to notice this problem. Perhaps the pagetable walking code, which in our case walks EPT12 already notices a problem and does something (#GP perhaps?) and we need to have it do the EPT misconfig instead. But it is possible we need to add additional tests that are not done for normal page tables - in particularly regarding reserved bits, and especially bit 5 (in EPT it is reserved, in normal page tables it is the accessed bit). This issue is low priority, as it only deals with the error path; A well-written L1 will not caused EPT configurations anyway.
Fixed by commit afa61f752ba6 (Advertise the support of EPT to the L1 guest, through the appropriate MSR., 2013-08-07)