Bug 208067
Summary: | Reboot fails when VMX enabled, but not in VMX operation | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | David P. Reed (dpreed) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | NEW --- | ||
Severity: | normal | CC: | dpreed |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.6.15, 5.8-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | Proposed patch |
Description
David P. Reed
2020-06-04 20:26:17 UTC
After a fair bit of research and thinking, if CR4.VMEX is set, it is possible to avoid a trap occurring here by checking some feature control bits in MSRs first, and then executing VMXON (which mostly fails by setting the C or Z flags.) So this can determine whether a VMXOFF is needed, by indicating that the processor is already in VMX root mode, or if it isn't, by entering VMX root mode, or failing for some other reason. The logic is pretty straightforward to do this without the possibility of either a GP fault or UD fault. As far as I can tell, VMXON is the only instruction that detects whether the processor is in VMX root mode already, without generating an undefined opcode fault. It would be weird-looking, but it should work. I may code up a patch for this, because it doesn't involve changing anything else, and achieves the goal of ensuring that the processor is not in VMX root mode. [if running under a Virtual Machine Monitor, that is, in VMX non-root operation, executing VMXON will do a VMEXIT. But that case must be handled in any event by a proper Virtual Machine Monitor. I'm not sure, for example, what KVM does on a VMXON, but it would already be exiting on VMXOFF in the same way at this point in reboot. Presumably KVM emulates VMXON by making it fail setting a flag, as the actual hardware does when CR4.VMXE is set. ] Since this can be fixed with one line of code, I will be submitting a parch shortly. It took a few lines, carefully placed, to do the cleanest possible patch for this issue. I will attach the proposed patch, which I already submitted to maintainers of the relevant files (from get_maintainer.pl script). I've done a series of tests that touch the code paths involved (different panic and reboot scenarios execute the problematic inline cpu_xoff() differently), both with VMX enabled and with VMX disabled (as is normal). And I've exercised the error trap path. Everything seems to be fixed by this patch, and no regressions seem to occur. Created attachment 289599 [details]
Proposed patch
This is the patch mentioned in the comment.
|