Bug 208067 - Reboot fails when VMX enabled, but not in VMX operation
Summary: Reboot fails when VMX enabled, but not in VMX operation
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-04 20:26 UTC by David P. Reed
Modified: 2020-06-10 18:35 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.6.15, 5.8-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Proposed patch (7.21 KB, patch)
2020-06-10 18:35 UTC, David P. Reed
Details | Diff

Description David P. Reed 2020-06-04 20:26:17 UTC
If CR4.VMXE is set, but VMXON has not been executed yet, or VMXOFF has been successfully executed once, a crash or reboot will encounter an #UD fault in cpu_emergency_vmxoff(), called from many reboot/crash paths. This #UD fault is incorrect behavior, since rebooting only requires CR4.VMXE to be reset (and maybe not even that is needed!).

Simple way to reproduce this: 

1. write a simple device driver that sets CR4.VMXE to 1 in one or more cpus.
2. Load the driver, setting the flag. System will continue to operate. 
3. execute command "sudo reboot".
System will panic with an #UD exception in cpu_emergency_vmxoff()'s inline expansion..

If it is thought that VMXOFF should be attempted, because there is no standard way to know if the processor is in VMX root operation or not, the #UD exception should be disabled during the execution of VMXOFF. If it were me, that is what I would do to fix this bug. It would be resilient in all cases.

This bug can be quite annoying when debugging virtualization code (whether it is KVM or any other code that requires CR4.VMXE to be set in order to work.) I've been dealing with the consequences randomly for months now.

I'd submit a patch, but the reboot/crash code path is quite subtle, and others are far more expert on the issues in that path (including kexec).

[was the author of this line of code confused into thinking CR4.VMXE indicated that the processor was in VMX operation? That might be likely, but not definitely true. A clean path to reboot seems important which avoids bugs like this when the system is not actually falling apart.]
Comment 1 David P. Reed 2020-06-04 22:14:05 UTC
After a fair bit of research and thinking, if CR4.VMEX is set, it is possible to avoid a trap occurring here by checking some feature control bits in MSRs first, and then executing VMXON (which mostly fails by setting the C or Z flags.)
So this can determine whether a VMXOFF is needed, by indicating that the processor is already in VMX root mode, or if it isn't, by entering VMX root mode, or failing for some other reason.

The logic is pretty straightforward to do this without the possibility of either a GP fault or UD fault.

As far as I can tell, VMXON is the only instruction that detects whether the processor is in VMX root mode already, without generating an undefined opcode fault.

It would be weird-looking, but it should work. I may code up a patch for this, because it doesn't involve changing anything else, and achieves the goal of ensuring that the processor is not in VMX root mode.

[if running under a Virtual Machine Monitor, that is, in VMX non-root operation, executing VMXON will do a VMEXIT. But that case must be handled in any event by a proper Virtual Machine Monitor. I'm not sure, for example, what KVM does on a VMXON, but it would already be exiting on VMXOFF in the same way at this point in reboot. Presumably KVM emulates VMXON by making it fail setting a flag, as the actual hardware does when CR4.VMXE is set. 
]
Comment 2 David P. Reed 2020-06-08 18:05:07 UTC
Since this can be fixed with one line of code, I will be submitting a parch shortly.
Comment 3 David P. Reed 2020-06-10 18:31:26 UTC
It took a few lines, carefully placed, to do the cleanest possible patch for this issue. I will attach the proposed patch, which I already submitted to maintainers of the relevant files (from get_maintainer.pl script). I've done a series of tests that touch the code paths involved (different panic and reboot scenarios execute the problematic inline cpu_xoff() differently), both with VMX enabled and with VMX disabled (as is normal). And I've exercised the error trap path.
Everything seems to be fixed by this patch, and no regressions seem to occur.
Comment 4 David P. Reed 2020-06-10 18:35:04 UTC
Created attachment 289599 [details]
Proposed patch

This is the patch mentioned in the comment.

Note You need to log in before you can comment on or make changes to this bug.