Bug 11219
Summary: | KVM modules break emergency reboot | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Rafael J. Wysocki (rjw) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, avi, zdenek.kabelac |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | disable vmx on reboot |
Description
Rafael J. Wysocki
2008-08-01 14:56:24 UTC
As a response to Rafaels' email for rechecking status of this bug - it still applies to my kernel build from commit: 10fec20ef5eec1c91913baec1225400f0d02df40 Sysrq+B will end in the int3 deadlock when kvm modules are loaded. (copying mingo)
(context: sysrq-B with kvm-intel.ko loaded doesn't work. on my machine,
it kills the sata interface, but the processor and network keeps working)
Strangely, the specs say:
>
* Avi Kivity <avi@qumranet.com> wrote: > (copying mingo) > > (context: sysrq-B with kvm-intel.ko loaded doesn't work. on my machine, > it kills the sata interface, but the processor and network keeps working) > > Strangely, the specs say: > >> • The INIT signal is blocked whenever a logical processor is in VMX >> root operation. >> It is not blocked in VMX non-root operation. Instead, INITs cause VM >> exits (see >> Section 21.3, “Other Causes of VM Exits”). > > So INIT (which is wired to the triple-fault processor output, it seems, > rather than RESET) is blocked and the machine is not reset completely. > > So we need to disable vmx during native_machine_emergency_restart(). > There are at least three ways of doing this: > > - add a vmxoff sequence (with an exception handler) to > native_machine_emergency_restart(). while simplest, this will not > unblock INIT for other cpus > > - add an emergency_restart notifier_block, and have kvm subscribe. This > has the disadvantage of being slightly complex, opening a tiny race > (emergency restart during kvm module initialization), and requiring IPIs > during emergency restart. > > - move vmxon/vmxoff management out of the kvm module and into x86 core. > Bloats the core but reduces complexity. IPIs still required. > > I think the notifier block is the way to go. Ingo, let me know what you > prefer. notifier should be OK i think - sysrq-b is an emergency mechanism after all. btw., "echo b > /proc/sysrq-trigger" never worked reliably for me with KVM also loaded. Ingo Simple workaround: boot with 'reboot=a' kernel parameter. proposed as a patch for 2.6.1[78]. btw, this isn't a recent regression. the problem has been present since 2.6.20. Removed from the list, thanks. Well I have no good news - reboot=a doesn't solve my problem - the machine doesn't emergency reboot with this flag - actually I've forget to mention this in the initial post, that I've already tried I think all those reboot parameters. So for T61 and kvm modules loaded - ACPI reboot will not fix the problem. Currently tested with kernel: 1941246dd98089dd637f44d3bd4f6cc1c61aa9e4 Created attachment 17477 [details]
disable vmx on reboot
Please test the attached patch. Watch out for not all processors coming back online after the reboot.
Handled-By : Avi Kivity <avi@qumranet.com> There is some progress - usually first I check if the emergency reboot works in the runlevel 1 with SysRQ+SUB With your attached patch kernel finally reboots - with ACPI or with reboot=kbd. The trouble is - if I start my usual runlevel 5 - the emergency reboot turns again into plain deadlock - I could see 'reseting' written on the console with blinking cursor - both with ACPI & KBD. Hopefully I've not made any mistakes during tests as I've tried to double check them - but for this behavior I've no explanation. Any idea what should I check as a potential source of troubles here? Maybe the vmx switch has to be applied to both CPUs? Probably the difference was which cpu executed the reset. Try (from runlevel 3): taskset 1 emergency-reboot taskset 2 emergency-reboot (where emergency-reboot is a script that does 'echo b > /proc/sysrq-trigger) Great, this has finally made it working, now it really reboots. So hopefully it will be now possible to make a workable patch for this. Which one really worked? taskset 1 or taskset 2? Hmm - I have thought that I should run them both at the same time. so I've actually put them into a script file I've made and extra test - each double checked. When I run only taskset 1 or taskset 2 - the reboot will not happen. And there is minor difference - with taskset 2 the machine still looks for a while somewhat 'alive' - i.e. I could switch consoles for some time. Only when I execute shell script with both taskset commands - the reboot will succeed. Just a minor respin for this bug - anything new ? A fix is queued for 2.6.29. |