Bug 11219 - KVM modules break emergency reboot
Summary: KVM modules break emergency reboot
Status: CLOSED CODE_FIX
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-01 14:56 UTC by Rafael J. Wysocki
Modified: 2012-05-22 13:09 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.27-rc1
Tree: Mainline
Regression: No


Attachments
disable vmx on reboot (842 bytes, patch)
2008-08-27 03:17 UTC, Avi Kivity
Details | Diff

Description Rafael J. Wysocki 2008-08-01 14:56:24 UTC
Subject    : Re: Sysrq+B doesn't work on my box
Submitter  : "Zdenek Kabelac" <zdenek.kabelac@gmail.com>
Date       : 2008-08-01 20:25
References : http://marc.info/?l=linux-kernel&m=121762241105336&w=4

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Zdenek Kabelac 2008-08-12 02:19:50 UTC
As a response to Rafaels' email for rechecking status of this bug - it still applies to my kernel build from commit: 10fec20ef5eec1c91913baec1225400f0d02df40

Sysrq+B will end in the int3 deadlock when  kvm modules are loaded.
Comment 2 Avi Kivity 2008-08-18 06:39:47 UTC
(copying mingo)

(context: sysrq-B with kvm-intel.ko loaded doesn't work. on my machine, 
it kills the sata interface, but the processor and network keeps working)

Strangely, the specs say:

> 
Comment 3 Ingo Molnar 2008-08-18 07:04:59 UTC
* Avi Kivity <avi@qumranet.com> wrote:

> (copying mingo)
>
> (context: sysrq-B with kvm-intel.ko loaded doesn't work. on my machine,  
> it kills the sata interface, but the processor and network keeps working)
>
> Strangely, the specs say:
>
>> • The INIT signal is blocked whenever a logical processor is in VMX  
>> root operation.
>> It is not blocked in VMX non-root operation. Instead, INITs cause VM  
>> exits (see
>> Section 21.3, “Other Causes of VM Exits”).
>
> So INIT (which is wired to the triple-fault processor output, it seems,  
> rather than RESET) is blocked and the machine is not reset completely.
>
> So we need to disable vmx during native_machine_emergency_restart().  
> There are at least three ways of doing this:
>
> - add a vmxoff sequence (with an exception handler) to  
> native_machine_emergency_restart(). while simplest, this will not  
> unblock INIT for other cpus
>
> - add an emergency_restart notifier_block, and have kvm subscribe. This  
> has the disadvantage of being slightly complex, opening a tiny race  
> (emergency restart during kvm module initialization), and requiring IPIs  
> during emergency restart.
>
> - move vmxon/vmxoff management out of the kvm module and into x86 core.  
> Bloats the core but reduces complexity. IPIs still required.
>
> I think the notifier block is the way to go. Ingo, let me know what you  
> prefer.

notifier should be OK i think - sysrq-b is an emergency mechanism after 
all.

btw., "echo b > /proc/sysrq-trigger" never worked reliably for me with 
KVM also loaded.

	Ingo
Comment 4 Avi Kivity 2008-08-25 09:28:02 UTC
Simple workaround:

  boot with 'reboot=a' kernel parameter.

proposed as a patch for 2.6.1[78].
Comment 5 Avi Kivity 2008-08-25 09:51:26 UTC
btw, this isn't a recent regression.  the problem has been present since 2.6.20.
Comment 6 Rafael J. Wysocki 2008-08-25 14:23:37 UTC
Removed from the list, thanks.
Comment 7 Zdenek Kabelac 2008-08-27 02:57:33 UTC
Well I have no good news - reboot=a doesn't solve my problem - the machine doesn't emergency reboot with this flag - actually I've forget to mention this in the initial post, that I've already tried I think all those reboot parameters.

So for T61 and kvm modules loaded - ACPI reboot will not fix the problem.

Currently tested with kernel: 1941246dd98089dd637f44d3bd4f6cc1c61aa9e4
Comment 8 Avi Kivity 2008-08-27 03:17:54 UTC
Created attachment 17477 [details]
disable vmx on reboot

Please test the attached patch.  Watch out for not all processors coming back online after the reboot.
Comment 9 Rafael J. Wysocki 2008-08-27 03:43:23 UTC
Handled-By : Avi Kivity <avi@qumranet.com>
Comment 10 Zdenek Kabelac 2008-08-27 07:37:45 UTC
There is some progress - usually first I check if the emergency reboot works in the runlevel 1  with SysRQ+SUB

With your attached patch kernel finally reboots - with ACPI or with reboot=kbd.

The trouble is - if I start my usual runlevel 5 - the emergency reboot turns again into plain deadlock - I could see 'reseting' written on the console with blinking cursor - both with ACPI & KBD.

Hopefully I've not made any mistakes during tests as I've tried to double check them - but for this behavior I've no explanation.

Any idea what should I check as a potential source of troubles here?
Maybe the vmx switch has to be applied to both CPUs?
Comment 11 Avi Kivity 2008-08-27 07:50:11 UTC
Probably the difference was which cpu executed the reset.

Try (from runlevel 3):

  taskset 1 emergency-reboot

  taskset 2 emergency-reboot

(where emergency-reboot is a script that does 'echo b > /proc/sysrq-trigger)
Comment 12 Zdenek Kabelac 2008-08-27 12:04:24 UTC
Great, this has finally made it working, now it really reboots.

So hopefully it will be now possible to make a workable patch for this.
Comment 13 Avi Kivity 2008-08-27 13:06:36 UTC
Which one really worked? taskset 1 or taskset 2?
Comment 14 Zdenek Kabelac 2008-08-27 14:43:18 UTC
Hmm - I have thought that I should run them both at the same time.
so I've actually put them into a script file
Comment 15 Zdenek Kabelac 2008-08-27 15:07:43 UTC
I've made and extra test - each double checked.

When I run only taskset 1  or taskset 2 - the reboot will not happen.
And there is minor difference  - with taskset 2 the machine still looks for a while somewhat 'alive' - i.e. I could switch consoles for some time.

Only when I execute shell script with both taskset commands - the reboot will succeed.
Comment 16 Zdenek Kabelac 2008-12-23 16:29:36 UTC
Just a minor respin for this bug - anything new ?
Comment 17 Avi Kivity 2008-12-23 23:55:40 UTC
A fix is queued for 2.6.29.

Note You need to log in before you can comment on or make changes to this bug.