Kernel Bug Tracker – Bug 9787
Deadlock on _any_ ACPI event with nmi_watchdog=1
Last modified: 2008-02-15 11:20:19 UTC
Latest working kernel version: 2.6.23 for sure (but I had some 2.6.24 kernel wroking)
Earliest failing kernel version: current 2.6.24, up to a7da60f41551abb3c520b03d42ec05dd7decfc7f mainline commit
Hardware Environment: x86_64
It's impossible to use any last kernel with ACPI compiled in: _any_ ACPI even hangs my laptop without any [net]console output or so.
Any ACPI even mean: lid closing, AC adapter removing/plugging in, button press.
Guess, needless to say that it's impossible to use laptop without ACPI:
no battery info, no automating AC adapter events, no hibernate on the button press.
Steps to reproduce:
It can be reproduced easily and almost 100%. I'm just booting ACPI kernel and removing AC adapter - laptop is locked.
I'm trying 2.6.24-rc5 and lower now to try to find out the problem release.
But I can boot 2.6.23 anytime - and there is no the problem there.
*** Bug 9788 has been marked as a duplicate of this bug. ***
*** Bug 9789 has been marked as a duplicate of this bug. ***
Just finished trying all the kernels down to 2.6.24-rc1 - the same thing.
So, marking the bug as regression.
Created attachment 14512 [details]
On Mon, 21 Jan 2008 08:52:25 -0800 (PST) firstname.lastname@example.org wrote:
> Summary: Deadlock on _any_ ACPI event
> Product: ACPI
> Version: 2.5
> KernelVersion: 2.6.24-rc8
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: high
> Priority: P1
> Component: ACPICA-Core
> AssignedTo: email@example.com
> ReportedBy: firstname.lastname@example.org
> Latest working kernel version: 2.6.23 for sure (but I had some 2.6.24 kernel
A box-killing post-2.6.23 regression.
> Earliest failing kernel version: current 2.6.24, up to
> a7da60f41551abb3c520b03d42ec05dd7decfc7f mainline commit
> Hardware Environment: x86_64
> Problem Description:
> It's impossible to use any last kernel with ACPI compiled in: _any_ ACPI even
> hangs my laptop without any [net]console output or so.
> Any ACPI even mean: lid closing, AC adapter removing/plugging in, button press.
> Guess, needless to say that it's impossible to use laptop without ACPI:
> no battery info, no automating AC adapter events, no hibernate on the button
> Steps to reproduce:
> It can be reproduced easily and almost 100%. I'm just booting ACPI kernel and
> removing AC adapter - laptop is locked.
> I'm trying 2.6.24-rc5 and lower now to try to find out the problem release.
> But I can boot 2.6.23 anytime - and there is no the problem there.
Yes, please do try to pinpoint which change broke it.
Created attachment 14513 [details]
dmesg of the problem 2.6.24-rc8 (HEAD is a7da60f41551abb3c520b03d42ec05dd7decfc7f)
I found that any 2.6.24-rcX has the problem.
Somewhere between 2.6.24-rc7 - .rc8 I've tried to merge kgdb git branch and tried to connect to the locked system when the problem appears - it didn't let me in.
Even kgdb can't interrupt this.
can you attach the dmesg output of a working kernel? say 2.6.23.
Just found a backed up working 2.6.24-rc6 (!!)
Attaching its config and dmesg output.
Created attachment 14521 [details]
2.6.24-rc6 working config
Created attachment 14522 [details]
Working dmesg output
2.6.24-rc6 working dmesg output
The thing is there is nothing ACPI related between the configs
(except CONFIG_ACPI_SYSFS_POWER which is NOP for me: the problem doesn't depend on the option state).
So, we have some non-obvious reason.
I'm trying different options (in the configs diff) to find it out.
>[ 0.000000] Linux version 2.6.24-rc8 (root@knote) (gcc version 4.2.2 (Gentoo 4.2.2 p1.0)) #31 SMP PREEMPT Mon Jan 21 21:23:44 EET 2008
>[ 0.000000] Command line: ro root=/dev/sda2 nmi_watchdog=1
why do you add the "nmi_watchdoh=1" boot option?
Is there any difference if you remove it?
I started to add it to find out if it will show deadlock with disabled interrupts or so.
But in fact - it didn't change anything.
BTW, I have an update.
I've built rc8 and rc6 with the working config... and I got the same problem(!).
So, I guess the problem is in my own environment.
Rechecking/cleaning everything and trying again.
Congrats Zhang :)
You asked the right question in fact.
I added nmi_watchdog to track down some problems during IPVS customization I was doing before I hit the %subj% problem. Now I can see why I hit it...
Without nmi_watchdog everything is OK :)
So, the prob. is a low prio in fact.
But we have an interesting question finally: why an ACPI event hangs system when nmi_watchdog is enabled?
In fact, the problem appears and in 2.6.23 - so, this bug doesn't block meta-bug #9243 and is probably not a regression (not sure since what version it exists) - so removing both attributes.
Per Linus, nmi_watchdog was disabled to prevent issues like this:
closing as a duplicate of bug 7839 -- use nmi_watchdog at your own risk.
Please re-open if you find that nmi-watchdog worked
on a previous kernel and then stopped working.
*** This bug has been marked as a duplicate of bug 7839 ***
Sorry for wasting all your time.