Bug 9787 - Deadlock on _any_ ACPI event with nmi_watchdog=1
Summary: Deadlock on _any_ ACPI event with nmi_watchdog=1
Status: CLOSED DUPLICATE of bug 7839
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts (show other bugs)
Hardware: All Linux
: P1 low
Assignee: Zhang Rui
URL:
Keywords:
: 9788 9789 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-01-21 08:52 UTC by Nick
Modified: 2008-02-15 11:20 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.24-rc8
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Problem config (44.21 KB, text/plain)
2008-01-21 10:43 UTC, Nick
Details
dmesg of the problem 2.6.24-rc8 (HEAD is a7da60f41551abb3c520b03d42ec05dd7decfc7f) (27.65 KB, text/plain)
2008-01-21 11:52 UTC, Nick
Details
working config (43.41 KB, text/plain)
2008-01-21 18:38 UTC, Nick
Details
Working dmesg output (27.95 KB, text/plain)
2008-01-21 18:40 UTC, Nick
Details

Description Nick 2008-01-21 08:52:24 UTC
Latest working kernel version: 2.6.23 for sure (but I had some 2.6.24 kernel wroking)
Earliest failing kernel version: current 2.6.24, up to a7da60f41551abb3c520b03d42ec05dd7decfc7f   mainline commit

Hardware Environment: x86_64
Problem Description:
It's impossible to use any last kernel with ACPI compiled in: _any_ ACPI even hangs my laptop without any [net]console output or so.
Any ACPI even mean: lid closing, AC adapter removing/plugging in, button press.

Guess, needless to say that it's impossible to use laptop without ACPI:
no battery info, no automating AC adapter events, no hibernate on the button press.

Steps to reproduce:
It can be reproduced easily and almost 100%.  I'm just booting ACPI kernel and removing AC adapter - laptop is locked.

I'm trying 2.6.24-rc5 and lower now to try to find out the problem release.
But I can boot 2.6.23 anytime - and there is no the problem there.
Comment 1 Nick 2008-01-21 10:36:59 UTC
*** Bug 9788 has been marked as a duplicate of this bug. ***
Comment 2 Nick 2008-01-21 10:37:12 UTC
*** Bug 9789 has been marked as a duplicate of this bug. ***
Comment 3 Nick 2008-01-21 10:38:55 UTC
Just finished trying all the kernels down to 2.6.24-rc1 - the same thing.
So, marking the bug as regression.
Comment 4 Nick 2008-01-21 10:43:20 UTC
Created attachment 14512 [details]
Problem config
Comment 5 Anonymous Emailer 2008-01-21 11:36:35 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 21 Jan 2008 08:52:25 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9787
> 
>            Summary: Deadlock on _any_ ACPI event
>            Product: ACPI
>            Version: 2.5
>      KernelVersion: 2.6.24-rc8
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: ACPICA-Core
>         AssignedTo: acpi_acpica-core@kernel-bugs.osdl.org
>         ReportedBy: gentuu@gmail.com
> 
> 
> Latest working kernel version: 2.6.23 for sure (but I had some 2.6.24 kernel
> wroking)

A box-killing post-2.6.23 regression.

> Earliest failing kernel version: current 2.6.24, up to
> a7da60f41551abb3c520b03d42ec05dd7decfc7f   mainline commit
> 
> Hardware Environment: x86_64
> Problem Description:
> It's impossible to use any last kernel with ACPI compiled in: _any_ ACPI even
> hangs my laptop without any [net]console output or so.
> Any ACPI even mean: lid closing, AC adapter removing/plugging in, button
> press.
> 
> Guess, needless to say that it's impossible to use laptop without ACPI:
> no battery info, no automating AC adapter events, no hibernate on the button
> press.
> 
> Steps to reproduce:
> It can be reproduced easily and almost 100%.  I'm just booting ACPI kernel
> and
> removing AC adapter - laptop is locked.
> 
> I'm trying 2.6.24-rc5 and lower now to try to find out the problem release.
> But I can boot 2.6.23 anytime - and there is no the problem there.
> 

Yes, please do try to pinpoint which change broke it.
Comment 6 Nick 2008-01-21 11:52:13 UTC
Created attachment 14513 [details]
dmesg of the problem 2.6.24-rc8 (HEAD is a7da60f41551abb3c520b03d42ec05dd7decfc7f)
Comment 7 Nick 2008-01-21 11:53:38 UTC
I found that any 2.6.24-rcX has the problem.
Comment 8 Nick 2008-01-21 11:58:00 UTC
Somewhere between 2.6.24-rc7 - .rc8 I've tried to merge kgdb git branch and tried to connect to the locked system when the problem appears - it didn't let me in.
Even kgdb can't interrupt this.
Total lock.
Comment 9 Zhang Rui 2008-01-21 18:12:09 UTC
can you attach the dmesg output of a working kernel? say 2.6.23.
Comment 10 Nick 2008-01-21 18:37:45 UTC
Surprise-surprise!

Just found a backed up working 2.6.24-rc6 (!!)
Attaching its config and dmesg output.
Comment 11 Nick 2008-01-21 18:38:42 UTC
Created attachment 14521 [details]
working config

2.6.24-rc6 working config
Comment 12 Nick 2008-01-21 18:40:34 UTC
Created attachment 14522 [details]
Working dmesg output

2.6.24-rc6 working dmesg output
Comment 13 Nick 2008-01-21 18:44:38 UTC
The thing is there is nothing ACPI related between the configs
(except CONFIG_ACPI_SYSFS_POWER which is NOP for me: the problem doesn't depend on the option state).

So, we have some non-obvious reason.

I'm trying different options (in the configs diff) to find it out.
Comment 14 Zhang Rui 2008-01-21 19:24:12 UTC
>[    0.000000] Linux version 2.6.24-rc8 (root@knote) (gcc version 4.2.2
>(Gentoo 4.2.2 p1.0)) #31 SMP PREEMPT Mon Jan 21 21:23:44 EET 2008
>[    0.000000] Command line: ro root=/dev/sda2 nmi_watchdog=1
why do you add the "nmi_watchdoh=1" boot option?
Is there any difference if you remove it?
Comment 15 Nick 2008-01-21 19:39:23 UTC
I started to add it to find out if it will show deadlock with disabled interrupts or so.
But in fact - it didn't change anything.


BTW, I have an update.
I've built rc8 and rc6 with the working config...  and I got the same problem(!).
So, I guess the problem is in my own environment.
Rechecking/cleaning everything and trying again.
Comment 16 Nick 2008-01-21 20:22:21 UTC
Congrats Zhang :)

You asked the right question in fact.
I added nmi_watchdog to track down some problems during IPVS customization I was doing before I hit the %subj% problem. Now I can see why I hit it...

Without nmi_watchdog everything is OK :)
So, the prob. is a low prio in fact.

But we have an interesting question finally: why an ACPI event hangs system when nmi_watchdog is enabled?
Comment 17 Nick 2008-01-21 20:36:04 UTC
In fact, the problem appears and in 2.6.23 - so, this bug doesn't block meta-bug #9243 and is probably not a regression (not sure since what version it exists) - so removing both attributes.
Comment 18 Len Brown 2008-01-21 21:05:40 UTC
Per Linus, nmi_watchdog was disabled to prevent issues like this:

http://lkml.org/lkml/2007/3/5/303

closing as a duplicate of bug 7839 -- use nmi_watchdog at your own risk.

Please re-open if you find that nmi-watchdog worked
on a previous kernel and then stopped working.


*** This bug has been marked as a duplicate of bug 7839 ***
Comment 19 Nick 2008-01-22 03:45:38 UTC
Thanks Len!

Sorry for wasting all your time.

Note You need to log in before you can comment on or make changes to this bug.