Bug 6647 - second suspend fails if nmi watchdog is enabled: Thinkpad X60 core dual
Summary: second suspend fails if nmi watchdog is enabled: Thinkpad X60 core dual
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: platform_i386
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-06-05 00:16 UTC by Jeremy Fitzhardinge
Modified: 2007-04-28 12:49 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.17-rc5-mm3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
.config (41.44 KB, text/plain)
2006-06-05 00:17 UTC, Jeremy Fitzhardinge
Details
dmesg output of boot (39.70 KB, text/plain)
2006-06-05 00:17 UTC, Jeremy Fitzhardinge
Details
Picture of oops message (256.58 KB, image/jpeg)
2006-06-05 01:15 UTC, Jeremy Fitzhardinge
Details
dmesg output before suspend (32.61 KB, text/plain)
2006-06-05 14:12 UTC, Jeremy Fitzhardinge
Details
grep NMI /proc/interrupts before suspend (28 bytes, text/plain)
2006-06-05 14:12 UTC, Jeremy Fitzhardinge
Details
dmesg output after resume (46.67 KB, text/plain)
2006-06-05 14:13 UTC, Jeremy Fitzhardinge
Details
grep NMI /proc/interrupts after resume (28 bytes, text/plain)
2006-06-05 14:13 UTC, Jeremy Fitzhardinge
Details

Description Jeremy Fitzhardinge 2006-06-05 00:16:16 UTC
Most recent kernel where this bug did not occur:2.6.17-rc5
Distribution:FC5
Hardware Environment:Thinkpad X60: 1.83Ghz Intel Core Duo
Software Environment:
Problem Description:
When suspending to ram with ACPI with the watchdog enabled, the machine crashes
with BUG in arch/i386/kernel/nmi.c:174.  The stack backtrace is:


    release_evntsel_nmi
    stop_apci_nmi_watchdog
    on_each_cpu
    disable_lapic_nmi_watchdog
    lapic_nmi_suspend
    sysdev_suspend
    device_power_down
    suspend_enter
    enter_state
    state_store
    subsys_attr_store
    sysfs_write_file
    vfs_write
    sys_write
    sysenter_past_esp 
Steps to reproduce:
1. boot with nmi_watchdog set to default setting
2. suspend to ram, and resume
3. suspend to ram again - crash during suspend

booting with nmi_watchdog=0 on the kernel command line works around the problem.
Comment 1 Jeremy Fitzhardinge 2006-06-05 00:17:20 UTC
Created attachment 8251 [details]
.config
Comment 2 Jeremy Fitzhardinge 2006-06-05 00:17:57 UTC
Created attachment 8252 [details]
dmesg output of boot
Comment 3 Jeremy Fitzhardinge 2006-06-05 00:27:44 UTC
It appears that patch
x86_64-mm-add-performance-counter-reservation-framework-for-up-kernels.patch
(dzickus@redhat.com) introduces the BUG_ON in question.  This patch is
apparently intended to be UP only, though clearly its code gets into SMP kernels...
Comment 4 Jeremy Fitzhardinge 2006-06-05 00:36:15 UTC
Er, obviously there are a couple of follow-on patches to make this work for SMP
i386.
Comment 5 Andrew Morton 2006-06-05 00:41:17 UTC
I see no BUG_ONs in that dmesg output?
Comment 6 Jeremy Fitzhardinge 2006-06-05 00:57:42 UTC
No, that's just a normal bootup.  The machine locks hard after the crash, so I
can't get dmesg output (and netconsole doesn't work during suspend either).

I'll attach the screenshots.
Comment 7 Jeremy Fitzhardinge 2006-06-05 01:15:14 UTC
Created attachment 8253 [details]
Picture of oops message
Comment 8 Don Zickus 2006-06-05 09:57:51 UTC
Can you attach a dmesg output of a normal bootup with the nmi watchdog enabled?
 The one you attached has it disabled (nmi_watchdog=0).

Also could I see the output of 'cat /proc/interrupts |grep NMI' before you
suspend and after you resume (the first time).  

Thanks.
Comment 9 Jeremy Fitzhardinge 2006-06-05 14:12:08 UTC
Created attachment 8260 [details]
dmesg output before suspend
Comment 10 Jeremy Fitzhardinge 2006-06-05 14:12:54 UTC
Created attachment 8261 [details]
grep NMI /proc/interrupts before suspend
Comment 11 Jeremy Fitzhardinge 2006-06-05 14:13:23 UTC
Created attachment 8262 [details]
dmesg output after resume
Comment 12 Jeremy Fitzhardinge 2006-06-05 14:13:56 UTC
Created attachment 8263 [details]
grep NMI /proc/interrupts after resume
Comment 13 Jeremy Fitzhardinge 2006-06-05 14:15:58 UTC
The NMI count on CPU1 is increasing at about 10-20/sec (variable) after resume.
Comment 14 Pavel Machek 2006-09-29 04:22:34 UTC
Is the problem still there with 2.6.18? We may need to touch_nmi_watchdog or
something...
Comment 15 Daniel Smolik 2006-10-03 11:43:26 UTC
But why this patch is not included in 2.6.18 ?
Comment 16 Daniel Smolik 2006-10-03 11:52:16 UTC
I now compile 2.6.18 an I can suspend one time and after this suspend led
blinking and cannot suspend again
Comment 17 Daniel Smolik 2006-10-03 12:02:08 UTC
Sorry I wrote to wrong bug :-(
Comment 18 Rafael J. Wysocki 2006-10-30 09:46:11 UTC
Could anyone please say what the current status of this bug is?
Comment 19 Adrian Bunk 2006-12-03 09:42:46 UTC
Please reopen this bug if it's still present in kernel 2.6.19.

Note You need to log in before you can comment on or make changes to this bug.