Bug 1206

Summary: NMI during poweroff - 2.4 only - S7505VB2
Product: ACPI Reporter: Len Brown (lenb)
Component: Power-OffAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.4.22, 2.4.27 Subsystem:
Regression: --- Bisected commit-id:
Attachments: serial console messages
patch against 2.4.28

Description Len Brown 2003-09-09 22:48:21 UTC
Distribution: RHL9.0
Hardware Environment: S7505VB2, 2GB RAM, SATA, 5 NICs
Software Environment: 2.4.22 kernel with ACPI enabled
Problem Description:
# init 0
provokes an NMI
The system sometimes powers off, and sometimes resets.

...
Turning off swap:
Turning off quotas:
Unmounting file systems:
Halting system...
flushing ide devices: hdc hde
Power down.
Uhhuh. NMI received. Dazed and

Reproducible: yes
Comment 1 Len Brown 2003-09-09 22:50:10 UTC
Created attachment 860 [details]
serial console messages
Comment 2 Len Brown 2003-09-11 14:20:00 UTC
Reproduced on a 2nd system, this one running BIOSv 1.01 and 1 physical 
processor.  Power-off worked, but the NMI is still there.

Turning off swap:
Turning off quotas:
Unmounting file systems:
Halting system...
flushing ide devices: hda hdc
Power down.
Uhhuh. NMI received. Dazed and confused, but trying to continue
e100: config WOL failed
You probably have a hardware problem with your RAM chips
Uhhuh. NMI received for unknown reason 35.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
 hwsleep-0257 [16] acpi_enter_sleep_state: Entering sleep state [S5]
Comment 3 Len Brown 2003-09-11 17:44:44 UTC
no difference with latest BIOS (1.06) and simplified kernel to
UP noapic.  though UP re-orders the messages and noapic causes
reason-code to be 25 rather than 35.  Power-down failure only
seen on the 2 (physical) processor box at this point; but NMI
is seen in all configs.

Power down.
Uhhuh. NMI received. Dazed and confused, but trying to continue
You probably have a hardware problem with your RAM chips
Uhhuh. NMI received for unknown reason 25.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
e100: config WOL failed
 hwsleep-0257 [21] acpi_enter_sleep_state: Entering sleep state [S5]
Comment 4 Len Brown 2003-09-12 01:47:53 UTC
> e100: config WOL failed
is from e100_do_wol() -- called from e100_suspend().
In a box with four pro/100's, I get 4 of these.
I configure WOL off by default, perhaps the e100 driver doesn't expect it?

> Uhhuh. NMI received. Dazed and confused, but trying to continue
> You probably have a hardware problem with your RAM chips
is from mem_parity_error(), called from do_nmi() when reason & 0x80


> Uhhuh. NMI received for unknown reason 25.
> Dazed and confused, but trying to continue
> Do you have a strange power saving mode enabled?
is from unknown_nmi_error(), called from do_num() when !(reason & 0xc0).
decimal 25 and 35 are 0x23 and 0x19, neither of which have 0xc0 bits set.



Comment 5 Luming Yu 2003-09-17 23:50:37 UTC
Would you please have workaroud patch at bug 1141 a try? Thanks a lot!
Comment 6 Len Brown 2004-11-15 10:39:05 UTC
2.6 powers down normally, w/ no NMI message 
	i tested 2.6.5/FC2 2,6.8.1 and 2.6.9 
2.4 powers down normally, but still gets the NMI 
	i tested as recently as 2.4.28-rc2 
 
BIOS is the latest -- 1.10 9/7/04 
 
Comment 7 Len Brown 2004-11-15 13:52:45 UTC
Created attachment 4032 [details]
patch against 2.4.28

The NMI is provoked by pci_pm_suspend_bus()
called by pci_pm_suspend()
called by acpi_system_save_state()
called by acpi_power_off().

Unclear why acpi_system_save_state()
was added to acpi_power_off() in 2.4.25
but evidently it was not a good idea.
This patch removes the offending call.
Comment 8 Len Brown 2004-11-17 01:09:24 UTC
shipped in 2.4.28-rc4 - closing.