Bug 4041 - a kexec patch causes poweroff to hang on some computers
Summary: a kexec patch causes poweroff to hang on some computers
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Off (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Alexey Starikovskiy
URL:
Keywords:
: 4160 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-01-14 09:10 UTC by Barry K. Nathan
Modified: 2005-07-27 17:49 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.11-rc1-mm1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
"powerforce" (tiny poweroff program) (313 bytes, text/plain)
2005-01-14 09:13 UTC, Barry K. Nathan
Details
reasonably minimal .config for test purposes (11.80 KB, text/plain)
2005-01-14 09:33 UTC, Barry K. Nathan
Details
serial console capture of a working shutdown (6.11 KB, text/plain)
2005-01-14 09:36 UTC, Barry K. Nathan
Details
output from broken shutdown (5.86 KB, text/plain)
2005-01-14 09:44 UTC, Barry K. Nathan
Details
output of "dmidecode" (13.90 KB, text/plain)
2005-01-14 09:48 UTC, Barry K. Nathan
Details
ACPI power-off cleanup patch (4.36 KB, patch)
2005-03-01 04:06 UTC, Alexey Starikovskiy
Details | Diff
ACPI power-off cleanup patch (7.31 KB, patch)
2005-03-03 08:36 UTC, Alexey Starikovskiy
Details | Diff
minimal, possibly ugly fix for attachment 4649 for !CONFIG_ACPI_SLEEP (892 bytes, patch)
2005-03-04 08:01 UTC, Barry K. Nathan
Details | Diff
modify magic sysrq poweroff infrastructure (881 bytes, patch)
2005-03-07 10:19 UTC, Barry K. Nathan
Details | Diff
allow magic sysrq to use a different poweroff handler (1.80 KB, patch)
2005-03-08 06:38 UTC, Barry K. Nathan
Details | Diff
change ACPI to handle Alt-SysRQ-O differently from regular poweroff (1.08 KB, patch)
2005-03-08 06:48 UTC, Barry K. Nathan
Details | Diff
Combined patch (6.15 KB, patch)
2005-03-09 05:47 UTC, Alexey Starikovskiy
Details | Diff
Fixed combined patch (8.60 KB, patch)
2005-03-15 04:02 UTC, Alexey Starikovskiy
Details | Diff

Description Barry K. Nathan 2005-01-14 09:10:18 UTC
Distribution: Fedora Core 2 or 3, or a highly stripped down "distribution"
(which I will attach to this bug)
Hardware Environment: Intel D815EGEW motherboard; IOGear 2-head KVM switch (need
to retest without the KVM, but I'll do that after I file this bug)
Software Environment: Minimal test environment consists of an ext2 partition
with two directories, /lost+found and /sbin, and one file, /sbin/init
Problem Description:

Some computers hang instead of properly shutting down when the following patch
is applied to the kernel tree:
kexec-i8259-shutdowni386.patch

(That is, the problem disappears from 2.6.11-rc1-mm1 once this patch is
reverted, and it appears on 2.6.11-rc1 plus just this patch.)

This problem was pretty widely seen with 2.6.9-based Fedora kernels, which
included kexec patches from 2.6.9-mm:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132761


Steps to reproduce:

1. Install kernel
2. Boot without disabling ACPI.
3. Run "shutdown -h now".

In more detail, for a minimal test environment:
1. Compile a minimal test kernel, preferably one that does not use modules and
that will fit onto a single floppy disk. I will be attaching one of the .configs
that I used.
2. Run "rdev arch/i386/boot/bzImage /dev/hda2", where "/dev/hda2" is the ext2
partition that you will install the minimal /sbin/init.
3. Prepare a floppy disk with the SYSLINUX boot loader (I used version 2.08).
4. Create a syslinux.cfg file on the floppy with the following line:
default bzimage
5. Add appropriate arguments for a serial console if needed.
6. Copy arch/i386/boot/bzImage onto the floppy.
7. gcc -Os -static -o powerforce powerforce.c (I will attach the source)
8. strip powerforce
9. Copy powerforce onto the ext2 partition as /sbin/init.
10. Reboot from the floppy.
11. See whether the computer shuts down properly or not.
12. To continue testing, reconfigure/repatch/change your kernel and/or
syslinux.cfg as needed, and repeat steps 1, 2, 6, 10-12 (or so) as needed.

Give me some time to attach everything (i.e. if you see this bug and not all the
attachments are here, come back in half an hour or something).
Comment 1 Barry K. Nathan 2005-01-14 09:13:21 UTC
Created attachment 4398 [details]
"powerforce" (tiny poweroff program)
Comment 2 Barry K. Nathan 2005-01-14 09:33:06 UTC
Created attachment 4399 [details]
reasonably minimal .config for test purposes
Comment 3 Barry K. Nathan 2005-01-14 09:36:15 UTC
Created attachment 4400 [details]
serial console capture of a working shutdown

Note that the two "atkbd.c" lines near the bottom do not appear if I boot
without plugging a keyboard in. However, leaving the keyboard unplugged (or
plugging a keyboard in directly instead of via a KVM) makes no difference as to
whether the computer successfully shuts down.
Comment 4 Barry K. Nathan 2005-01-14 09:44:15 UTC
Created attachment 4401 [details]
output from broken shutdown

This is the output from a broken shutdown (i.e. it hangs).
Comment 5 Barry K. Nathan 2005-01-14 09:48:25 UTC
Created attachment 4402 [details]
output of "dmidecode"

Just in case this helps figure anything out...
Comment 6 Barry K. Nathan 2005-01-14 09:50:26 UTC
Ok, I think that's it as far as the attachments go. If there's any more info you
need from me, just ask.
Comment 7 Barry K. Nathan 2005-01-14 09:53:28 UTC
FWIW I just got an e-mail from davej saying that the kexec patches weren't at
fault for all of the Fedora 2.6.9 ACPI shutdown bug reports. Just mentioning
this here for the sake of completeness.
Comment 8 Len Brown 2005-01-24 20:18:49 UTC
As this happens in ACPI mode, but not if acpi=off 
I'm moving it to the ACPI category 
Comment 9 Barry K. Nathan 2005-01-24 21:34:15 UTC
That reminds me, I need to see if it still happens after the recent kexec
overhaul (i.e. I need to test again with a newer -mm kernel). I don't know if
I'll be able to get to that tonight or if I'll have to put it off a few days.
Comment 10 Len Brown 2005-01-24 22:57:56 UTC
hmmm, applied 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/broken-out/kexec-i8259-shutdowni386.patch 
but havent' found a failing machine yet... 
 
Re: why the failure? 
My first guess is that we're confusing the BIOS -- or some BIOS' anyway... 
Comment 11 Barry K. Nathan 2005-01-25 01:14:42 UTC
Well, the triggering patch's name has changed to x86-i8259-shutdown.patch, and
the changelog entry now references this bug, but the effect is still the same
for me: ACPI poweroff works without that patch and doesn't work with that patch.


Regarding the motherboard/BIOS version, quoting from the dmidecode output I
attached earlier:

	BIOS Information
		Vendor: Intel Corp.
		Version: EW81510A.86A.0046.P05.0203250259
		Release Date: 03/25/2002
[...]
	Base Board Information
		Manufacturer: Intel Corporation              
		Product Name: D815EGEW                       
		Version: AAA69600-201                   
		Serial Number: AZEW21914723

IIRC, "D815EGEW" is what the motherboard's packaging said on it.

To download the BIOS, visit the following web site and choose "OS Independent"
as the operating system:
http://downloadfinder.intel.com/scripts-df/Product_Filter.asp?ProductID=783

Does this information help?
Comment 12 Barry K. Nathan 2005-01-25 14:51:49 UTC
Eric Biederman thinks he's found the cause of the problem:
http://marc.theaimsgroup.com/?l=linux-kernel&m=110665405402747&w=2

He posted a patch for testing:
http://marc.theaimsgroup.com/?l=linux-kernel&m=110665542929525&w=2

That patch is filled with typos ("apci" and "offf"), but once those are fixed,
my computer shuts down properly again.

Eric's latest message hasn't hit MARC yet, so I'll quote it here instead:

> Thanks.  Now I just need to come up with the good version unless one of         
> the acpi guys wants to volunteer.
Comment 13 Barry K. Nathan 2005-01-26 22:18:21 UTC
BTW, someone else (on LKML) tested Eric's patch and found that it broke
Alt-SysRq-O, but not regular shutdown...
Comment 14 Barry K. Nathan 2005-02-09 20:31:27 UTC
http://www.ussg.iu.edu/hypermail/linux/kernel/0501.3/0869.html

This LKML post is relevant to comment #13...
Comment 15 Pavel Machek 2005-03-01 02:53:23 UTC
Unfortunately, that patch breaks powerdown at the end of swsusp, and is very ugly
anyway (see mail "Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown"
on lkml).
Comment 16 Alexey Starikovskiy 2005-03-01 04:06:02 UTC
Created attachment 4620 [details]
ACPI power-off cleanup patch

Could you please try this patch?
Comment 17 Barry K. Nathan 2005-03-01 08:05:04 UTC
I'll test it as soon as I can. Hopefully that will be later today, but I can't
promise that.
Comment 18 Barry K. Nathan 2005-03-02 08:39:08 UTC
  LD      vmlinux
drivers/built-in.o(.text+0x2a309): In function `acpi_power_off':
: undefined reference to `pm_ops'
drivers/built-in.o(.text+0x47843): In function `sysdev_shutdown':
: undefined reference to `pm_ops'
make: *** [vmlinux] Error 1


This is with (2.6.11-rc5-mm1 - acpi_power_off-bug-fix.patch + attachment 4620 [details]),
but I get the same failure with (2.6.11 + x86-i8259-shutdown.patch + attachment
4620 [details]).
Comment 19 Alexey Starikovskiy 2005-03-03 06:25:36 UTC
Could you try to enable CONFIG_PM in .config?
Comment 20 Alexey Starikovskiy 2005-03-03 08:36:49 UTC
Created attachment 4649 [details]
ACPI power-off cleanup patch

Pavel,
I've tryed to make sysdev "acpi" as you suggested. Please take a look.
Comment 21 Barry K. Nathan 2005-03-03 08:56:20 UTC
Re: comment #19

I'll try that when I get a chance. That might be a couple of days though.
Comment 22 Barry K. Nathan 2005-03-04 06:42:30 UTC
Attachment 4620 [details] doesn't work, even with CONFIG_PM enabled (it compiles but it
doesn't really help matters, I get a kernel panic instead of a total freezeup).
I was going to attach output from booting with that patch, but since it's been
marked obsolete (I just noticed that), I guess there's no point to doing that.

I'll try the new patch now.
Comment 23 Barry K. Nathan 2005-03-04 07:37:46 UTC
Attachment 4649 [details] fails to compile if CONFIG_ACPI_SLEEP is disabled.
(CONFIG_ACPI_SLEEP is automatically disabled if CONFIG_EXPERIMENTAL is disabled.)

Here's how it blows up:

  CC      drivers/acpi/sleep/poweroff.o
drivers/acpi/sleep/poweroff.c: In function `acpi_sleep_prepare':
drivers/acpi/sleep/poweroff.c:22: error: `acpi_wakeup_address' undeclared (first
use in this function)
drivers/acpi/sleep/poweroff.c:22: error: (Each undeclared identifier is reported
only once
drivers/acpi/sleep/poweroff.c:22: error: for each function it appears in.)
drivers/acpi/sleep/poweroff.c: In function `acpi_poweroff_init':
drivers/acpi/sleep/poweroff.c:80: warning: ISO C90 forbids mixed declarations
and code
make[3]: *** [drivers/acpi/sleep/poweroff.o] Error 1
make[2]: *** [drivers/acpi/sleep] Error 2
make[1]: *** [drivers/acpi] Error 2
make: *** [drivers] Error 2

Perhaps I'll try fixing this error later, but first I'll enable
CONFIG_EXPERIMENTAL and CONFIG_ACPI_SLEEP and see if that compiles and works.
Comment 24 Barry K. Nathan 2005-03-04 08:01:27 UTC
Created attachment 4662 [details]
minimal, possibly ugly fix for attachment 4649 [details] for !CONFIG_ACPI_SLEEP

With this patch on top of attachment 4649 [details], CONFIG_ACPI_SLEEP can be disabled
and it will still compile, and it will still shut the computer down.

Note that I wrote this patch without taking the time to understand the code, so
it may not be technically correct.
Comment 25 Barry K. Nathan 2005-03-04 09:24:28 UTC
It looks like 4649+4662 only works if CONFIG_PM is enabled. If CONFIG_PM is
disabled, then I get the same freeze as before.

Does CONFIG_ACPI without CONFIG_PM even make sense? Maybe ACPI should depend on
PM. (I'm going to look back in changesets to see if I can find any history on this.)
Comment 26 Alexey Starikovskiy 2005-03-05 01:05:02 UTC
ACPI works not only for power management, but also for reporting machine
configuration to OS, and for the second purpose it does not need CONFIG_PM.
May be we should not register for beeing able to power-off machine in the case of 
CONFIG_PM is not enabled.
Comment 27 Barry K. Nathan 2005-03-05 01:54:29 UTC
> ACPI works not only for power management, but also for reporting machine
> configuration to OS, and for the second purpose it does not need CONFIG_PM.

Ok, I was aware of the second purpose but wasn't sure whether it also needed to
be supported without CONFIG_PM.

> May be we should not register for beeing able to power-off machine in the case
> of CONFIG_PM is not enabled.

Yeah, that's what I'm thinking.

BTW, the help text for CONFIG_PM makes it sound like you can't use ACPI without
CONFIG_PM. I guess that needs to be fixed.
Comment 28 Barry K. Nathan 2005-03-06 03:18:52 UTC
Ok, now that I have figured out what was causing my sporadic swsusp failures
(bug 4298) and how to work around it, I can test patches to see if they break
swsusp. (Might be another day or so before I get around to this, however.)
Comment 29 Barry K. Nathan 2005-03-06 07:04:27 UTC
swsusp works, but Alt-SysRq-O seems broken. The old quick-and-dirty patch broke
both swsusp (according to LKML posts; I didn't try it myself) and Alt-SysRq-O,
so this is progress.
Comment 30 Alexey Starikovskiy 2005-03-07 02:15:03 UTC
Magic key sequence does not shut down devices, it only calls power-off function.
We either should try to call prepare-to-shutdown before power-off or try to 
shutdown devices. Do not know what to prefer, as I guess there was the reason 
to not call device-shutdown in magic-key handler.
Comment 31 Barry K. Nathan 2005-03-07 03:08:33 UTC
IIRC, Eric Biederman was wondering the same thing about magic SysRQ.

AFAIK the SysRQ shutdown is really intended for emergency situations, so it
should be robust and work whenever other stuff is messed up, rather than being a
fully clean shutdown.
Comment 32 Barry K. Nathan 2005-03-07 09:59:59 UTC
I think I may have figured out a way to fix Alt-SysRQ-O. I'll see if I can have
a patch for testing later today.
Comment 33 Barry K. Nathan 2005-03-07 10:19:31 UTC
Created attachment 4686 [details]
modify magic sysrq poweroff infrastructure

I haven't tested this patch yet, but what it should do (if it works) is let
platform drivers define a pm_power_off_magic_sysrq() function as well as a
pm_power_off() function. That way, poweroff can be done differently for sysrq.
If pm_power_off_magic_sysrq() does not get defined/set, then plain
pm_power_off() should be used.

This patch alone should not change any behavior, but it should be possible to
use this as the basis for a patch that fixes Alt-SysRQ-O for ACPI, I hope...
Comment 34 Barry K. Nathan 2005-03-07 11:11:21 UTC
Comment on attachment 4686 [details]
modify magic sysrq poweroff infrastructure

*sigh* This patch is broken (it may be necessary but it's not sufficient). I
plan to make a fixed version, but that may not happen today.
Comment 35 Barry K. Nathan 2005-03-07 11:28:02 UTC
I have what I think is a fixed version (it compiles and it doesn't break
Alt-SysRQ-O anymore as far as I can tell) but I won't have it ready for public
consumption until later today.
Comment 36 Barry K. Nathan 2005-03-08 06:38:47 UTC
Created attachment 4692 [details]
allow magic sysrq to use a different poweroff handler
Comment 37 Barry K. Nathan 2005-03-08 06:48:27 UTC
Created attachment 4693 [details]
change ACPI to handle Alt-SysRQ-O differently from regular poweroff

This depends on attachment 4692 [details].

BTW, I forgot to mention this when attaching 4692, but that patch only includes
the needed changes for the core kernel and for i386 (it may not even cover all
i386 subarches). I can create patches for other architectures (and for other
i386 subarches if needed) after the overall design gets reviewed and approved.

Also, FWIW, an alternate approach to attachment 4692 [details] would be to add a
parameter to pm_power_off, rather than adding the pm_power_off_magic_sysrq
function, for instance:
void (*pm_power_off)(int called_from_magic_sysrq)
but that would involve touching all callers of pm_power_off, so it would be
more invasive for no gain IMO.


I tested 4692 and this patch, and it seems to be completely working -- but the
kernel I tested turned out to have other problems; it seems it was miscompiled
(I think it's something I messed up when setting up distcc). I *think* all the
miscompiles affected modules and not the core kernel, but to be safe I'm
recompiling it now and will retest right after it finishes.
Comment 38 Barry K. Nathan 2005-03-08 08:21:24 UTC
Actually, it looks like it's not miscompiling, so much as I'm getting massive
XFS filesystem corruption on my test machine (on a kernel without *any* of these
patches). My compile cluster is actually working properly.

The upshot is that the patches probably work, but the test results are suspect.
I need to get to the bottom of the FS corruption issues first before I can do
fully proper testing. :( Hopefully this won't take *too* long.
Comment 39 Barry K. Nathan 2005-03-08 09:49:51 UTC
After running xfs_repair, I can't reproduce the XFS filesystem problems anymore.
Oh well.

Anyway, I can now report that with the fully patched kernel, poweroff via both
"shutdown -h now" and magic sysrq is working. Yay! (This is with CONFIG_PM and
CONFIG_ACPI_SLEEP enabled.)

Fully patched means applying all of the following in this order:
x86-i8259-shutdown.patch [this used to cause shutdown freezes]
attachment 4649 [details]
attachment 4662 [details]
attachment 4692 [details]
attachment 4693 [details]

I just realized that I now need to test with CONFIG_ACPI_SLEEP disabled, so I'm
recompiling with that configuration now. If the tests with that work too, then I
think it will be time to start getting this stuff reviewed.


By the way, regarding attachment 4662 [details]: What purpose does the
"ACPI_FLUSH_CPU_CACHE();" serve in
drivers/acpi/sleep/poweroff.c:acpi_sleep_prepare()? I'm wondering if that really
needs to be two #ifdef's or if it could be combined into one.
Comment 40 Barry K. Nathan 2005-03-08 10:40:15 UTC
Ok, I've tested with CONFIG_ACPI_SLEEP disabled. That too works, both for sysrq
and shutdown -h.

I guess the next step is to e-mail relevant people/lists and let them know about
the progress on this bug, but I'm having a hard time thinking about the details
of that right now, so I'll revisit this when I get a chance (maybe this evening,
or maybe in a couple of days if I happen to be really busy).
Comment 41 Alexey Starikovskiy 2005-03-09 05:47:20 UTC
Created attachment 4696 [details]
Combined patch

Combined previous 4 patches and changed patches #3&4 to be less intrusive.
Comment 42 Barry K. Nathan 2005-03-12 04:41:35 UTC
I'll test the patch later this weekend. Right now I can't test it because I'm
too busy with stuff unrelated to Linux; that is also why I waited this long
before responding at all.
Comment 43 Barry K. Nathan 2005-03-13 08:21:21 UTC
I tested the version of the patch in 2.6.11-mm3 (although I patched it into a
Fedora kernel, much like my previous testing). It works. Cool!
Comment 44 Alexey Starikovskiy 2005-03-14 01:42:58 UTC
*** Bug 3862 has been marked as a duplicate of this bug. ***
Comment 45 Alexey Starikovskiy 2005-03-14 01:43:26 UTC
*** Bug 4160 has been marked as a duplicate of this bug. ***
Comment 46 Alexey Starikovskiy 2005-03-14 01:43:51 UTC
*** Bug 4244 has been marked as a duplicate of this bug. ***
Comment 47 Julien HENRY 2005-03-15 01:09:35 UTC
I tried the patch last night in order to solve my problem (Bug 4244) but first,
I received a compilation error, caused by a mistake in variable name :
In poweroff.c, function acpi_sleep_prepare, the variable sleep_prepared is
declared, but shutdown_prepared is used after.

After correction, the compilation works, but it don't solve my problem.
Comment 48 Barry K. Nathan 2005-03-15 02:53:34 UTC
(Just some general notes for anyone reading this bug)

I tested by applying both of the following patches in this order, FWIW
(IOW, this is what fixes it for me):

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/acpi-poweroff-fix.patch
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/acpi-poweroff-fix-fix.patch

Also, I'm changing the bug back to RESOLVED/CODE_FIX. That does not
necessarily imply that the code has been tested to work (that's VERIFIED or
CLOSED, not RESOLVED), but it means that a new patch was written to fix this
bug (which is in fact true).

BTW, the exact ACPI shutdown bug that I reported can be worked around by
reverting this patch (which is in -mm but not in mainline):
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/x86-i8259-shutdown.patch

If you're having ACPI shutdown problems without the x86-i8259-shutdown
patch in your kernel, then that means your ACPI shutdown problem is not
exactly the same one that I reported -- but the patches to fix my problem
could fix other problems too, so you should still test them.
Comment 49 Alexey Starikovskiy 2005-03-15 04:02:47 UTC
Created attachment 4722 [details]
Fixed combined patch

Patch with indentation fixes and fixes suggested by Andrew Morton
Comment 50 Len Brown 2005-03-18 13:21:41 UTC
applied to acpi-test tree 
Comment 51 Len Brown 2005-07-27 17:49:41 UTC
shipped in 2.6.13-rc3 -- closing.

Note You need to log in before you can comment on or make changes to this bug.