Distribution: Fedora Core 2 or 3, or a highly stripped down "distribution" (which I will attach to this bug) Hardware Environment: Intel D815EGEW motherboard; IOGear 2-head KVM switch (need to retest without the KVM, but I'll do that after I file this bug) Software Environment: Minimal test environment consists of an ext2 partition with two directories, /lost+found and /sbin, and one file, /sbin/init Problem Description: Some computers hang instead of properly shutting down when the following patch is applied to the kernel tree: kexec-i8259-shutdowni386.patch (That is, the problem disappears from 2.6.11-rc1-mm1 once this patch is reverted, and it appears on 2.6.11-rc1 plus just this patch.) This problem was pretty widely seen with 2.6.9-based Fedora kernels, which included kexec patches from 2.6.9-mm: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132761 Steps to reproduce: 1. Install kernel 2. Boot without disabling ACPI. 3. Run "shutdown -h now". In more detail, for a minimal test environment: 1. Compile a minimal test kernel, preferably one that does not use modules and that will fit onto a single floppy disk. I will be attaching one of the .configs that I used. 2. Run "rdev arch/i386/boot/bzImage /dev/hda2", where "/dev/hda2" is the ext2 partition that you will install the minimal /sbin/init. 3. Prepare a floppy disk with the SYSLINUX boot loader (I used version 2.08). 4. Create a syslinux.cfg file on the floppy with the following line: default bzimage 5. Add appropriate arguments for a serial console if needed. 6. Copy arch/i386/boot/bzImage onto the floppy. 7. gcc -Os -static -o powerforce powerforce.c (I will attach the source) 8. strip powerforce 9. Copy powerforce onto the ext2 partition as /sbin/init. 10. Reboot from the floppy. 11. See whether the computer shuts down properly or not. 12. To continue testing, reconfigure/repatch/change your kernel and/or syslinux.cfg as needed, and repeat steps 1, 2, 6, 10-12 (or so) as needed. Give me some time to attach everything (i.e. if you see this bug and not all the attachments are here, come back in half an hour or something).
Created attachment 4398 [details] "powerforce" (tiny poweroff program)
Created attachment 4399 [details] reasonably minimal .config for test purposes
Created attachment 4400 [details] serial console capture of a working shutdown Note that the two "atkbd.c" lines near the bottom do not appear if I boot without plugging a keyboard in. However, leaving the keyboard unplugged (or plugging a keyboard in directly instead of via a KVM) makes no difference as to whether the computer successfully shuts down.
Created attachment 4401 [details] output from broken shutdown This is the output from a broken shutdown (i.e. it hangs).
Created attachment 4402 [details] output of "dmidecode" Just in case this helps figure anything out...
Ok, I think that's it as far as the attachments go. If there's any more info you need from me, just ask.
FWIW I just got an e-mail from davej saying that the kexec patches weren't at fault for all of the Fedora 2.6.9 ACPI shutdown bug reports. Just mentioning this here for the sake of completeness.
As this happens in ACPI mode, but not if acpi=off I'm moving it to the ACPI category
That reminds me, I need to see if it still happens after the recent kexec overhaul (i.e. I need to test again with a newer -mm kernel). I don't know if I'll be able to get to that tonight or if I'll have to put it off a few days.
hmmm, applied ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc1/2.6.11-rc1-mm1/broken-out/kexec-i8259-shutdowni386.patch but havent' found a failing machine yet... Re: why the failure? My first guess is that we're confusing the BIOS -- or some BIOS' anyway...
Well, the triggering patch's name has changed to x86-i8259-shutdown.patch, and the changelog entry now references this bug, but the effect is still the same for me: ACPI poweroff works without that patch and doesn't work with that patch. Regarding the motherboard/BIOS version, quoting from the dmidecode output I attached earlier: BIOS Information Vendor: Intel Corp. Version: EW81510A.86A.0046.P05.0203250259 Release Date: 03/25/2002 [...] Base Board Information Manufacturer: Intel Corporation Product Name: D815EGEW Version: AAA69600-201 Serial Number: AZEW21914723 IIRC, "D815EGEW" is what the motherboard's packaging said on it. To download the BIOS, visit the following web site and choose "OS Independent" as the operating system: http://downloadfinder.intel.com/scripts-df/Product_Filter.asp?ProductID=783 Does this information help?
Eric Biederman thinks he's found the cause of the problem: http://marc.theaimsgroup.com/?l=linux-kernel&m=110665405402747&w=2 He posted a patch for testing: http://marc.theaimsgroup.com/?l=linux-kernel&m=110665542929525&w=2 That patch is filled with typos ("apci" and "offf"), but once those are fixed, my computer shuts down properly again. Eric's latest message hasn't hit MARC yet, so I'll quote it here instead: > Thanks. Now I just need to come up with the good version unless one of > the acpi guys wants to volunteer.
BTW, someone else (on LKML) tested Eric's patch and found that it broke Alt-SysRq-O, but not regular shutdown...
http://www.ussg.iu.edu/hypermail/linux/kernel/0501.3/0869.html This LKML post is relevant to comment #13...
Unfortunately, that patch breaks powerdown at the end of swsusp, and is very ugly anyway (see mail "Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown" on lkml).
Created attachment 4620 [details] ACPI power-off cleanup patch Could you please try this patch?
I'll test it as soon as I can. Hopefully that will be later today, but I can't promise that.
LD vmlinux drivers/built-in.o(.text+0x2a309): In function `acpi_power_off': : undefined reference to `pm_ops' drivers/built-in.o(.text+0x47843): In function `sysdev_shutdown': : undefined reference to `pm_ops' make: *** [vmlinux] Error 1 This is with (2.6.11-rc5-mm1 - acpi_power_off-bug-fix.patch + attachment 4620 [details]), but I get the same failure with (2.6.11 + x86-i8259-shutdown.patch + attachment 4620 [details]).
Could you try to enable CONFIG_PM in .config?
Created attachment 4649 [details] ACPI power-off cleanup patch Pavel, I've tryed to make sysdev "acpi" as you suggested. Please take a look.
Re: comment #19 I'll try that when I get a chance. That might be a couple of days though.
Attachment 4620 [details] doesn't work, even with CONFIG_PM enabled (it compiles but it doesn't really help matters, I get a kernel panic instead of a total freezeup). I was going to attach output from booting with that patch, but since it's been marked obsolete (I just noticed that), I guess there's no point to doing that. I'll try the new patch now.
Attachment 4649 [details] fails to compile if CONFIG_ACPI_SLEEP is disabled. (CONFIG_ACPI_SLEEP is automatically disabled if CONFIG_EXPERIMENTAL is disabled.) Here's how it blows up: CC drivers/acpi/sleep/poweroff.o drivers/acpi/sleep/poweroff.c: In function `acpi_sleep_prepare': drivers/acpi/sleep/poweroff.c:22: error: `acpi_wakeup_address' undeclared (first use in this function) drivers/acpi/sleep/poweroff.c:22: error: (Each undeclared identifier is reported only once drivers/acpi/sleep/poweroff.c:22: error: for each function it appears in.) drivers/acpi/sleep/poweroff.c: In function `acpi_poweroff_init': drivers/acpi/sleep/poweroff.c:80: warning: ISO C90 forbids mixed declarations and code make[3]: *** [drivers/acpi/sleep/poweroff.o] Error 1 make[2]: *** [drivers/acpi/sleep] Error 2 make[1]: *** [drivers/acpi] Error 2 make: *** [drivers] Error 2 Perhaps I'll try fixing this error later, but first I'll enable CONFIG_EXPERIMENTAL and CONFIG_ACPI_SLEEP and see if that compiles and works.
Created attachment 4662 [details] minimal, possibly ugly fix for attachment 4649 [details] for !CONFIG_ACPI_SLEEP With this patch on top of attachment 4649 [details], CONFIG_ACPI_SLEEP can be disabled and it will still compile, and it will still shut the computer down. Note that I wrote this patch without taking the time to understand the code, so it may not be technically correct.
It looks like 4649+4662 only works if CONFIG_PM is enabled. If CONFIG_PM is disabled, then I get the same freeze as before. Does CONFIG_ACPI without CONFIG_PM even make sense? Maybe ACPI should depend on PM. (I'm going to look back in changesets to see if I can find any history on this.)
ACPI works not only for power management, but also for reporting machine configuration to OS, and for the second purpose it does not need CONFIG_PM. May be we should not register for beeing able to power-off machine in the case of CONFIG_PM is not enabled.
> ACPI works not only for power management, but also for reporting machine > configuration to OS, and for the second purpose it does not need CONFIG_PM. Ok, I was aware of the second purpose but wasn't sure whether it also needed to be supported without CONFIG_PM. > May be we should not register for beeing able to power-off machine in the case > of CONFIG_PM is not enabled. Yeah, that's what I'm thinking. BTW, the help text for CONFIG_PM makes it sound like you can't use ACPI without CONFIG_PM. I guess that needs to be fixed.
Ok, now that I have figured out what was causing my sporadic swsusp failures (bug 4298) and how to work around it, I can test patches to see if they break swsusp. (Might be another day or so before I get around to this, however.)
swsusp works, but Alt-SysRq-O seems broken. The old quick-and-dirty patch broke both swsusp (according to LKML posts; I didn't try it myself) and Alt-SysRq-O, so this is progress.
Magic key sequence does not shut down devices, it only calls power-off function. We either should try to call prepare-to-shutdown before power-off or try to shutdown devices. Do not know what to prefer, as I guess there was the reason to not call device-shutdown in magic-key handler.
IIRC, Eric Biederman was wondering the same thing about magic SysRQ. AFAIK the SysRQ shutdown is really intended for emergency situations, so it should be robust and work whenever other stuff is messed up, rather than being a fully clean shutdown.
I think I may have figured out a way to fix Alt-SysRQ-O. I'll see if I can have a patch for testing later today.
Created attachment 4686 [details] modify magic sysrq poweroff infrastructure I haven't tested this patch yet, but what it should do (if it works) is let platform drivers define a pm_power_off_magic_sysrq() function as well as a pm_power_off() function. That way, poweroff can be done differently for sysrq. If pm_power_off_magic_sysrq() does not get defined/set, then plain pm_power_off() should be used. This patch alone should not change any behavior, but it should be possible to use this as the basis for a patch that fixes Alt-SysRQ-O for ACPI, I hope...
Comment on attachment 4686 [details] modify magic sysrq poweroff infrastructure *sigh* This patch is broken (it may be necessary but it's not sufficient). I plan to make a fixed version, but that may not happen today.
I have what I think is a fixed version (it compiles and it doesn't break Alt-SysRQ-O anymore as far as I can tell) but I won't have it ready for public consumption until later today.
Created attachment 4692 [details] allow magic sysrq to use a different poweroff handler
Created attachment 4693 [details] change ACPI to handle Alt-SysRQ-O differently from regular poweroff This depends on attachment 4692 [details]. BTW, I forgot to mention this when attaching 4692, but that patch only includes the needed changes for the core kernel and for i386 (it may not even cover all i386 subarches). I can create patches for other architectures (and for other i386 subarches if needed) after the overall design gets reviewed and approved. Also, FWIW, an alternate approach to attachment 4692 [details] would be to add a parameter to pm_power_off, rather than adding the pm_power_off_magic_sysrq function, for instance: void (*pm_power_off)(int called_from_magic_sysrq) but that would involve touching all callers of pm_power_off, so it would be more invasive for no gain IMO. I tested 4692 and this patch, and it seems to be completely working -- but the kernel I tested turned out to have other problems; it seems it was miscompiled (I think it's something I messed up when setting up distcc). I *think* all the miscompiles affected modules and not the core kernel, but to be safe I'm recompiling it now and will retest right after it finishes.
Actually, it looks like it's not miscompiling, so much as I'm getting massive XFS filesystem corruption on my test machine (on a kernel without *any* of these patches). My compile cluster is actually working properly. The upshot is that the patches probably work, but the test results are suspect. I need to get to the bottom of the FS corruption issues first before I can do fully proper testing. :( Hopefully this won't take *too* long.
After running xfs_repair, I can't reproduce the XFS filesystem problems anymore. Oh well. Anyway, I can now report that with the fully patched kernel, poweroff via both "shutdown -h now" and magic sysrq is working. Yay! (This is with CONFIG_PM and CONFIG_ACPI_SLEEP enabled.) Fully patched means applying all of the following in this order: x86-i8259-shutdown.patch [this used to cause shutdown freezes] attachment 4649 [details] attachment 4662 [details] attachment 4692 [details] attachment 4693 [details] I just realized that I now need to test with CONFIG_ACPI_SLEEP disabled, so I'm recompiling with that configuration now. If the tests with that work too, then I think it will be time to start getting this stuff reviewed. By the way, regarding attachment 4662 [details]: What purpose does the "ACPI_FLUSH_CPU_CACHE();" serve in drivers/acpi/sleep/poweroff.c:acpi_sleep_prepare()? I'm wondering if that really needs to be two #ifdef's or if it could be combined into one.
Ok, I've tested with CONFIG_ACPI_SLEEP disabled. That too works, both for sysrq and shutdown -h. I guess the next step is to e-mail relevant people/lists and let them know about the progress on this bug, but I'm having a hard time thinking about the details of that right now, so I'll revisit this when I get a chance (maybe this evening, or maybe in a couple of days if I happen to be really busy).
Created attachment 4696 [details] Combined patch Combined previous 4 patches and changed patches #3&4 to be less intrusive.
I'll test the patch later this weekend. Right now I can't test it because I'm too busy with stuff unrelated to Linux; that is also why I waited this long before responding at all.
I tested the version of the patch in 2.6.11-mm3 (although I patched it into a Fedora kernel, much like my previous testing). It works. Cool!
*** Bug 3862 has been marked as a duplicate of this bug. ***
*** Bug 4160 has been marked as a duplicate of this bug. ***
*** Bug 4244 has been marked as a duplicate of this bug. ***
I tried the patch last night in order to solve my problem (Bug 4244) but first, I received a compilation error, caused by a mistake in variable name : In poweroff.c, function acpi_sleep_prepare, the variable sleep_prepared is declared, but shutdown_prepared is used after. After correction, the compilation works, but it don't solve my problem.
(Just some general notes for anyone reading this bug) I tested by applying both of the following patches in this order, FWIW (IOW, this is what fixes it for me): http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/acpi-poweroff-fix.patch http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/acpi-poweroff-fix-fix.patch Also, I'm changing the bug back to RESOLVED/CODE_FIX. That does not necessarily imply that the code has been tested to work (that's VERIFIED or CLOSED, not RESOLVED), but it means that a new patch was written to fix this bug (which is in fact true). BTW, the exact ACPI shutdown bug that I reported can be worked around by reverting this patch (which is in -mm but not in mainline): http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/x86-i8259-shutdown.patch If you're having ACPI shutdown problems without the x86-i8259-shutdown patch in your kernel, then that means your ACPI shutdown problem is not exactly the same one that I reported -- but the patches to fix my problem could fix other problems too, so you should still test them.
Created attachment 4722 [details] Fixed combined patch Patch with indentation fixes and fixes suggested by Andrew Morton
applied to acpi-test tree
shipped in 2.6.13-rc3 -- closing.