Bug 3862

Summary: poweroff fails if "lapic" forced on
Product: ACPI Reporter: Stas Sergeev (stsp2)
Component: Power-OffAssignee: Alexey Starikovskiy (astarikovskiy)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: acpi-bugzilla
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.10-rc2-mmX Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmesg from -rc2-mm4
dmesg for -rc2-bk13
config for -rc2-mm4
config for -rc2-bk13
IRQ handling cleanup
patch to revert from mm2 to mm1
ACPI power off cleanup patch
ACPI power-off cleanup patch, take 2.

Description Stas Sergeev 2004-12-04 10:25:20 UTC
Distribution: FC2
Hardware Environment: logs attached
Software Environment: configs attached
Problem Description:
It seems like power-off doesn't work
for me any more in rc2-mm2 and -mm4.
It still works with rc2-bk13 though.
The last lines I see on the screen,
are as follows:
---
Halting system...
Shutdown: hda
Power down.
acpi_power_off called
---
At this point the PC should switch off,
but this does not happen these days. 

Steps to reproduce:
$ sudo poweroff
Comment 1 Stas Sergeev 2004-12-04 10:26:37 UTC
Created attachment 4218 [details]
dmesg from -rc2-mm4
Comment 2 Stas Sergeev 2004-12-04 10:27:07 UTC
Created attachment 4219 [details]
dmesg for -rc2-bk13
Comment 3 Stas Sergeev 2004-12-04 10:27:46 UTC
Created attachment 4220 [details]
config for -rc2-mm4
Comment 4 Stas Sergeev 2004-12-04 10:28:17 UTC
Created attachment 4221 [details]
config for -rc2-bk13
Comment 5 Len Brown 2004-12-05 22:32:46 UTC
Please try 2.6.10-rc3, which has been reported to fix
poweroff for others.
Comment 6 Stas Sergeev 2004-12-07 09:25:14 UTC
This problem was -mm specific, but not any more.
Now also the plain -rc3 sicks (-rc2 was OK).
Is there any more info I have to provide to help
fixing this?
Comment 7 Shaohua 2004-12-07 17:48:15 UTC
How about disable local APIC (without boot option lapic)?
Comment 8 Stas Sergeev 2004-12-09 08:59:00 UTC
That helps! Why?
It used to work and lapic used to work either.
I use NMIs to debug some stuff, so this work-around
is not very good for me.
Thanks.
Comment 9 Stas Sergeev 2004-12-11 00:53:47 UTC
In case I wasn't clear: booting without "lapic" helps with power-off.
Comment 10 Shaohua 2004-12-12 17:29:37 UTC
If no loapic works for you, please use it.
>Local APIC disabled by BIOS -- reenabling.
>Found and enabled local APIC!
This generally means BIOS can't handle local APIC shutdown correctly.
Comment 11 Stas Sergeev 2004-12-13 01:58:12 UTC
> If no loapic works for you, please use it.
> This generally means BIOS can't handle local APIC shutdown correctly.
Yes, but it worked with the earlier kernels...
If not that small fact, I'd of course just do what you say.
But as it is now, there might be some other way to get
it working I suppose, isnt it?
Comment 12 Alexey Starikovskiy 2004-12-14 04:28:44 UTC
Stas, you can try this patch with "lapic". It should appear in 2.6.10...
http://bugme.osdl.org/attachment.cgi?id=4187&action=view
Comment 13 Stas Sergeev 2004-12-15 09:37:01 UTC
> http://bugme.osdl.org/attachment.cgi?id=4187&action=view
Yes, exactly, that patch is a problem.
Reverting it makes things to work properly again.
Are there any fixes I should try, or:

> It should appear in 2.6.10...
probably means it should *disappear* from 2.6.10?
Comment 14 Shaohua 2004-12-15 16:49:46 UTC
Do you mean removing the patch in http://bugme.osdl.org/attachment.cgi?
id=4187&action=view resolves your problem? Oh, this is quite interesting. The 
patch is designed to fix poweroff issue actually :(.
Comment 15 Stas Sergeev 2004-12-15 20:36:05 UTC
> Do you mean removing the patch in http://bugme.osdl.org/attachment.cgi?
> id=4187&action=view resolves your problem?
Yes, I mean exactly that. It is now there in -rc3, which
stopped working (-rc2 was OK for me). Reverting it from
-rc3 helps. Haven't tried reverting it from -mm (where I
noticed the problem first), but I bet it will work either.

> patch is designed to fix poweroff issue actually :(.
Then it definitely needs more work:)
If there are any updates to that patch, I can test them.
Comment 16 Len Brown 2004-12-20 20:48:35 UTC
I'm glad to know that the system works correctly in its default 
configuration -- LAPIC disabled. 
 
The fix in bug 3643 is aimed exactly at the case where the 
LAPIC is forced on.  Without it, high volume systems 
such as the D600 do not poweroff if "lapic" is forced on. 
Curiously, your system seem to behave exactly the opposite 
of the D600. 
 
the offending patch was the 1st one in bug 3643. 
It has been replaced by the 2nd patch in bug 3643, 
which is present in the latest 2.6.10 bk tree: 
http://linux.bkbits.net:8080/linux-2.6/cset@41ae14advqqMGMgR3rtIQw0iN6c29w 
 
It would be good to make sure that the latest still fails. 
I expect that it will, as all we did was move the lapic_shutdown 
later -- it still will run on your configuration. 
 
 
Comment 17 Stas Sergeev 2004-12-20 21:14:32 UTC
> I'm glad to know that the system works correctly in its default 
> configuration -- LAPIC disabled.
I understand, but can't share that feeling:)

> It would be good to make sure that the latest still fails.
OK, I'll test it, but most likely this is the exactly the
one that is in -rc3, right? And since -rc3 fails, I suppose
it is that patch.
I was never applying the first one by hands, I was only
reverting it. So I suppose it was the second one making
the problems for me from the very beginning, and the first
one happend to be able to revert it without rejects (but with
offsets and fuzz).
Comment 18 Stas Sergeev 2004-12-21 09:03:18 UTC
> OK, I'll test it
Done, nothing unexpected. Same results.

> LAPIC is forced on.  Without it, high volume systems 
> such as the D600 do not poweroff if "lapic" is forced on. 
> Curiously, your system seem to behave exactly the opposite 
> of the D600.
I wonder why not enabling it works, but enabling and then
disabling - doesn't.
Comment 19 Len Brown 2004-12-22 21:49:14 UTC
> I wonder why not enabling it works, but enabling and then
> disabling - doesn't.

Only the BIOS knows.

Linux by default used to force the LAPIC to be enabled.
This caused many systems to not power-off because when
we power-off we enter the BIOS SMM mode and apparently
confuse the BIOS code there that expects the LAPIC
to still be off.  (indeed, it told us one wasn't there)

So we changed Linux's default configuration to do what
the BIOS says -- not force the LAPIC on when the BIOS
tells us one is not present.  This is why you must
supply "lapic" on your box to get the LAPIC
when you didn't need it before.

On the D600 we found that when the LAPIC was forced
on with "lapic", we could successfully poweroff
with lapic_shutdown.  Unclear why forcing the LAPIC
to be enabled on your system works without lapic_shutdown
and fails with lapic_shutdown.

Probably the root cause is something to do with
handling SMIs -- somehting Linux has no visibility into.

I'm not sure what to do with this issue.
By definition it will effect only those who have
forced the LAPIC on with "lapic".
Maybe the right thing to do is to provide yet
another kernel parameter to disable lapic_shutdown?
Comment 20 Stas Sergeev 2004-12-23 10:33:04 UTC
> another kernel parameter to disable lapic_shutdown?
That will work for me at least.
Maybe it is somehow possible to do more than one shutdown
attempt? Say, try to shutdown with lapic enabled, and if
that fails - disable lapic and retry.
Most probably this is a crap and can't be done, it is just
that after the "acpi_power_off" message I can still type
characters on the console so I assume the kernel is still
functional at that point. And if the kernel is alive, why
not several shutdown attempts can be performed?
Comment 21 Alexey Starikovskiy 2004-12-24 08:16:44 UTC
Stas, does it make any difference if you comment out everyting before if (...)
statement in disable_local_APIC()in arch/i386/kernel/apic.c? It looks like
current code tries to disable LAPIC twice...
Or change it to look like this:

void disable_local_APIC(void)
{
	unsigned long value;
	/*
	 * Disable APIC (implies clearing of registers
	 * for 82489DX!).
	 */
	if (enabled_via_apicbase) {
		unsigned int l, h;
		rdmsr(MSR_IA32_APICBASE, l, h);
		l &= ~MSR_IA32_APICBASE_ENABLE;
		wrmsr(MSR_IA32_APICBASE, l, h);
	} else {
		clear_local_APIC();
		value = apic_read(APIC_SPIV);
		value &= ~APIC_SPIV_APIC_ENABLED;
		apic_write_around(APIC_SPIV, value);
	}
}
Comment 22 Stas Sergeev 2004-12-27 09:28:23 UTC
> Or change it to look like this:
Unfortunately this changed nothing visible :(
Comment 23 Alexey Starikovskiy 2004-12-30 08:28:22 UTC
Stas, could you try this version? It still works for me, but does not do hard
LAPIC shutdown... In this software off state it still responds to NMI, so may be
it will work for you.

void disable_local_APIC(void)
{
        unsigned long value;

        clear_local_APIC();

        /*
         * Disable APIC (implies clearing of registers
         * for 82489DX!).
         */
        value = apic_read(APIC_SPIV);
        value &= ~APIC_SPIV_APIC_ENABLED;
        apic_write_around(APIC_SPIV, value);

/*      if (enabled_via_apicbase) {
                unsigned int l, h;
                rdmsr(MSR_IA32_APICBASE, l, h);
                l &= ~MSR_IA32_APICBASE_ENABLE;
                wrmsr(MSR_IA32_APICBASE, l, h);
        }
*/
}
Comment 24 Stas Sergeev 2004-12-30 09:07:41 UTC
Actually I did that already.
In your previous version I tried to invert the if condition
like "if (!enabled_via_apicbase)" and that changed nothing,
and apparently it was effectively similar to what you now
want me to try.
Only bypassing both "disables" makes the shutdown to work.
Looking in the apic startup code, there so many things are
done, that I am almost sure it ends up in some different
state than it initially was. Maybe trying to disable some
startup code will help to narrow that down?
Comment 25 Len Brown 2005-01-17 09:00:28 UTC
any luck if the local apic is enabled, but nmi_watchdog is not enabled?
Comment 26 Stas Sergeev 2005-01-18 10:01:01 UTC
> any luck if the local apic is enabled, but nmi_watchdog is not enabled?
Unfortunately no luck.
Comment 27 Stas Sergeev 2005-02-14 10:27:51 UTC
Hmm, it looks like the bug have fixed itself and no longer appears in the
latest -mm patches. What have been changed? Does it still work on your D600?
Very strange...
Closing?
Comment 28 Alexey Starikovskiy 2005-02-15 02:49:53 UTC
There is work of making ACPI interpreter compliant with spec, and I hape that is
it... Thanks for your responses!
Comment 29 Stas Sergeev 2005-02-15 09:14:17 UTC
Hmm, actually the bug fixed itself not completely, it have left some
work for the programmers still.
"poweroff" now works perfectly, thats great, but Alt-PrtSc-o is still
broken. It just locks up the machine until the NMI Oopser gets nervous,
then the crash.
Can this please also be fixed? NMI Oops is not in the logs, but if
you want to have a look, I can write it down by hands.

> There is work of making ACPI interpreter compliant with spec, and I hape that is
> it...
In case you are wondering, the fix happened between the 2.6.11-rc3-mm1
and 2.6.11-rc3-mm2.
Comment 30 Len Brown 2005-02-15 09:29:31 UTC
> In case you are wondering, the fix happened between the 2.6.11-rc3-mm1
> and 2.6.11-rc3-mm2.

So it is fixed in the -mm tree only, and the base still fails?
Comment 31 Alexey Starikovskiy 2005-02-15 09:45:55 UTC
Created attachment 4551 [details]
IRQ handling cleanup

Stas, could you try this patch on mm2 or any other version that failed?
mm3 introduced bad hack which hopefully will never get into base kernel...
Comment 32 Stas Sergeev 2005-02-15 10:04:26 UTC
> mm3 introduced bad hack which hopefully will never get into base kernel...
What kind of mm3?
I was talking about 2.6.11-rc3, which, by the time of writing this, doesn't
have the -mm3 patch AFAICS.
I'll try it on -mm1 I guess (tomorrow probably) and will let you know as
well as how the -rc4 goes.
Comment 33 Stas Sergeev 2005-02-21 09:47:23 UTC
Hello.

> So it is fixed in the -mm tree only, and the base still fails?
Exactly. 2.6.11-rc4 still fails.

> Stas, could you try this patch on mm2 or any other version that failed?
That patch makes no difference - the failed -mm1 still fails.
Looking into the patch I have difficulties imagining how it
could help. It is a cleanup, why would it help?
Comment 34 Alexey Starikovskiy 2005-02-22 04:50:54 UTC
Stas,
I think, that http://www.ussg.iu.edu/hypermail/linux/kernel/0501.3/0467.html
is that cured your problem. Following piece of code tries to do someting similar.
===== drivers/acpi/sleep/poweroff.c 1.3 vs edited =====
--- 1.3/drivers/acpi/sleep/poweroff.c	2004-12-01 09:05:02 +03:00
+++ edited/drivers/acpi/sleep/poweroff.c	2005-02-14 15:51:34 +03:00
@@ -18,8 +18,12 @@
 	/* Some SMP machines only can poweroff in boot CPU */
 	set_cpus_allowed(current, cpumask_of_cpu(0));
 	acpi_wakeup_gpe_poweroff_prepare();
-	acpi_enter_sleep_state_prep(ACPI_STATE_S5);
-	ACPI_DISABLE_IRQS();
+	if (!irqs_disabled()) {
+		acpi_enter_sleep_state_prep(ACPI_STATE_S5);
+		ACPI_DISABLE_IRQS();
+	} else {
+		printk(KERN_WARNING "IRQs disabled. Skipping ACPI sleep prepare.\n");
+	}
 	acpi_enter_sleep_state(ACPI_STATE_S5);
 }
Comment 35 Stas Sergeev 2005-02-22 05:54:21 UTC
> is that cured your problem. Following piece of code tries to do someting
> similar.
OK, I see. Though as your patch (that I tried) doesn't
fix the problem, your guess is probably wrong.
In case you really need to find out the patch that fixes the problem,
I think you can point me a few patches from -mm "broken out", and I
will revert them one by one from the working kernel and find out
which one does the trick.
Since the mainline kernel still fails, I imagine it may be important
for you to figure that out.
But as for me - I am satisfied with -mm:) I have to use it anyway
since it have also other bugfixes that are important for me.
Comment 36 Alexey Starikovskiy 2005-02-22 06:55:56 UTC
It will help me a lot if you remove patch mentioned in e-mail discussion above
and  report if problem returns. 
Comment 37 Stas Sergeev 2005-02-22 07:34:36 UTC
Do you mean the entire patch from Eric?
I'll do that within a few days (the problematic PC is at work,
so I can't do the tests very quickly), and in a meantime it
would be nice if you attach the patch here. It looks like it was
inlined in the message and the formatting is probably lost, and
it have typos, as another tester reported to that discussion. I
can try patching it anyway, but if you have that patch handy,
attaching it here will help:)
Comment 38 Alexey Starikovskiy 2005-02-22 10:32:27 UTC
Created attachment 4593 [details]
patch to revert from mm2 to mm1

these are changes in question from mm1 to mm2. apply with -R.
Comment 39 Stas Sergeev 2005-02-24 08:56:33 UTC
Confirmed, this patch fixes the problem. I applied it to -rc4 and it works.
Comment 40 Alexey Starikovskiy 2005-02-28 05:01:57 UTC
Created attachment 4609 [details]
ACPI power off cleanup patch

Stas, could you please try this patch?
Comment 41 Stas Sergeev 2005-03-02 09:18:01 UTC
Patch was tested and works. Why you call it a cleanup if it makes
the problem to go away? :)
However, it doesn't fix Alt-PrtSc-o problem, which is that it
still doesn't work more than to lock up the machine.
Comment 42 Alexey Starikovskiy 2005-03-03 08:34:28 UTC
Created attachment 4648 [details]
ACPI power-off cleanup patch, take 2.

Problem is, these patches are ugly and will not pass through maintainers.
Please try one more (less ugly) patch... :)
Comment 43 Stas Sergeev 2005-03-04 09:40:02 UTC
This patch works too.
Comment 44 Alexey Starikovskiy 2005-03-09 05:49:04 UTC
http://bugme.osdl.org/attachment.cgi?id=4696&action=view

Stas, bug #4041 seems to be relevant to your case, could you please check, that
final patch works for you as well?
Comment 45 Stas Sergeev 2005-03-10 11:39:59 UTC
That patch contains typos and bugs, it doesn't compile.
However, after some obvious fixing it works.
Comment 46 Alexey Starikovskiy 2005-03-11 09:42:18 UTC
Is the Alt-PrtSc-o problem fixed too? 
Comment 47 Stas Sergeev 2005-03-11 09:52:16 UTC
Haven't checked, sorry.
Till monday.
Comment 48 Alexey Starikovskiy 2005-03-14 01:42:54 UTC
Marking this bug as duplicate of #4041. Thanks for your patience.

*** This bug has been marked as a duplicate of 4041 ***
Comment 49 Stas Sergeev 2005-06-08 09:45:18 UTC
OK, guys, this bug is still present in 2.6.12-rc6 and
2.6.12-rc5-mm2, AFAICS.
You marked this entry as duplicate of #4041 and #4041 is
now closed too, but all that time long, nothing is fixed,
as far as I can tell.
I think I have to re-open this entry.
What is the real state of that problem currently?
Comment 50 Alexey Starikovskiy 2005-06-19 17:32:27 UTC
#4041 is marked resolved, not closed, which means that patch is available. It
could be closed only if patch appears in stable kernel.
Andrew Morton droped acpi-test patch from his -mm tree, and thus you see this
bug again. Len Brown, maintainer of the acpi-test patch hopes to be able to put
all the patches back for inclusion into stable 2.6.13.
Comment 51 Len Brown 2005-07-27 17:49:05 UTC
shipped in 2.6.13-rc3 -- closing.