Bug 3862
Summary: | poweroff fails if "lapic" forced on | ||
---|---|---|---|
Product: | ACPI | Reporter: | Stas Sergeev (stsp2) |
Component: | Power-Off | Assignee: | Alexey Starikovskiy (astarikovskiy) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | acpi-bugzilla |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.10-rc2-mmX | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
dmesg from -rc2-mm4
dmesg for -rc2-bk13 config for -rc2-mm4 config for -rc2-bk13 IRQ handling cleanup patch to revert from mm2 to mm1 ACPI power off cleanup patch ACPI power-off cleanup patch, take 2. |
Description
Stas Sergeev
2004-12-04 10:25:20 UTC
Created attachment 4218 [details]
dmesg from -rc2-mm4
Created attachment 4219 [details]
dmesg for -rc2-bk13
Created attachment 4220 [details]
config for -rc2-mm4
Created attachment 4221 [details]
config for -rc2-bk13
Please try 2.6.10-rc3, which has been reported to fix poweroff for others. This problem was -mm specific, but not any more. Now also the plain -rc3 sicks (-rc2 was OK). Is there any more info I have to provide to help fixing this? How about disable local APIC (without boot option lapic)? That helps! Why? It used to work and lapic used to work either. I use NMIs to debug some stuff, so this work-around is not very good for me. Thanks. In case I wasn't clear: booting without "lapic" helps with power-off. If no loapic works for you, please use it.
>Local APIC disabled by BIOS -- reenabling.
>Found and enabled local APIC!
This generally means BIOS can't handle local APIC shutdown correctly.
> If no loapic works for you, please use it.
> This generally means BIOS can't handle local APIC shutdown correctly.
Yes, but it worked with the earlier kernels...
If not that small fact, I'd of course just do what you say.
But as it is now, there might be some other way to get
it working I suppose, isnt it?
Stas, you can try this patch with "lapic". It should appear in 2.6.10... http://bugme.osdl.org/attachment.cgi?id=4187&action=view > http://bugme.osdl.org/attachment.cgi?id=4187&action=view Yes, exactly, that patch is a problem. Reverting it makes things to work properly again. Are there any fixes I should try, or: > It should appear in 2.6.10... probably means it should *disappear* from 2.6.10? Do you mean removing the patch in http://bugme.osdl.org/attachment.cgi? id=4187&action=view resolves your problem? Oh, this is quite interesting. The patch is designed to fix poweroff issue actually :(. > Do you mean removing the patch in http://bugme.osdl.org/attachment.cgi? > id=4187&action=view resolves your problem? Yes, I mean exactly that. It is now there in -rc3, which stopped working (-rc2 was OK for me). Reverting it from -rc3 helps. Haven't tried reverting it from -mm (where I noticed the problem first), but I bet it will work either. > patch is designed to fix poweroff issue actually :(. Then it definitely needs more work:) If there are any updates to that patch, I can test them. I'm glad to know that the system works correctly in its default configuration -- LAPIC disabled. The fix in bug 3643 is aimed exactly at the case where the LAPIC is forced on. Without it, high volume systems such as the D600 do not poweroff if "lapic" is forced on. Curiously, your system seem to behave exactly the opposite of the D600. the offending patch was the 1st one in bug 3643. It has been replaced by the 2nd patch in bug 3643, which is present in the latest 2.6.10 bk tree: http://linux.bkbits.net:8080/linux-2.6/cset@41ae14advqqMGMgR3rtIQw0iN6c29w It would be good to make sure that the latest still fails. I expect that it will, as all we did was move the lapic_shutdown later -- it still will run on your configuration. > I'm glad to know that the system works correctly in its default > configuration -- LAPIC disabled. I understand, but can't share that feeling:) > It would be good to make sure that the latest still fails. OK, I'll test it, but most likely this is the exactly the one that is in -rc3, right? And since -rc3 fails, I suppose it is that patch. I was never applying the first one by hands, I was only reverting it. So I suppose it was the second one making the problems for me from the very beginning, and the first one happend to be able to revert it without rejects (but with offsets and fuzz). > OK, I'll test it Done, nothing unexpected. Same results. > LAPIC is forced on. Without it, high volume systems > such as the D600 do not poweroff if "lapic" is forced on. > Curiously, your system seem to behave exactly the opposite > of the D600. I wonder why not enabling it works, but enabling and then disabling - doesn't. > I wonder why not enabling it works, but enabling and then
> disabling - doesn't.
Only the BIOS knows.
Linux by default used to force the LAPIC to be enabled.
This caused many systems to not power-off because when
we power-off we enter the BIOS SMM mode and apparently
confuse the BIOS code there that expects the LAPIC
to still be off. (indeed, it told us one wasn't there)
So we changed Linux's default configuration to do what
the BIOS says -- not force the LAPIC on when the BIOS
tells us one is not present. This is why you must
supply "lapic" on your box to get the LAPIC
when you didn't need it before.
On the D600 we found that when the LAPIC was forced
on with "lapic", we could successfully poweroff
with lapic_shutdown. Unclear why forcing the LAPIC
to be enabled on your system works without lapic_shutdown
and fails with lapic_shutdown.
Probably the root cause is something to do with
handling SMIs -- somehting Linux has no visibility into.
I'm not sure what to do with this issue.
By definition it will effect only those who have
forced the LAPIC on with "lapic".
Maybe the right thing to do is to provide yet
another kernel parameter to disable lapic_shutdown?
> another kernel parameter to disable lapic_shutdown?
That will work for me at least.
Maybe it is somehow possible to do more than one shutdown
attempt? Say, try to shutdown with lapic enabled, and if
that fails - disable lapic and retry.
Most probably this is a crap and can't be done, it is just
that after the "acpi_power_off" message I can still type
characters on the console so I assume the kernel is still
functional at that point. And if the kernel is alive, why
not several shutdown attempts can be performed?
Stas, does it make any difference if you comment out everyting before if (...) statement in disable_local_APIC()in arch/i386/kernel/apic.c? It looks like current code tries to disable LAPIC twice... Or change it to look like this: void disable_local_APIC(void) { unsigned long value; /* * Disable APIC (implies clearing of registers * for 82489DX!). */ if (enabled_via_apicbase) { unsigned int l, h; rdmsr(MSR_IA32_APICBASE, l, h); l &= ~MSR_IA32_APICBASE_ENABLE; wrmsr(MSR_IA32_APICBASE, l, h); } else { clear_local_APIC(); value = apic_read(APIC_SPIV); value &= ~APIC_SPIV_APIC_ENABLED; apic_write_around(APIC_SPIV, value); } } > Or change it to look like this:
Unfortunately this changed nothing visible :(
Stas, could you try this version? It still works for me, but does not do hard LAPIC shutdown... In this software off state it still responds to NMI, so may be it will work for you. void disable_local_APIC(void) { unsigned long value; clear_local_APIC(); /* * Disable APIC (implies clearing of registers * for 82489DX!). */ value = apic_read(APIC_SPIV); value &= ~APIC_SPIV_APIC_ENABLED; apic_write_around(APIC_SPIV, value); /* if (enabled_via_apicbase) { unsigned int l, h; rdmsr(MSR_IA32_APICBASE, l, h); l &= ~MSR_IA32_APICBASE_ENABLE; wrmsr(MSR_IA32_APICBASE, l, h); } */ } Actually I did that already. In your previous version I tried to invert the if condition like "if (!enabled_via_apicbase)" and that changed nothing, and apparently it was effectively similar to what you now want me to try. Only bypassing both "disables" makes the shutdown to work. Looking in the apic startup code, there so many things are done, that I am almost sure it ends up in some different state than it initially was. Maybe trying to disable some startup code will help to narrow that down? any luck if the local apic is enabled, but nmi_watchdog is not enabled? > any luck if the local apic is enabled, but nmi_watchdog is not enabled?
Unfortunately no luck.
Hmm, it looks like the bug have fixed itself and no longer appears in the latest -mm patches. What have been changed? Does it still work on your D600? Very strange... Closing? There is work of making ACPI interpreter compliant with spec, and I hape that is it... Thanks for your responses! Hmm, actually the bug fixed itself not completely, it have left some
work for the programmers still.
"poweroff" now works perfectly, thats great, but Alt-PrtSc-o is still
broken. It just locks up the machine until the NMI Oopser gets nervous,
then the crash.
Can this please also be fixed? NMI Oops is not in the logs, but if
you want to have a look, I can write it down by hands.
> There is work of making ACPI interpreter compliant with spec, and I hape that is
> it...
In case you are wondering, the fix happened between the 2.6.11-rc3-mm1
and 2.6.11-rc3-mm2.
> In case you are wondering, the fix happened between the 2.6.11-rc3-mm1
> and 2.6.11-rc3-mm2.
So it is fixed in the -mm tree only, and the base still fails?
Created attachment 4551 [details]
IRQ handling cleanup
Stas, could you try this patch on mm2 or any other version that failed?
mm3 introduced bad hack which hopefully will never get into base kernel...
> mm3 introduced bad hack which hopefully will never get into base kernel...
What kind of mm3?
I was talking about 2.6.11-rc3, which, by the time of writing this, doesn't
have the -mm3 patch AFAICS.
I'll try it on -mm1 I guess (tomorrow probably) and will let you know as
well as how the -rc4 goes.
Hello. > So it is fixed in the -mm tree only, and the base still fails? Exactly. 2.6.11-rc4 still fails. > Stas, could you try this patch on mm2 or any other version that failed? That patch makes no difference - the failed -mm1 still fails. Looking into the patch I have difficulties imagining how it could help. It is a cleanup, why would it help? Stas, I think, that http://www.ussg.iu.edu/hypermail/linux/kernel/0501.3/0467.html is that cured your problem. Following piece of code tries to do someting similar. ===== drivers/acpi/sleep/poweroff.c 1.3 vs edited ===== --- 1.3/drivers/acpi/sleep/poweroff.c 2004-12-01 09:05:02 +03:00 +++ edited/drivers/acpi/sleep/poweroff.c 2005-02-14 15:51:34 +03:00 @@ -18,8 +18,12 @@ /* Some SMP machines only can poweroff in boot CPU */ set_cpus_allowed(current, cpumask_of_cpu(0)); acpi_wakeup_gpe_poweroff_prepare(); - acpi_enter_sleep_state_prep(ACPI_STATE_S5); - ACPI_DISABLE_IRQS(); + if (!irqs_disabled()) { + acpi_enter_sleep_state_prep(ACPI_STATE_S5); + ACPI_DISABLE_IRQS(); + } else { + printk(KERN_WARNING "IRQs disabled. Skipping ACPI sleep prepare.\n"); + } acpi_enter_sleep_state(ACPI_STATE_S5); } > is that cured your problem. Following piece of code tries to do someting
> similar.
OK, I see. Though as your patch (that I tried) doesn't
fix the problem, your guess is probably wrong.
In case you really need to find out the patch that fixes the problem,
I think you can point me a few patches from -mm "broken out", and I
will revert them one by one from the working kernel and find out
which one does the trick.
Since the mainline kernel still fails, I imagine it may be important
for you to figure that out.
But as for me - I am satisfied with -mm:) I have to use it anyway
since it have also other bugfixes that are important for me.
It will help me a lot if you remove patch mentioned in e-mail discussion above and report if problem returns. Do you mean the entire patch from Eric? I'll do that within a few days (the problematic PC is at work, so I can't do the tests very quickly), and in a meantime it would be nice if you attach the patch here. It looks like it was inlined in the message and the formatting is probably lost, and it have typos, as another tester reported to that discussion. I can try patching it anyway, but if you have that patch handy, attaching it here will help:) Created attachment 4593 [details]
patch to revert from mm2 to mm1
these are changes in question from mm1 to mm2. apply with -R.
Confirmed, this patch fixes the problem. I applied it to -rc4 and it works. Created attachment 4609 [details]
ACPI power off cleanup patch
Stas, could you please try this patch?
Patch was tested and works. Why you call it a cleanup if it makes the problem to go away? :) However, it doesn't fix Alt-PrtSc-o problem, which is that it still doesn't work more than to lock up the machine. Created attachment 4648 [details]
ACPI power-off cleanup patch, take 2.
Problem is, these patches are ugly and will not pass through maintainers.
Please try one more (less ugly) patch... :)
This patch works too. http://bugme.osdl.org/attachment.cgi?id=4696&action=view Stas, bug #4041 seems to be relevant to your case, could you please check, that final patch works for you as well? That patch contains typos and bugs, it doesn't compile. However, after some obvious fixing it works. Is the Alt-PrtSc-o problem fixed too? Haven't checked, sorry. Till monday. Marking this bug as duplicate of #4041. Thanks for your patience. *** This bug has been marked as a duplicate of 4041 *** OK, guys, this bug is still present in 2.6.12-rc6 and 2.6.12-rc5-mm2, AFAICS. You marked this entry as duplicate of #4041 and #4041 is now closed too, but all that time long, nothing is fixed, as far as I can tell. I think I have to re-open this entry. What is the real state of that problem currently? #4041 is marked resolved, not closed, which means that patch is available. It could be closed only if patch appears in stable kernel. Andrew Morton droped acpi-test patch from his -mm tree, and thus you see this bug again. Len Brown, maintainer of the acpi-test patch hopes to be able to put all the patches back for inclusion into stable 2.6.13. shipped in 2.6.13-rc3 -- closing. |