Latest working kernel version: Never happened when using a x86 (32bit) kernel
Earliest failing kernel version: 126.96.36.199 (x86_64)
Hardware Environment: Acer 4520 laptop, Turion X2 processor, Pheonix BIOS
When I try to shutdown the computer, everything goes fine until the very end and it seems like it is going to power off (i.e. the keyboard leds blink), but it just freezes. If then I press some key or touch the touchpad (interruption?) then it shows 'Power off' and goes down.
After that, when I try to turn on the computer, it comes up but doesn't boot (doesn't even show the bios first screen), then I must hit power again for it go down and turn on again, and everything goes fine.
Nothing wrong happens when I reboot.
Steps to reproduce:
Shutdown the computer and ta-da!
It started to happen when I moved from the 32bit to 64bit kernel (2.6.27 series) and I normally use the gentoo patched kernel, which seems to solve the problem with the 'nolapic' parameter (noapic doesn't solve).
The info I'll post here is all based on the Linux original kernel (vanilla), though. No external modules are used with it. I installed it and configured based on default values (i.e. not using my old .config). For some reason, nolapic makes it fail at boot, due to some error with sata controller, which also happens when I use this .config with the gentoo patched kernel.
I forgot to say, it's a NVidia MCP67 chipset
Created attachment 20228 [details]
Kernel config for original 188.8.131.52 kernel
Created attachment 20229 [details]
Kernel config for gentoo 2.6.28-r1 kernel (on which nolapic works)
Created attachment 20230 [details]
dmesg from original kernel 184.108.40.206
Created attachment 20231 [details]
lsmod from original 220.127.116.11 kernel
Created attachment 20232 [details]
I don't understand why you are trying to use "nolapic"
on an SMP system with an IOAPIC. Why is there any
mention of "nolapic" in this bug report?
No 32-bit kernel ever failed? What, exactly was the latest run?
No 64-bit kernel ever worked? What, exactly was the earliest attempted?
why is CONFIG_FB=y and CONFIG_DRM=n -- you don't run X?
(In reply to comment #7)
> I don't understand why you are trying to use "nolapic"
> on an SMP system with an IOAPIC. Why is there any
> mention of "nolapic" in this bug report?
> No 32-bit kernel ever failed? What, exactly was the latest run?
> No 64-bit kernel ever worked? What, exactly was the earliest attempted?
> why is CONFIG_FB=y and CONFIG_DRM=n -- you don't run X?
Just mentioned because it solved the reported issue (probably by adding more issues, but the intention was to give a possible hint)
The latest 32-bit kernel I used was 2.6.27-gentoo-r8, which was also the first 64bit kernel I used
CONFIG_DRM=no is because i use NVidia proprietary driver which, just to make clear, is NOT installed on the vanilla kernel I used to make this report
the key-press-to-continue thing sounds like whatever
timer the system is using for timeouts may have stopped
(or its IRQ stopped working)
The fact that the subsequent boot get snarled suggests
that SMM is getting confused by the state of your system
on poweroff, and is leaving some bad state someplace
that does harm on subsequent power-on.
It is possible that the 32 and 64-bit kernels are using
If you can still boot the 32-bit kernel (say from a live CD)
that may give us a clue.
Some knobs to experiment with:
Created attachment 20234 [details]
dmesg using nmi_watchdog=0
With this option I have to hit a key 2 times for the computer to shut down
Created attachment 20235 [details]
dmesg using nolapic_timer
Fixed the problem
Created attachment 20236 [details]
dmesg using hpet=force
Didn't change anything (apparently)
Created attachment 20237 [details]
dmesg using idle=poll
Fixed the problem
For some reason, the power on bug is not happening, neither with vanilla or gentoo kernel, with any or no kernel parameter
From the test it seems that the box can be shutdown correctly if the C-state is disabled or it works on tick mode(The box can't be switched to NOHZ mode when adding the boo option of "nolapic_timer").
Is this issue related with AMD C1E?
How about this issue if the CONFIG_CPU_IDLE is unset in kernel configuration?
(In reply to comment #16)
> Hi, Tiago
> How about this issue if the CONFIG_CPU_IDLE is unset in kernel
Right on the spot, unsetting CONFIG_CPU_IDLE makes the bug disappear
Tested it disabled on vanilla 18.104.22.168 and shutdown 2 times sucessfully, then enabled again and shutdown failed
I downloaded an ubuntu 32-bit live-cd here, would a dmesg of it be useful?
yakui to generate a c1e cleanup patch.
Created attachment 20360 [details]
patch: use the saved pm_idle function in course of shutdown
Will you please try the debug patch on the latest kernel and see whether the problem still exists?
In the debug patch the saved pm_idle function will be used in course of shutdown/suspend/resume.
(In reply to comment #19)
> Created an attachment (id=20360) [details]
> patch: use the saved pm_idle function in course of shutdown
> Will you please try the debug patch on the latest kernel and see whether the
> problem still exists?
> In the debug patch the saved pm_idle function will be used in course of
I applied it on kernel 2.6.29-r6, but it didn't fix the problem
Is there any useful output from it I can give for you?
Thanks for the test.
Will you please double check whether the problem can be fixed when the CONFIG_ACPI_IDLE is unset in kernel configuration? (This should be done on 22.214.171.124)
In fact when the CONFIG_ACPI_IDLE is unset, the saved pm_idle(c1e_idle) is used to enter the CPU idle in course of suspend/resume/shutdown. And in the debug patch the same pm_idle function is used in course of suspend/resume/shutdown. But the result is different.
Created attachment 20369 [details]
patch: use the idle_poll to enter CPU idle in course of suspend/resume/shutdown
Will you please try the updated patch and see whether it works for you?
In this patch the idle_poll will be used to enter the idle state in course of suspend/resume/shutdown.
Sorry for the delay, I've been a little busy lately
I just applied your patch to vanilla sources 2.6.29-r8 and it FIXES the bug. Due to some reason it didn't in 2.6.29-r6, probably because your patch was written over a later version of the file in question. Anyway, the bug seems to be solved, I tested it four times and they all worked flawlessly.
Thank you, Yakui.
Thanks for the test.
The two different patches are attached. Which patch is used in your test? In comment #19 or #comment #22?
(In reply to comment #24)
> Hi, Tiago
> Thanks for the test.
> The two different patches are attached. Which patch is used in your test?
> In comment #19 or #comment #22?
the one from comment #19
> using C1E aware idle routine
> System has AMD C1E enabled
> CPU1: AMD Turion(tm) 64 X2 Mobile Technology TL-52 stepping 02
> Brought up 2 CPUs
2.6.27 added explicit AMD C1E support, which is supposed to deal
with the fact that AMD C1E breaks the TSC and LAPIC timer,
and make "nolapic_timer" un-necessary on these systems.
But this system needs "nolapic_timer" in order to shutdown cleanly,
so this new support appears to have a hole.
This doesn't look like an ACPI bug, it looks like an AMD timer bug.
I just installed the latest kernel (2.6.29) and this bug seems to be solved. Although we didn't reach the source of it, no one else except me confirmed it, so since it's not happening here anymore, I think we should close it, right?