Bug 12700 - poweroff hang unless "nolapic_timer" - Acer 4520 laptop, NVidia MCP67, Turion X2 processor
Summary: poweroff hang unless "nolapic_timer" - Acer 4520 laptop, NVidia MCP67, Turion...
Status: RESOLVED CODE_FIX
Alias: None
Product: Timers
Classification: Unclassified
Component: Interval Timers (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Thomas Gleixner
URL:
Keywords:
Depends on:
Blocks: 56331
  Show dependency tree
 
Reported: 2009-02-13 10:16 UTC by Tiago Santos
Modified: 2013-04-09 06:23 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.28.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel config for original 2.6.28.4 kernel (53.48 KB, application/octet-stream)
2009-02-13 10:20 UTC, Tiago Santos
Details
Kernel config for gentoo 2.6.28-r1 kernel (on which nolapic works) (54.65 KB, application/octet-stream)
2009-02-13 10:21 UTC, Tiago Santos
Details
dmesg from original kernel 2.6.28.4 (31.35 KB, text/x-log)
2009-02-13 10:21 UTC, Tiago Santos
Details
lsmod from original 2.6.28.4 kernel (808 bytes, text/x-log)
2009-02-13 10:21 UTC, Tiago Santos
Details
lspci (2.10 KB, text/x-log)
2009-02-13 10:21 UTC, Tiago Santos
Details
dmesg using nmi_watchdog=0 (31.38 KB, text/x-log)
2009-02-13 13:15 UTC, Tiago Santos
Details
dmesg using nolapic_timer (31.56 KB, text/x-log)
2009-02-13 13:15 UTC, Tiago Santos
Details
dmesg using hpet=force (31.37 KB, text/x-log)
2009-02-13 13:16 UTC, Tiago Santos
Details
dmesg using idle=poll (31.19 KB, text/x-log)
2009-02-13 13:16 UTC, Tiago Santos
Details
patch: use the saved pm_idle function in course of shutdown (1.54 KB, patch)
2009-02-25 00:06 UTC, ykzhao
Details | Diff
patch: use the idle_poll to enter CPU idle in course of suspend/resume/shutdown (1.15 KB, patch)
2009-02-25 18:15 UTC, ykzhao
Details | Diff

Description Tiago Santos 2009-02-13 10:16:40 UTC
Latest working kernel version: Never happened when using a x86 (32bit) kernel
Earliest failing kernel version: 2.6.28.4 (x86_64)
Distribution: gentoo
Hardware Environment: Acer 4520 laptop, Turion X2 processor, Pheonix BIOS
Software Environment:

Problem Description:
When I try to shutdown the computer, everything goes fine until the very end and it seems like it is going to power off (i.e. the keyboard leds blink), but it just freezes. If then I press some key or touch the touchpad (interruption?) then it shows 'Power off' and goes down.
After that, when I try to turn on the computer, it comes up but doesn't boot (doesn't even show the bios first screen), then I must hit power again for it go down and turn on again, and everything goes fine.
Nothing wrong happens when I reboot.

Steps to reproduce:
Shutdown the computer and ta-da!

Additional info:
It started to happen when I moved from the 32bit to 64bit kernel (2.6.27 series) and I normally use the gentoo patched kernel, which seems to solve the problem with the 'nolapic' parameter (noapic doesn't solve).

The info I'll post here is all based on the Linux original kernel (vanilla), though. No external modules are used with it. I installed it and configured based on default values (i.e. not using my old .config). For some reason, nolapic makes it fail at boot, due to some error with sata controller, which also happens when I use this .config with the gentoo patched kernel.
Comment 1 Tiago Santos 2009-02-13 10:18:08 UTC
I forgot to say, it's a NVidia MCP67 chipset
Comment 2 Tiago Santos 2009-02-13 10:20:28 UTC
Created attachment 20228 [details]
Kernel config for original 2.6.28.4 kernel
Comment 3 Tiago Santos 2009-02-13 10:21:00 UTC
Created attachment 20229 [details]
Kernel config for gentoo 2.6.28-r1 kernel (on which nolapic works)
Comment 4 Tiago Santos 2009-02-13 10:21:21 UTC
Created attachment 20230 [details]
dmesg from original kernel 2.6.28.4
Comment 5 Tiago Santos 2009-02-13 10:21:37 UTC
Created attachment 20231 [details]
lsmod from original 2.6.28.4 kernel
Comment 6 Tiago Santos 2009-02-13 10:21:50 UTC
Created attachment 20232 [details]
lspci
Comment 7 Len Brown 2009-02-13 11:12:21 UTC
I don't understand why you are trying to use "nolapic"
on an SMP system with an IOAPIC.  Why is there any
mention of "nolapic" in this bug report?

No 32-bit kernel ever failed?  What, exactly was the latest run?
No 64-bit kernel ever worked?  What, exactly was the earliest attempted?

why is CONFIG_FB=y and CONFIG_DRM=n -- you don't run X?
Comment 8 Tiago Santos 2009-02-13 11:25:01 UTC
(In reply to comment #7)
> I don't understand why you are trying to use "nolapic"
> on an SMP system with an IOAPIC.  Why is there any
> mention of "nolapic" in this bug report?
> 
> No 32-bit kernel ever failed?  What, exactly was the latest run?
> No 64-bit kernel ever worked?  What, exactly was the earliest attempted?
> 
> why is CONFIG_FB=y and CONFIG_DRM=n -- you don't run X?
> 

Just mentioned because it solved the reported issue (probably by adding more issues, but the intention was to give a possible hint)
The latest 32-bit kernel I used was 2.6.27-gentoo-r8, which was also the first 64bit kernel I used
CONFIG_DRM=no is because i use NVidia proprietary driver which, just to make clear, is NOT installed on the vanilla kernel I used to make this report
Comment 9 Len Brown 2009-02-13 12:15:26 UTC
the key-press-to-continue thing sounds like whatever
timer the system is using for timeouts may have stopped
(or its IRQ stopped working)

The fact that the subsequent boot get snarled suggests
that SMM is getting confused by the state of your system
on poweroff, and is leaving some bad state someplace
that does harm on subsequent power-on.

It is possible that the 32 and 64-bit kernels are using
different timers.

If you can still boot the 32-bit kernel (say from a live CD)
that may give us a clue.

Some knobs to experiment with:

nmi_watchdog=0
nolapic_timer
hpet=force
idle=poll
Comment 10 Tiago Santos 2009-02-13 13:15:18 UTC
Created attachment 20234 [details]
dmesg using nmi_watchdog=0

With this option I have to hit a key 2 times for the computer to shut down
Comment 11 Tiago Santos 2009-02-13 13:15:53 UTC
Created attachment 20235 [details]
dmesg using nolapic_timer

Fixed the problem
Comment 12 Tiago Santos 2009-02-13 13:16:32 UTC
Created attachment 20236 [details]
dmesg using hpet=force

Didn't change anything (apparently)
Comment 13 Tiago Santos 2009-02-13 13:16:50 UTC
Created attachment 20237 [details]
dmesg using idle=poll

Fixed the problem
Comment 14 Tiago Santos 2009-02-13 13:18:13 UTC
For some reason, the power on bug is not happening, neither with vanilla or gentoo kernel, with any or no kernel parameter
Comment 15 ykzhao 2009-02-15 17:45:46 UTC
Hi, Venki
    From the test it seems that the box can be shutdown correctly if the C-state is disabled or it works on tick mode(The box can't be switched to NOHZ mode when adding the boo option of "nolapic_timer").
    Is this issue related with AMD C1E?
    Thanks.
    
Comment 16 ykzhao 2009-02-15 17:46:51 UTC
Hi, Tiago
    How about this issue if the CONFIG_CPU_IDLE is unset in kernel configuration?
    Thanks.
Comment 17 Tiago Santos 2009-02-15 21:37:51 UTC
(In reply to comment #16)
> Hi, Tiago
>     How about this issue if the CONFIG_CPU_IDLE is unset in kernel
> configuration?
>     Thanks.
> 

Right on the spot, unsetting CONFIG_CPU_IDLE makes the bug disappear
Tested it disabled on vanilla 2.6.28.5 and shutdown 2 times sucessfully, then enabled again and shutdown failed
I downloaded an ubuntu 32-bit live-cd here, would a dmesg of it be useful?
Comment 18 Zhang Rui 2009-02-23 17:15:29 UTC
yakui to generate a c1e cleanup patch.
Comment 19 ykzhao 2009-02-25 00:06:36 UTC
Created attachment 20360 [details]
patch: use the saved pm_idle function in course of shutdown

Will you please try the debug patch on the latest kernel and see whether the problem still exists?
   In the debug patch the saved pm_idle function will be used in course of shutdown/suspend/resume.
    Thanks.
Comment 20 Tiago Santos 2009-02-25 12:33:27 UTC
(In reply to comment #19)
> Created an attachment (id=20360) [details]
> patch: use the saved pm_idle function in course of shutdown
> 
> Will you please try the debug patch on the latest kernel and see whether the
> problem still exists?
>    In the debug patch the saved pm_idle function will be used in course of
> shutdown/suspend/resume.
>     Thanks.
> 

I applied it on kernel 2.6.29-r6, but it didn't fix the problem
Is there any useful output from it I can give for you?
Comment 21 ykzhao 2009-02-25 18:05:35 UTC
HI, Tiago
    Thanks for the test.
    Will you please double check whether the problem can be fixed when the CONFIG_ACPI_IDLE is unset in kernel configuration? (This should be done on 2.6.28.5)
    In fact when the CONFIG_ACPI_IDLE is unset, the saved pm_idle(c1e_idle) is used to enter the CPU idle in course of suspend/resume/shutdown. And in the debug patch the same pm_idle function is used in course of suspend/resume/shutdown. But the result is different.
    Thanks.
Comment 22 ykzhao 2009-02-25 18:15:07 UTC
Created attachment 20369 [details]
patch: use the idle_poll to enter CPU idle in course of suspend/resume/shutdown

Will you please try the updated patch and see whether it works for you?
   In this patch the idle_poll will be used to enter the idle state in course of suspend/resume/shutdown.
   Thanks.
Comment 23 Tiago Santos 2009-03-14 15:12:43 UTC
Sorry for the delay, I've been a little busy lately

I just applied your patch to vanilla sources 2.6.29-r8 and it FIXES the bug. Due to some reason it didn't in 2.6.29-r6, probably because your patch was written over a later version of the file in question. Anyway, the bug seems to be solved, I tested it four times and they all worked flawlessly.
Thank you, Yakui.

Tiago
Comment 24 ykzhao 2009-03-15 18:04:03 UTC
Hi, Tiago
    Thanks for the test.
    The two different patches are attached. Which patch is used in your test? In comment #19 or #comment #22?
    Thanks.
Comment 25 Tiago Santos 2009-03-15 18:08:45 UTC
(In reply to comment #24)
> Hi, Tiago
>     Thanks for the test.
>     The two different patches are attached. Which patch is used in your test?
> In comment #19 or #comment #22?
>     Thanks.
> 
the one from comment #19
Comment 26 Len Brown 2009-03-15 19:03:21 UTC
> using C1E aware idle routine
...
> System has AMD C1E enabled
> CPU1: AMD Turion(tm) 64 X2 Mobile Technology TL-52 stepping 02
> Brought up 2 CPUs

2.6.27 added explicit AMD C1E support, which is supposed to deal
with the fact that AMD C1E breaks the TSC and LAPIC timer,
and make "nolapic_timer" un-necessary on these systems.

But this system needs "nolapic_timer" in order to shutdown cleanly,
so this new support appears to have a hole.

This doesn't look like an ACPI bug, it looks like an AMD timer bug.
Comment 27 Tiago Santos 2009-03-27 02:26:42 UTC
I just installed the latest kernel (2.6.29) and this bug seems to be solved. Although we didn't reach the source of it, no one else except me confirmed it, so since it's not happening here anymore, I think we should close it, right?

Note You need to log in before you can comment on or make changes to this bug.