Bug 12700
Summary: | poweroff hang unless "nolapic_timer" - Acer 4520 laptop, NVidia MCP67, Turion X2 processor | ||
---|---|---|---|
Product: | Timers | Reporter: | Tiago Santos (ircalf) |
Component: | Interval Timers | Assignee: | Thomas Gleixner (tglx) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, venki |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28.4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 56331 | ||
Attachments: |
Kernel config for original 2.6.28.4 kernel
Kernel config for gentoo 2.6.28-r1 kernel (on which nolapic works) dmesg from original kernel 2.6.28.4 lsmod from original 2.6.28.4 kernel lspci dmesg using nmi_watchdog=0 dmesg using nolapic_timer dmesg using hpet=force dmesg using idle=poll patch: use the saved pm_idle function in course of shutdown patch: use the idle_poll to enter CPU idle in course of suspend/resume/shutdown |
Description
Tiago Santos
2009-02-13 10:16:40 UTC
I forgot to say, it's a NVidia MCP67 chipset Created attachment 20228 [details]
Kernel config for original 2.6.28.4 kernel
Created attachment 20229 [details]
Kernel config for gentoo 2.6.28-r1 kernel (on which nolapic works)
Created attachment 20230 [details]
dmesg from original kernel 2.6.28.4
Created attachment 20231 [details]
lsmod from original 2.6.28.4 kernel
Created attachment 20232 [details]
lspci
I don't understand why you are trying to use "nolapic" on an SMP system with an IOAPIC. Why is there any mention of "nolapic" in this bug report? No 32-bit kernel ever failed? What, exactly was the latest run? No 64-bit kernel ever worked? What, exactly was the earliest attempted? why is CONFIG_FB=y and CONFIG_DRM=n -- you don't run X? (In reply to comment #7) > I don't understand why you are trying to use "nolapic" > on an SMP system with an IOAPIC. Why is there any > mention of "nolapic" in this bug report? > > No 32-bit kernel ever failed? What, exactly was the latest run? > No 64-bit kernel ever worked? What, exactly was the earliest attempted? > > why is CONFIG_FB=y and CONFIG_DRM=n -- you don't run X? > Just mentioned because it solved the reported issue (probably by adding more issues, but the intention was to give a possible hint) The latest 32-bit kernel I used was 2.6.27-gentoo-r8, which was also the first 64bit kernel I used CONFIG_DRM=no is because i use NVidia proprietary driver which, just to make clear, is NOT installed on the vanilla kernel I used to make this report the key-press-to-continue thing sounds like whatever timer the system is using for timeouts may have stopped (or its IRQ stopped working) The fact that the subsequent boot get snarled suggests that SMM is getting confused by the state of your system on poweroff, and is leaving some bad state someplace that does harm on subsequent power-on. It is possible that the 32 and 64-bit kernels are using different timers. If you can still boot the 32-bit kernel (say from a live CD) that may give us a clue. Some knobs to experiment with: nmi_watchdog=0 nolapic_timer hpet=force idle=poll Created attachment 20234 [details]
dmesg using nmi_watchdog=0
With this option I have to hit a key 2 times for the computer to shut down
Created attachment 20235 [details]
dmesg using nolapic_timer
Fixed the problem
Created attachment 20236 [details]
dmesg using hpet=force
Didn't change anything (apparently)
Created attachment 20237 [details]
dmesg using idle=poll
Fixed the problem
For some reason, the power on bug is not happening, neither with vanilla or gentoo kernel, with any or no kernel parameter Hi, Venki From the test it seems that the box can be shutdown correctly if the C-state is disabled or it works on tick mode(The box can't be switched to NOHZ mode when adding the boo option of "nolapic_timer"). Is this issue related with AMD C1E? Thanks. Hi, Tiago How about this issue if the CONFIG_CPU_IDLE is unset in kernel configuration? Thanks. (In reply to comment #16) > Hi, Tiago > How about this issue if the CONFIG_CPU_IDLE is unset in kernel > configuration? > Thanks. > Right on the spot, unsetting CONFIG_CPU_IDLE makes the bug disappear Tested it disabled on vanilla 2.6.28.5 and shutdown 2 times sucessfully, then enabled again and shutdown failed I downloaded an ubuntu 32-bit live-cd here, would a dmesg of it be useful? yakui to generate a c1e cleanup patch. Created attachment 20360 [details]
patch: use the saved pm_idle function in course of shutdown
Will you please try the debug patch on the latest kernel and see whether the problem still exists?
In the debug patch the saved pm_idle function will be used in course of shutdown/suspend/resume.
Thanks.
(In reply to comment #19) > Created an attachment (id=20360) [details] > patch: use the saved pm_idle function in course of shutdown > > Will you please try the debug patch on the latest kernel and see whether the > problem still exists? > In the debug patch the saved pm_idle function will be used in course of > shutdown/suspend/resume. > Thanks. > I applied it on kernel 2.6.29-r6, but it didn't fix the problem Is there any useful output from it I can give for you? HI, Tiago Thanks for the test. Will you please double check whether the problem can be fixed when the CONFIG_ACPI_IDLE is unset in kernel configuration? (This should be done on 2.6.28.5) In fact when the CONFIG_ACPI_IDLE is unset, the saved pm_idle(c1e_idle) is used to enter the CPU idle in course of suspend/resume/shutdown. And in the debug patch the same pm_idle function is used in course of suspend/resume/shutdown. But the result is different. Thanks. Created attachment 20369 [details]
patch: use the idle_poll to enter CPU idle in course of suspend/resume/shutdown
Will you please try the updated patch and see whether it works for you?
In this patch the idle_poll will be used to enter the idle state in course of suspend/resume/shutdown.
Thanks.
Sorry for the delay, I've been a little busy lately I just applied your patch to vanilla sources 2.6.29-r8 and it FIXES the bug. Due to some reason it didn't in 2.6.29-r6, probably because your patch was written over a later version of the file in question. Anyway, the bug seems to be solved, I tested it four times and they all worked flawlessly. Thank you, Yakui. Tiago Hi, Tiago Thanks for the test. The two different patches are attached. Which patch is used in your test? In comment #19 or #comment #22? Thanks. (In reply to comment #24) > Hi, Tiago > Thanks for the test. > The two different patches are attached. Which patch is used in your test? > In comment #19 or #comment #22? > Thanks. > the one from comment #19 > using C1E aware idle routine ... > System has AMD C1E enabled > CPU1: AMD Turion(tm) 64 X2 Mobile Technology TL-52 stepping 02 > Brought up 2 CPUs 2.6.27 added explicit AMD C1E support, which is supposed to deal with the fact that AMD C1E breaks the TSC and LAPIC timer, and make "nolapic_timer" un-necessary on these systems. But this system needs "nolapic_timer" in order to shutdown cleanly, so this new support appears to have a hole. This doesn't look like an ACPI bug, it looks like an AMD timer bug. I just installed the latest kernel (2.6.29) and this bug seems to be solved. Although we didn't reach the source of it, no one else except me confirmed it, so since it's not happening here anymore, I think we should close it, right? |