Kernel Bug Tracker – Bug 21952
resume hangs unless intel_idle.max_cstate=3 or maxcpus=1 - Samsung N145, N148, N150, N210, Lenovo S10-3
Last modified: 2013-08-15 21:39:41 UTC
Samsung N150 Fails to Resume after Suspend/Hibernate on Kernel Version 2.6.35.
Please see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/640100 for further information.
Also affects Samsung N148 Plus (same hardware as in N150 Plus but without preinstalled Windows)
Ditto for Samsung N210 netbook. Very difficult to debug, too.
Suspend (to RAM) appears to work, but the wakeup simply hangs with a dark screen. No serial ports, no way to attach one either (USB is no good for this).
Same problem for Samsung N150 Plus.
With the 18.104.22.168 (Kubuntu Maverik) suspend not work, but with the 22.214.171.124 works. Is there some way to have useful info about the problem?
I tried the procedure described here: https://wiki.ubuntu.com/DebuggingKernelSuspend
but every time report a different hash.
If you need someone to make a test on this platform, i offer my support.
Problem exists with 2.6.34-2.6.36 kernels.
On 2.6.33 no hang-ups detected for now.
2.6.27 not tested (since is not released).
I don't know if can help to solve the problem, but on the ubuntu bug track system i found a bug that covered the same problem in june. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/594885.
The focus was on the setting CONFIG_PM_DISABLE_CONSOLE.
In this email there is the ubuntu patch that create the problem:
In this thread ubuntu kernel team drop this patch from the kernel build, but now we suffer the same problem...
Davide, that bug is definitely not the same, since in bug 594885 problem is only not show picture on display while all other is working, and current bug is about totally hanged-up system and nothing works except power button. It is much bigger problem and by another (for now still unknown) reason.
Well, on my N210 at least, that's what happens -- no backlight, and total system lockup on resume. ALT-SYSRQ-X buttons don't work either.
Right Юрий Чудновский! Sorry for the mistake...
Re-tested 2.6.34 kernel - seems to work good. Maybe i've tested by mistake not 34 brunch kernel earlier. Confirm regression begins withing 2.6.35 branch.
Looks very similar to the symptoms from this more generic bug:
Also affects Samsung N210 Plus (Ubuntu 10.10 network edition, kernel 2.6.35-23-generic; dual-boot with Windows 7 Starter)
I had this issue on my netbook samsung n210 with 126.96.36.199 kernel.
Appending intel_idle.max_cstate=0 to the kernel boot parameters solves the issue. I wonder what this means and how it solves the issue! Can anyone throw some light?
Appending intel_idle.max_cstate=0 to the kernel boot parameters solved the issue for my samsung n210 as well (using 188.8.131.52 kernel now).
Setting max_cstate=0 essentially disables aggressive power-saving on the CPU. So battery life will likely be reduced.
Why does this help? Dunno, but it does indicate some kind of race condition in the kernel. With normal settings, operations can have higher latency as the CPU transitions from lower (slower) cstates up to cstate-0 (the fastest).
Disabling cstates means the timing is always fast, and predictable.
So.. possibly a better fix is to have the suspend script read/save the current max_cstate value, set it to zero afterward before suspending. Then on resume, the resume script could restore the old value.
This way, everything ought to work well enough. Ideally, I expect this should really be done in-kernel, since that's where the bug is.
I am able to set intel_idle.max_cstate=3 on my Samsung N150+ without triggering this bug, and it's not until intel_idle.max_cstate=4 that I run into problems. That indicates to me that the problem isn't so much that the intel_idle driver needs to be disabled, but rather that cstate-4 is glitchy.
Power management is a mystery to me so I don't know whether there's much to be gained by running with max_cstate=3 rather than max_cstate=0, but I hope this can at least help someone more knowledgeable narrow down the problem.
Linux tinytronix 2.6.35-24-generic #43~ppa1~loms~maverick-Ubuntu SMP Fri Dec 24 18:15:40 UTC 2010 i686 GNU/Linux
BOOT_IMAGE=/boot/vmlinuz-2.6.35-24-generic root=UUID=66817f18-a5c7-4d9e-8a29-3220974cb618 ro intel_idle.max_cstate=3 quiet splash
Is the problem still present in 2.6.37?
I am experiencing the problem with my Samsung N145 Plus and kernel 2.6.37, too. I noticed that disabling Hyper Threading in BIOS or at runtime (echo 0 > /sys/devices/system/cpu/cpu1/online) helps. Can someone confirm this? Furthermore all the devices mentioned in topic seem to have Hyper Threading capable processors.
Confirmed, disabling hyperthreading in BIOS resolves the issue without requiring intel_idle.max_cstate to be specified, on 2.6.35 as well.
please show the output from
dmesg |grep idle
grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
in the default configuration, and also when
booting with intel_idle.max_cstate=0 (to fall back to ACPI).
does booting with "maxcpus=1" also work around the problem?
does booting with "lapic_timer" help?
If yes, is the kernel being tested include the
patch from bug #21032?
please show the lspci output for each of the failing systems.
Created attachment 44232 [details]
lspci -v (Samsung N150+)
Created attachment 44242 [details]
default boot parameters
Created attachment 44252 [details]
default boot parameters, hyperthreading disabled in BIOS
Created attachment 44262 [details]
Created attachment 44272 [details]
Created attachment 44282 [details]
Created attachment 44292 [details]
'maxcpus=1' and 'nolapic_timer' both work around the issue on 2.6.35. I don't know whether the patch has been applied.
> 00:00.0 Host bridge: Intel Corporation N10 Family DMI Bridge
Intel NM10. Good to know.
> intel_idle: lapic_timer_reliable_states 0x6
This should be 0x2
This means that your kernel (vmlinuz-2.6.35-24-generic)
is older than upstream 184.108.40.206, which is when the patch
from bug #21032 shipped.
However, I don't think that patch will actually help here.
That is because comment #27 shows that intel_idle.max_cstate=3
fixes your system, and that disables ATM-C4, yet leaves
So the problem here is with ATM-C4.
> intel_idle.max_cstate=0 (the acpi_idle case)
No output from "# grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*" ?
If that is the case, then ACPI C-states deeper than C1
are somehow disabled on this system.
Is it running with default BIOS SETUP settings?
Do you see the same thing when on AC vs on Battery?
> nolapic_timer output in comment #28
The output shows that with nolapic_timer you are not entering
*any* c-states, not even c1; instead you are polling.
It looks like something is broken related to the nolapic_timer
option -- need to look into that; because it would otherwise
implicate the lapic timer; but here it nukes all the c-states,
which doesn't tell us anything about the lapic timer.
Created attachment 44302 [details]
patch vs 2.6.37
please Shaohua's broadcast clock event patch from 2.6.38-rc1
This version should apply cleanly to 2.6.37
Created attachment 44312 [details]
patch vs 2.6.36
Here is the same patch, back-ported to apply cleanly to 220.127.116.11
Created attachment 44322 [details]
patch vs 18.104.22.168
Here is the same patch, back-ported to apply cleanly to 22.214.171.124
Created attachment 44382 [details]
lspci -v output on Samsung N145 Plus
Created attachment 44392 [details]
dmesg|grep idle and cpuidle sysfs with default configuration
Created attachment 44402 [details]
dmesg|grep idle with intel_idle.max_cstate=0
there's no data for cpuidle in sysfs
Created attachment 44412 [details]
cpuidle sysfs with nolapic_timer
patch from bug #21032 should be applied as it's 2.6.37
maxcpus=1 as well as nolapic_timer (obviously because it disables the c-states) help.
The patch from #31 does not fix the problem in my case.
The patch from #33 didn't fix it for me, with 126.96.36.199.
I am not sure this is the same issue or not (pls ignore if not) but I just wanted to make you aware of a similar 2.6.35 regression that went away in 2.6.36:
2.6.37 is still good on the hardware in question.
I've been debugging this issue a little with an N150. It's still present as of v2.6.38-rc6.
In addition to what's been reported here, I've observed that acpi_skip_timer_override and nohpet seem to make this issue go away. I've traced it down to something going wrong when bringing the secondary logical CPU online. When the hpet code receives the CPU_ONLINE notification the primary CPU schedules some work on the secondary CPU and waits for it to complete, but the work is never getting executed. The secondary CPU is coming online and executing instructions, and I haven't isolated exactly where it hangs.
I've also noticed that this problem seems to be timing sensitive, so it's entirely possible that some of the command-line options that "fix" the issue just alter the timing enough to mask it.
Let me know if there's anything you want me to try, and I'll post any further findings here as well.
Created attachment 49672 [details]
Debug output with intel_idle.max_cstate=0
Attached requested data with intel_idle.max_cstate=0. I got some data from the cpuidle sysfs nodes, it just seemed to take a while after boot before they appeared for some reason. I still see the same nolapic_timer behavior with 2.6.38.
I also note that acpi_idle only seems to utilize C3 and higher, so if this is a problem with C4 it makes sense that disabling intel_idle eliminates the issue.
Did someone gets rarely (let's say, once per day) suddenly hangups on 2.6.35+ kernels (even with intel_idle.max_cstate=3 or etc)? If so, its may be same regression, because I don't remember any hangups on 2.6.32.
I got some time to look into this a little more. I have some more information, but still no clear answer.
The secondary CPU starts executing and hits idle at least once. It hangs after coming out of idle and re-enabling irqs -- I can see that it makes it as far as local_irq_enable() in intel_idle(), but no farther in that function. Seeing where it goes from there is more of a challenge, given the limited debug capabilities in this state. However, I don't see it hitting smp_reschedule_interrupt() which is expected from the schedule_delayed_work_on() call from hpet_cpuhp_notify().
I have the same problem with a samsung n220 with kernel 188.8.131.52. I tried to change intel_idle.max_cstate to 0 (I also tried 1,2 and 3) and I still can't resume from suspend.
re: comment #43
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:ACPI FFH INTEL MWAIT 0x0
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc:ACPI FFH INTEL MWAIT 0x10
it seems that when running acpi_idle via "intel_idle.max_cstate=0"
that only C1 and C2 are exposed.
Is this on AC? Please try it on DC to see if additional
C-states show up under ACPI.
Sorry, I'm not in possession of that machine any more so I'll be unable to do any more testing.
FYI: With Linux 3.0.1 (didn't try 3.0) on my Samsung N145P netbook suspend is working fine now.
What could be the patch, that fixed it?
same thing for me, resume now works without any workaround with linux 3.0.1 (Samsung N220)
Good to know.
Please reopen. I was a bit overhasty: Sometimes suspend is working fine on my Samsung N145P, but sometimes it still fails in the same way like before.
I'm using and learning Ubuntu since Maverick, and since that time when I'm initiating suspend system can't wake up normally. Just disk activity indicator is flashing for short time, screen remains black.
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-20-generic 3.2.0-20.33
ProcVersionSignature: Ubuntu 3.2.0-20.33-generic 3.2.12
Uname: Linux 3.2.0-20-generic i686
MachineType: LENOVO S10-3
Proc: Intel Atom N450 1.66 GHz
Motherboard: Intel NM10
Video: Intel Graphics Media Accelerator (GMA) 3150
Network: Realtek PCIe GBE Family Controller (10/100/1000MBit), Atheros AR9285 Wireless Network Adapter (bgn), 2.1+EDR Bluetooth
Just updated system with update manager and installed Linux kernel 3.3.1 to test - problem remains. My Lenovo can't get up from suspend.
As Viktor, I got the same issu with a Lenovo S10. After suspend, screen remains black. I got no idea how to debug since I even can't access the netbook thriugh SSH.
(In reply to comment #54)
> Just updated system with update manager and installed Linux kernel 3.3.1 to
> test - problem remains. My Lenovo can't get up from suspend.
I saw asimilar problem on my Lenovo s10-3t, but on my machine it can resume back from suspend, sometimes after 120 seconds, sometimes 150 seconds or 300 seconds.
So could you try to wait 6 minutes to see whether it could come back.
(In reply to comment #56)
> So could you try to wait 6 minutes to see whether it could come back.
no, without intel_idle.max_cstate=3 it doesn't wake even after 10 min.
cstate=3 makes computer wake up fast (but it badly affects browser, Firefox starts to load the memory and processor).
Regarding the Lenovo S10-3...
originally its resume problem was fixed by this patch in 2.6.36:
Author: Len Brown <firstname.lastname@example.org>
Date: Fri Sep 24 21:02:27 2010 -0400
intel_idle: PCI quirk to prevent Lenovo Ideapad s10-3 boot hang
You can tell if that quirk is running b/c it spews a dmesg line:
[ 0.624375] pci 0000:00:1f.0: [Firmware Bug]: TigerPoint LPC.BM_STS cleared
The way that original issue was debugged was finding the difference
between the working acpi_idle and the failing intel_idle.
But today the failure is different.
I have access to a Lenovo S10-3
I just dropped FC17 on it, which is 3.5.2, and resume hangs
with a black screen. The quirk above is in place.
intel_idle.max_cstate=3 allows resume to work.
But some cmdline params that fail are surprising:
intel_idle.max_cstate=0 processor.max_cstate=2 (gives MWAIT 0x10)
cpuidle.off=1 crashes on boot
here is a clue, after leaving the system "failed" for about 5 minutes
it actually resumed, and dmesg says this:
[ 118.624575] PM: Syncing filesystems ... done.
[ 118.627338] PM: Preparing system for mem sleep
[ 118.779349] Freezing user space processes ... (elapsed 0.01 seconds) done.
[ 118.791287] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[ 118.802295] PM: Entering mem sleep
[ 118.802338] Suspending console(s) (use no_console_suspend to debug)
[ 118.803318] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[ 118.803577] sd 1:0:0:0: [sda] Stopping disk
[ 119.169169] ACPI handle has no context!
[ 119.365118] PM: suspend of devices complete after 562.255 msecs
[ 119.365590] PM: late suspend of devices complete after 0.459 msecs
[ 119.377247] pcieport 0000:00:1c.0: wake-up capability enabled by ACPI
[ 119.388261] ehci_hcd 0000:00:1d.7: wake-up capability enabled by ACPI
[ 119.399244] uhci_hcd 0000:00:1d.3: wake-up capability enabled by ACPI
[ 120.757907] uhci_hcd 0000:00:1d.1: wake-up capability enabled by ACPI
[ 120.758006] uhci_hcd 0000:00:1d.0: wake-up capability enabled by ACPI
[ 120.758137] PM: noirq suspend of devices complete after 1392.536 msecs
[ 120.758174] ACPI: Preparing to enter system sleep state S3
[ 120.784273] PM: Saving platform NVS memory
[ 120.784330] Disabling non-boot CPUs ...
[ 120.786548] CPU 1 is now offline
[ 120.787428] Extended CMOS year: 2000
[ 120.787428] ACPI: Low-level resume complete
[ 120.787428] PM: Restoring platform NVS memory
[ 120.787428] CPU0: Thermal monitoring handled by SMI
[ 120.787428] Extended CMOS year: 2000
[ 120.787428] microcode: CPU0 updated to revision 0x107, date = 2009-08-25
[ 120.792528] Enabling non-boot CPUs ...
[ 120.792749] Booting Node 0 Processor 1 APIC 0x1
[ 120.807014] microcode: CPU1 updated to revision 0x107, date = 2009-08-25
[ 120.812158] CPU1 is up
[ 120.812539] ACPI: Waking up from system sleep state S3
[ 424.700077] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20120320/evregion-501)
[ 424.700185] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.DSSV] (Node ffff88003d1d0488), AE_TIME (20120320/psparse-536)
[ 424.700228] ACPI Error: Method parse/execution failed [\_WAK] (Node ffff88003d1cbaf0), AE_TIME (20120320/psparse-536)
[ 424.700402] ACPI Exception: AE_TIME, While executing method \_WAK (20120320/hwesleep-82)
[ 424.721458] uhci_hcd 0000:00:1d.0: wake-up capability disabled by ACPI
[ 424.728327] uhci_hcd 0000:00:1d.1: wake-up capability disabled by ACPI
[ 424.732428] uhci_hcd 0000:00:1d.3: wake-up capability disabled by ACPI
[ 424.732540] ehci_hcd 0000:00:1d.7: wake-up capability disabled by ACPI
[ 424.733523] PM: noirq resume of devices complete after 21.159 msecs
[ 424.733927] PM: early resume of devices complete after 0.323 msecs
So it appears that we had some kind of time-out in the EC while evaluating _WAK
This is not intel_idle specific, and it looks like ACPI,
so moving bug categories.
hmm, we get the _WAK EC timeout also in the working intel_idle.max_cstate=3 case:
[ 62.018118] PM: Syncing filesystems ... done.
[ 63.634734] PM: Preparing system for mem sleep
[ 63.768209] Freezing user space processes ... (elapsed 0.01 seconds) done.
[ 63.779276] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[ 63.790285] PM: Entering mem sleep
[ 63.790330] Suspending console(s) (use no_console_suspend to debug)
[ 63.791278] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[ 63.791622] sd 1:0:0:0: [sda] Stopping disk
[ 64.157146] ACPI handle has no context!
[ 64.392113] PM: suspend of devices complete after 601.259 msecs
[ 64.392513] PM: late suspend of devices complete after 0.389 msecs
[ 64.403259] pcieport 0000:00:1c.0: wake-up capability enabled by ACPI
[ 64.414241] ehci_hcd 0000:00:1d.7: wake-up capability enabled by ACPI
[ 64.425214] uhci_hcd 0000:00:1d.3: wake-up capability enabled by ACPI
[ 65.783698] uhci_hcd 0000:00:1d.1: wake-up capability enabled by ACPI
[ 65.783791] uhci_hcd 0000:00:1d.0: wake-up capability enabled by ACPI
[ 65.783900] PM: noirq suspend of devices complete after 1391.379 msecs
[ 65.783936] ACPI: Preparing to enter system sleep state S3
[ 65.811197] PM: Saving platform NVS memory
[ 65.811251] Disabling non-boot CPUs ...
[ 65.813252] CPU 1 is now offline
[ 65.814435] Extended CMOS year: 2000
[ 65.814435] ACPI: Low-level resume complete
[ 65.814435] PM: Restoring platform NVS memory
[ 65.814435] CPU0: Thermal monitoring handled by SMI
[ 65.814435] Extended CMOS year: 2000
[ 65.814435] microcode: CPU0 updated to revision 0x107, date = 2009-08-25
[ 65.819131] Enabling non-boot CPUs ...
[ 65.819449] Booting Node 0 Processor 1 APIC 0x1
[ 65.853992] microcode: CPU1 updated to revision 0x107, date = 2009-08-25
[ 65.858078] CPU1 is up
[ 65.858479] ACPI: Waking up from system sleep state S3
[ 69.862782] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20120320/evregion-501)
[ 69.862803] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.DSSV] (Node ffff88003d1d0488), AE_TIME (20120320/psparse-536)
[ 69.862829] ACPI Error: Method parse/execution failed [\_WAK] (Node ffff88003d1cbaf0), AE_TIME (20120320/psparse-536)
[ 69.862861] ACPI Exception: AE_TIME, While executing method \_WAK (20120320/hwesleep-82)
[ 69.864039] Clocksource tsc unstable (delta = 1099511324104 ns)
[ 69.864227] Switching to clocksource hpet
The resume problems for s10-3 has 2 types:
1. hang on resume for ever
2. the resume will hang for 2-5 minutes, and then come back to life.
For the 2nd one I have a debug patch to fix it, pls check in bugzilla 41932
Apparently I'm mostly seeing failure #1, because the patch
in bug 41932 doesn't seem to help. (applied w/ typo fixed to 3.5.2)
suspend seems to always work on my lenovo s10-3
when I use intel_idle.max_cstate=3 on Linux-3.5.2, and I've
not the foggiest idea why.
The same c-state accessed by acpi-idle doesn't work,
and shallower c-states don't work. bizarre.
(In reply to comment #62)
> Apparently I'm mostly seeing failure #1, because the patch
> in bug 41932 doesn't seem to help. (applied w/ typo fixed to 3.5.2)
I see, my machine is a s10-3t, which is different from your s10-3.
the bios version is Rev 0.25, released on 05/26/2010
Len, any idea/progress on this?
Resume fails on my Lenovo s10-3 running Ubuntu 13.04 (Linux 3.8.0-27-generic),
but the recent upstream kernels I try all work fine:
For the newest one, I tested AC, DC, intel_idle.max_cstate=0 -- all OK.
Ubuntu 10.04's kernel still fails always --
no matter if running intel_idle, acpi_idle, or even idle=halt
I grabbed the latest -- 3.8.0-29-generic from raring-proposed, but no joy.
So I installed Ubuntu 13.10's daily build -- 3.11.0-2-generic (Aug 12th)
and suspend/resume on the Lenovo Ideapad S10-3 works fine.
I don't know what 13.04's problem was, but since upstream is
working and 13.10 is working, we seem to be done here.