Kernel Bug Tracker – Bug 5555
suspend/resume unstable between 2.6.11 and 2.6.12/13/14
Last modified: 2007-06-30 15:41:26 UTC
I have a Dell 600m x86 laptop here and I am using suspend-to-ram ("suspend").
With kernel 2.6.11, everything works fine for months at a time- the machine
suspends, resumes, there is no problem.
However, kernels 2.6.12, 2.6.13, and 2.6.14 have all exhibited the following
problem: every few days the machine will not resume from mem sleep. I open the
lid, it powers up, hard drive light flashes once or twice, and nothing else
happens. The screen backlight will not even come back on. It's not like the
screen doesn't wake up - the whole thing is completely frozen. It does not
respond to ACPI events like hitting the power button, or ssh. The only thing I
can do at this point is to completely power it off.
This is not an issue with other software on the system. After 2.6.12 refused to
resume, I went back to 2.6.11 and everything worked fine again. When 2.6.13 came
out, I built it and again, sometimes it will not resume. I went back to 2.6.11
again and everything worked fine again up until I switched to 2.6.14. It worked
fine for two days, and now it too locked up during resume.
I don't want to be stuck on kernel 2.6.11 til the end of time. There must have
been some significant changes between 2.6.11 and 2.6.12 that introduced this...
Does anyone have any ideas? My 2.6.14 config file is basically the same as the
working 2.6.11 config file, just mainly defaults selected for the new options.
here is a link to my .config for 2.6.14 if anyone wants to look at it:
I do not have any binary modules loaded. I have tried this with and without
unloading various modules before suspending, but still no luck.
Since I haven't the laptop you have, I can't reproduce it here. Could you
please narrow the problem? Between 2.6.11 to 2.6.12, there are 6 -rc releases
(2.6.12-rc1 to 2.6.12-rc6).You can get the patch from
(http://www.kernel.org/pub/linux/kernel/v2.6/testing/). Could you please
figure out which one is the first release breaking your system? Thanks!
All right. Given the intermittent nature of this bug, I'll test all of those. It
usually happens once every few days, so it will probably take a while. I just
downloaded RC1 and will try building it. Hopefully we'll find out soon enough...
I think I'm hitting this too, the same symptoms and completely erratic
behaviour. I had 2.6.12-rc6 going through literally hundreds of suspends
(stress testing for this specific issue, with or without X, USB, C3, etc.) and
then failing. I tested every -bk snapshot between 2.6.12-rc4 (which, at that
time, seemed stable) to 2.6.12 final, only to come to a conclusion that no
kernel version (including those older than 2.6.11) has ever been 100% stable
on my machine. Some versions are better (last hundreds of suspends), some
worse (15% resumes fail), sometimes a new build of the same version behaves
differently than the previous one. I finally gave up and moved to 2.6.15-rc,
which still seems to exhibit this problem.
I'm getting the impression that the bug is obscurely related to the compiler
version, code alignment, or something similar. For the record, I'm using gcc
3.3.6, but I remember that 3.4 didn't really help much. I also need
acpi_sleep=s3_bios for my backlight to work.
David: any idea on how to debug this? Serial console? Is this likely a
Created attachment 6935 [details]
So both your issues are s3 stress test failed. Attached patch will emulate a S3
process. Let's try if it can pass your stress test.
You might also check if the lspci -vv output is significently different before
suspend and after resume in a real S3 circle.
Switched to kernel 2.6.15, this is still happening. First resume freeze happened
2 days after making the switch. Again, 2.6.11 works just fine, months without
errors. I will try to get a serial console going to try to see if it produces
any output, but I do not even know what to look for.
Is anyone still working on this? Any ideas why this happens?
Created attachment 7514 [details]
lspci output, 2.6.11, after bootup
Created attachment 7515 [details]
lspci output, 2.6.11, after first suspend/resume cycle
Created attachment 7516 [details]
lspci output, 2.6.15, after bootup
Created attachment 7517 [details]
lspci output, 2.6.15, after first suspend/resume cycle
I know this is hard, but can you give me the lspci -vv output just before the
failed suspend/resume cycle? We got several failure reports (two IIRC) caused
by 2.6.11 - 2.6.12 changes, but I looked at the changesets. There aren't
significent suspend/resume changes.
Will do. This will take time, and hopefully I will never have to post it!
I will add a command to echo the output of lspci -vv to a file and sync right
before suspending. Next time it fails, I will post the output.
It's been about a week and sure enough, when I opened my laptop this morning, I
was greeted by a locked up system. Open it up, the HD spins up, and nothing else
happens. LCD backlight does not turn on, no HD activity during resume (usually
the light blinks a few times during resume). I am posting an output of lspci -vv
right before the bad suspend cycle.
I should probably mention that my video card (Radeon Mobility M9000) does not
get POSTed by the BIOS during resume. This is instead done by the Radeon driver
in Xorg. I highly doubt this is responsible for the lockups, as this all works
fine with kernel 2.6.11. Still... anyways, hope the output helps.
Created attachment 7571 [details]
lspci output, 2.6.15, right before a bad suspend 'cycle'
Downstream bug: http://bugs.gentoo.org/126051
(for my reference only, no useful info to add)
Switched from letting X wake up the graphics card to using vbetool. Same kind of
behavior still. I open the lid, power light goes solid, but the hard drive
doesn't even click a few times (I guess this happens as tasks begin to resume).
I upgraded from 2.6.15 to 2.6.16 by thinking "well maybe they fixed it."
In 2.6.15, I've been using a serial console to look at the output of various
things before and after suspend. Of course, if the suspend freeze happened
before the console was resumed, then that would be a problem.
Well, problem is, in 2.6.16 the serial console is not properly restored after
suspend until the machine is fully resumed and a userspace command is issued.
This is Bug 6259. How would I capture/store any debug messages that occur during
a failed suspend/resume cycle if getting the console working is contingent on a
successful resume in 2.6.16?
Is this still an issue on the latest kernel, currently 2.6.21?
please reopen when responding to comment #16