I have a Dell 600m x86 laptop here and I am using suspend-to-ram ("suspend"). With kernel 2.6.11, everything works fine for months at a time- the machine suspends, resumes, there is no problem. However, kernels 2.6.12, 2.6.13, and 2.6.14 have all exhibited the following problem: every few days the machine will not resume from mem sleep. I open the lid, it powers up, hard drive light flashes once or twice, and nothing else happens. The screen backlight will not even come back on. It's not like the screen doesn't wake up - the whole thing is completely frozen. It does not respond to ACPI events like hitting the power button, or ssh. The only thing I can do at this point is to completely power it off. This is not an issue with other software on the system. After 2.6.12 refused to resume, I went back to 2.6.11 and everything worked fine again. When 2.6.13 came out, I built it and again, sometimes it will not resume. I went back to 2.6.11 again and everything worked fine again up until I switched to 2.6.14. It worked fine for two days, and now it too locked up during resume. I don't want to be stuck on kernel 2.6.11 til the end of time. There must have been some significant changes between 2.6.11 and 2.6.12 that introduced this... Does anyone have any ideas? My 2.6.14 config file is basically the same as the working 2.6.11 config file, just mainly defaults selected for the new options. here is a link to my .config for 2.6.14 if anyone wants to look at it: http://wam.umd.edu/~stevenm/config2614 I do not have any binary modules loaded. I have tried this with and without unloading various modules before suspending, but still no luck.
Since I haven't the laptop you have, I can't reproduce it here. Could you please narrow the problem? Between 2.6.11 to 2.6.12, there are 6 -rc releases (2.6.12-rc1 to 2.6.12-rc6).You can get the patch from (http://www.kernel.org/pub/linux/kernel/v2.6/testing/). Could you please figure out which one is the first release breaking your system? Thanks!
All right. Given the intermittent nature of this bug, I'll test all of those. It usually happens once every few days, so it will probably take a while. I just downloaded RC1 and will try building it. Hopefully we'll find out soon enough...
I think I'm hitting this too, the same symptoms and completely erratic behaviour. I had 2.6.12-rc6 going through literally hundreds of suspends (stress testing for this specific issue, with or without X, USB, C3, etc.) and then failing. I tested every -bk snapshot between 2.6.12-rc4 (which, at that time, seemed stable) to 2.6.12 final, only to come to a conclusion that no kernel version (including those older than 2.6.11) has ever been 100% stable on my machine. Some versions are better (last hundreds of suspends), some worse (15% resumes fail), sometimes a new build of the same version behaves differently than the previous one. I finally gave up and moved to 2.6.15-rc, which still seems to exhibit this problem. I'm getting the impression that the bug is obscurely related to the compiler version, code alignment, or something similar. For the record, I'm using gcc 3.3.6, but I remember that 3.4 didn't really help much. I also need acpi_sleep=s3_bios for my backlight to work. David: any idea on how to debug this? Serial console? Is this likely a hardware problem?
Created attachment 6935 [details] debug So both your issues are s3 stress test failed. Attached patch will emulate a S3 process. Let's try if it can pass your stress test. You might also check if the lspci -vv output is significently different before suspend and after resume in a real S3 circle.
Hello again. Switched to kernel 2.6.15, this is still happening. First resume freeze happened 2 days after making the switch. Again, 2.6.11 works just fine, months without errors. I will try to get a serial console going to try to see if it produces any output, but I do not even know what to look for. Is anyone still working on this? Any ideas why this happens?
Created attachment 7514 [details] lspci output, 2.6.11, after bootup
Created attachment 7515 [details] lspci output, 2.6.11, after first suspend/resume cycle
Created attachment 7516 [details] lspci output, 2.6.15, after bootup
Created attachment 7517 [details] lspci output, 2.6.15, after first suspend/resume cycle
I know this is hard, but can you give me the lspci -vv output just before the failed suspend/resume cycle? We got several failure reports (two IIRC) caused by 2.6.11 - 2.6.12 changes, but I looked at the changesets. There aren't significent suspend/resume changes.
Will do. This will take time, and hopefully I will never have to post it! I will add a command to echo the output of lspci -vv to a file and sync right before suspending. Next time it fails, I will post the output.
It's been about a week and sure enough, when I opened my laptop this morning, I was greeted by a locked up system. Open it up, the HD spins up, and nothing else happens. LCD backlight does not turn on, no HD activity during resume (usually the light blinks a few times during resume). I am posting an output of lspci -vv right before the bad suspend cycle. I should probably mention that my video card (Radeon Mobility M9000) does not get POSTed by the BIOS during resume. This is instead done by the Radeon driver in Xorg. I highly doubt this is responsible for the lockups, as this all works fine with kernel 2.6.11. Still... anyways, hope the output helps.
Created attachment 7571 [details] lspci output, 2.6.15, right before a bad suspend 'cycle'
Downstream bug: http://bugs.gentoo.org/126051 (for my reference only, no useful info to add)
Switched from letting X wake up the graphics card to using vbetool. Same kind of behavior still. I open the lid, power light goes solid, but the hard drive doesn't even click a few times (I guess this happens as tasks begin to resume). I upgraded from 2.6.15 to 2.6.16 by thinking "well maybe they fixed it." In 2.6.15, I've been using a serial console to look at the output of various things before and after suspend. Of course, if the suspend freeze happened before the console was resumed, then that would be a problem. Well, problem is, in 2.6.16 the serial console is not properly restored after suspend until the machine is fully resumed and a userspace command is issued. This is Bug 6259. How would I capture/store any debug messages that occur during a failed suspend/resume cycle if getting the console working is contingent on a successful resume in 2.6.16?
Is this still an issue on the latest kernel, currently 2.6.21?
please reopen when responding to comment #16