Bug 12998

Summary: S3 suspend: power shuts off completely every 20 or so suspends (T60)
Product: ACPI Reporter: Sanjoy Mahajan (sanjoy)
Component: Power-Sleep-WakeAssignee: acpi_power-sleep-wake
Status: REJECTED INVALID    
Severity: normal CC: rjw, rui.zhang, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: acpidump output
dmesg from current bootup (from /var/log/dmesg)
/var/log/messages showing a few S3 suspend/resumes and a reboot
dmesg showing several successful suspend/resume cycles
lspci -vxxx output
dmesg_after after resuming (booted with acpi_sleep=s3_beep)
final dmesg (all worked)

Description Sanjoy Mahajan 2009-04-02 03:50:40 UTC
Created attachment 20773 [details]
acpidump output

Every 20 or S3 suspends, the machine shuts completely off: It goes to sleep fine, the crescent-moon LED turns on, I put it in my backpack to go home, and then a few hours later when I open the lid to wake it up, nothing happens.  Whereupon I notice that the crescent-moon LED was not on anymore, and the machine needs to be rebooted. 

My first theory was that the battery ran out.  But that was never the case.  Most always the battery (checked after rebooting) is around 90% or 95%. 

I don't know how, but maybe it oopsed a while after going to sleep? There's nothing in the log files, of course, so it's been hard to debug. 

The hardware is a Thinkpad T60 with Intel graphics, wireless (untainted kernel).  The machine runs Debian unstable but with the vanilla 2.6.29 kernel. 

I had noticed this problem with vanilla 2.6.27.4, and upgraded to 2.6.29 in the hope that it would go away.  But if anything it has become more frequent.
Comment 1 Sanjoy Mahajan 2009-04-02 03:52:21 UTC
Created attachment 20774 [details]
dmesg from current bootup (from /var/log/dmesg)
Comment 2 Sanjoy Mahajan 2009-04-02 03:55:40 UTC
I put the machine into suspend using just

echo mem > /sys/power/state

(i.e. not with s2ram or acpid scripts) and wake it up by opening the lid or pushing the Fn key if the lid is already open.
Comment 3 Zhang Rui 2009-04-02 05:55:28 UTC
please attach the dmesg output after a successful suspend/resume cycle.
does this problem exist in hibernation?
Comment 4 Zhang Rui 2009-04-02 05:59:51 UTC
> Every 20 or S3 suspends, the machine shuts completely off: 

every 20 what?

> the crescent-moon LED turns on,

how many LEDs are on in all?

> I had noticed this problem with vanilla 2.6.27.4, and upgraded to 2.6.29 in
> the
> hope that it would go away.  But if anything it has become more frequent.

what do you mean more frequent?
This bug is not always reproducible in every S3, is it?

I suspect this may be a thermal problem, please attach the /var/log/messages file. and make sure there is no fan spinning when the laptop is suspended.
Comment 5 ykzhao 2009-04-02 06:08:54 UTC
Will you please attach the output of lspci -vxxx?
    Will you please do the following test and see whether the box can be resumed by power buttonr ?
    a. boot the system with the boot option of "acpi_sleep=s3_beep"
    b. kill the process using the /proc/acpi/event (use the command of "lsof /proc/acpi/event" to get the process)
    c. echo mem > /sys/power/state; dmesg >dmesg_after;
    d. after the system enters the suspended state, press the power button and see whether the box can be resumed. Of course please confirm whether the beep voice can be heard.
    e. If the box can't be resumed, please reboot the system and check whether there exists the file of dmesg_after . If exists, please attach it.

    Thanks.
Comment 6 Sanjoy Mahajan 2009-04-02 14:55:53 UTC
Created attachment 20777 [details]
/var/log/messages showing a few S3 suspend/resumes and a reboot
Comment 7 Sanjoy Mahajan 2009-04-02 14:57:07 UTC
>> Every 20 or S3 suspends, the machine shuts completely off: 

> every 20 what?

Sorry, I meant to type "every 20 or so S3 suspends...".

>> the crescent-moon LED turns on,

> how many LEDs are on in all?

If I have the AC plugged in while suspending, all three LEDs (on the
outside of the case) are on, including the crescent moon.  As soon as I
unplug the AC, only the crescent moon remains on.

>> I had noticed this problem with vanilla 2.6.27.4, and upgraded to
>> 2.6.29 in the hope that it would go away.  But if anything it has
>> become more frequent.

> what do you mean more frequent?

With 2.6.27.4, it happened maybe once per 50 suspends; with 2.6.29 it
happens more frequently, maybe once per 20 suspends.

> This bug is not always reproducible in every S3, is it?

No, unfortunately.  It happens about once every 20 cycles.  It seems
more likely to happen if I have it suspended for longer.  I've never
seen it happen with a 5- or 10-minute suspend; only with suspends
lasting, say, an hour or longer.

> I suspect this may be a thermal problem, please attach the
> /var/log/messages file.

I will attach a current /var/log/messages showing a few suspend/resume
cycles as well as a reboot (I think I rebooted due to one of these
resume failures).

> and make sure there is no fan spinning when the laptop is suspended.

The fan is almost never spinning when I suspend; but when it is, it
always shuts off during the suspend (as far as I remember).  But I'll
listen carefully from now on to be sure.

If the fan is spinning, I cannot turn it off by hand because the T60 fan
is controlled only by the BIOS.  There is nothing under /proc/acpi/fan/
for example (there are two thermal zones under
/proc/acpi/thermal_zone/).
Comment 8 Sanjoy Mahajan 2009-04-02 14:58:31 UTC
Created attachment 20778 [details]
dmesg showing several successful suspend/resume cycles
Comment 9 Sanjoy Mahajan 2009-04-02 15:00:01 UTC
> please attach the dmesg output after a successful suspend/resume
> cycle.

I've just attached it.

> does this problem exist in hibernation?

I haven't ever tested hibernation (I use only S3 suspend/resume).
Comment 10 Sanjoy Mahajan 2009-04-02 15:02:26 UTC
Created attachment 20779 [details]
lspci -vxxx output
Comment 11 Sanjoy Mahajan 2009-04-02 15:04:08 UTC
> Will you please attach the output of lspci -vxxx?

Attached.

I'll next try the test you suggest.
Comment 12 Sanjoy Mahajan 2009-04-02 15:25:43 UTC
Created attachment 20781 [details]
dmesg_after after resuming (booted with acpi_sleep=s3_beep)
Comment 13 Sanjoy Mahajan 2009-04-02 15:25:49 UTC
> Will you please do the following test and see whether the box can be
> resumed by power buttonr ?

I did that test.  acpid was using /proc/acpi/event, so I killed it.  I
heard the beep during the suspend.  It resumed using the power button,
making several beeps in the process.  For completeness, I've attached
the dmesg_after file.
Comment 14 Zhang Rui 2009-04-03 01:00:17 UTC
please
1. set CONFIG_PM_DEBUG and rebuild
2. echo core > /sys/power/pm_test
3. echo mem > /sys/power/state
4. run this test for 50 times and see if the problem is reproducible.
Comment 15 Sanjoy Mahajan 2009-04-04 03:42:44 UTC
> 4. run this test for 50 times and see if the problem is reproducible.

After recompiling and "echo core > /sys/power/pm_test", I did

for n in `seq 50`; do 
   echo ==== RUN $n ====; 
   echo mem > /sys/power/state
   sleep 2
done

and all 50 suspend/resume cycles worked fine.
Comment 16 Zhang Rui 2009-04-07 06:55:38 UTC
what about this test:

"echo none > /sys/power/pm_test"
for n in `seq 50`; do 
   dmesg -c   
   echo mem > /sys/power/state
   dmesg > dmesg-$n
   sleep 2
done
when the S3 fails, please ttach the latest dmesg ouput.
Comment 17 Sanjoy Mahajan 2009-04-11 13:24:32 UTC
Created attachment 20938 [details]
final dmesg (all worked)

I ran that test, and (unfortunately) it suspended and resumed fine all 50 times.  For completeness, here is the last dmesg.

Yesterday the bug repeated itself with vanilla 2.6.29 (without PM_DEBUG), but there was nothing useful in the logs.

If I keep running the PM_DEBUG kernel, is there a better chance of finding something in the logs if the problem recurs?
Comment 18 Zhang Rui 2009-04-13 02:19:39 UTC
(In reply to comment #17)
> Created an attachment (id=20938) [details]
> final dmesg (all worked)
> 
> I ran that test, and (unfortunately) it suspended and resumed fine all 50
> times.  For completeness, here is the last dmesg.
> 
> Yesterday the bug repeated itself with vanilla 2.6.29 (without PM_DEBUG), but
> there was nothing useful in the logs.
> 
so you get that dmesg output and there is nothing abnormal?
will you please attach that dmesg output please?

> If I keep running the PM_DEBUG kernel, is there a better chance of finding
> something in the logs if the problem recurs?

I don't think so, unless you can reproduce this bug in PM_DEBUG kernel.
Comment 19 Sanjoy Mahajan 2009-04-13 02:31:27 UTC
> so you get that dmesg output and there is nothing abnormal?  will you
> please attach that dmesg output please?

(It wasn't a PM_DEBUG kernel.)  Because the system crashed, I had to
reboot, which reset the kernel ring buffer.  And because it crashed
before resuming, there was also nothing from the suspend (or the resume)
in the syslog.

> I don't think so, unless you can reproduce this bug in PM_DEBUG
> kernel.

I'll keep running with this PM_DEBUG kernel and see what happens if the
bug reproduces itself in ordinary use (as it has in the past).
Comment 20 Zhang Rui 2009-04-13 02:43:52 UTC
okay, but I think you'd better use a script to save the dmesg before suspend every time when you want to do a S3. :)
Comment 21 Sanjoy Mahajan 2009-04-13 03:34:14 UTC
> okay, but I think you'd better use a script to save the dmesg before suspend
> every time when you want to do a S3.

Good idea.  But if I use a script like

dmesg > /root/s3hang/before.dmesg
echo mem > /sys/power/state

how will it work?  The before.dmesg file will contain all dmesgs, but
only before the S3 suspend started.  The messages generated during the
suspend won't be saved anywhere.  Or is there a way to save those too?

The obvious solutions like

dmesg > /root/s3hang/before.dmesg
echo mem > /sys/power/state
dmesg > /root/s3hang/after.dmesg

won't work since the second dmesg won't run until resume, and even then,
it won't save the file until a sync.

Or am I missing a clever method?
Comment 22 Zhang Rui 2009-04-13 08:46:17 UTC
well, it's difficult to get the dmesg output when system hangs during suspend.
As this bug is not reproducible in every S3, what we can do is to see if there is anything abnormal before the failed suspend, e.g. any device works in an incorrect state before suspend, etc.
Comment 23 Zhang Rui 2009-04-27 07:55:52 UTC
ping Sanjoy
Comment 24 Sanjoy Mahajan 2009-04-27 11:04:12 UTC
I've been running the 2.6.29 kernel w/ PMDEBUG since my last report.

My current theory is that the laptop gets squeezed by books in my
backpack, pushing the power button and shutting off the machine.  I've
therefore been careful over the last few weeks to stand the backpack
upright in order to reduce the chance of that happening.

Since starting that habit, I haven't been able to reproduce the problem.
Which means it may be my fault to begin with, and not an ACPI or even a
hardware problem (except that the T60 lid isn't as sturdy as I would
like).

Sorry for the likely noise.
Comment 25 Zhang Rui 2009-04-28 02:55:15 UTC
hah, thanks for finding out the ROOT CAUSE of the problem. :p

close this bug.