Bug 8680

Summary: Suspend/resume fails/stalls until keyboard interrupt occurs
Product: Power Management Reporter: Markus Schaub (linux)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjwysocki)
Status: REJECTED UNREPRODUCIBLE    
Severity: normal CC: acpi-bugzilla, bunk, jesse.brandeburg, protasnb, rjwysocki, tglx, venki
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: my dmesg during suspend/resume
my .config
CPU0/power on AC
CPU0/power on DC
timer_list on AC
timer_list on DC

Description Markus Schaub 2007-06-26 13:09:35 UTC
Most recent kernel where this bug did not occur: 2.6.20
Hardware Environment: IBM Thinkpad T43
Problem Description:
Suspend and resume to/from disk randomly fails. Sometimes pressing any key during supend or resume leads to succesful suspend/resume.
Comment 1 Andrew Morton 2007-06-26 13:16:13 UTC
Subject: Re: [Bugme-new]  New: Suspend/resume fails/stalls until
 keyboard interrupt occurs

On Tue, 26 Jun 2007 13:05:44 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8680
> 
>            Summary: Suspend/resume fails/stalls until keyboard interrupt
>                     occurs
>            Product: Power Management
>            Version: 2.5
>      KernelVersion: 2.6.22-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Hibernation/Suspend
>         AssignedTo: power-management_other@kernel-bugs.osdl.org
>         ReportedBy: linux@markus-schaub.de
> 
> 
> Most recent kernel where this bug did not occur: 2.6.20
> Hardware Environment: IBM Thinkpad T43
> Problem Description:
> Suspend and resume to/from disk randomly fails. Sometimes pressing any key
> during supend or resume leads to succesful suspend/resume.
> 

(pleae respond via emailed reply-to-all)

This might be a post-2.6.21 regression.  Are you able to find out if 2.6.21
had the same bug?

Thanks.
Comment 2 Jesse Brandeburg 2007-07-30 09:54:42 UTC
I think I'm seeing this same bug on 2.6.23-rc1+git, with my T43

I noticed as I was coming out of S3, that I was getting a very slow flashing cursor so I waited and pressed keys and about 5 minutes later the laptop finished coming out of suspend and I'm working on it now.

There seem to be no interesting messages in dmesg, but I'll attach mine + config.  What can I do to help debug?
Comment 3 Jesse Brandeburg 2007-07-30 09:56:16 UTC
Created attachment 12205 [details]
my dmesg during suspend/resume
Comment 4 Jesse Brandeburg 2007-07-30 09:56:58 UTC
Created attachment 12206 [details]
my .config
Comment 5 Rafael J. Wysocki 2007-08-08 09:57:36 UTC
(In reply to comment #4)
> my .config

Can you please try without CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS?
Comment 6 Jesse Brandeburg 2007-08-28 11:28:01 UTC
It happened once more, but I've had many suspend/resume cycles where it was fine.  Just wanted to add a note that it is difficult to reproduce.
Comment 7 Jesse Brandeburg 2007-08-28 11:28:34 UTC
I still have to try the CONFIG_NO_HZ=n
Comment 8 Rafael J. Wysocki 2007-09-23 05:34:19 UTC
There is a fix that may be related to this issue in the current Linus' tree (2.6.23-rc7-git4 as of today).  Please test it.
Comment 9 Jesse Brandeburg 2007-09-23 19:35:24 UTC
I tried with the latest git, rc7+, and still reproduced this issue with CONFIG_NO_HZ=y

all I did was boot into X, log in, run powertop, and suspend/resume a few times first with power cord plugged in, and ran into the problem with power cord unplugged on the third or fourth resume.  Its not immediately apparent that keyboard presses help (or hurt) I just can't tell, but it took like 4 or more minutes to resume with a very very slow flashing cursor in upper left.

btw, I'm running in vesafb mode 0x318 on console, don't know if that matters or not.
Comment 10 Rafael J. Wysocki 2007-09-24 05:00:03 UTC
Well, please try CONFIG_NO_HZ=n and CONFIG_HIGH_RES_TIMERS=n, then.
Comment 11 Jesse Brandeburg 2007-09-24 10:21:52 UTC
it seems it is *probably* related to CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS.

I've done 10 S3/resume cycles with varying states of plugged in/unplugged, and have not been able to repro.

I'll reboot again to the old kernel with NO_HZ and HRT enabled and try once more to repro, just to make sure I'm using a valid test.
Comment 12 Jesse Brandeburg 2007-09-27 08:32:43 UTC
ah, bugzilla is back.  with the NO_HZ kernel my laptop eventually had the problem again, but after 7-10 s3/resume cycles.

Are there some more debugging options that I can turn on?  Maybe timestamping dmesg?  I'm willing to try some more stuff, and I'm sorry that I haven't been able to nail down a test case to 100% repro.

unrelated to this bug but FYI, the hot-keys only work once to put the machine to sleep, I suppose it could be my user-space.
Comment 13 Rafael J. Wysocki 2007-09-27 08:58:45 UTC
(In reply to comment #12)
> ah, bugzilla is back.  with the NO_HZ kernel my laptop eventually had the
> problem again, but after 7-10 s3/resume cycles.
> 
> Are there some more debugging options that I can turn on?

Well, nothing obviously useful comes to mind.

> Maybe timestamping dmesg?

Yes, you can try that.

> I'm willing to try some more stuff, and I'm sorry that I haven't been able
> to nail down a test case to 100% repro.

There's nothing to be sorry about. :-)  Thanks for your patience.

> unrelated to this bug but FYI, the hot-keys only work once to put the machine
> to sleep, I suppose it could be my user-space.

Yes, it could be, but also it may be an ACPI problem.  Please try to rule out the user space and create a separate bugizlla entry if that turns out to be a kernel problem.
Comment 14 Thomas Gleixner 2007-09-27 09:17:39 UTC
Some questions:

- Is this ever happening, when you do suspend/resume while AC is plugged in ?

- Did you unplug AC while the box was in suspend ?

- Can you please try "clocksource=acpi_pm" on the kernel command line ?

Please provide the output of /proc/acpi/processor/CPU0/power for AC plugged and unplugged. Also the output of /proc/timer_list for both states.

Thanks,

      tglx
Comment 15 Jesse Brandeburg 2007-09-27 09:27:11 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> - Is this ever happening, when you do suspend/resume while AC is
> plugged in ? 

well, honestly I'm not sure.  I think it has happenned in the past when
plugged into AC.
 
> - Did you unplug AC while the box was in suspend ?

in the interests of trying various stuff when attempting to repro, yes,
but I don't know if it is required as it is very difficult to repro and
I was trying all sorts of stuff.

It appears I just need to go in and out of suspend with wireless
enabled, until I hit the problem.
 
> - Can you please try "clocksource=acpi_pm" on the kernel command line
> ? 

okay, that's up next, I'm assuming you mean with the NO_HZ kernel.

> Please provide the output of /proc/acpi/processor/CPU0/power for AC
> plugged and unplugged. Also the output of /proc/timer_list for both
> states. 

I'll attach that in a minute.
Comment 16 Thomas Gleixner 2007-09-27 09:42:17 UTC
> It appears I just need to go in and out of suspend with wireless
> enabled, until I hit the problem.

Hmm.

> > - Can you please try "clocksource=acpi_pm" on the kernel command line
> > ? 
> 
> okay, that's up next, I'm assuming you mean with the NO_HZ kernel.

Yes.
Comment 17 Jesse Brandeburg 2007-09-27 14:21:22 UTC
okay, with clocksource=acpi_pm:
15 S3/resume cycles on AC

suspend, unplug AC
1 Successful wake
suspend
then repro again!

its not immediately clear that pressing any keys helps it wake up faster, but it will eventually come back (cursor flashes really really slow)
Comment 18 Jesse Brandeburg 2007-09-27 14:22:14 UTC
Created attachment 12969 [details]
CPU0/power on AC
Comment 19 Jesse Brandeburg 2007-09-27 14:22:36 UTC
Created attachment 12970 [details]
CPU0/power on DC
Comment 20 Jesse Brandeburg 2007-09-27 14:22:56 UTC
Created attachment 12971 [details]
timer_list on AC
Comment 21 Jesse Brandeburg 2007-09-27 14:23:25 UTC
Created attachment 12972 [details]
timer_list on DC
Comment 22 Thomas Gleixner 2007-11-14 14:56:11 UTC
Hmm, that's odd:

active state:            C2 (with AC)
active state:            C3 (w/o AC)

How can this happen ? This is an UP machine and it is in C2(3) while it reads out /proc/acpi/...../CPU0/power

Jesse, is this problem still there with .23 or later ?
Comment 23 Jesse Brandeburg 2007-12-14 10:25:17 UTC
I haven't seen it lately, i'm sorry i'm only able to test infrequently.
Comment 24 Jesse Brandeburg 2007-12-14 20:51:09 UTC
I tested 11 S3/wake cycles when on battery with wireless enabled, I'm using 2.6.24-rc4, with HRT and NO_HZ.

I believe it is fixed at this point.  I have to say suspend resume is much more usable now that i can wake the T43 with fn key or power or lid switch reliably.

Thanks for your attention.  I'll have to leave it to someone else to mark this fixed because I didn't file the bug.
Comment 25 Len Brown 2007-12-14 20:59:10 UTC
thanks for testing, Jesse.