Bug 15262 - Bizarre reproducible resume failure on MBP
Summary: Bizarre reproducible resume failure on MBP
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: power-management_other
URL:
Keywords:
Depends on:
Blocks: 7216
  Show dependency tree
 
Reported: 2010-02-09 16:33 UTC by jwatzman
Modified: 2011-01-17 22:25 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.33-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Configuration for original 2.6.32.6 build (75.26 KB, application/octet-stream)
2010-02-09 20:54 UTC, jwatzman
Details
"diff -Nur" between 2.6.32.6 debs build from tarball and git (9.29 KB, application/octet-stream)
2010-02-09 20:55 UTC, jwatzman
Details

Description jwatzman 2010-02-09 16:33:14 UTC
(Reported on LKML yesterday, but this is probably the correct place to report it -- sorry! I've never tried to track down a kernel bug before :))

I'm having some weird issues where my MacBook Pro fails to resume from suspend. The failure mode is: backlight on, cdrom drive does its "reset" noise thing, but nothing other than that -- no response to keyboard, for example, when Control-Alt-F1 then Control-Alt-Delete should reboot.

The following kernels consistently FAIL when resuming:
 - 2.6.32.7 built from kernel.org 2.6.32 tarball + 2.6.32.7 patch
 - 2.6.32.6 built from
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.32.y.git
 - 2.6.32.7 built from
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.32.y.git

The following kernel consistently SUCCEEDS when resuming:
 - 2.6.32.6 built from kernel.org 2.6.32 tarball + 2.6.32.6 patch

Note that 2.6.32.6 either fails or succeeds depending upon how I build it! To make sure, I built the two variations of .6 on the same day, no system changes in between, with the same .config, and did a diff of the source -- everything was the same AFAICT, but the git one fails and the tarball one is fine! The failures and successes have been consistent, so either the issue is intermittent and I'm *really* unlucky or something really strange is going on.

Google suggested to turn on the PM debug options and try a couple of things; I did the following on the tty of a 2.6.32.6 git build with .config modified only to turn on PM debugging:
 - run through the /sys/power/pm_test options; it resumes successfully from everything except "none"
 - echo 1 > /sys/power/pm_trace and use the magic number. dmesg on the next boot reported the number as "0:168:740" corresponding to "drivers/base/power/main.c:430" -- it did not report a device I have no idea how to debug this further

Specs:
MacBookPro2,2; Debian sid (not quite up-to-date as I keep things consistent to track this down); Core 2 Duo 2.33Ghz; 2G ram; 1G swapfile
Comment 1 Rafael J. Wysocki 2010-02-09 19:59:33 UTC
 - 2.6.32.6 built from
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.32.y.git

The following kernel consistently SUCCEEDS when resuming:
 - 2.6.32.6 built from kernel.org 2.6.32 tarball + 2.6.32.6 patch

Aren't these two the same thing and if they aren't, what's the difference?
Comment 2 jwatzman 2010-02-09 20:54:10 UTC
Created attachment 24978 [details]
Configuration for original 2.6.32.6 build
Comment 3 jwatzman 2010-02-09 20:55:29 UTC
Created attachment 24979 [details]
"diff -Nur" between 2.6.32.6 debs build from tarball and git
Comment 4 jwatzman 2010-02-09 20:56:22 UTC
A recursive diff shows that the sources are the same, hence my extreme confusion on how builds from one consistently fail and builds from the other consistently work!

I've attached a diff of the resulting deb directory structure and my .config in case they might be interesting/enlightening.
Comment 5 jwatzman 2010-02-09 21:46:33 UTC
Okay, this just got a bit less insane. The tarball build of 2.6.32.6 I made yesterday just failed on me a few times in a row (it didn't ever when I tested it yesterday).

Assuming I just had really, really bad luck with an intermittent suspend bug, what's the next step in tracking it down? Should I retry the pm_test and pm_trace steps on 2.6.32.7?
Comment 6 Rafael J. Wysocki 2010-02-09 21:51:42 UTC
No, please test if you can reproduce with 2.6.33-rc7.
Comment 7 jwatzman 2010-02-10 00:46:32 UTC
Reproduced twice in a row, same symptoms as before.

 - boot into single user mode root prompt
 - cd /sys/power
 - echo core > pm_test
 - pm-suspend (repeat 3-4 times, works fine)
 - echo none > pm_test
 - echo 1 > pm_trace
 - pm-suspend (machine hangs on resume same as before)
 - hard poweroff
 - boot, reports magic number "0:839:740" corresponding to "drivers/base/power/main.c:514" (no device reported)
Comment 8 Rafael J. Wysocki 2011-01-16 22:18:46 UTC
Is the problem still present in 2.6.37?
Comment 9 jwatzman 2011-01-17 16:07:22 UTC
No, everything has worked fine for a while and I forgot about this bug report. Thanks for the followup.
Comment 10 Rafael J. Wysocki 2011-01-17 22:25:24 UTC
No problem, thanks for the update.  Closing.

Note You need to log in before you can comment on or make changes to this bug.