Bug 24882

Summary: PM/Hibernate: Memory corruption patch introduces regression (2.6.36.2)
Product: Power Management Reporter: akwatts
Component: Hibernation/SuspendAssignee: power-management_other
Status: CLOSED UNREPRODUCIBLE    
Severity: high CC: florian, hughd, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 21782    

Description akwatts 2010-12-14 04:00:31 UTC
The 'PM / Hibernate: Fix memory corruption related to swap' code introduced in 2.6.36.2 (commit 53e87163a135b1c868f31327c7f0b34feb605506) prevents proper resume after hibernation.

======================
From 53e87163a135b1c868f31327c7f0b34feb605506 Mon Sep 17 00:00:00 2001
From: Rafael J. Wysocki <rjw@sisk.pl>
Date: Fri, 3 Dec 2010 22:57:45 +0100
Subject: [PATCH] PM / Hibernate: Fix memory corruption related to swap
======================

Tests indicate that "devices" works but "platform" hangs on resume.

State after image is read to 100% varies (sometimes there is a blinking underscore on an otherwise black screen with limited kb input and other times just a black screen with no response at the kb). Logs are uninteresting.

Reverting 53e87163a135b1c868f31327c7f0b34feb605506 (c9e664f1fdf34aa8cede047b206deaa8f1945af0 upstream) corrects the behavior and hibernate->resume once again works.

~ Andy
Comment 1 Rafael J. Wysocki 2010-12-14 19:47:14 UTC
What platform is this?  Do you use the built-in hibernation or s2disk?
Comment 2 Rafael J. Wysocki 2010-12-14 19:48:42 UTC
BTW, is the problem 100% reproducible?
Comment 3 Rafael J. Wysocki 2010-12-14 23:12:33 UTC
Also, can you check if the problem is reproducible with the current mainline
kernel, please?
Comment 4 Florian Mickler 2011-02-09 05:34:13 UTC
*ping*
Comment 6 akwatts 2011-03-04 19:57:29 UTC
Excuse the delay, illness kept me away.

This has turned out to be a very difficult problem to intelligently diagnose (at least for me) because of my inability to systematically reproduce. So to answer your first question above, it is *not* 100% reproducible.

To answer your second question, the method I use is internal hibernation (i.e. echo disk > /sys/power/state)

It also doesn't seem related to memory load since the hangs on resume can occur if I hibernate just after starting X (with minimal load on RAM) or after starting lots of memory intensive applications in 4 virtual desktops.

Breakdown of problem by kernel version:

2.6.35.2                        NO PROBLEMS
2.6.36.1                        NO PROBLEMS*
2.6.36.2                        PROBLEMS
2.6.36.2 (w/o commit 53e87163)  NO PROBLEMS
2.6.37-2.6.37.2                 NO PROBLEMS*

*I did not do as much testing on these as with the 2.6.35 branch but I have not been able to trigger a resume hang yet.

By the way, I noticed an incredible speed improvement in both the snapshot generation/saving and the resumption from hibernation as of 2.6.37. Congratulations; it is now a bit painful to go back to the .36 branch.

I will try the commit mentioned above but I noticed that was introduced in 2.6.37.2 and both 2.6.37 and 2.6.37.1 seem to be OK (tough to say for sure on 2.6.37.1 since I only had it on my system for a short time). Is it still worth testing this particular commit given this information?

Also, I do not envision returning to the 2.6.36.x branch since so far 2.6.37.x is working great. So, my personal stake in the bug is reduced but I am quite willing to continue helping to debug for the benefit of others.

Please let me know how best to proceed. Thanks.

~ Andy
Comment 7 Rafael J. Wysocki 2011-03-06 12:24:36 UTC
2.6.36.y is EOL now, so if the bug is not visible in 2.6.37.y any more,
I'm closing this entry as "unreproducible".