Bug 96111

Summary: Unreliable hibernation on Lenovo X230
Product: Power Management Reporter: rhn (dvyfkebuac.rhn)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED CODE_FIX    
Severity: normal CC: aaron.lu, jlee, lenb, rui.zhang, yu.c.chen
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 3.17 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg - broken commit, fresh boot
dmesg - broken commit, boot->hibernate->failed resume
dmesg - working kernel, boot->hibernate->successful resume
dmesg - 4.2-rc3 + patch
total solution to fix panic and failor during hibernation
dmesg - 4.3-rc2 + patch with oops
dmesg-4.3-rc3 + patch no oops

Description rhn 2015-04-03 15:52:54 UTC
Created attachment 173061 [details]
dmesg - broken commit, fresh boot

Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.

The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.

I have tracked the problem to first appear in the commit
e67ee10190e69332f929bdd6594a312363321a66	Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'

The failure mode looks similar to the one specified by commit
84c91b7ae07c62cf6dee7fde3277f4be21331f85	PM / hibernate: avoid unsafe pages in e820 reserved regions
and reverting this commit seems to solve the problem.
Comment 1 rhn 2015-04-03 15:53:48 UTC
Created attachment 173071 [details]
dmesg - broken commit, boot->hibernate->failed resume
Comment 2 rhn 2015-04-03 15:54:38 UTC
Created attachment 173081 [details]
dmesg - working kernel, boot->hibernate->successful resume
Comment 3 rhn 2015-04-04 08:15:06 UTC
Additional results based on commit 8f778bbc542ddf8f6243b21d6aca087e709cabdc:
8f778bb : bad
8f778bb + reverted 84c91b7 : good
8f778bb + patch [1] : good

patch [1]:
x86: Kill E820_RESERVED_KERN  https://lkml.org/lkml/2015/3/4/434
Comment 4 Lee, Chun-Yi 2015-04-05 08:00:35 UTC
Base on the dmesg of bug description. Confirm the e820 table separate by setup_data that's reserved as E820_RESERVED_KERN regions:

[    0.000000] BIOS-e820: [mem 0x000000005baff000-0x00000000d684ffff] usable

[    0.000000] e820: update [mem 0x9d3e0018-0x9d3f0057] usable ==> usable

[    0.000000] reserve setup_data: [mem 0x000000005baff000-0x000000009d3e0017] usable
[    0.000000] reserve setup_data: [mem 0x000000009d3e0018-0x000000009d3f0057] usable
[    0.000000] reserve setup_data: [mem 0x000000009d3f0058-0x00000000d684ffff] usable

[    0.000000] PM: Registered nosave memory: [mem 0x9d3e0000-0x9d3e0fff]
[    0.000000] PM: Registered nosave memory: [mem 0x9d3f0000-0x9d3f0fff]

The E820_RESERVED_KERN regions causes some regions at e820 table do not page align, so the coes of register nosave memory misjudgment the not non-page align space to be the "hole" space and add to nosave regions.

This issue should fix by Yinghai Lu's patches for clearing the e820 codes and remove E820_RESERVED_KERN regions because kernel already reserved setup_data by memblock, should not change e820 table.

The 84c91b7ae patch should reverted from v4.0-rc kernel until Yinghai Lu's patches merged to v4.1 kernel. I will resend 84c91b7ae patch until Yinghai Lu's patches merged.
Comment 5 Lee, Chun-Yi 2015-04-07 07:59:26 UTC
Tracking Yinghai Lu's patches:
  x86: Kill E820_RESERVED_KERN  https://lkml.org/lkml/2015/3/4/434

Mail loop of "Unreliable hibernation on Lenovo x230":
  https://lkml.org/lkml/2015/4/5/30
Comment 6 Rafael J. Wysocki 2015-04-13 12:52:38 UTC
Fixed by commit f82daee49c09 (Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions").
Comment 7 Chen Yu 2015-07-20 01:49:14 UTC
Hi, kebuac.rhn
Is it convenient for you to apply following patch on latest kernel(4.2-rc2,eg)
,to see if the hibernation/resume works?(also attach the log info)
https://patchwork.kernel.org/patch/6697191/
Much appreciate if you can do that.
Yu
Comment 8 rhn 2015-07-21 15:50:26 UTC
Hi, Chen Yu
I used 4.2-rc3, and it seems that the patch doesn't affect my system in any negative way. Actually, I wasn't able to see any difference at all in dmesg on the x230.
You should find dmesg attached.
rhn
Comment 9 rhn 2015-07-21 15:52:33 UTC
Created attachment 183231 [details]
dmesg - 4.2-rc3 + patch

sequence: boot 4.0.7.fc21 -> boot 4.2-rc3+patch->hibernate->resume -> *dmesg*
Comment 10 Chen Yu 2015-09-02 10:17:36 UTC
Created attachment 186491 [details]
total solution to fix panic and failor during hibernation
Comment 11 Chen Yu 2015-09-16 15:18:56 UTC
Hello, kebuac.rhn
Do you have time to test above patch attached at #comment 10? thanks a lot.
Yu
Comment 12 rhn 2015-09-16 19:31:50 UTC
Hi Yu,
Yes, I will test it in the next week or two. Sorry for not doing it earlier, I thought it was the same patch as before.
rhn
Comment 13 rhn 2015-11-15 23:52:35 UTC
I tested the patch with 4.3.0-rc2 kernel (I had it built for a while, but didn't get a chance to reboot).

It works both with and without the patch. Patched version gave me a display-related oops on one boot, but maybe not related.
Comment 14 rhn 2015-11-15 23:53:48 UTC
Created attachment 193151 [details]
dmesg - 4.3-rc2 + patch with oops
Comment 15 rhn 2015-11-15 23:54:29 UTC
Created attachment 193161 [details]
dmesg-4.3-rc3 + patch no oops