Bug 201855 - Hibernate and suspend fails with "Image not found (code -22)"
Summary: Hibernate and suspend fails with "Image not found (code -22)"
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Chen Yu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-03 15:58 UTC by djl
Modified: 2020-11-04 08:18 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.19
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
debug patch to track why swap header is invalid (1.33 KB, text/plain)
2019-03-26 12:24 UTC, Chen Yu
Details

Description djl 2018-12-03 15:58:57 UTC
Steps to reproduce:

1. Install 4.19
2. Run `echo suspend > /sys/power/disk; echo disk > /sys/power/state`

Expected results:

The machine should write the hibernation image to swap then suspend to RAM as documented here: https://www.kernel.org/doc/Documentation/power/swsusp.txt

Actual results:

The hibernation image is correctly written to swap but never suspends to RAM. The kernel log shows "Image not found (code -22)" immediately after issuing the above commands. Rebooting the machine manually correctly resumes from the swap image.

Other info:

Doesn't happen with the last 4.18 release. Started with 4.19 and persists through 4.19.6.
Comment 1 Chen Yu 2018-12-12 15:27:31 UTC
(In reply to djl from comment #0)
> Steps to reproduce:
> 
> 1. Install 4.19
> 2. Run `echo suspend > /sys/power/disk; echo disk > /sys/power/state`
> 
> Expected results:
> 
> The machine should write the hibernation image to swap then suspend to RAM
> as documented here: https://www.kernel.org/doc/Documentation/power/swsusp.txt
> 
> Actual results:
> 
> The hibernation image is correctly written to swap but never suspends to
> RAM. The kernel log shows "Image not found (code -22)" immediately after
> issuing the above commands. 
Does it mean, the system returns to the shell after running above commands?
Or it hangs there?
> Rebooting the machine manually correctly resumes
Please define 'manually': does it mean, pressing the power button(reset button)?
> from the swap image.
> 
Do other modes work? such as platform, shutdown, test_resume , etc
> Other info:
> 
> Doesn't happen with the last 4.18 release. Started with 4.19 and persists
> through 4.19.6.
Previously there is a issue in debian when trying to use customized dev offset when doing hibernation, but it should have been fixed. And if it still happens, could you please help do a git bisect to find the bad commit?
Comment 2 djl 2018-12-15 20:39:17 UTC
(In reply to Chen Yu from comment #1)
> Does it mean, the system returns to the shell after running above commands?
> Or it hangs there?

The system hangs here.

> Please define 'manually': does it mean, pressing the power button(reset
> button)?

Yes. I need to hold the power button until it shuts off.

> Do other modes work? such as platform, shutdown, test_resume , etc

platform and shutdown both suspend to disk (but not RAM) as expected. test_resume seems to exhibit the same behavior as "suspend".

> Previously there is a issue in debian when trying to use customized dev
> offset when doing hibernation, but it should have been fixed. And if it still
> happens, could you please help do a git bisect to find the bad commit?

Is this still relevant even if I'm not using Debian?

I've tried a git bisect but after a few days of testing I haven't been able to find the problematic commit. If I find some more time I'll spin up some VMs to speed the process along.
Comment 3 Chen Yu 2018-12-18 03:31:01 UTC
(In reply to djl from comment #0)
> Steps to reproduce:
> 
> 1. Install 4.19
> 2. Run `echo suspend > /sys/power/disk; echo disk > /sys/power/state`
> 
> Expected results:
> 
> The machine should write the hibernation image to swap then suspend to RAM
> as documented here: https://www.kernel.org/doc/Documentation/power/swsusp.txt
> 
> Actual results:
> 
> The hibernation image is correctly written to swap but never suspends to
> RAM. The kernel log shows "Image not found (code -22)" immediately after
> issuing the above commands.
This confused me since this message will only be printed in two cases:
1. During normal hibernation resume, or
2. During test_resume phase,
Since you are testing 'suspend', this message should pop up..
> Rebooting the machine manually correctly resumes
> from the swap image.
> 
> Other info:
> 
> Doesn't happen with the last 4.18 release. Started with 4.19 and persists
> through 4.19.6.
Comment 4 Chen Yu 2018-12-18 03:32:45 UTC
(In reply to djl from comment #2)
> (In reply to Chen Yu from comment #1)
> > Does it mean, the system returns to the shell after running above commands?
> > Or it hangs there?
> 
> The system hangs here.
> 
> > Please define 'manually': does it mean, pressing the power button(reset
> > button)?
> 
> Yes. I need to hold the power button until it shuts off.
> 
> > Do other modes work? such as platform, shutdown, test_resume , etc
> 
> platform and shutdown both suspend to disk (but not RAM) as expected.
> test_resume seems to exhibit the same behavior as "suspend".
> 
> > Previously there is a issue in debian when trying to use customized dev
> > offset when doing hibernation, but it should have been fixed. And if it
> still
> > happens, could you please help do a git bisect to find the bad commit?
> 
> Is this still relevant even if I'm not using Debian?
> 
Not sure, it depends.
> I've tried a git bisect but after a few days of testing I haven't been able
> to find the problematic commit. If I find some more time I'll spin up some
> VMs to speed the process along.
How about revert the following commit:
commit 355064675f1c997cea017ea64c8f2c216e5425d9
Author: Mario Limonciello <mario.limonciello@dell.com>
Date:   Wed Mar 28 12:01:09 2018 -0500

    PM / hibernate: Make passing hibernate offsets more friendly
Comment 5 djl 2018-12-18 12:34:04 UTC
(In reply to Chen Yu from comment #4)
> How about revert the following commit:
> commit 355064675f1c997cea017ea64c8f2c216e5425d9
> Author: Mario Limonciello <mario.limonciello@dell.com>
> Date:   Wed Mar 28 12:01:09 2018 -0500
> 
>     PM / hibernate: Make passing hibernate offsets more friendly

No luck, unfortunately. Reverting that commit still shows exactly the same problem.
Comment 6 Chen Yu 2018-12-27 09:17:11 UTC
(In reply to djl from comment #5)
> (In reply to Chen Yu from comment #4)
> > How about revert the following commit:
> > commit 355064675f1c997cea017ea64c8f2c216e5425d9
> > Author: Mario Limonciello <mario.limonciello@dell.com>
> > Date:   Wed Mar 28 12:01:09 2018 -0500
> > 
> >     PM / hibernate: Make passing hibernate offsets more friendly
> 
> No luck, unfortunately. Reverting that commit still shows exactly the same
> problem.
Thanks, David.
Could you please describe more about the symptom:
"The kernel log shows "Image not found (code -22)" immediately after issuing 
above commands:  echo suspend > /sys/power/disk; echo disk > /sys/power/state
According to my understanding, the message '"Image not found (code -22)"' should not appear  during hibernate/suspend phase, but only triggered during resume or test_resume mode.

1. Please switch to the latest upstream kernel.
2. provide:
   # grep . /sys/power/*
   # swapon -s
   # cat /proc/partitions
then try again:
3. echo test_resume > /sys/power/disk; echo disk > /sys/power/state
4. echo suspend > /sys/power/disk; echo disk > /sys/power/state

If it does not work neither, then the best way to figure it out is to 
use git bisect.
Comment 7 djl 2018-12-31 16:05:40 UTC
> Could you please describe more about the symptom:
> "The kernel log shows "Image not found (code -22)" immediately after issuing
> above commands:  echo suspend > /sys/power/disk; echo disk > /sys/power/state
> According to my understanding, the message '"Image not found (code -22)"' should not appear  during hibernate/suspend phase, but only triggered during resume or test_resume mode.

Is there anything specific you'd like me to describe? So far I've given all the information I've been able to gather. I've run out of things to say :)

But it appears I'm not the only one with this issue: https://www.reddit.com/r/archlinux/comments/a1xzh5/systemctl_hybridsleep_broken_in_linux419x/


> 1. Please switch to the latest upstream kernel.

Switched to 4.20.


> 2. provide:
>    # grep . /sys/power/*

    /sys/power/autosleep:off
    /sys/power/disk:[platform] shutdown reboot suspend test_resume 
    /sys/power/image_size:13451300864
    /sys/power/mem_sleep:s2idle [deep]
    /sys/power/pm_async:1
    /sys/power/pm_debug_messages:0
    /sys/power/pm_freeze_timeout:20000
    /sys/power/pm_print_times:0
    /sys/power/pm_test:[none] core processors platform devices freezer
    /sys/power/pm_trace:0
    /sys/power/pm_trace_dev_match:ieee80211
    /sys/power/pm_trace_dev_match:leds
    /sys/power/pm_trace_dev_match:usbhid
    /sys/power/pm_trace_dev_match:acpi
    grep: /sys/power/pm_wakeup_irq: No data available
    /sys/power/reserved_size:1048576
    /sys/power/resume:253:2
    /sys/power/resume_offset:0
    /sys/power/state:freeze mem disk
    /sys/power/wakeup_count:0


>    # swapon -s

    Filename                Type        Size    Used    Priority
    /dev/dm-2                               partition   16777212    0   -2


>    # cat /proc/partitions

    major minor  #blocks  name

       8        0 1953514584 sda
       8       16 1953514584 sdb
       8       32  250059096 sdc
       8       33     524288 sdc1
       8       34    1048576 sdc2
       8       35  247436615 sdc3
       8       48  250059096 sdd
       8       49     524288 sdd1
       8       50    1048576 sdd2
       8       51  248485191 sdd3
       8       80 3907018584 sdf
       8       81 3907017543 sdf1
       8       64  976762584 sde
       8       65      16384 sde1
       8       66  976744448 sde2
       9      127  247305536 md127
       9        1 1953383488 md1
     253        0  247303488 dm-0
     253        1  230522880 dm-1
     253        2   16777216 dm-2
     253        3 1953381440 dm-3


> 3. echo test_resume > /sys/power/disk; echo disk > /sys/power/state
> 4. echo suspend > /sys/power/disk; echo disk > /sys/power/state

These commands still show the same behavior under 4.20.

> If it does not work neither, then the best way to figure it out is to
> use git bisect.

I'm still working on this but since this is a workstation, it's difficult to find time to constantly build and reboot.
Comment 8 Chen Yu 2019-03-26 12:24:10 UTC
Created attachment 282033 [details]
debug patch to track why swap header is invalid

Hi, 
Since I've no idea why resume was triggered during hibernation, here's a debug patch to track why this happened.
Could you please test it on latest upstream kernel and provide the log displayed 
during hibernation hang? 
PS1 : Please provide ls -l /dev/dm-1 before running hibernation. 
PS2: Please use echo disk > /sys/power/state to trigger the hibernation.
Comment 9 djl 2019-03-26 15:54:44 UTC
Thanks for keeping up with this ticket!

> PS1 : Please provide ls -l /dev/dm-1 before running hibernation.

```
$ ls -l /dev/dm-1
brw-rw---- 1 root disk 253, 1 Mar 26 15:39 /dev/dm-1
```


> PS2: Please use echo disk > /sys/power/state to trigger the hibernation.

Here's the new output from the patch:

```
PM: Image not found (code -22), resume device (253:2)
Call Trace:
 dump_stack+0x5c/0x80
 swsusp_check+0xdd/0x170
 software_resume+0xec/0x210
 resume_store+0x7d/0xa0
 kernfs_fop_write+0x116/0x190
 __vfs_write+0x36/0x1b0
 ? handle_mm_fault+0x10a/0x250
 vfs_write+0xa9/0x1a0
 ksys_write+0x4f/0xb0
 do_syscall_64+0x5b/0x170
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7eff5db5e818
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 6d 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:00007fff8f9f5d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007eff5db5e818
RDX: 0000000000000005 RSI: 000055b8dc3ed120 RDI: 0000000000000001
RBP: 000055b8dc3ed120 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007eff5dc315c0
R13: 0000000000000005 R14: 00007eff5dc2c5c0 R15: 0000000000000005
```
Comment 10 Chen Yu 2019-07-01 09:43:35 UTC
(In reply to djl from comment #9)
> Thanks for keeping up with this ticket!
> 
> > PS1 : Please provide ls -l /dev/dm-1 before running hibernation.
> 
> ```
> $ ls -l /dev/dm-1
> brw-rw---- 1 root disk 253, 1 Mar 26 15:39 /dev/dm-1
> ```
> 
> 
> > PS2: Please use echo disk > /sys/power/state to trigger the hibernation.
> 
> Here's the new output from the patch:
> 
> ```
> PM: Image not found (code -22), resume device (253:2)
> Call Trace:
>  dump_stack+0x5c/0x80
>  swsusp_check+0xdd/0x170
>  software_resume+0xec/0x210
>  resume_store+0x7d/0xa0
>  kernfs_fop_write+0x116/0x190
>  __vfs_write+0x36/0x1b0
>  ? handle_mm_fault+0x10a/0x250
>  vfs_write+0xa9/0x1a0
>  ksys_write+0x4f/0xb0
>  do_syscall_64+0x5b/0x170
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7eff5db5e818
> Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48
> 8d 05 25 6d 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff
> 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> RSP: 002b:00007fff8f9f5d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007eff5db5e818
> RDX: 0000000000000005 RSI: 000055b8dc3ed120 RDI: 0000000000000001
> RBP: 000055b8dc3ed120 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007eff5dc315c0
> R13: 0000000000000005 R14: 00007eff5dc2c5c0 R15: 0000000000000005
> ```

Okay, I got some time to look at this. 
This appears to be that Debian in user space tries to trigger the resume when the system is falling to hibernation. In order to narrow down, how about booting the system with the following command line append in grub: "init=/bin/bash" , then do the S4 test.
Comment 11 djl 2019-07-05 11:19:26 UTC
> Okay, I got some time to look at this. 
This appears to be that Debian in user space tries to trigger the resume when the system is falling to hibernation. In order to narrow down, how about booting the system with the following command line append in grub: "init=/bin/bash" , then do the S4 test.

I'm not running Debian on the affected machine (but Arch Linux) so this may not be relevant to me but went ahead and ran the test anyway and noticed something new.

Now it correctly suspends and hibernates but no longer resumes. After waking, the display is blank. Like the original problem, I need to force the machine to power down and reboot before the display works again.

This happens both with and without "init=/bin/bash".

I actually noticed this a few weeks ago and assumed that it was a problem with my personal configuration. Since this still happens "init=/bin/bash" this is unlikely to be a configuration problem on my end.

My kernel version is 5.1.15.
Comment 12 Chen Yu 2019-07-05 14:57:10 UTC
(In reply to djl from comment #11)
> > Okay, I got some time to look at this. 
> This appears to be that Debian in user space tries to trigger the resume
> when the system is falling to hibernation. In order to narrow down, how
> about booting the system with the following command line append in grub:
> "init=/bin/bash" , then do the S4 test.
> 
> I'm not running Debian on the affected machine (but Arch Linux) so this may
> not be relevant to me but went ahead and ran the test anyway and noticed
> something new.
> 
> Now it correctly suspends and hibernates but no longer resumes. After
> waking, the display is blank. Like the original problem, I need to force the
> machine to power down and reboot before the display works again.
> 
This seems to be another issue now. Is it possible that the system has resumed, however there's no display due to graphic driver issue? Could you please help check if blacklist the graphic driver and then after resumed bindly type 'reboot' if it works? Also w/o graphic driver loaded, does test_resume mode works?
> This happens both with and without "init=/bin/bash".
> 
> I actually noticed this a few weeks ago and assumed that it was a problem
> with my personal configuration. Since this still happens "init=/bin/bash"
> this is unlikely to be a configuration problem on my end.
> 
> My kernel version is 5.1.15.
Comment 13 djl 2019-07-06 09:34:17 UTC
> This seems to be another issue now. Is it possible that the system has
> resumed, however there's no display due to graphic driver issue? Could
> you please help check if blacklist the graphic driver and then after
> resumed bindly type 'reboot' if it works?

Booting without a graphics driver doesn't work. Typing "reboot" after resuming doesn't reboot the machine.

> Also w/o graphic driver loaded, does test_resume mode works?

Without a graphics driver, test_resume *does* work.
Comment 14 Chen Yu 2019-09-09 09:23:27 UTC
(In reply to djl from comment #13)
> > This seems to be another issue now. Is it possible that the system has
> > resumed, however there's no display due to graphic driver issue? Could
> > you please help check if blacklist the graphic driver and then after
> > resumed bindly type 'reboot' if it works?
> 
> Booting without a graphics driver doesn't work. Typing "reboot" after
> resuming doesn't reboot the machine.
> 
> > Also w/o graphic driver loaded, does test_resume mode works?
> 
> Without a graphics driver, test_resume *does* work.
Okay, to summary, the 'Image not found' error was gone, but it hangs when graphic driver resumes. Is it i915 driver?
Comment 15 djl 2019-09-10 14:31:04 UTC
> Okay, to summary, the 'Image not found' error was gone, but it hangs when graphic driver resumes.

Correct.

> Is it i915 driver?

No, this happens with the (proprietary) Nvidia drivers. It also happened with the Nouveau drivers but that was quite a while ago and I haven't been able to test with more recent versions yet. I'll try to find time this weekend to test this again.
Comment 16 djl 2020-07-21 10:19:02 UTC
Sorry for the super long delay :/

This turned out to be a bug in the Nvidia driver, both proprietary and Nouveau. I've recently had to move to an AMD GPU and this is no longer a problem.

Given that I seem to be the only one with this problem - or at least the only one reporting a problem - it's entirely possible that my old Nvidia GPU was faulty.

Note You need to log in before you can comment on or make changes to this bug.