Bug 201855
Summary: | Hibernate and suspend fails with "Image not found (code -22)" | ||
---|---|---|---|
Product: | Drivers | Reporter: | djl |
Component: | Other | Assignee: | Chen Yu (yu.c.chen) |
Status: | CLOSED WILL_NOT_FIX | ||
Severity: | normal | CC: | djl, yu.c.chen |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.19 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | debug patch to track why swap header is invalid |
Description
djl
2018-12-03 15:58:57 UTC
(In reply to djl from comment #0) > Steps to reproduce: > > 1. Install 4.19 > 2. Run `echo suspend > /sys/power/disk; echo disk > /sys/power/state` > > Expected results: > > The machine should write the hibernation image to swap then suspend to RAM > as documented here: https://www.kernel.org/doc/Documentation/power/swsusp.txt > > Actual results: > > The hibernation image is correctly written to swap but never suspends to > RAM. The kernel log shows "Image not found (code -22)" immediately after > issuing the above commands. Does it mean, the system returns to the shell after running above commands? Or it hangs there? > Rebooting the machine manually correctly resumes Please define 'manually': does it mean, pressing the power button(reset button)? > from the swap image. > Do other modes work? such as platform, shutdown, test_resume , etc > Other info: > > Doesn't happen with the last 4.18 release. Started with 4.19 and persists > through 4.19.6. Previously there is a issue in debian when trying to use customized dev offset when doing hibernation, but it should have been fixed. And if it still happens, could you please help do a git bisect to find the bad commit? (In reply to Chen Yu from comment #1) > Does it mean, the system returns to the shell after running above commands? > Or it hangs there? The system hangs here. > Please define 'manually': does it mean, pressing the power button(reset > button)? Yes. I need to hold the power button until it shuts off. > Do other modes work? such as platform, shutdown, test_resume , etc platform and shutdown both suspend to disk (but not RAM) as expected. test_resume seems to exhibit the same behavior as "suspend". > Previously there is a issue in debian when trying to use customized dev > offset when doing hibernation, but it should have been fixed. And if it still > happens, could you please help do a git bisect to find the bad commit? Is this still relevant even if I'm not using Debian? I've tried a git bisect but after a few days of testing I haven't been able to find the problematic commit. If I find some more time I'll spin up some VMs to speed the process along. (In reply to djl from comment #0) > Steps to reproduce: > > 1. Install 4.19 > 2. Run `echo suspend > /sys/power/disk; echo disk > /sys/power/state` > > Expected results: > > The machine should write the hibernation image to swap then suspend to RAM > as documented here: https://www.kernel.org/doc/Documentation/power/swsusp.txt > > Actual results: > > The hibernation image is correctly written to swap but never suspends to > RAM. The kernel log shows "Image not found (code -22)" immediately after > issuing the above commands. This confused me since this message will only be printed in two cases: 1. During normal hibernation resume, or 2. During test_resume phase, Since you are testing 'suspend', this message should pop up.. > Rebooting the machine manually correctly resumes > from the swap image. > > Other info: > > Doesn't happen with the last 4.18 release. Started with 4.19 and persists > through 4.19.6. (In reply to djl from comment #2) > (In reply to Chen Yu from comment #1) > > Does it mean, the system returns to the shell after running above commands? > > Or it hangs there? > > The system hangs here. > > > Please define 'manually': does it mean, pressing the power button(reset > > button)? > > Yes. I need to hold the power button until it shuts off. > > > Do other modes work? such as platform, shutdown, test_resume , etc > > platform and shutdown both suspend to disk (but not RAM) as expected. > test_resume seems to exhibit the same behavior as "suspend". > > > Previously there is a issue in debian when trying to use customized dev > > offset when doing hibernation, but it should have been fixed. And if it > still > > happens, could you please help do a git bisect to find the bad commit? > > Is this still relevant even if I'm not using Debian? > Not sure, it depends. > I've tried a git bisect but after a few days of testing I haven't been able > to find the problematic commit. If I find some more time I'll spin up some > VMs to speed the process along. How about revert the following commit: commit 355064675f1c997cea017ea64c8f2c216e5425d9 Author: Mario Limonciello <mario.limonciello@dell.com> Date: Wed Mar 28 12:01:09 2018 -0500 PM / hibernate: Make passing hibernate offsets more friendly (In reply to Chen Yu from comment #4) > How about revert the following commit: > commit 355064675f1c997cea017ea64c8f2c216e5425d9 > Author: Mario Limonciello <mario.limonciello@dell.com> > Date: Wed Mar 28 12:01:09 2018 -0500 > > PM / hibernate: Make passing hibernate offsets more friendly No luck, unfortunately. Reverting that commit still shows exactly the same problem. (In reply to djl from comment #5) > (In reply to Chen Yu from comment #4) > > How about revert the following commit: > > commit 355064675f1c997cea017ea64c8f2c216e5425d9 > > Author: Mario Limonciello <mario.limonciello@dell.com> > > Date: Wed Mar 28 12:01:09 2018 -0500 > > > > PM / hibernate: Make passing hibernate offsets more friendly > > No luck, unfortunately. Reverting that commit still shows exactly the same > problem. Thanks, David. Could you please describe more about the symptom: "The kernel log shows "Image not found (code -22)" immediately after issuing above commands: echo suspend > /sys/power/disk; echo disk > /sys/power/state According to my understanding, the message '"Image not found (code -22)"' should not appear during hibernate/suspend phase, but only triggered during resume or test_resume mode. 1. Please switch to the latest upstream kernel. 2. provide: # grep . /sys/power/* # swapon -s # cat /proc/partitions then try again: 3. echo test_resume > /sys/power/disk; echo disk > /sys/power/state 4. echo suspend > /sys/power/disk; echo disk > /sys/power/state If it does not work neither, then the best way to figure it out is to use git bisect. > Could you please describe more about the symptom: > "The kernel log shows "Image not found (code -22)" immediately after issuing > above commands: echo suspend > /sys/power/disk; echo disk > /sys/power/state > According to my understanding, the message '"Image not found (code -22)"' should not appear during hibernate/suspend phase, but only triggered during resume or test_resume mode. Is there anything specific you'd like me to describe? So far I've given all the information I've been able to gather. I've run out of things to say :) But it appears I'm not the only one with this issue: https://www.reddit.com/r/archlinux/comments/a1xzh5/systemctl_hybridsleep_broken_in_linux419x/ > 1. Please switch to the latest upstream kernel. Switched to 4.20. > 2. provide: > # grep . /sys/power/* /sys/power/autosleep:off /sys/power/disk:[platform] shutdown reboot suspend test_resume /sys/power/image_size:13451300864 /sys/power/mem_sleep:s2idle [deep] /sys/power/pm_async:1 /sys/power/pm_debug_messages:0 /sys/power/pm_freeze_timeout:20000 /sys/power/pm_print_times:0 /sys/power/pm_test:[none] core processors platform devices freezer /sys/power/pm_trace:0 /sys/power/pm_trace_dev_match:ieee80211 /sys/power/pm_trace_dev_match:leds /sys/power/pm_trace_dev_match:usbhid /sys/power/pm_trace_dev_match:acpi grep: /sys/power/pm_wakeup_irq: No data available /sys/power/reserved_size:1048576 /sys/power/resume:253:2 /sys/power/resume_offset:0 /sys/power/state:freeze mem disk /sys/power/wakeup_count:0 > # swapon -s Filename Type Size Used Priority /dev/dm-2 partition 16777212 0 -2 > # cat /proc/partitions major minor #blocks name 8 0 1953514584 sda 8 16 1953514584 sdb 8 32 250059096 sdc 8 33 524288 sdc1 8 34 1048576 sdc2 8 35 247436615 sdc3 8 48 250059096 sdd 8 49 524288 sdd1 8 50 1048576 sdd2 8 51 248485191 sdd3 8 80 3907018584 sdf 8 81 3907017543 sdf1 8 64 976762584 sde 8 65 16384 sde1 8 66 976744448 sde2 9 127 247305536 md127 9 1 1953383488 md1 253 0 247303488 dm-0 253 1 230522880 dm-1 253 2 16777216 dm-2 253 3 1953381440 dm-3 > 3. echo test_resume > /sys/power/disk; echo disk > /sys/power/state > 4. echo suspend > /sys/power/disk; echo disk > /sys/power/state These commands still show the same behavior under 4.20. > If it does not work neither, then the best way to figure it out is to > use git bisect. I'm still working on this but since this is a workstation, it's difficult to find time to constantly build and reboot. Created attachment 282033 [details]
debug patch to track why swap header is invalid
Hi,
Since I've no idea why resume was triggered during hibernation, here's a debug patch to track why this happened.
Could you please test it on latest upstream kernel and provide the log displayed
during hibernation hang?
PS1 : Please provide ls -l /dev/dm-1 before running hibernation.
PS2: Please use echo disk > /sys/power/state to trigger the hibernation.
Thanks for keeping up with this ticket! > PS1 : Please provide ls -l /dev/dm-1 before running hibernation. ``` $ ls -l /dev/dm-1 brw-rw---- 1 root disk 253, 1 Mar 26 15:39 /dev/dm-1 ``` > PS2: Please use echo disk > /sys/power/state to trigger the hibernation. Here's the new output from the patch: ``` PM: Image not found (code -22), resume device (253:2) Call Trace: dump_stack+0x5c/0x80 swsusp_check+0xdd/0x170 software_resume+0xec/0x210 resume_store+0x7d/0xa0 kernfs_fop_write+0x116/0x190 __vfs_write+0x36/0x1b0 ? handle_mm_fault+0x10a/0x250 vfs_write+0xa9/0x1a0 ksys_write+0x4f/0xb0 do_syscall_64+0x5b/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7eff5db5e818 Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 6d 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 RSP: 002b:00007fff8f9f5d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007eff5db5e818 RDX: 0000000000000005 RSI: 000055b8dc3ed120 RDI: 0000000000000001 RBP: 000055b8dc3ed120 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007eff5dc315c0 R13: 0000000000000005 R14: 00007eff5dc2c5c0 R15: 0000000000000005 ``` (In reply to djl from comment #9) > Thanks for keeping up with this ticket! > > > PS1 : Please provide ls -l /dev/dm-1 before running hibernation. > > ``` > $ ls -l /dev/dm-1 > brw-rw---- 1 root disk 253, 1 Mar 26 15:39 /dev/dm-1 > ``` > > > > PS2: Please use echo disk > /sys/power/state to trigger the hibernation. > > Here's the new output from the patch: > > ``` > PM: Image not found (code -22), resume device (253:2) > Call Trace: > dump_stack+0x5c/0x80 > swsusp_check+0xdd/0x170 > software_resume+0xec/0x210 > resume_store+0x7d/0xa0 > kernfs_fop_write+0x116/0x190 > __vfs_write+0x36/0x1b0 > ? handle_mm_fault+0x10a/0x250 > vfs_write+0xa9/0x1a0 > ksys_write+0x4f/0xb0 > do_syscall_64+0x5b/0x170 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7eff5db5e818 > Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 > 8d 05 25 6d 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff > 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 > RSP: 002b:00007fff8f9f5d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007eff5db5e818 > RDX: 0000000000000005 RSI: 000055b8dc3ed120 RDI: 0000000000000001 > RBP: 000055b8dc3ed120 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007eff5dc315c0 > R13: 0000000000000005 R14: 00007eff5dc2c5c0 R15: 0000000000000005 > ``` Okay, I got some time to look at this. This appears to be that Debian in user space tries to trigger the resume when the system is falling to hibernation. In order to narrow down, how about booting the system with the following command line append in grub: "init=/bin/bash" , then do the S4 test. > Okay, I got some time to look at this.
This appears to be that Debian in user space tries to trigger the resume when the system is falling to hibernation. In order to narrow down, how about booting the system with the following command line append in grub: "init=/bin/bash" , then do the S4 test.
I'm not running Debian on the affected machine (but Arch Linux) so this may not be relevant to me but went ahead and ran the test anyway and noticed something new.
Now it correctly suspends and hibernates but no longer resumes. After waking, the display is blank. Like the original problem, I need to force the machine to power down and reboot before the display works again.
This happens both with and without "init=/bin/bash".
I actually noticed this a few weeks ago and assumed that it was a problem with my personal configuration. Since this still happens "init=/bin/bash" this is unlikely to be a configuration problem on my end.
My kernel version is 5.1.15.
(In reply to djl from comment #11) > > Okay, I got some time to look at this. > This appears to be that Debian in user space tries to trigger the resume > when the system is falling to hibernation. In order to narrow down, how > about booting the system with the following command line append in grub: > "init=/bin/bash" , then do the S4 test. > > I'm not running Debian on the affected machine (but Arch Linux) so this may > not be relevant to me but went ahead and ran the test anyway and noticed > something new. > > Now it correctly suspends and hibernates but no longer resumes. After > waking, the display is blank. Like the original problem, I need to force the > machine to power down and reboot before the display works again. > This seems to be another issue now. Is it possible that the system has resumed, however there's no display due to graphic driver issue? Could you please help check if blacklist the graphic driver and then after resumed bindly type 'reboot' if it works? Also w/o graphic driver loaded, does test_resume mode works? > This happens both with and without "init=/bin/bash". > > I actually noticed this a few weeks ago and assumed that it was a problem > with my personal configuration. Since this still happens "init=/bin/bash" > this is unlikely to be a configuration problem on my end. > > My kernel version is 5.1.15. > This seems to be another issue now. Is it possible that the system has > resumed, however there's no display due to graphic driver issue? Could > you please help check if blacklist the graphic driver and then after > resumed bindly type 'reboot' if it works? Booting without a graphics driver doesn't work. Typing "reboot" after resuming doesn't reboot the machine. > Also w/o graphic driver loaded, does test_resume mode works? Without a graphics driver, test_resume *does* work. (In reply to djl from comment #13) > > This seems to be another issue now. Is it possible that the system has > > resumed, however there's no display due to graphic driver issue? Could > > you please help check if blacklist the graphic driver and then after > > resumed bindly type 'reboot' if it works? > > Booting without a graphics driver doesn't work. Typing "reboot" after > resuming doesn't reboot the machine. > > > Also w/o graphic driver loaded, does test_resume mode works? > > Without a graphics driver, test_resume *does* work. Okay, to summary, the 'Image not found' error was gone, but it hangs when graphic driver resumes. Is it i915 driver? > Okay, to summary, the 'Image not found' error was gone, but it hangs when graphic driver resumes. Correct. > Is it i915 driver? No, this happens with the (proprietary) Nvidia drivers. It also happened with the Nouveau drivers but that was quite a while ago and I haven't been able to test with more recent versions yet. I'll try to find time this weekend to test this again. Sorry for the super long delay :/ This turned out to be a bug in the Nvidia driver, both proprietary and Nouveau. I've recently had to move to an AMD GPU and this is no longer a problem. Given that I seem to be the only one with this problem - or at least the only one reporting a problem - it's entirely possible that my old Nvidia GPU was faulty. |