I have a problem that exists for a long time, as far a I can remember - it exists in all or nearly all 3.x versions and maybe even in 2.6.38 or so...
The problem is, SOMETIMES the system reboots instead of resuming from suspend to disk (both swsusp and uswsusp). It reads the image normally until 100%, but then just reboots and loads the next time as if there was no hibernation image.
MOST times resume is working without problem...
Can you suggest me how to collect more information on this problem?
Could you try latest v3.16-rc6 kernel?
please read /etc/fstab and find out your swap partition, say, /dev/sdax, and then reboot the kernel with boot option "resume=/dev/sdax", can you reproduce this problem then?
Oops. Sorry. I've probably missed the email notifications about the previous comments...
I'm now on 3.16 kernel on both laptops, and both of them (one newer Samsung 880Z5E with IVB Core i7-3635QM, one older Dell Studio XPS 16 with Core i7 Q 720) have this problem.
resume=/dev/sdaX option is of course there in the command line. The problem isn't related to finding the resume image; the resume image is found and read successfully, but after that something goes wrong and the laptop reboots.
I don't know what conditions lead to it - there's nothing special when the problem reproduces.
(it reproduces randomly each maybe 10th time or so...)
Please refer to https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt, basically, you can do:
# cd /sys/power
# echo devices > pm_test
# echo disk > state
multiple times and see if this triggers any error.
Tried the suggestion from comment 5 20 times on Samsung, nothing happened, and there was no errors in dmesg except
[ 3058.621722] radeon 0000:01:00.0: ring 5 stalled for more than 10000msec
[ 3058.621726] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000002e last fence id 0x000000000000002c on ring 5)
[ 3058.621729] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35).
[ 3058.621733] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
but the primary card is Intel HD4000, so this didn't affect the system; also this happened every time, so it couldn't cause an error which only reproduces sometimes.
Meanwhile it seems I was incorrect about older Dell XPS 16 - it had ~40 day uptime so the error didn't reproduce for those 40 days :) it could have even more, but after the last suspend/resume radeon GPU stalled, the screen became white so I turned it off by pressing power button for 5 seconds.
The error stil does reproduce sometimes on Samsung though.
Then try pm_trace may give us some hint: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-power
From your description, it seems that the image is restored and then somewhere went wrong and the system isn't resumed correctly, which should be caught by pm_trace.
Hm, I've tried to suspend/resume after doing pm_test (having it set to none) and on the second attempt got something new - kernel panic with the error message visible on the screen... took a photo on mobile :)
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa006b61f>] usb_device_match+0x2f/0x80 [usbcore]
Oops: 0002 [#1] SMP
Modules linked in: ........
CPU: 2 PID: 4565 Comm: systemd-sleep Not tainted 3.16-1-amd64 #1 Debian 3.16.2-3
Hardware name: SAMSUNG ELECTRONICS CO., LTD. 870Z5E/880Z5E/680Z5E/NP880Z5E-X01UB, BIOS P02ADH.008.130604.SK 06/04/2013
This was on the screen for several seconds, then some NMI dumps showed up. The system didn't resume, but didn't reboot also.
Don't know if is't the same error or not, I've not seen this before.
can you always reproduce this NULL pointer reference bug?
If yes, I think it would be great to fix that first and see if there is anything difference.
BTW, it would be good to try to reproduce the problem in the latest upstream kernel.
Bug closed as there is no response from the bug reporter for more than a month.
Please feel free to reopen it if you can reproduce the problem in latest upstream kernel and provide the information requested in comment #10.
It seems it's in fact fixed in newer kernels (3.18? 3.19? 4.0?), because I've not seen it for several months. So everything is ok now, thanks :)