Bug 81191
Summary: | Sometimes the laptop reboots just after resume from suspend to disk | ||
---|---|---|---|
Product: | Power Management | Reporter: | Vitaliy Filippov (vitalif) |
Component: | Hibernation/Suspend | Assignee: | Zhang Rui (rui.zhang) |
Status: | CLOSED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | aaron.lu, rui.zhang, tianyu.lan |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.14, 3.16 debian, most 3.x versions | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Vitaliy Filippov
2014-07-27 07:41:09 UTC
Could you try latest v3.16-rc6 kernel? please read /etc/fstab and find out your swap partition, say, /dev/sdax, and then reboot the kernel with boot option "resume=/dev/sdax", can you reproduce this problem then? Oops. Sorry. I've probably missed the email notifications about the previous comments... I'm now on 3.16 kernel on both laptops, and both of them (one newer Samsung 880Z5E with IVB Core i7-3635QM, one older Dell Studio XPS 16 with Core i7 Q 720) have this problem. resume=/dev/sdaX option is of course there in the command line. The problem isn't related to finding the resume image; the resume image is found and read successfully, but after that something goes wrong and the laptop reboots. I don't know what conditions lead to it - there's nothing special when the problem reproduces. (it reproduces randomly each maybe 10th time or so...) Please refer to https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt, basically, you can do: # cd /sys/power # echo devices > pm_test # echo disk > state multiple times and see if this triggers any error. ping... Tried the suggestion from comment 5 20 times on Samsung, nothing happened, and there was no errors in dmesg except [ 3058.621722] radeon 0000:01:00.0: ring 5 stalled for more than 10000msec [ 3058.621726] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000002e last fence id 0x000000000000002c on ring 5) [ 3058.621729] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35). [ 3058.621733] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35). but the primary card is Intel HD4000, so this didn't affect the system; also this happened every time, so it couldn't cause an error which only reproduces sometimes. Meanwhile it seems I was incorrect about older Dell XPS 16 - it had ~40 day uptime so the error didn't reproduce for those 40 days :) it could have even more, but after the last suspend/resume radeon GPU stalled, the screen became white so I turned it off by pressing power button for 5 seconds. The error stil does reproduce sometimes on Samsung though. Then try pm_trace may give us some hint: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-power From your description, it seems that the image is restored and then somewhere went wrong and the system isn't resumed correctly, which should be caught by pm_trace. Hm, I've tried to suspend/resume after doing pm_test (having it set to none) and on the second attempt got something new - kernel panic with the error message visible on the screen... took a photo on mobile :) BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa006b61f>] usb_device_match+0x2f/0x80 [usbcore] PGD 0 Oops: 0002 [#1] SMP Modules linked in: ........ CPU: 2 PID: 4565 Comm: systemd-sleep Not tainted 3.16-1-amd64 #1 Debian 3.16.2-3 Hardware name: SAMSUNG ELECTRONICS CO., LTD. 870Z5E/880Z5E/680Z5E/NP880Z5E-X01UB, BIOS P02ADH.008.130604.SK 06/04/2013 ...registers... Call Trace: __device_attach+0x22/0x40 bus_for_eachdrv+0x53/0x90 device_attach+0x98/0xc0 rebind_marked_interfaces.isra.12+0x75/0xb0 [usbcore] usb_resume_complete+0x18/0x20 [usbcore] dpm_complete+0x11a/0x370 hibernation_snapshot+0x18e/0x370 hibernate+0x152/0x200 state_store+0xcc/0xe0 kernfs_fop_writee+0xda/0x150 vfs_write+0xb2/0x1f0 SyS_write+0x42/0xa0 page_fault+0x28/0x30 system_call_fast_compare_end+0x10/0x15 This was on the screen for several seconds, then some NMI dumps showed up. The system didn't resume, but didn't reboot also. Don't know if is't the same error or not, I've not seen this before. can you always reproduce this NULL pointer reference bug? If yes, I think it would be great to fix that first and see if there is anything difference. BTW, it would be good to try to reproduce the problem in the latest upstream kernel. Ping Bug closed as there is no response from the bug reporter for more than a month. Please feel free to reopen it if you can reproduce the problem in latest upstream kernel and provide the information requested in comment #10. It seems it's in fact fixed in newer kernels (3.18? 3.19? 4.0?), because I've not seen it for several months. So everything is ok now, thanks :) |