Bug 66741 - suspend to disk(hibernate) resumes the first time but black screen on the second time
Summary: suspend to disk(hibernate) resumes the first time but black screen on the sec...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Imre Deak
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-07 09:28 UTC by Giorgos aka shad0w
Modified: 2014-04-11 14:38 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.11.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg.output (209.96 KB, text/plain)
2013-12-07 09:28 UTC, Giorgos aka shad0w
Details
commit bisect log (2.76 KB, text/plain)
2014-01-07 20:36 UTC, b3nmore
Details
bisect.log 2nd try (4.70 KB, text/plain)
2014-01-11 20:00 UTC, b3nmore
Details
revert bisected patch (597 bytes, patch)
2014-01-14 10:48 UTC, Daniel Vetter
Details | Diff

Description Giorgos aka shad0w 2013-12-07 09:28:39 UTC
Created attachment 117781 [details]
dmesg.output

I have haswell i5-4570 cpu using intel hd4600 gpu, the problem is that after resuming suspend to disk, the first time works, but the second time black screen and the screen goes to standby.

I have attached my dmesg doing twice the same thing.
Comment 1 Daniel Vetter 2013-12-09 10:43:58 UTC
Is this a regression? Have you tried older kernels?

Can you please try with intel_iommu=off added to your kernel cmdline? There's a WARN backtrace in dmesg about this.

Finally please test latest upstream kernels, preferably drm-intel-nightly from http://cgit.freedesktop.org/~danvet/drm-intel/

Also, is the system otherwise working (ssh access or so)? Or what's the reason you've filed this against the gfx driver?
Comment 2 b3nmore 2014-01-07 20:35:09 UTC
I'm having the same issue on a i5-4570S and dmesg shows the same two warnings as in the attached dmesg.output (encoder's hw state doesn't match sw tracking (expected 1, found 0); crtc's computed active state doesn't match tracked active state (expected 1, found 0)). So I believe I'm seeing the same issue and can fill in some details.

I'm trying to summarize what I've found out so far. The full history can be found at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1250498 . Note, that I need to set /sys/power/disk to [shutdown] to get a consistent behavior (but this may be another issue).

Starting with (ubuntu mainline) kernel version 3.10-rc1, resume from S4 only works correctly once. The second time the monitor goes into powersave mode, but ssh is still possible. dmesg shows the two mentioned warnings at drivers/gpu/drm/i915/intel_display.c.
Kernel 3.9.11 does not have this issue, so I guess it's a regression. A full commit bisect showed, that 5aa1c98862d3f365d9cf6d0833d5dc127d2a76e7 is the first bad commit (I did this however with /sys/power/disk=[platform], but I think the result should be valid even so).

The latest mainline kernel I've tested and failed was 3.13-rc6.
Comment 3 b3nmore 2014-01-07 20:36:54 UTC
Created attachment 121261 [details]
commit bisect log
Comment 4 Daniel Vetter 2014-01-08 16:35:49 UTC
If this bisect hasn't gone awry it would point at an issue or at least interaction with disk drivers. b3nmore, can you please double check that the preceding commits to 5aa1c98862d3f365d9cf6d0833d5dc127d2a76e7 really work and that the commit itself is broken?

It's a merge commit, so you can't just revert it to validate the bisect result.
Comment 5 b3nmore 2014-01-09 22:19:20 UTC
> If this bisect hasn't gone awry ...

It did, I guess the mode of /sys/power/disk has a bigger influence than I had anticipated. However, I believe the assertion that we have a regression between 3.9.11 and 3.10-rc1 is still valid.
So I'm going to do the bisect again. I will assume the following as a successful test: Set /sys/power/disk to [reboot] and resume two times (via writing disk to /sys/power/state). I will skip any build which does not resume to either a sane state or a working system except video output.
Hopefully this approach will lead to better results (if someone thinks otherwise, please let me know. I'm doing this the first, well, the second time).
Comment 6 Daniel Vetter 2014-01-10 07:41:40 UTC
If we're very unlucky the bug has been there always and something just made it much easier to hit. That then often results in very strange bisect runs ...

Otherwise I'd just redo the bisect and keep the bisect log around (git bisect log) at the end. In case something cause awry you can try to reset certain kernels.
Comment 7 b3nmore 2014-01-11 20:00:07 UTC
Created attachment 121651 [details]
bisect.log 2nd try

Finished bisecting. Judging from the commit comment this time it converged on more reasonable commit:

24576d23976746cb52e7700c4cadbf4bc1bc3472 is the first bad commit
commit 24576d23976746cb52e7700c4cadbf4bc1bc3472
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Mar 26 09:25:45 2013 -0700

    drm/i915: enable VT switchless resume v3
    
    With the other bits in place, we can do this safely.
    
    v2: disable backlight on suspend to prevent premature enablement on resume
    v3: disable CRTCs on suspend to allow RTD3 (Kristen)
    
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 72fb279ebc4c48824015f43eac459d046dde3204 6797fd617ffd75b6e2a457aaabf5e90f9a4021c0 M	drivers
Comment 8 b3nmore 2014-01-11 20:10:31 UTC
Just a note: I had to skip several commits during bisect (cf. bisect.log) because the resulting kernel didn't fulfill my test case from #5 (won't boot at all, complete system hang after 1st/2nd resume). So there might be still a change, that we caught the wrong commit.
Comment 9 Daniel Vetter 2014-01-14 10:48:16 UTC
Created attachment 122011 [details]
revert bisected patch

Yeah, that makes much more sense. To make triple-sure we're chasing the right thing the attached patch should revert the effects of the bisected commit. Please test.
Comment 10 b3nmore 2014-01-14 17:36:02 UTC
I applied the patch to v3.13-rc8, but it didn't solve the issue. Just to be sure I manually patched the first bad commit (24576d23976746cb52e7700c4cadbf4bc1bc3472, intel_fbdev.c was intel_fb.c at this time), but got the same error as usual.

I guess, this leaves us with the changes in i915_drv.c.
Comment 11 b3nmore 2014-02-11 07:14:01 UTC
Whatever fixes/changes went into 3.14-rc1, some seems to have fixed this issue. Even the weird inconsistency when using different /sys/power/disk modes is gone.

I'm going to monitor it for any regressions during the maturing of 3.14, but I guess we close this bug.
Comment 12 b3nmore 2014-04-01 06:25:25 UTC
I didn't find any regression for this issue in all the 3.14-rcX and the final.

So this issue is definitively fixed for me now.
Comment 13 Daniel Vetter 2014-04-11 14:38:45 UTC
Yay! Thanks a lot for the update and reporting this bug. Please reopen if this issue resurfaces.

Note You need to log in before you can comment on or make changes to this bug.