Bug 119991

Summary: Screen is corrupt upon resume from hybrid-sleep with Radeon HD5xxx cards
Product: Drivers Reporter: Jani-Markus Maunus (j.markus.maunus)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: felix.schwarz
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.6.1 Subsystem:
Regression: No Bisected commit-id:

Description Jani-Markus Maunus 2016-06-11 02:41:35 UTC
When resuming from hybrid-sleep, the screen is blank/corrupt (usually white-ish, resembling static with some random colours, but sometimes black or a pattern of black/white boxes) and the GUI is essentially unusable as nothing is visible. The system is still responsive, however; I can switch to/from virtual consoles without issue (and ssh into the box). 

I can reproduce this every time, but only on resume from hybrid sleep - normal suspend works fine, as does hibernate. In fact, after resuming from hybrid sleep I can "fix" the corruption by hibernating the system, and then resuming - some corruption remains, but is quickly cleared by doing something simple such as alt-tabbing or moving/resizing the topmost window, and the system appears to work normally.

This does not happen with my HD 4670, but does appear with HD 5750 and HD 5850.

A possibly related symptom: sometimes both hibernate and hybrid-sleep take extraordinarily long (hard disks spin down normally, but the system remains powered and fans running for considerable time, anywhere from ~10 seconds to several minutes after the disks have stopped). This happens with both 5-series cards as well, but not on the 4670. However, I have had this happen even on kernels which do not exhibit screen corruption after resume from hybrid sleep, and can't consistently reproduce it.

I did a git bisect on Linus' tree and the first bad commit is b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe
Comment 1 Felix Schwarz 2016-06-12 10:26:24 UTC
This commit seems to be a funny one as it was reverted later one (and we even have a revert of the revert). If you revert b9729b17 on top of Linus' tree does the problem go away for you?

Hopefully this isn't too much noise (as it might be a different problem) I see similar symptoms with a HD6450 (also with regular suspend but it only happens in most cases, not always) and the problem goes away if I revert that commit.


just for reference:
commit d57c0edfe00d3274b50f91ce3076ed0e82d28782
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Jul 8 14:08:12 2015 -0400

    Revert "Revert "drm/radeon: dont switch vt on suspend""
    
    This reverts commit ac9134906b3f5c2b45dc80dab0fee792bd516d52.
    
    We've fixed the underlying problem with cursors, so re-enable
    this.

commit ac9134906b3f5c2b45dc80dab0fee792bd516d52
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Jun 29 11:09:11 2015 -0400

    Revert "drm/radeon: dont switch vt on suspend"
    
    This reverts commit b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe.
    
    This seems to break the cursor on resume for lots of systems.
    
    Cc: stable@vger.kernel.org
Comment 2 Jani-Markus Maunus 2016-06-12 13:17:13 UTC
The current master has fixed both screen corruption as well as abnormally slow hibernate/hybrid suspend - even without the revert. Both were still present as of 4.6.2 stable.
Comment 3 Jani-Markus Maunus 2016-06-13 07:20:17 UTC
I did a bisect out of curiosity, and in my case the following commit fixes both issues.

commit 274ad65c9d02bdcbee9bae045517864c3521d530
Author: Jérome Glisse <jglisse@redhat.com>
Date:   Fri Mar 18 16:58:39 2016 +0100

    drm/radeon: hard reset r600 and newer GPU when hibernating.
    
    Some GPU block like UVD and VCE require hard reset to be properly
    resume if there is no real powerdown of the asic like during various
    hibernation step. This patch perform such hard reset.
    
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Also, to clarify, reverting b9729b17a4 on master made no difference, but reverting it on the commit immediately preceding the above also fixed both issues. In any case, the bug appears fixed for me.

I might be just rambling at this point, but 4.6.1 stable was definitely bugged for me, yet this commit is dated well before 4.6.1 release - any idea what's up with that?
Comment 4 Felix Schwarz 2016-06-13 12:13:45 UTC
(In reply to Jani-Markus Maunus from comment #3)
> I might be just rambling at this point, but 4.6.1 stable was definitely
> bugged for me, yet this commit is dated well before 4.6.1 release - any idea
> what's up with that?

Well it seems to me as if Jérôme's patch was merged via Alex's 'drm-next' and Dave's 'drm-next-4.7' branches so it just missed the boat for 4.6. Also the commit you pointed out above has no "stable" tag so it won't be picked up automatically for point releases.

If you really care about 4.6 you might want to ping the authors and/or the reviewers who might be willing to nominate the patch for the stable series.