Bug 206393 - amdgpu: garbled screen after resume
Summary: amdgpu: garbled screen after resume
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-02 20:42 UTC by Bjoern Franke
Modified: 2020-04-20 10:04 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.5.1-arch1, 5.5-arch1
Tree: Mainline
Regression: Yes


Attachments
garbled screen (1.72 MB, image/jpeg)
2020-02-02 20:42 UTC, Bjoern Franke
Details
dmesg (78.37 KB, text/plain)
2020-02-02 20:42 UTC, Bjoern Franke
Details
lspci (14.34 KB, text/plain)
2020-02-02 20:43 UTC, Bjoern Franke
Details

Description Bjoern Franke 2020-02-02 20:42:14 UTC
Created attachment 287081 [details]
garbled screen

On a Thinkpad A275 with a Radeon R5/R6/R7 (verbose lspci attached) the screen is garbled after resume. First a black screen appears and then more and more white pixels appear from the border to the middle of the screen.

Restarting the X-session via SSH does not help, the VTs are also inaccessible.
Comment 1 Bjoern Franke 2020-02-02 20:42:46 UTC
Created attachment 287083 [details]
dmesg
Comment 2 Bjoern Franke 2020-02-02 20:43:08 UTC
Created attachment 287085 [details]
lspci
Comment 3 Alex Deucher 2020-02-03 14:59:45 UTC
Is this a regression?  If so, can you bisect?
Comment 4 Bjoern Franke 2020-02-03 15:04:56 UTC
It's a regression, it did not appear with 5.4.15. The error message "[    1.549880] [drm:dm_helpers_parse_edid_caps [amdgpu]] *ERROR* Couldn't read SADs: -2" at the beginning also does not appear with 5.4.15.

I will try to bisect it (my last bisecting is 8 years ago).
Comment 5 Bjoern Franke 2020-02-03 21:48:53 UTC
Bisected to
1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 is the first bad commit
commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
Author: Noah Abradjian <noah.abradjian@amd.com>
Date:   Fri Sep 27 16:30:57 2019 -0400

    drm/amd/display: Make clk mgr the only dto update point
    
    [Why]
    
    * Clk Mgr DTO update point did not cover all needed updates, as it included a
      check for plane_state which does not exist yet when the updater is called on
      driver startup
    * This resulted in another update path in the pipe programming sequence, based
      on a dppclk update flag
    * However, this alternate path allowed for stray DTO updates, some of which would
      occur in the wrong order during dppclk lowering and cause underflow
    
    [How]
    
    * Remove plane_state check and use of plane_res.dpp->inst, getting rid
      of sequence dependencies (this results in extra dto programming for unused
      pipes but that doesn't cause issues and is a small cost)
    * Allow DTOs to be updated even if global clock is equal, to account for
      edge case exposed by diags tests
    * Remove update_dpp_dto call in pipe programming sequence (leave update to
      dppclk_control there, as that update is necessary and shouldn't occur in clk
      mgr)
    * Remove call to optimize_bandwidth when committing state, as it is not needed
      and resulted in sporadic underflows even with other fixes in place
    
    Signed-off-by: Noah Abradjian <noah.abradjian@amd.com>
    Reviewed-by: Jun Lei <Jun.Lei@amd.com>
    Acked-by: Leo Li <sunpeng.li@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 .../gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c   | 14 +++++++++-----
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c  |  3 ++-
 drivers/gpu/drm/amd/display/dc/core/dc.c                   |  4 ----
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c         |  8 +-------
 4 files changed, 12 insertions(+), 17 deletions(-)

In contrast to #205915 it's not "fixable" via setting dpm_force_performance_level to high or low.
Comment 7 Alexander Jones 2020-02-25 21:33:02 UTC
I can also confirm this bug, I'm suffering the exact same problem with my ThinkPad A275. My glitchy screen looks like this: https://imgur.com/tKAxlI7

I'm also not able to restart X or switch VT, but I can confirm that the user applications still run, as I've heard Kmail issue a notification while the screen was glitchy. Other functions of the laptop, like the fan speed and keyboard illumination, still work perfectly after waking up and function like normal (if I press the shortcut to light the keyboard up, it changes intensity properly). Switching back to an older version of the kernel, 5.4.10 in my case, fixes the problem as well.

Curiously enough, I checked my dmesg for the SAD message after suspending with the older kernel, and it now reads this:

[    5.450379] [drm] SADs count is: -2, don't need to read it

So it seems to be related to reading whatever the SAD is.
Comment 8 Alex Deucher 2020-02-25 21:37:20 UTC
The SAD message is harmless and unrelated.  The issue is with the patch in comment 5.
Comment 9 Bjoern Franke 2020-02-28 18:08:04 UTC
Issue seems to be fixed in 5.6.0-rc3, so it will be hopefully backported into the 5.5 branch.
Comment 10 Alex Deucher 2020-02-28 21:56:43 UTC
(In reply to Bjoern Franke from comment #9)
> Issue seems to be fixed in 5.6.0-rc3, so it will be hopefully backported
> into the 5.5 branch.

Can you identify what patch fixed it for you?
Comment 11 Bjoern Franke 2020-03-01 10:01:01 UTC
(In reply to Alex Deucher from comment #10)
> (In reply to Bjoern Franke from comment #9)
> > Issue seems to be fixed in 5.6.0-rc3, so it will be hopefully backported
> > into the 5.5 branch.
> 
> Can you identify what patch fixed it for you?

Bisected to 5622b2d68d0a6e2fd960f2129704dc3c561608b2. Does this make sense to you?

Note You need to log in before you can comment on or make changes to this bug.