Bug 205915

Summary: AMDGPU: Screen flicker after resume from suspend
Product: Drivers Reporter: onil (onil)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: aarad.russell, akamch, alexdeucher, bjo, dbulashevich, erik.labianca, harry.wentland, kernel, labre
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.5.0-rc2 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg after resume from suspend
Xorg log on commit 50575eb5b339683ee148189beae5eb53ccb3158b
Xorg log of good commit 50575eb5b339683ee148189beae5eb53ccb3158b
Kernel log of good commit 50575eb5b339683ee148189beae5eb53ccb3158b
Xorg log of bad commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
Kernel log of bad commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
Call optimize bandwidth again.

Description onil 2019-12-19 15:37:46 UTC
After resume from suspend screen flickers, flickering intensifies during mouse movement. reverting to 5.4 fixes the issue.

workaround: changing dpm_force_performance_level to "high" (and optionally afterwards to "low") resolves the issue while "auto" introduces it again.

OS: Fedora 31 on gnome wayland
GPU: 5700XT Pulse
Monitor: LG 27UK650W on DP 1.4
Mesa: 19.3

no info/error message on syslog.
Comment 1 Alex Deucher 2019-12-19 15:52:59 UTC
Please attach your xorg log (if using X) and dmesg output.
Comment 2 onil 2019-12-19 16:23:15 UTC
Created attachment 286367 [details]
dmesg after resume from suspend
Comment 3 onil 2019-12-20 15:37:07 UTC
Problem only occurs when dpm_force_performance_level is set to "low" before suspend. if dpm_force_performance_level is "auto" or "high" there is no flickering after resume from suspend.
Comment 4 onil 2019-12-22 11:31:39 UTC
Please disregard comment #3, flickering still occurs when performance level is "auto", after manually switching it to "low" or "high" flickering stops.
Comment 5 Manuel Ullmann 2020-01-26 21:26:38 UTC
Confirmed on navi14. Bisected it to 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4. Log is indicating exactly nothing. I’ll attach them nonetheless, if you happen to see more than me.
It’s just a wild guess, but could it be, that your indicator for a safe lowering of the clock is not always a good indicator? I mean you know your code a lot better than me. A clock, that is raised too high while voltage being too low could however be a reason for the flickering on OpenGL operations, that I’m seeing. Have not seen those on Vulkan or framebuffer though.
Anyway, have a look for yourself:

commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
Author: Noah Abradjian <noah.abradjian@amd.com>
Date:   Fri Sep 27 16:30:57 2019 -0400

    drm/amd/display: Make clk mgr the only dto update point

    [Why]

    * Clk Mgr DTO update point did not cover all needed updates, as it included a
      check for plane_state which does not exist yet when the updater is called on
      driver startup
    * This resulted in another update path in the pipe programming sequence, based
      on a dppclk update flag
    * However, this alternate path allowed for stray DTO updates, some of which would
      occur in the wrong order during dppclk lowering and cause underflow

    [How]

    * Remove plane_state check and use of plane_res.dpp->inst, getting rid
      of sequence dependencies (this results in extra dto programming for unused
      pipes but that doesn't cause issues and is a small cost)
    * Allow DTOs to be updated even if global clock is equal, to account for
      edge case exposed by diags tests
    * Remove update_dpp_dto call in pipe programming sequence (leave update to
      dppclk_control there, as that update is necessary and shouldn't occur in clk
      mgr)
    * Remove call to optimize_bandwidth when committing state, as it is not needed
      and resulted in sporadic underflows even with other fixes in place

    Signed-off-by: Noah Abradjian <noah.abradjian@amd.com>
    Reviewed-by: Jun Lei <Jun.Lei@amd.com>
    Acked-by: Leo Li <sunpeng.li@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 .../gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c   | 14 +++++++++-----
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c  |  3 ++-
 drivers/gpu/drm/amd/display/dc/core/dc.c                   |  4 ----
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c         |  8 +-------
    4 files changed, 12 insertions(+), 17 deletions(-)

modified   drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c
@@ -108,11 +108,12 @@ void dcn20_update_clocks_update_dpp_dto(struct clk_mgr_internal *clk_mgr,
 	for (i = 0; i < clk_mgr->base.ctx->dc->res_pool->pipe_count; i++) {
 		int dpp_inst, dppclk_khz;
 
-		if (!context->res_ctx.pipe_ctx[i].plane_state)
-			continue;
-
-		dpp_inst = context->res_ctx.pipe_ctx[i].plane_res.dpp->inst;
+		/* Loop index will match dpp->inst if resource exists,
+		 * and we want to avoid dependency on dpp object
+		 */
+		dpp_inst = i;
 		dppclk_khz = context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz;
+
 		clk_mgr->dccg->funcs->update_dpp_dto(
 				clk_mgr->dccg, dpp_inst, dppclk_khz);
 	}
@@ -235,6 +236,7 @@ void dcn2_update_clocks(struct clk_mgr *clk_mgr_base,
 
 		update_dispclk = true;
 	}
+
 	if (dc->config.forced_clocks == false || (force_reset && safe_to_lower)) {
 		if (dpp_clock_lowered) {
 			// if clock is being lowered, increase DTO before lowering refclk
@@ -244,10 +246,12 @@ void dcn2_update_clocks(struct clk_mgr *clk_mgr_base,
 			// if clock is being raised, increase refclk before lowering DTO
 			if (update_dppclk || update_dispclk)
 				dcn20_update_clocks_update_dentist(clk_mgr);
-			if (update_dppclk)
+			// always update dtos unless clock is lowered and not safe to lower
+			if (new_clocks->dppclk_khz >= dc->current_state->bw_ctx.bw.dcn.clk.dppclk_khz)
 				dcn20_update_clocks_update_dpp_dto(clk_mgr, context);
 		}
 	}
+
 	if (update_dispclk &&
 			dmcu && dmcu->funcs->is_dmcu_initialized(dmcu)) {
 		/*update dmcu for wait_loop count*/
modified   drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
@@ -171,7 +171,8 @@ void rn_update_clocks(struct clk_mgr *clk_mgr_base,
 		// if clock is being raised, increase refclk before lowering DTO
 		if (update_dppclk || update_dispclk)
 			rn_vbios_smu_set_dppclk(clk_mgr, clk_mgr_base->clks.dppclk_khz);
-		if (update_dppclk)
+		// always update dtos unless clock is lowered and not safe to lower
+		if (new_clocks->dppclk_khz >= dc->current_state->bw_ctx.bw.dcn.clk.dppclk_khz)
 			dcn20_update_clocks_update_dpp_dto(clk_mgr, context);
 	}
 
modified   drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1241,10 +1241,6 @@ static enum dc_status dc_commit_state_no_check(struct dc *dc, struct dc_state *c
 
 	dc_enable_stereo(dc, context, dc_streams, context->stream_count);
 
-	if (!dc->optimize_seamless_boot)
-		/* pplib is notified if disp_num changed */
-		dc->hwss.optimize_bandwidth(dc, context);
-
 	for (i = 0; i < context->stream_count; i++)
 		context->streams[i]->mode_changed = false;
 
modified   drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1202,15 +1202,9 @@ static void dcn20_update_dchubp_dpp(
 	struct dpp *dpp = pipe_ctx->plane_res.dpp;
 	struct dc_plane_state *plane_state = pipe_ctx->plane_state;
 
-	if (pipe_ctx->update_flags.bits.dppclk) {
+	if (pipe_ctx->update_flags.bits.dppclk)
 		dpp->funcs->dpp_dppclk_control(dpp, false, true);
 
-		dc->res_pool->dccg->funcs->update_dpp_dto(
-				dc->res_pool->dccg,
-				dpp->inst,
-				pipe_ctx->plane_res.bw.dppclk_khz);
-	}
-
 	/* TODO: Need input parameter to tell current DCHUB pipe tie to which OTG
 	 * VTG is within DCHUBBUB which is commond block share by each pipe HUBP.
 	 * VTG is 1:1 mapping with OTG. Each pipe HUBP will select which VTG
Comment 6 Manuel Ullmann 2020-01-26 21:28:49 UTC
Created attachment 286981 [details]
Xorg log on commit 50575eb5b339683ee148189beae5eb53ccb3158b
Comment 7 Manuel Ullmann 2020-01-26 21:30:10 UTC
Created attachment 286983 [details]
Xorg log of good commit 50575eb5b339683ee148189beae5eb53ccb3158b
Comment 8 Manuel Ullmann 2020-01-26 21:30:37 UTC
Created attachment 286985 [details]
Kernel log of good commit 50575eb5b339683ee148189beae5eb53ccb3158b
Comment 9 Manuel Ullmann 2020-01-26 21:31:07 UTC
Created attachment 286987 [details]
Xorg log of bad commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
Comment 10 Manuel Ullmann 2020-01-26 21:31:31 UTC
Created attachment 286989 [details]
Kernel log of bad commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
Comment 11 Manuel Ullmann 2020-01-26 22:10:07 UTC
Reverting 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 on HEAD, currently 5.5-rc7+, seems to fix it. The revert should probably be limited to the 2 hunks in the clock manager files to preserve the underflow avoidance:
-			if (update_dppclk)
+			// always update dtos unless clock is lowered and not safe to lower
+			if (new_clocks->dppclk_khz >= dc->current_state->bw_ctx.bw.dcn.clk.dppclk_khz)

For the sake of comleteness:
Steps to reproduce:
1. Suspend with Navi and activated display core.
2. Resume.
3. Start glxgears.

Actual results:
Animation flickers. This happens on any OpenGL accelerated task, like Firefox with hardware acceleration or similar.

Expected results:
Flicker-free rendering.
Comment 12 Manuel Ullmann 2020-02-17 17:31:33 UTC
Created attachment 287449 [details]
Call optimize bandwidth again.

I pinned the issue down to the four lines, which call optimize_bandwidth. Unfortunately the commit message indicates, that its removal fixed other issues, so it can not be simply put back in.

Maybe this one is also vendor specific voltage lowering. Are there some sys nodes, which expose vendor adaptations? I’ve got one from MSI [1] with 130 Watt TDP.

[1]: https://us.msi.com/Graphics-card/Radeon-RX-5500-XT-GAMING-X-8G/Specification
Comment 13 Alex Deucher 2020-02-17 19:08:35 UTC
Does this series help?
https://patchwork.freedesktop.org/patch/353367/
Comment 15 Manuel Ullmann 2020-02-18 22:13:44 UTC
That fixed it. Thanks.