Bug 217892

Summary: [amdgpu]: system freezes when trying to turn back on monitor
Product: Drivers Reporter: Michael Mair-Keimberger (mmk+bugs)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED ANSWERED    
Severity: normal CC: mark.blakeney
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: signature.asc

Description Michael Mair-Keimberger 2023-09-09 10:19:36 UTC
Hi,

My setup is a dual monitor 4K/144Hz with running sway on it. Both monitors are connected via DP to a Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX]. Usually if i don't change monitor settings everything works as expected. The monitors do also wake up flawlessly after system idling. 

However, sometimes i turn off the second monitor (for example for playing games). For that i made a shortcut in sway which looks like this. 
> bindsym $mod+Shift+F12 output DP-2 toggle

Now, turning the monitor of works as expected. However, turning it back on i encounter following erros/problems:
Main Workspace (Desktop) freezes, second monitor tries to get turned on. (The monitor led goes up)
After some time (couple of seconds, around 10-15sec) the main desktop works again, the second screen goes off again.
At that point i usually have to reboot the system to get the second monitor back.

In dmesg is see following entries:
[ 8623.325357] [drm] enabling link 1 failed: 15
[ 8623.382238] [drm] REG_WAIT timeout 10us * 5000 tries - enc32_stream_encoder_dp_unblank line:348
[ 8623.437493] [drm] REG_WAIT timeout 10us * 5000 tries - enc32_stream_encoder_dp_unblank line:357
[ 8638.435963] [drm:amdgpu_dm_atomic_check] *ERROR* [CRTC:81:crtc-3] hw_done or flip_done timed out

This is also something which can be reproduces quite easily. However sometimes it works almost without problems. (in that case, the monitor comes back but the desktop on the main monitor looks distorted/corruped - maximizing a application fixes that)

This also seems to be a regression. With kernel 6.2 and 6.3 this worked as expected.

I'm using following kernel:
Linux x2 6.5.2-gentoo #1 SMP PREEMPT_DYNAMIC Sat Sep  9 00:29:42 CEST 2023 x86_64 AMD Ryzen 9 7950X3D 16-Core Processor AuthenticAMD GNU/Linux

As soon as there is a linux-6.6 kernel available in gentoo i'll try that one too.
Comment 1 Bagas Sanjaya 2023-09-09 11:00:57 UTC
Created attachment 305074 [details]
signature.asc

On Sat, Sep 09, 2023 at 10:19:36AM +0000, bugzilla-daemon@kernel.org wrote:
> Hi,
> 
> My setup is a dual monitor 4K/144Hz with running sway on it. Both monitors
> are
> connected via DP to a Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon
> RX
> 7900 XT/7900 XTX]. Usually if i don't change monitor settings everything
> works
> as expected. The monitors do also wake up flawlessly after system idling. 
> 
> However, sometimes i turn off the second monitor (for example for playing
> games). For that i made a shortcut in sway which looks like this. 
> > bindsym $mod+Shift+F12 output DP-2 toggle
> 
> Now, turning the monitor of works as expected. However, turning it back on i
> encounter following erros/problems:
> Main Workspace (Desktop) freezes, second monitor tries to get turned on. (The
> monitor led goes up)
> After some time (couple of seconds, around 10-15sec) the main desktop works
> again, the second screen goes off again.
> At that point i usually have to reboot the system to get the second monitor
> back.
> 
> In dmesg is see following entries:
> [ 8623.325357] [drm] enabling link 1 failed: 15
> [ 8623.382238] [drm] REG_WAIT timeout 10us * 5000 tries -
> enc32_stream_encoder_dp_unblank line:348
> [ 8623.437493] [drm] REG_WAIT timeout 10us * 5000 tries -
> enc32_stream_encoder_dp_unblank line:357
> [ 8638.435963] [drm:amdgpu_dm_atomic_check] *ERROR* [CRTC:81:crtc-3] hw_done
> or
> flip_done timed out
> 
> This is also something which can be reproduces quite easily. However
> sometimes
> it works almost without problems. (in that case, the monitor comes back but
> the
> desktop on the main monitor looks distorted/corruped - maximizing a
> application
> fixes that)
> 
> This also seems to be a regression. With kernel 6.2 and 6.3 this worked as
> expected.
> 
> I'm using following kernel:
> Linux x2 6.5.2-gentoo #1 SMP PREEMPT_DYNAMIC Sat Sep  9 00:29:42 CEST 2023
> x86_64 AMD Ryzen 9 7950X3D 16-Core Processor AuthenticAMD GNU/Linux
> 
> As soon as there is a linux-6.6 kernel available in gentoo i'll try that one
> too.

You may try compiling your own kernel instead of having to wait for the kernel
package to be updated. See
Documentation/admin-guide/quickly-build-trimmed-linux.rst for full
instructions.

In any case, please also report to freedesktop tracker [1].

Thanks.

[1]: https://gitlab.freedesktop.org/drm/amd/-/issues
Comment 2 Michael Mair-Keimberger 2023-09-09 15:09:52 UTC
Hello Bagas,

Thanks for the hint. 
I'll wait for the rc1 and then try with the latest kernel.

FYI: I've open a issue freedesktop issue [1]

[1] https://gitlab.freedesktop.org/drm/amd/-/issues/2836

BTW, i forgot to mentioned. I think the regression started with 6.4. I already mentioned that it worked with 6.2 and 6.3 but i had problems with 6.4 already too.
Comment 3 Artem S. Tashkinov 2023-09-10 11:02:18 UTC
AMDGPU driver bug tracker is here: https://gitlab.freedesktop.org/drm/amd/-/issues