Bug 215727 - drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout when using firefox, chrome or icaclient
Summary: drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout when using f...
Status: RESOLVED ANSWERED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-22 20:30 UTC by Wojciech Krol
Modified: 2022-03-25 00:01 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.16.15-arch1-1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Dmesg (103.24 KB, text/plain)
2022-03-22 20:30 UTC, Wojciech Krol
Details
Partial umr -R gfx_0.0.0 (240.00 KB, text/plain)
2022-03-22 20:33 UTC, Wojciech Krol
Details

Description Wojciech Krol 2022-03-22 20:30:18 UTC
Created attachment 300599 [details]
Dmesg

Hi,

Symptoms:
I have installed an AMD Radeon RX 6700-XT card and started having following random crashes when using a browser or icaclient (Citrix client):
[   85.861734] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=13365, emitted seq=13367
[   85.862162] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_x11 pid 819 thread kwin_x11:cs0 pid 838
Display hangs/ becomes glitched.

Steps to reproduce:
Happens randomly when using a browser (tested firefox and chrome-based) or icaclient.
I get this error several times every day.
Happens in Xorg, also in Wayland.
Process mentioned in the error is not always window manager (kwin_x11). Sometimes it's Xorg (or Xwayland), sometimes app (i.e. firefox).
System: Archlinux (linux-firmware 20220309.cd01f85-1)
DE: KDE 5.24.3 / mesa 21.3.7

Logs:
In this case of attached dmesg I was using kwin on Xorg and just started firefox (hardware acceleration was on). Same thing happens when using icaclient (very frequent crashes, but hard to reproduce on demand).
Afterwards, i have also tried collecting gfx_0.0.0 data with umr:
umr -R gfx_0.0.0

This also resulted with crash:
[  171.047397] BUG: unable to handle page fault for address: ffffb34e820ffffc

(full log at the end of attached dmesg).

If you need additional data I can reproduce this error.
Comment 1 Wojciech Krol 2022-03-22 20:33:37 UTC
Created attachment 300600 [details]
Partial umr -R gfx_0.0.0
Comment 2 Artem S. Tashkinov 2022-03-25 00:01:58 UTC
Please repost here: https://gitlab.freedesktop.org/drm/amd/-/issues

Note You need to log in before you can comment on or make changes to this bug.