Bug 205675 - Display locks up. AMDGPU timeout
Summary: Display locks up. AMDGPU timeout
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-27 03:47 UTC by freddyreimer
Modified: 2021-01-03 16:49 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.5.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg tail from immediately after a lockup (847 bytes, text/plain)
2019-11-27 03:47 UTC, freddyreimer
Details
Newer dmesg tail from a lockup on 5.5.2 (882 bytes, text/plain)
2020-02-09 15:34 UTC, freddyreimer
Details

Description freddyreimer 2019-11-27 03:47:41 UTC
Created attachment 286079 [details]
dmesg tail from immediately after a lockup

I have been encountering issues the AMDGPU driver completely failing when loading games. When loading into a game and after making one click or moving the mouse, the display will completely freeze. Can't tab out or go to a TTY at all. I can SSH into the box and do stuff, such as getting the attached dmesg tail, but even killing the process doesn't unfreeze the display, which has the still image of the game. Only rebooting unlocks it. 

Basically it just seems to timeout and then can't recover, and this happens all the time on certain games, but inconsistent as to what environment it happens. Some lock it up on Xorg but work fine on Wayland. Some work fine on Wayland but break on Xorg. Some never work at all. My Graphics card is a Navi10, RX5700. I'm on the 5.4 kernel, but this was happening on 5.3 as well.
Comment 1 Pierre-Eric Pelloux-Prayer 2019-11-27 14:56:38 UTC
Thanks for the bug report.

The sdma0 timeout issue (from you dmesg) has already been reported. The most active bug report is: https://gitlab.freedesktop.org/drm/amd/issues/892

Note that sdma usage for Navi is disabled for Mesa 19.3 and 19.2.5 so this issue shouldn't occur if you use one of these releases.

Other related issues:
 - https://bugzilla.kernel.org/show_bug.cgi?id=205169 - has a patch but need to be applied manually until it makes it to an upstream release
 - gfx timeout issues: those are likely to be game specific and are probably a bug in Mesa (https://gitlab.freedesktop.org/mesa/mesa/issues)
Comment 2 freddyreimer 2019-11-27 18:17:59 UTC
Hello! Glad to see there's others looking at this. My mesa is on 19.3.0_rc4, but the issue is still happening. That related bug in your second link talks about 5.4 kernel being out but not having the fix. Is this something that might show up in 5.4.1? 

I might try going to mesa 19.2.5 or 19.2.6 specifically later though, in case the mesa-side disable isn't in rc4 for some reason. Seems like it might be a kernel issue though.
Comment 3 freddyreimer 2020-02-09 15:33:16 UTC
Update. Roughly around the time of the last update to this, I manually added that fix and it was working out for me. However, I ran some updates to both mesa and the kernel itself and now it appears the issue is back.

I have updated this issue with my current specifications. I'm on the 5.5.2 kernel now, with my package manager reporting the mesa version as 20.0.0_rc1. llvm is on 9.0.1.

I'll also add in an attachment with a more recent dmesg tail. I did try checking to see if I could manually re-add the patch to the file again, but it looks like those lines of code are already there, yet this issue still persists.
Comment 4 freddyreimer 2020-02-09 15:34:18 UTC
Created attachment 287263 [details]
Newer dmesg tail from a lockup on 5.5.2

Note You need to log in before you can comment on or make changes to this bug.