Bug 115051

Summary: HD 5870, GPU lockup
Product: Drivers Reporter: Quentin Deldycke (quentindeldycke)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: julien.isorce, quentindeldycke, scorp, TigerLiu.ee
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4;4.5-rc7 Subsystem:
Regression: No Bisected commit-id:

Description Quentin Deldycke 2016-03-21 12:44:10 UTC
Hi,


Using debian, actually with linux 4.5-rc7 kernel. (bug also present in 4.4).

When i use DRI_PRIME=1, with dri3 enabled (or not) and unigine valley benchmark, the computer freeze.

Here are the only kernel log available (as whole pc freeze, i activated netconsole...)


[ 3023.618712] radeon 0000:02:00.0: ring 0 stalled for more than 20040msec
[ 3023.618719] radeon 0000:02:00.0: GPU lockup (current fence id 0x000000000001bef2 last fence id 0x000000000001bf04 on ring 0)
[ 3023.618769] radeon 0000:02:00.0: failed to get a new IB (-35)
[ 3023.618797] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
[ 3024.181910] radeon 0000:02:00.0: Saved 567 dwords of commands on ring 0.
[ 3024.181925] radeon 0000:02:00.0: GPU softreset: 0x00000009
[ 3024.181945] radeon 0000:02:00.0:   GRBM_STATUS               = 0xF5700828
[ 3024.181947] radeon 0000:02:00.0:   GRBM_STATUS_SE0           = 0x88000003
[ 3024.181950] radeon 0000:02:00.0:   GRBM_STATUS_SE1           = 0xFC000001
[ 3024.181952] radeon 0000:02:00.0:   SRBM_STATUS               = 0x20000CC0
[ 3024.181956] radeon 0000:02:00.0:   SRBM_STATUS2              = 0x00000000
[ 3024.181969] radeon 0000:02:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3024.181972] radeon 0000:02:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[ 3024.181974] radeon 0000:02:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048006
[ 3024.181976] radeon 0000:02:00.0:   R_008680_CP_STAT          = 0x80268647
[ 3024.181979] radeon 0000:02:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 3024.447603] radeon 0000:02:00.0: Wait for MC idle timedout !
[ 3024.447609] radeon 0000:02:00.0: GRBM_SOFT_RESET=0x00007F6B
[ 3024.447662] radeon 0000:02:00.0: SRBM_SOFT_RESET=0x00000100
[ 3024.448818] radeon 0000:02:00.0:   GRBM_STATUS               = 0x00003828
[ 3024.448821] radeon 0000:02:00.0:   GRBM_STATUS_SE0           = 0x00000007
[ 3024.448823] radeon 0000:02:00.0:   GRBM_STATUS_SE1           = 0x00000007
[ 3024.448825] radeon 0000:02:00.0:   SRBM_STATUS               = 0x20000CC0
[ 3024.448827] radeon 0000:02:00.0:   SRBM_STATUS2              = 0x00000000
[ 3024.448847] radeon 0000:02:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 3024.448851] radeon 0000:02:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 3024.448853] radeon 0000:02:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[ 3024.448855] radeon 0000:02:00.0:   R_008680_CP_STAT          = 0x00000000
[ 3024.448858] radeon 0000:02:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 3024.448878] radeon 0000:02:00.0: GPU reset succeeded, trying to resume
[ 3024.652028] [drm] PCIE gen 2 link speeds already enabled


I tried with kernel parameters:
radeon.hard_reset=1 radeon.audio=0 radeon.dpm=1 radeon.lockup_timeout=20000

But none of them seems to have any effect. Note that differently to other bugs i could find on the tracker, the computer is completely frozen.

Card is a HD 5870 vapor x. It works using a 4x pcie slot. (16x is used on a vga-passthrough configuration, but not used during this test).

Here are revelant informations:
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: X.Org (0x1002)
    Device: AMD CYPRESS (DRM 2.43.0, LLVM 3.7.1) (0x6898)
    Version: 11.2.0
    Accelerated: yes
    Video memory: 1024MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.1
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD CYPRESS (DRM 2.43.0, LLVM 3.7.1)
OpenGL core profile version string: 4.1 (Core Profile) Mesa 11.2.0-rc3
OpenGL core profile shading language version string: 4.10
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:

xorg-server 2:1.18.2-1

Radeon module info:
[     4.422] (II) Loading /usr/lib/xorg/modules/drivers/radeon_drv.so
[     4.425] (II) Module radeon: vendor="X.Org Foundation"
[     4.425]    compiled for 1.18.0, module version = 7.6.1
[     4.425]    Module class: X.Org Video Driver
[     4.425]    ABI class: X.Org Video Driver, version 20.0


I know i use a not stable mesa version. It was a last try. But appears also in mesa 11.1.2...
Comment 1 Tiger 2016-03-24 09:03:40 UTC
Does a lower Xorg version work better?
Comment 2 Quentin Deldycke 2016-03-28 11:51:30 UTC
Right now, retrograding work is a bit a pain.. I will possibly try with a livecd...
Comment 3 Filipp Andjelo 2016-05-05 10:35:36 UTC
Hi, same problem here on Radeon HD7850. Unfortunately it seems to be kind of unpredictable yet,  sometimes I can run a game for hours and it is rock solid, but sometimes I have to restart the system couple of times and it freezes in a row. However, I have a feeling, that more demanding graphics bring the GPU to freeze faster and more often. I also attached myself via ssh from another machine to be able to analyse it. Currently my system is running xorg-server 1.18.3, but older versions did freeze also. If I activate DRI3 lockups seem to come more often, but as already mentioned, this is not very reliable observation. I'll try to do more accurate analysis, please tell me if you have a hint where to look first please.

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: X.Org (0x1002)
    Device: AMD PITCAIRN (DRM 2.43.0, LLVM 3.7.1) (0x6819)
    Version: 11.2.1
    Accelerated: yes
    Video memory: 1024MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.1
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD PITCAIRN (DRM 2.43.0, LLVM 3.7.1)
OpenGL core profile version string: 4.1 (Core Profile) Mesa 11.2.1
OpenGL core profile shading language version string: 4.10
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
Comment 4 Filipp Andjelo 2016-05-05 10:37:11 UTC
May be someone should change the bug subject, because this problem doesn't seem to be only HD5870 related
Comment 5 Filipp Andjelo 2016-05-05 10:39:33 UTC
I forgot the most important part :)

I'm using ArchLinux with latest stock kernel:
Linux hoth 4.5.1-1-ARCH #1 SMP PREEMPT Thu Apr 14 19:19:32 CEST 2016 x86_64 GNU/Linux