Bug 102831

Summary: Some flashbased videoplayers cause GPU lockups on radeon.
Product: Drivers Reporter: Jon Arne Jørgensen (jonjon.arnearne)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: szg0000
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.2-rc6 Tree: Mainline
Regression: No
Attachments: Complete Xorg.log
Full dmesg

Description Jon Arne Jørgensen 2015-08-13 18:16:49 UTC
Created attachment 184851 [details]
Complete Xorg.log

When trying to play videos from allmyvideos.net in Chromium, the GPU will lockup. And my computer will be left in an unusable state.

This lockup seems to be related to specific flash players, as I have tried several other sites with flash players that don't cause the lockups.

The specific lockup from dmesg:
[  380.103559] radeon 0000:02:00.0: ring 0 stalled for more than 10037msec
[  380.103568] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000006078 last fence id 0x00000000000061d4 on ring 0)
[  380.103668] radeon 0000:02:00.0: failed to get a new IB (-35)
[  380.103713] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
[  380.109844] radeon 0000:02:00.0: failed to get a new IB (-35)
[  380.109882] [drm:radeon_uvd_suspend [radeon]] *ERROR* Error destroying UVD (-35)!
[  380.125967] radeon 0000:02:00.0: Saved 11161 dwords of commands on ring 0.
[  380.125977] radeon 0000:02:00.0: GPU softreset: 0x00000008
[  380.125979] radeon 0000:02:00.0:   R_008010_GRBM_STATUS      = 0xA0003028
[  380.125981] radeon 0000:02:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
[  380.125983] radeon 0000:02:00.0:   R_000E50_SRBM_STATUS      = 0x200028C0
[  380.125985] radeon 0000:02:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  380.125986] radeon 0000:02:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
[  380.125988] radeon 0000:02:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000086
[  380.125990] radeon 0000:02:00.0:   R_008680_CP_STAT          = 0x80018647
[  380.125992] radeon 0000:02:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  380.189434] radeon 0000:02:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
[  380.189488] radeon 0000:02:00.0: SRBM_SOFT_RESET=0x00000100
[  380.191628] radeon 0000:02:00.0:   R_008010_GRBM_STATUS      = 0x00003028
[  380.191630] radeon 0000:02:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
[  380.191631] radeon 0000:02:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
[  380.191633] radeon 0000:02:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  380.191635] radeon 0000:02:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  380.191637] radeon 0000:02:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  380.191638] radeon 0000:02:00.0:   R_008680_CP_STAT          = 0x00000000
[  380.191640] radeon 0000:02:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  380.191649] radeon 0000:02:00.0: GPU reset succeeded, trying to resume
[  380.211635] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[  380.213937] [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
[  380.213981] radeon 0000:02:00.0: WB enabled
[  380.213984] radeon 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880227b7cc00
[  380.213985] radeon 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880227b7cc0c
[  380.214840] radeon 0000:02:00.0: fence driver on ring 5 use gpu addr 0x000000000005c598 and cpu addr 0xffffc9000101c598
[  380.261529] [drm] ring test on 0 succeeded in 1 usecs
[  380.261537] [drm] ring test on 3 succeeded in 3 usecs
[  380.437798] [drm] ring test on 5 succeeded in 1 usecs
[  380.437802] [drm] UVD initialized successfully.

Sometimes when this happens I'm able to change (CTRL-ALT-F1) and get to the virtual terminal or ssh in from my phone, while other times the computer just locks up.

I've compiled the kernel with PANIC_ON_HANG, so i can get to kexec-kdump to get the dmesg.
Comment 1 Jon Arne Jørgensen 2015-08-13 18:19:06 UTC
Created attachment 184861 [details]
Full dmesg
Comment 2 Jon Arne Jørgensen 2015-08-13 18:22:09 UTC
I wrote PANIC_ON_HANG, that should be: CONFIG_BOOTPARAM_HUNG_TASK_PANIC
Comment 3 Jon Arne Jørgensen 2015-08-13 18:27:17 UTC
When the lockup fails to crash the kernel, I end up with an unusable system with this filling my dmesg buffer:
[...snip...]
[ 1196.506527] radeon 0000:02:00.0: ring 5 stalled for more than 232970msec
[ 1196.506535] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000006 last fence id 0x0000000000000007 on ring 5)
[ 1197.007312] radeon 0000:02:00.0: ring 5 stalled for more than 233471msec
[ 1197.007319] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000006 last fence id 0x0000000000000007 on ring 5)
[ 1197.508102] radeon 0000:02:00.0: ring 5 stalled for more than 233972msec
[ 1197.508110] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000006 last fence id 0x0000000000000007 on ring 5)
[ 1198.008887] radeon 0000:02:00.0: ring 5 stalled for more than 234473msec
[ 1198.008896] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000006 last fence id 0x0000000000000007 on ring 5)
[ 1198.509671] radeon 0000:02:00.0: ring 5 stalled for more than 234974msec
[ 1198.509679] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000006 last fence id 0x0000000000000007 on ring 5)
[ 1199.010474] radeon 0000:02:00.0: ring 5 stalled for more than 235475msec
[ 1199.010482] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000006 last fence id 0x0000000000000007 on ring 5)
[ 1199.511267] radeon 0000:02:00.0: ring 5 stalled for more than 235976msec
[ 1199.511274] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000006 last fence id 0x0000000000000007 on ring 5)
[...snip...]