Bug 201957

Summary: amdgpu: ring gfx timeout
Product: Drivers Reporter: e88z4 (felix.adrianto)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: blocking CC: alexdeucher, anode.dev, au1064, bjo, delentef, diego.viola, dushistov, j.cordoba, janpieter.sollie, jmstylr, juan.zenos, keramidasceid, kernel, lekto, lukasz, mh, panospolychronis, perk11, pierre-eric.pelloux-prayer, poseidon+o1zah, postix, randyk161, reuben_p, rmalinverni, sellis, shallowaloe, udovdh, ungu_93, wdmlist, zzyxpaw
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.19.8, 4.20-rc5 Tree: Mainline
Regression: No
Attachments: 5 second video clip that triggers a crash
attachment-25111-0.html
kernel config 5.4.7 Fiji
/proc/cpuinfo
lspci output

Description e88z4 2018-12-11 04:52:52 UTC
Error message: 
[Dec 5 22:08] amdgpu 0000:23:00.0: GPU fault detected: 146 0x0000480c for process yuzu pid 2920 thread yuzu:cs0 pid 2935
[  +0.000005] amdgpu 0000:23:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
[  +0.000002] amdgpu 0000:23:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0604800C
[  +0.000003] amdgpu 0000:23:00.0: VM fault (0x0c, vmid 3, pasid 32770) at page 0, read from 'TC4' (0x54433400) (72)
[ +10.053011] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=37241, emitted seq=37244
[  +0.000007] [drm] GPU recovery disabled.


How to reproduce the issue:
1. Playing with yuzu-emulator 
2. Load Super Mario Odyssey
3. Start new game
4. When Mario is about to jump for the first time after being woken up by Cappy, this bug must occur. 

During the issue, the following occured:
1. Graphic locked up. 
2. System can be access through SSH.

System specification:
Debian Sid
Radeon RX 580

I have tried the following combination:
1. Kernel 4.17, 4.18, 4.19, 4.20, drm-next-4.21.wip
2. Mesa 18.2, 18.3, 19.0-development branch

But none of the above combination fixes the issue. Let me know if you need more information and more testing from me.
Comment 1 Alex Deucher 2018-12-11 14:57:43 UTC
This is more likely a mesa issue than a kernel issue.
Comment 2 e88z4 2018-12-11 18:18:17 UTC
I will try to test with amdgpu-pro sometimes this week with the kernel that I mentioned above. If the application works as expected, it could be an issue with mesa opengl bug.
Comment 3 Dev Bazilio 2019-03-07 05:20:32 UTC
(In reply to Alex Deucher from comment #1)
> This is more likely a mesa issue than a kernel issue.

no, 4.14 kernel with latest mesa libs works very vell without any stucks
but from 4.20.4 and in all latest kernels (including 5.0) OS freezes and stucks every 30s ... 1min for 30s when browsing youtube with HW acceleration enabled(uvd) or playing a game, RX550, Arch, vanilla kernel

  365.021164] amdgpu: [powerplay] 
                last message was failed ret is 0
[  365.045198] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  365.570667] amdgpu: [powerplay] 
                failed to send message 133 ret is 0 
[  366.115228] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=9365, emitted seq=9365
[  366.115377] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[  366.115388] [drm] Timeout, but no hardware hang detected.
[  366.689407] amdgpu: [powerplay] 
                last message was failed ret is 0
[  367.232287] amdgpu: [powerplay] 
                failed to send message 306 ret is 0 
[  367.787043] amdgpu: [powerplay] 
                last message was failed ret is 0
[  368.320138] amdgpu: [powerplay] 
                failed to send message 5e ret is 0 
[  369.367739] amdgpu: [powerplay] 
                last message was failed ret is 0
[  369.907559] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
[  370.994478] amdgpu: [powerplay] 
                last message was failed ret is 0
[  371.538753] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[  372.075079] amdgpu: [powerplay] 
                last message was failed ret is 0
[  372.598565] amdgpu: [powerplay] 
                failed to send message 148 ret is 0 
[  373.657188] amdgpu: [powerplay] 
                last message was failed ret is 0
[  374.198637] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
[  375.075076] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  375.284948] amdgpu: [powerplay] 
                last message was failed ret is 0
[  375.830347] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[  376.138428] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=10113, emitted seq=10113
[  376.138783] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[  376.138797] [drm] IP block:sdma_v3_0 is hung!
[  376.138809] [drm] GPU recovery disabled.
[  376.394657] amdgpu: [powerplay] 
                last message was failed ret is 0
[  376.934375] amdgpu: [powerplay] 
                failed to send message 16a ret is 0 
[  377.463230] amdgpu: [powerplay] 
                last message was failed ret is 0
[  377.977725] amdgpu: [powerplay] 
                failed to send message 186 ret is 0 
[  378.518406] amdgpu: [powerplay] 
                last message was failed ret is 0
[  379.060098] amdgpu: [powerplay] 
                failed to send message 54 ret is 0 
[  379.556880] amdgpu: [powerplay] 
                last message was failed ret is 0
[  380.075217] amdgpu: [powerplay] 
                failed to send message 26b ret is 0 
[  380.605976] amdgpu: [powerplay] 
                last message was failed ret is 0
[  381.134301] amdgpu: [powerplay] 
                failed to send message 13d ret is 0 
[  381.657486] amdgpu: [powerplay] 
                last message was failed ret is 0
[  382.204551] amdgpu: [powerplay] 
                failed to send message 14f ret is 0 
[  382.741827] amdgpu: [powerplay] 
                last message was failed ret is 0
[  383.281165] amdgpu: [powerplay] 
                failed to send message 151 ret is 0 
[  383.824923] amdgpu: [powerplay] 
                last message was failed ret is 0
[  384.362266] amdgpu: [powerplay] 
                failed to send message 135 ret is 0 
[  384.903686] amdgpu: [powerplay] 
                last message was failed ret is 0
[  385.101515] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  385.461515] amdgpu: [powerplay] 
                failed to send message 190 ret is 0 
[  386.014015] amdgpu: [powerplay] 
                last message was failed ret is 0
[  386.164818] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=10761, emitted seq=10761
[  386.164970] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
[  386.164985] [drm] Timeout, but no hardware hang detected.
Comment 4 Alex Deucher 2019-03-07 05:24:36 UTC
Can you bisect?
Comment 5 Cameron 2019-03-12 13:15:31 UTC
I'm having a very similar issue, running Linux Mint 19.1. The issue has persisted from at least 4.15, I'm currently running 5.0.1 and the issue remains. 

Here is the latest syslog of the error:

[37258.615599] gmc_v9_0_process_interrupt: 10 callbacks suppressed
[37258.615608] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615615] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107805000 from 27
[37258.615619] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[37258.615629] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615633] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107807000 from 27
[37258.615636] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615645] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615648] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107801000 from 27
[37258.615651] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615660] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615663] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107803000 from 27
[37258.615666] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615675] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615678] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107809000 from 27
[37258.615681] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615689] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615692] amdgpu 0000:06:00.0:   in page starting at address 0x000080010780b000 from 27
[37258.615695] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615704] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615707] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107805000 from 27
[37258.615710] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615740] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615743] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107807000 from 27
[37258.615746] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615756] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615759] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107801000 from 27
[37258.615762] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615771] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615774] amdgpu 0000:06:00.0:   in page starting at address 0x0000800107803000 from 27
[37258.615777] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37268.712339] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37268.712387] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37268.712389] [drm] GPU recovery disabled.
[37278.952537] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37278.952624] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37278.952628] [drm] GPU recovery disabled.
[37289.192390] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37289.192478] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37289.192481] [drm] GPU recovery disabled.
[37299.432447] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37299.432534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37299.432538] [drm] GPU recovery disabled.
[37309.676431] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37309.676518] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37309.676522] [drm] GPU recovery disabled.
[37319.912444] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37319.912536] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37319.912541] [drm] GPU recovery disabled.
[37330.156619] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37330.156706] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37330.156710] [drm] GPU recovery disabled.
[37340.392424] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37340.392511] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37340.392515] [drm] GPU recovery disabled.
[37350.632424] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37350.632511] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37350.632514] [drm] GPU recovery disabled.
[37360.872417] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37360.872508] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37360.872511] [drm] GPU recovery disabled.
[37371.112436] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37371.112523] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37371.112527] [drm] GPU recovery disabled.
[37381.352427] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37381.352514] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37381.352517] [drm] GPU recovery disabled.
[37391.592410] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37391.592497] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37391.592500] [drm] GPU recovery disabled.
[37401.836426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37401.836513] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37401.836517] [drm] GPU recovery disabled.
[37412.072433] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37412.072520] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37412.072524] [drm] GPU recovery disabled.
[37422.312442] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37422.312528] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37422.312532] [drm] GPU recovery disabled.
[37432.552428] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37432.552515] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37432.552519] [drm] GPU recovery disabled.
[37442.792418] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37442.792506] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37442.792510] [drm] GPU recovery disabled.
[37453.032397] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37453.032483] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37453.032487] [drm] GPU recovery disabled.
[37463.272534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37463.272621] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37463.272624] [drm] GPU recovery disabled.
[37473.512589] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37473.512676] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37473.512680] [drm] GPU recovery disabled.
[37483.752954] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37483.753041] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37483.753044] [drm] GPU recovery disabled.
[37493.992566] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=602475, emitted seq=602478
[37493.992654] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37493.992657] [drm] GPU recovery disabled.

During this time the laptop continues to operate (plays music and can SSH in), however the display and any input (keyboard / mouse) do not respond. The caps lock light for example does not toggle. The only way to recover is a force reboot by holding the power button.

I'm unable to provide any steps on how to re-create as the issue happens at completely random times when performing different tasks or when leaving the machine idle. 

System specs:
Lenovo ThinkPad A485
AMD Ryzen 7 PRO 2700U with Radeon Vega Mobile Gfx
Linux Mint 19.1
Kernel 5.0.1 (installed via ukuu)
Comment 6 Dev Bazilio 2019-04-01 18:20:39 UTC
tried linux-amd-staging-drm-next-git-5.1.811103.2acb851ad43b and dmes is still has a lot of warnings. Tested also youtube in chrome with UVD, got a minor freeze and long freeze ~30sec of system

Apr 01 21:01:03 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 21:01:03 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -110
Apr 01 21:01:03 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).


Apr 01 20:26:59 kernel: [drm] amdgpu kernel modesetting enabled.
Apr 01 20:26:59 kernel: vga_switcheroo: detected switching method \_SB_.PCI0.VGA_.ATPX handle
Apr 01 20:26:59 kernel: [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x1025:0x1201 0xCA).
Apr 01 20:26:59 kernel: [drm] register mmio base: 0xD1500000
Apr 01 20:26:59 kernel: [drm] register mmio size: 262144
Apr 01 20:26:59 kernel: [drm] add ip block number 0 <vi_common>
Apr 01 20:26:59 kernel: [drm] add ip block number 1 <gmc_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 2 <cz_ih>
Apr 01 20:26:59 kernel: [drm] add ip block number 3 <gfx_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 4 <sdma_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 5 <powerplay>
Apr 01 20:26:59 kernel: [drm] add ip block number 6 <dm>
Apr 01 20:26:59 kernel: [drm] add ip block number 7 <uvd_v6_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 8 <vce_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 9 <acp_ip>
Apr 01 20:26:59 kernel: [drm] UVD is enabled in physical mode
Apr 01 20:26:59 kernel: [drm] VCE enabled in physical mode
Apr 01 20:26:59 kernel: ATOM BIOS: 113-C91400-007
Apr 01 20:26:59 kernel: [drm] RAS INFO: ras initialized successfully, hardware ability[0] ras_mask[0]
Apr 01 20:26:59 kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
Apr 01 20:26:59 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
Apr 01 20:26:59 kernel: [drm] RAM width 64bits UNKNOWN
Apr 01 20:26:59 kernel: [TTM] Zone  kernel: Available graphics memory: 3804974 KiB
Apr 01 20:26:59 kernel: [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
Apr 01 20:26:59 kernel: [TTM] Initializing pool allocator
Apr 01 20:26:59 kernel: [TTM] Initializing DMA pool allocator
Apr 01 20:26:59 kernel: [drm] amdgpu: 512M of VRAM memory ready
Apr 01 20:26:59 kernel: [drm] amdgpu: 3072M of GTT memory ready.
Apr 01 20:26:59 kernel: [drm] GART: num cpu pages 262144, num gpu pages 262144
Apr 01 20:26:59 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
Apr 01 20:26:59 kernel: [drm] Found UVD firmware Version: 1.91 Family ID: 11
Apr 01 20:26:59 kernel: [drm] UVD ENC is disabled
Apr 01 20:26:59 kernel: [drm] Found VCE firmware Version: 52.4 Binary ID: 3
Apr 01 20:26:59 kernel: smu version 27.17.00
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Engine clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         300000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         480000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         533340
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         576000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         626090
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         685720
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         720000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         757900
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 75790
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 93300
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Display clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         300000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         400000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         496560
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         626090
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         685720
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         757900
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         800000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         847060
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 75790
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 93300
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Memory clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         667000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         933000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 75790
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 93300
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2! type 0 expected 3
Apr 01 20:26:59 kernel: [drm] Display Core initialized with v3.2.24!
Apr 01 20:26:59 kernel: [drm] SADs count is: -2, don't need to read it
Apr 01 20:26:59 kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Apr 01 20:26:59 kernel: [drm] Driver supports precise vblank timestamp query.

Apr 01 20:26:59 kernel: [drm] UVD initialized successfully.
Apr 01 20:26:59 kernel: [drm] VCE initialized successfully.
Apr 01 20:26:59 kernel: kfd kfd: Allocated 3969056 bytes on gart
Apr 01 20:26:59 kernel: Topology: Add APU node [0x9874:0x1002]
Apr 01 20:26:59 kernel: kfd kfd: added device 1002:9874
Apr 01 20:26:59 kernel: [drm] fb mappable at 0x21FDCD000
Apr 01 20:26:59 kernel: [drm] vram apper at 0x21F000000
Apr 01 20:26:59 kernel: [drm] size 8294400
Apr 01 20:26:59 kernel: [drm] fb depth is 24
Apr 01 20:26:59 kernel: [drm]    pitch is 7680
Apr 01 20:26:59 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Apr 01 20:26:59 kernel: Console: switching to colour frame buffer device 240x67
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
Apr 01 20:26:59 kernel: [drm] Initialized amdgpu 3.31.0 20150101 for 0000:00:01.0 on minor 0
Apr 01 20:26:59 kernel: amdgpu 0000:03:00.0: enabling device (0002 -> 0003)
Apr 01 20:26:59 kernel: [drm] initializing kernel modesetting (POLARIS12 0x1002:0x699F 0x1025:0x1210 0xC3).
Apr 01 20:26:59 kernel: [drm] register mmio base: 0xD1200000
Apr 01 20:26:59 kernel: [drm] register mmio size: 262144
Apr 01 20:26:59 kernel: [drm] add ip block number 0 <vi_common>
Apr 01 20:26:59 kernel: [drm] add ip block number 1 <gmc_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 2 <tonga_ih>
Apr 01 20:26:59 kernel: [drm] add ip block number 3 <gfx_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 4 <sdma_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 5 <powerplay>
Apr 01 20:26:59 kernel: [drm] add ip block number 6 <dm>
Apr 01 20:26:59 kernel: [drm] add ip block number 7 <uvd_v6_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 8 <vce_v3_0>
Apr 01 20:26:59 kernel: kfd kfd: skipped device 1002:699f, PCI rejects atomics
Apr 01 20:26:59 kernel: [drm] UVD is enabled in VM mode
Apr 01 20:26:59 kernel: [drm] UVD ENC is enabled in VM mode
Apr 01 20:26:59 kernel: [drm] VCE enabled in VM mode
Apr 01 20:26:59 kernel: vga_switcheroo: enabled
Apr 01 20:26:59 kernel: ATOM BIOS: SWBRT23054.001
Apr 01 20:26:59 kernel: [drm] GPU posting now...
Apr 01 20:26:59 kernel: [drm] RAS INFO: ras initialized successfully, hardware ability[0] ras_mask[0]
Apr 01 20:26:59 kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
Apr 01 20:26:59 kernel: amdgpu 0000:03:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
Apr 01 20:26:59 kernel: amdgpu 0000:03:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
Apr 01 20:26:59 kernel: [drm] Detected VRAM RAM=2048M, BAR=256M
Apr 01 20:26:59 kernel: [drm] RAM width 128bits GDDR5
Apr 01 20:26:59 kernel: [drm] amdgpu: 2048M of VRAM memory ready
Apr 01 20:26:59 kernel: [drm] amdgpu: 3072M of GTT memory ready.
Apr 01 20:26:59 kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536
Apr 01 20:26:59 kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Apr 01 20:26:59 kernel: [drm] Chained IB support enabled!
Apr 01 20:26:59 kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16
Apr 01 20:26:59 kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage ID but it's not patched 
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Engine clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         214000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         551000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         734000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         921000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         980000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         1046000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 104600
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 125000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Memory clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         300000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         625000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         1250000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 104600
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 125000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] Display Core initialized with v3.2.24!
Apr 01 20:26:59 kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Apr 01 20:26:59 kernel: [drm] Driver supports precise vblank timestamp query.
Apr 01 20:26:59 kernel: [drm] UVD and UVD ENC initialized successfully.
Apr 01 20:26:59 kernel: [drm] VCE initialized successfully.
Apr 01 20:26:59 kernel: [drm] Initialized amdgpu 3.31.0 20150101 for 0000:03:00.0 on minor 1
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 15b ret is 0 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 15a ret is 0 
Apr 01 20:26:59 kernel: [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test failed (-110).
Apr 01 20:26:59 kernel: EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 155 ret is 0 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 15b ret is 0
Apr 01 20:27:48 kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Apr 01 20:27:48 kernel: amdgpu: [powerplay] 
                                failed to send message 154 ret is 0 
Apr 01 20:27:49 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 20:27:49 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -110
Apr 01 20:27:49 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
Apr 01 20:27:50 kernel: amdgpu: [powerplay]
Apr 01 20:28:30 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 0 
Apr 01 20:28:31 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:31 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:32 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:32 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:33 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:33 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:34 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:34 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:35 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:35 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:36 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:36 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:37 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:37 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:38 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:39 kernel: amdgpu: [powerplay] 

Apr 01 20:29:12 kernel: amdgpu: [powerplay] 
                                failed to send message 154 ret is 0 
Apr 01 20:29:13 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 20:29:13 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -110
Apr 01 20:29:13 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).

Apr 01 20:30:06 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:06 kernel: amdgpu: [powerplay] 
                                failed to send message 135 ret is 0 
Apr 01 20:30:07 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:07 kernel: amdgpu: [powerplay] 
                                failed to send message 190 ret is 0 
Apr 01 20:30:08 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:08 kernel: amdgpu: [powerplay] 
                                failed to send message 63 ret is 0 
Apr 01 20:30:09 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:09 kernel: amdgpu: [powerplay] 
                                failed to send message 84 ret is 0 
Apr 01 20:30:09 kernel: amdgpu 0000:03:00.0: GPU pci config reset
Apr 01 20:34:17 kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Apr 01 20:34:18 kernel: amdgpu: [powerplay] 
                                failed to send message 154 ret is 0 
Apr 01 20:34:18 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 20:34:18 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -110
Apr 01 20:34:18 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
Comment 7 Dev Bazilio 2019-04-01 18:44:22 UTC
(In reply to Alex Deucher from comment #4)
> Can you bisect?

Unfortunately this is not possible as all latest kernels are now shipped with Display Core enabled by default and as I told 4.14 vanilla kernel works like a charm on same HW and with same mesa libs - no lags, no stucks or freezes and no warnings like listed above. So it's no sense to do "git bisect" as it's not a single commit which works incorrectly with GPU. DC - this a completely new functionality which replaces old amdgpu code
Comment 8 jens harms 2019-08-20 15:06:55 UTC
Hi, i have a very similar problem. My system is working with 4.15 and with 5.1.16 but not with other 5.x kernels:

The System does not boot with 5.x kernels. With 5.1.16 the gui system freezes sometimes but sshd and mouse is still working. 


CPU: Ryzen 5 2400g, BOARD: AORUS B450 I PRO WIFI, X Server 1.19.6

Kernel 5.0.x not working (blank screen after boot)
Kernel 5.2.x ( x <= 9 ) is not working (blank screen after boot)

but Kernel 5.1.16 is working (mostly)!


Error LOG with 5.1.16:
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: [gfxhub] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1848 thread Xorg:cs0 pid 1849)
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0:   in page starting at address 0x000080010c205000 from 27
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[Mi Aug 14 14:22:31 2019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=840738, emitted seq=840740
[Mi Aug 14 14:22:31 2019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1848 thread Xorg:cs0 pid 1849
[Mi Aug 14 14:22:31 2019] [drm] GPU recovery disabled.
Comment 9 Ungureanu Alexandru 2019-09-11 08:36:57 UTC
Just got something similar while playing Left 4 Dead. The system simply froze with altered colors on the screen and the sound just looping over the last second or so. Cannot confirm SSH access.


journalctl -b -1 ends with


[drm:gfx_v8_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2225992, emitted seq=2225993
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process hl2_linux pid 12532 thread hl2_


OS: Ubuntu 19.04 on 
Kernel: 5.0.0-27-generic
GPU: Radeon RX580
CPU: Ryzen 5 1600x

Thanks!
Comment 10 Dev Bazilio 2019-09-20 11:37:07 UTC
(In reply to Ungureanu Alexandru from comment #9)
> Just got something similar while playing Left 4 Dead. The system simply
> froze with altered colors on the screen and the sound just looping over the
> last second or so. Cannot confirm SSH access.

> Kernel: 5.0.0-27-generic
> GPU: Radeon RX580
> CPU: Ryzen 5 1600x

5.0 is very outdated kernel, use latest from kernel.org

as for me all works perfectly in 5.3 (Chip polaris RX540)
finally I have no more any errors like these ones:
- ERROR* resume of IP block <uvd_v6_0> failed -110
- [drm] Fence fallback timer expired on ring sdma0
- last message was failed ret is **
- [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq...
- IP block:sdma_v3_0 is hung!
- Timeout, but no hardware hang detected.

Tested on youtube with HW accelerated video and in several games
Thank you guys from AMD a lot, I had to wait 1y+ to get these bugs fixed
Comment 11 lekto 2019-10-02 10:39:30 UTC
Same problem here. It happens when I run looking-glass [1], but not everytime. I tied downgrading my kernel from 5.3.1 to 5.2.11 (I'm pretty sure it worked then), downgrading mesa from 19.2.0 to 19.1.7 (I'm sure it worked with 19.2.0-rc) and downgrading my firmware to 2019-09-23 (oldest in repo).

When it happens looking glass starts blinking and sometimes my other monitor stuck that I can only move cursor on it.

Spec:
Gentoo ~amd64
Ryzen 1600 (other have Ryzen too, coincidence?)
Linux GPU: R7 240 (with radeon driver)
Windows GPU: RX580
ASRock X370 Gaming X


[1] https://looking-glass.hostfission.com/
Comment 12 Matthias Heinz 2019-10-11 22:00:39 UTC
Hi,

I think I have the same bug and opened https://bugzilla.kernel.org/show_bug.cgi?id=204683.

At first it looked a bit different, because in newer kernels the error message has changed. But as you can see I did some testing and this seems to go way back. Sadly I couldn't test a 4.18 kernel.

Can somebody mark my report as duplicate? Because I think it is.

And Would some more debug info help?
Comment 13 Matthias Heinz 2019-10-14 17:18:59 UTC
*** Bug 204683 has been marked as a duplicate of this bug. ***
Comment 14 Konstantin Pereiaslov 2019-10-24 16:39:38 UTC
Also experiencing this with Radeon RX 5700 XT and amdgpu  19.1.0+git1910111930.b467d2~oibaf~b 

Didn't have any heavy load for the GPU to do. 

First I had some artifacts appeared on Plasma Hard Disk Monitor widget and CPU Load Widget (here is a screenshot: https://i.perk11.info/20191024_193152_kernel.png) while PC was idle and screen was locked, but everything else continued to work fine. 

I checked the logs for the period when this could've happened, but the only logs from that period are from KScreen that start like this:

Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper: RRNotify_OutputProperty (ignored)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Property:  EDID
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         State (newValue, Deleted):  1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper: RRNotify_OutputProperty (ignored)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Property:  EDID
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         State (newValue, Deleted):  1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper: RRNotify_OutputChange
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         CRTC:  81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Mode:  97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Rotation:  "Rotate_0"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Connection:  "Disconnected"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Subpixel Order:  0
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper: RRScreenChangeNotify
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Window: 18874373
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Root: 1744
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Rotation:  "Rotate_0"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Size ID: 65535
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Size:  7280 1440
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         SizeMM:  1926 381
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper: RRNotify_OutputChange
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         CRTC:  81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Mode:  97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Rotation:  "Rotate_0"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Connection:  "Disconnected"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:         Subpixel Order:  0
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xrandr: XRandROutput 88 update
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_connected: 0
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_crtc XRandRCrtc(0x5655577da9f0)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          CRTC: 81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          MODE: 97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Connection: 1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Primary: false
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xrandr: Output 88 : connected = false , enabled = true
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xrandr: XRandROutput 88 update
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_connected: 1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_crtc XRandRCrtc(0x5655577da9f0)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          CRTC: 81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          MODE: 97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Connection: 1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Primary: false



90 minutes later, the system became unresponsive while I was typing a message in Skype, but the audio I had playing in Audacity continued to play and the cron jobs continued running normally for a few minutes while I was trying to get the system unstuck without rebooting it which I couldn't.

Here are the errors:

Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!




Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=3485981, emitted seq=3485983
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2469 thread Xorg:cs0 pid 2491
Oct 24 19:04:15 perk11-home kernel: [drm] GPU recovery disabled.
Comment 15 Konstantin Pereiaslov 2019-10-24 16:40:26 UTC
My kernel version is 5.3.7-050307-generic running KDE Neon User edition with latest updates.
Comment 16 shallowaloe 2019-10-27 18:44:36 UTC
Created attachment 285665 [details]
5 second video clip that triggers a crash

Hi,

I think I'm having the same problem as you guys.  I run a mythbackend where I record cable television and those recordings often crash my system when hardware decoding is enabled.  Usually it's just the screen that freezes and I can still ssh to it.  

Kernel 5.1.6 was an exception for me too, with that kernel I'm able to restart the display manager and recover without having to reboot.

Attached is a short video that crashes my system.  I can trigger the alert by running:

mpv --vo=vaapi out.ts

I'm wondering if it crashes your systems too and if it's related.
Comment 17 jmstylr 2019-11-10 07:11:29 UTC
(In reply to shallowaloe from comment #16)
> Created attachment 285665 [details]
> 5 second video clip that triggers a crash
> 
> Hi,
> 
> I think I'm having the same problem as you guys.  I run a mythbackend where
> I record cable television and those recordings often crash my system when
> hardware decoding is enabled.  Usually it's just the screen that freezes and
> I can still ssh to it.  
> 
> Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> restart the display manager and recover without having to reboot.
> 
> Attached is a short video that crashes my system.  I can trigger the alert
> by running:
> 
> mpv --vo=vaapi out.ts
> 
> I'm wondering if it crashes your systems too and if it's related.


Just to add a data point, I tried running `mpv --vo=vaapi out.ts` against your file, and while it crashed the application, it did not freeze the system. 

My hardware is a Ryzen 3700X with a Radeon RX 5700, running Ubuntu 19.10 with default kernel (5.3.0-19-generic).

The command did result in the following lines in /var/log/syslog repeated every 5 seconds:

Nov 10 07:04:23 redacted kernel: [ 2266.802162] gmc_v10_0_process_interrupt: 23900 callbacks suppressed
Nov 10 07:04:23 redacted kernel: [ 2266.802166] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802170] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802171] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.802176] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802178] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802179] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.802566] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802568] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802569] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.802573] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802575] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802576] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.802984] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802985] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802987] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.802993] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802994] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802995] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.803403] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803404] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803406] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.803410] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803411] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803412] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.803822] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803824] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803825] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.803831] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803833] amdgpu 0000:0b:00.0:   at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803834] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Comment 18 Matthias Heinz 2019-11-25 09:43:45 UTC
Hi,

I recently built a 5.4.0-rc7 from drm-next (my HEAD was 17eee668b3cad423a47c090fe2275733c55db910) and also updated Mesa to 19.3.0-RC1.

Since then I didn't get any crashes. I have tested this for a few hours now, but it's entirely possible that I just didn't run into the bug for some reason, although it usually appeared after half an hour.

If possible please try this setup and see if it is fixed.
Comment 19 j.cordoba 2019-12-03 15:53:49 UTC
Hi,

This issue is still present in the latest kernels:

5.4.1, 5.4, 5.3.14

Last usable kernel for me is 4.20.17

System Specs

- Gigabyte b450-ds3h
- Ryzen 5 3400G (with RX Vega 11)
- Mesa 19.1.2 - padoka PPA (Stable)
- Ubuntu 18.04.3 LTS
Comment 20 Matthias Heinz 2019-12-03 16:07:15 UTC
Dear j.cordoba,

is it possible that you try to build 5.4.0-rc7 from drm-next and give it a test as I mentioned in Comment 18?

I'm running on this for some time now and the bug should have appeared by now, so I'm getting more confident that it is fixed.

Best regards
Matthias
Comment 21 Łukasz Żarnowiecki 2019-12-03 21:34:03 UTC
Same is happening to me on 5.4.1.  No issue with 4.9.

[   44.172714] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[   49.292694] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[   58.469316] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[   63.586055] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  156.606591] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 22 Pierre-Eric Pelloux-Prayer 2019-12-04 09:54:47 UTC
(In reply to shallowaloe from comment #16)
> Created attachment 285665 [details]
> 5 second video clip that triggers a crash
> 
> Hi,
> 
> I think I'm having the same problem as you guys.  I run a mythbackend where
> I record cable television and those recordings often crash my system when
> hardware decoding is enabled.  Usually it's just the screen that freezes and
> I can still ssh to it.  
> 
> Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> restart the display manager and recover without having to reboot.
> 
> Attached is a short video that crashes my system.  I can trigger the alert
> by running:
> 
> mpv --vo=vaapi out.ts
> 
> I'm wondering if it crashes your systems too and if it's related.


This one is probably a Mesa issue, see https://gitlab.freedesktop.org/mesa/mesa/issues/2177

What Mesa version are you using?
Comment 23 shallowaloe 2019-12-08 17:32:49 UTC
Created attachment 286227 [details]
attachment-25111-0.html

Thanks for the link to the bug. I'm running an ubuntu based system and am
using the oibaf ppa.  The current version is 20.0.



On Wed, Dec 4, 2019 at 1:54 AM <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=201957
>
> Pierre-Eric Pelloux-Prayer (pierre-eric.pelloux-prayer@amd.com) changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                  CC|
> |pierre-eric.pelloux-prayer@
>                    |                            |amd.com
>
> --- Comment #22 from Pierre-Eric Pelloux-Prayer (
> pierre-eric.pelloux-prayer@amd.com) ---
> (In reply to shallowaloe from comment #16)
> > Created attachment 285665 [details]
> > 5 second video clip that triggers a crash
> >
> > Hi,
> >
> > I think I'm having the same problem as you guys.  I run a mythbackend
> where
> > I record cable television and those recordings often crash my system when
> > hardware decoding is enabled.  Usually it's just the screen that freezes
> and
> > I can still ssh to it.
> >
> > Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> > restart the display manager and recover without having to reboot.
> >
> > Attached is a short video that crashes my system.  I can trigger the
> alert
> > by running:
> >
> > mpv --vo=vaapi out.ts
> >
> > I'm wondering if it crashes your systems too and if it's related.
>
>
> This one is probably a Mesa issue, see
> https://gitlab.freedesktop.org/mesa/mesa/issues/2177
>
> What Mesa version are you using?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 24 Janpieter Sollie 2020-01-02 08:30:11 UTC
Hi everyone,

I have the same issue with a Fiji Nano GPU: UVD6 and VCE3 timeout in ring buffer test @ boot with the AMDGPU driver. Other rings seem to work correctly.
To make sure the hardware functions like it should, and it's not a HW error, where (in the amdgpu driver) can I increase the timeout value?
Comment 25 Janpieter Sollie 2020-01-02 09:11:40 UTC
Created attachment 286575 [details]
kernel config 5.4.7 Fiji

Some additional info for my case:
- Running kernel 5.4.7 (vanilla), firmware 20191108 on gentoo
- Dmesg | grep -E "(drm)|(amdgpu)":
[    3.930023] [drm] amdgpu kernel modesetting enabled.
[    3.930217] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[    3.930219] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
[    3.930221] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfce00000 -> 0xfce3ffff
[    3.930224] fb0: switching to amdgpudrmfb from EFI VGA
[    3.930475] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300 0x1002:0x0B36 0xCA).
[    3.930486] [drm] register mmio base: 0xFCE00000
[    3.930486] [drm] register mmio size: 262144
[    3.930495] [drm] add ip block number 0 <vi_common>
[    3.930495] [drm] add ip block number 1 <gmc_v8_0>
[    3.930496] [drm] add ip block number 2 <tonga_ih>
[    3.930497] [drm] add ip block number 3 <gfx_v8_0>
[    3.930498] [drm] add ip block number 4 <sdma_v3_0>
[    3.930498] [drm] add ip block number 5 <powerplay>
[    3.930499] [drm] add ip block number 6 <dm>
[    3.930500] [drm] add ip block number 7 <uvd_v6_0>
[    3.930500] [drm] add ip block number 8 <vce_v3_0>
[    3.930715] [drm] UVD is enabled in physical mode
[    3.930715] [drm] VCE enabled in physical mode
[    3.930743] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    3.930751] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    3.930753] amdgpu 0000:0a:00.0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[    3.930758] [drm] Detected VRAM RAM=4096M, BAR=256M
[    3.930759] [drm] RAM width 512bits HBM
[    3.930838] [drm] amdgpu: 4096M of VRAM memory ready
[    3.930841] [drm] amdgpu: 4096M of GTT memory ready.
[    3.930860] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    3.930928] [drm] PCIE GART of 1024M enabled (table at 0x000000F4001D5000).
[    3.934174] [drm] Chained IB support enabled!
[    3.940198] amdgpu: [powerplay] hwmgr_sw_init smu backed is fiji_smu
[    3.941748] [drm] Found UVD firmware Version: 1.91 Family ID: 12
[    3.941752] [drm] UVD ENC is disabled
[    3.943542] [drm] Found VCE firmware Version: 55.2 Binary ID: 3
[    4.009146] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    4.040084] [drm] Display Core initialized with v3.2.48!
[    4.040542] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    4.040543] [drm] Driver supports precise vblank timestamp query.
[    4.067774] [drm] UVD initialized successfully.
[    4.168780] [drm] VCE initialized successfully.
[    4.170163] [drm] Cannot find any crtc or sizes
[    4.171948] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:0a:00.0 on minor 0
[    7.280062] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd (-110).
[    8.400365] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vce0 (-110).
[    8.400370] [drm:process_one_work] *ERROR* ib ring test failed (-110).
Comment 26 F. Delente 2020-01-19 17:03:37 UTC
Hello, I have the same problem on a Huawei Matebook D lapop, processor is an AMD Ryzen 5 with an integrated Radeon Vega Mobile GPU.

I use Fedora 31. The problem appeared when upgrading from then 5.3.16 kernel to the 5.4.6 kernel. Reverting to 5.3.16 solved the issue.

At some moments the UI (XFCE) freezes for about 5 seconds; I can move the mouse cursor but I can't get any keyboard input (not in X, not by switching console). Each time the freeze occurs dmesg shows the messages

[   45.530374] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[   50.139408] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered

I include /proc/cpuinfo and lspci outputs.
Comment 27 F. Delente 2020-01-19 17:04:19 UTC
Created attachment 286899 [details]
/proc/cpuinfo
Comment 28 F. Delente 2020-01-19 17:04:47 UTC
Created attachment 286901 [details]
lspci output
Comment 29 Matthias Heinz 2020-01-19 17:13:00 UTC
Hi. This bug is already reported here by me https://gitlab.freedesktop.org/drm/amd/issues/953

If possible try a 5.5-rc kernel and see if it's fixed there. It's fixed - at least for me - in the drm-tree.

Best regards
Matthias
Comment 30 Steven Ellis 2020-04-04 21:54:20 UTC
I"m seeing the same issue on Ubuntu 18.04 with

Upstream PPA "sudo add-apt-repository ppa:oibaf/graphics-drivers"

[  321.412530] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
[  326.286306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=4447, emitted seq=4449
[  326.286395] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process mythfrontend.re pid 2410 thread mythfronte:cs0 pid 2880


AMDGPUPRO driver 19.50-967956

[20913.330563] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[20918.450513] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[20923.570306] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[20928.690699] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 31 Matthias Heinz 2020-05-01 09:03:18 UTC
Hi,

for me this bug is fixed with a 5.5 kernel. And I'm wondering if this is fixed for all of you, too.

Best
Matthias
Comment 32 j.cordoba 2020-05-01 19:52:47 UTC
I agree. Fixed for me too
Comment 33 udo 2020-05-25 12:21:12 UTC
I still see them on 5.6.13:

[191571.372560] sd 11:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
[205796.424607] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4518280, emitted seq=4518282
[205796.424637] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process mpv pid 488243 thread mpv:cs0 pid 488257
[205796.424640] amdgpu 0000:0a:00.0: GPU reset begin!
[205800.840504] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[205800.937565] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205800.938060] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[205800.938849] [drm] PSP is resuming...
[205800.958729] [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[205800.972414] [drm] psp command (0x5) failed and response status is (0xFFFF0007)
[205801.176411] amdgpu 0000:0a:00.0: RAS: ras ta ucode is not available
[205801.460775] [drm] kiq ring mec 2 pipe 1 q 0
[205801.460986] amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x800002300 flags=0x0000]
[205801.516698] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[205801.516709] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[205801.516713] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[205801.516717] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[205801.516720] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[205801.516724] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[205801.516727] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[205801.516730] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[205801.516733] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[205801.516736] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[205801.516740] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[205801.516743] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[205801.516746] amdgpu 0000:0a:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[205801.516749] amdgpu 0000:0a:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[205801.516752] amdgpu 0000:0a:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[205801.516755] amdgpu 0000:0a:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1
[205801.525996] [drm] recover vram bo from shadow start
[205801.525998] [drm] recover vram bo from shadow done
[205801.526008] [drm] Skip scheduling IBs!
[205801.526051] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
[205802.536444] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4518342, emitted seq=4518344
[205802.536523] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 3825 thread gnome-shel:cs0 pid 3834
[205802.536531] amdgpu 0000:0a:00.0: GPU reset begin!
[205806.728558] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[205806.821326] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205806.821578] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[205806.821899] [drm] PSP is resuming...
[205806.841769] [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[205806.856213] [drm] psp command (0x5) failed and response status is (0xFFFF0007)
[205807.072210] amdgpu 0000:0a:00.0: RAS: ras ta ucode is not available
[205807.355997] [drm] kiq ring mec 2 pipe 1 q 0
[205807.356308] amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x800072f00 flags=0x0000]
[205807.409389] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[205807.409401] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[205807.409406] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[205807.409410] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[205807.409415] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[205807.409418] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[205807.409422] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[205807.409425] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[205807.409429] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[205807.409432] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[205807.409436] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[205807.409440] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[205807.409444] amdgpu 0000:0a:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[205807.409447] amdgpu 0000:0a:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[205807.409451] amdgpu 0000:0a:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[205807.409454] amdgpu 0000:0a:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1
[205807.418547] [drm] recover vram bo from shadow start
[205807.418549] [drm] recover vram bo from shadow done
[205807.418567] [drm] Skip scheduling IBs!
[205807.418569] [drm] Skip scheduling IBs!
[205807.418592] [drm] Skip scheduling IBs!
[205807.418613] amdgpu 0000:0a:00.0: GPU reset(2) succeeded!
[205808.428469] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[205809.458201] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=11463546, emitted seq=11463549
[205809.458282] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 3513 thread Xorg:cs0 pid 3514
[205809.458289] amdgpu 0000:0a:00.0: GPU reset begin!
[205812.872123] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[205812.981471] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205812.981823] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[205812.982264] [drm] PSP is resuming...
[205813.002134] [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[205813.012088] [drm] psp command (0x5) failed and response status is (0xFFFF0007)
[205813.208005] amdgpu 0000:0a:00.0: RAS: ras ta ucode is not available
[205813.497603] [drm] kiq ring mec 2 pipe 1 q 0
[205813.551494] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[205813.551506] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[205813.551510] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[205813.551514] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[205813.551517] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[205813.551520] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[205813.551524] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[205813.551526] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[205813.551529] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[205813.551532] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[205813.551535] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[205813.551538] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[205813.551541] amdgpu 0000:0a:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[205813.551543] amdgpu 0000:0a:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[205813.551546] amdgpu 0000:0a:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[205813.551549] amdgpu 0000:0a:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1
[205902.384966] traps: Bluez D-Bus thr[409727] trap invalid opcode ip:555cd19202af sp:7f265cf9de10 error:0 in skypeforlinux[555ccfa02000+542a000]
Comment 34 Panagiotis Polychronis 2020-06-19 19:11:45 UTC
The problem still exists with Linux Kernel 5.8-rc1 from git. (My graphics card is Radeon 5600XT)


[20581.087159] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2768656, emitted seq=2768658
[20581.087212] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process DOOMEternalx64v pid 8875 thread DOOMEternalx64v pid 8875
[20581.087217] amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
[20583.381257] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[20585.087232] amdgpu 0000:29:00.0: amdgpu: failed to suspend display audio
[20585.156036] snd_hda_codec_hdmi hdaudioC0D0: HDMI: ELD buf size is 0, force 128
[20585.156052] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 0
[20585.463157] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[20585.463205] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[20585.694999] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[20585.695047] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[20585.926951] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[20588.045497] amdgpu 0000:29:00.0: amdgpu: GPU reset succeeded, trying to resume
[20588.045605] [drm] PCIE GART of 512M enabled (table at 0x0000008000E10000).
[20588.045682] [drm] VRAM is lost due to GPU reset!
[20588.048023] [drm] PSP is resuming...
[20588.218089] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
[20588.287093] amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta ucode is not available
[20588.293101] amdgpu: SMU is resuming...
[20588.295088] amdgpu: SMU is resumed successfully!
[20588.413155] [drm] kiq ring mec 2 pipe 1 q 0
[20588.417493] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[20588.417632] [drm] JPEG decode initialized successfully.
[20588.417690] amdgpu 0000:29:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[20588.417693] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[20588.417697] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[20588.417700] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[20588.417703] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[20588.417707] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[20588.417709] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[20588.417713] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[20588.417716] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[20588.417719] amdgpu 0000:29:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[20588.417721] amdgpu 0000:29:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[20588.417724] amdgpu 0000:29:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[20588.417726] amdgpu 0000:29:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[20588.417728] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[20588.417730] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
[20588.417732] amdgpu 0000:29:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[20588.421588] [drm] recover vram bo from shadow start
[20588.427530] [drm] recover vram bo from shadow done
[20588.427534] [drm] Skip scheduling IBs!
[20588.427537] [drm] Skip scheduling IBs!
[20588.427565] [drm] Skip scheduling IBs!
[20588.427573] [drm] Skip scheduling IBs!
[20588.427583] [drm] Skip scheduling IBs!
[20588.427591] [drm] Skip scheduling IBs!
[20588.427597] [drm] Skip scheduling IBs!
[20588.427649] [drm] Skip scheduling IBs!
[20588.427669] [drm] Skip scheduling IBs!
[20588.427680] [drm] Skip scheduling IBs!
[20588.427692] [drm] Skip scheduling IBs!
[20588.427693] [drm] Skip scheduling IBs!
[20588.427699] [drm] Skip scheduling IBs!
[20588.427703] [drm] Skip scheduling IBs!
[20588.427710] [drm] Skip scheduling IBs!
[20588.427714] amdgpu 0000:29:00.0: amdgpu: GPU reset(2) succeeded!
[20588.427719] [drm] Skip scheduling IBs!
[20588.427721] [drm] Skip scheduling IBs!
[20588.427724] [drm] Skip scheduling IBs!
[20588.427726] [drm] Skip scheduling IBs!
[20600.095254] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2768668, emitted seq=2768669
[20600.095404] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasmashell pid 1570 thread plasmashel:cs0 pid 1713
[20600.095413] amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
[20604.095435] amdgpu 0000:29:00.0: amdgpu: failed to suspend display audio
[20604.448799] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[20604.448848] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[20604.681029] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[20604.681078] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[20604.913262] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[20605.288303] Disabling lock debugging due to kernel taint
[20605.288325] mce: [Hardware Error]: Machine check events logged
[20605.288327] [Hardware Error]: Uncorrected, software restartable error.
[20605.288330] [Hardware Error]: CPU:1 (17:8:2) MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|UECC|-|Poison|-]: 0xbc002800000c0135
[20605.288335] [Hardware Error]: Error Addr: 0x00000000e8ac0000
[20605.288337] [Hardware Error]: IPID: 0x000000b000000000
[20605.288339] [Hardware Error]: Load Store Unit Ext. Error Code: 12, DC Data error type 1 and poison consumption.
[20605.288341] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
[20605.288345] mce: Uncorrected hardware memory error in user-access at e8ac0000
[20605.288347] Memory failure: 0xe8ac0: memory outside kernel control
[20605.288348] mce: Memory error not recovered
[20605.288361] amdgpu 0000:29:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0x8ac0000 flags=0x0000]
[20605.288375] amdgpu 0000:29:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0x8ac0000 flags=0x0000]
[20607.031477] amdgpu 0000:29:00.0: amdgpu: GPU reset succeeded, trying to resume
[20607.031591] [drm] PCIE GART of 512M enabled (table at 0x0000008000E10000).
[20607.031613] [drm] VRAM is lost due to GPU reset!
[20607.034094] [drm] PSP is resuming...
[20607.204092] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
[20607.273093] amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta ucode is not available
[20607.279097] amdgpu: SMU is resuming...
[20607.281035] amdgpu: SMU is resumed successfully!
[20607.397649] [drm] kiq ring mec 2 pipe 1 q 0
[20607.402090] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[20607.402494] [drm] JPEG decode initialized successfully.
[20607.402540] amdgpu 0000:29:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[20607.402542] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[20607.402544] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[20607.402546] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[20607.402548] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[20607.402549] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[20607.402551] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[20607.402553] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[20607.402554] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[20607.402556] amdgpu 0000:29:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[20607.402558] amdgpu 0000:29:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[20607.402559] amdgpu 0000:29:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[20607.402561] amdgpu 0000:29:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[20607.402563] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[20607.402564] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
[20607.402566] amdgpu 0000:29:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[20607.405742] [drm] recover vram bo from shadow start
[20607.409317] [drm] recover vram bo from shadow done
[20607.409320] [drm] Skip scheduling IBs!
[20607.409410] amdgpu 0000:29:00.0: amdgpu: GPU reset(4) succeeded!
[20607.493800] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* got no status for stream 00000000fbb3d792 on acrtc00000000bb69f545
[20607.494597] ------------[ cut here ]------------
[20607.494599] WARNING: CPU: 10 PID: 999 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7429 amdgpu_dm_atomic_commit_tail+0x1ada/0x22b0 [amdgpu]
[20607.494599] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse joydev mousedev input_leds hid_generic usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev snd_usbmidi_lib snd_rawmidi snd_seq_device mc rfkill squashfs nls_iso8859_1 snd_hda_codec_realtek nls_cp437 vfat snd_hda_codec_generic fat ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg loop snd_hda_codec edac_mce_amd amd_energy snd_hda_core kvm_amd snd_hwdep kvm wmi_bmof snd_pcm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_timer aesni_intel r8169 snd crypto_simd realtek cryptd ccp glue_helper sp5100_tco k10temp soundcore libphy i2c_piix4 rng_core pcspkr wmi evdev mac_hid pinctrl_amd gpio_amdpt acpi_cpufreq uinput sg crypto_user ip_tables x_tables xhci_pci xhci_pci_renesas xhci_hcd amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm
[20607.494633] CPU: 10 PID: 999 Comm: Xorg Tainted: G   M              5.8.0-rc1-MANJARO+ #2
[20607.494634] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
[20607.494635] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x1ada/0x22b0 [amdgpu]
[20607.494636] Code: 8b bd e8 fc ff ff e8 d5 7f 10 00 48 85 c0 0f 85 23 e9 ff ff 49 8b b5 e8 01 00 00 4c 89 e2 48 c7 c7 e0 5c 91 c0 e8 f6 74 d0 ff <0f> 0b 49 8b 4f 08 e9 10 e9 ff ff 49 8b 45 00 48 8d b8 78 01 00 00
[20607.494637] RSP: 0018:ffffa6b781987838 EFLAGS: 00010246
[20607.494638] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[20607.494639] RDX: 0000000000000000 RSI: ffffffffaaf63047 RDI: 00000000ffffffff
[20607.494640] RBP: ffffa6b781987ba8 R08: 000000000000053e R09: 0000000000000001
[20607.494641] R10: 0000000000000000 R11: 0000000000000001 R12: ffff941201964000
[20607.494641] R13: ffff9410db79d400 R14: ffff94110b71bc00 R15: ffff9410fcc69880
[20607.494642] FS:  00007f87fbe2be80(0000) GS:ffff94120ea80000(0000) knlGS:0000000000000000
[20607.494643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20607.494644] CR2: 0000000000fb1fe8 CR3: 0000000402700000 CR4: 00000000003406e0
[20607.494644] Call Trace:
[20607.494644]  ? sched_clock+0x5/0x10
[20607.494645]  ? irqtime_account_irq+0x90/0xc0
[20607.494646]  ? preempt_count_add+0x68/0xa0
[20607.494646]  commit_tail+0x94/0x130 [drm_kms_helper]
[20607.494647]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[20607.494648]  drm_atomic_helper_update_plane+0xe9/0x140 [drm_kms_helper]
[20607.494648]  drm_mode_cursor_universal+0x128/0x240 [drm]
[20607.494649]  drm_mode_cursor_common+0x102/0x230 [drm]
[20607.494650]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[20607.494650]  drm_ioctl_kernel+0xb2/0x100 [drm]
[20607.494651]  drm_ioctl+0x208/0x360 [drm]
[20607.494651]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[20607.494652]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[20607.494652]  ksys_ioctl+0x82/0xc0
[20607.494653]  __x64_sys_ioctl+0x16/0x20
[20607.494653]  do_syscall_64+0x44/0x70
[20607.494654]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20607.494655] RIP: 0033:0x7f87fca658eb
[20607.494655] Code: Bad RIP value.
[20607.494656] RSP: 002b:00007ffc20a98628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[20607.494657] RAX: ffffffffffffffda RBX: 00007ffc20a98660 RCX: 00007f87fca658eb
[20607.494658] RDX: 00007ffc20a98660 RSI: 00000000c02464bb RDI: 000000000000000d
[20607.494659] RBP: 00000000c02464bb R08: 000055c87121c270 R09: 000000000000007f
[20607.494659] R10: 0000000000000a00 R11: 0000000000000246 R12: 000055c87109aad0
[20607.494660] R13: 000000000000000d R14: 0000000000000004 R15: 000055c87109b210
[20607.494661] ---[ end trace 96f7cc95700c9634 ]---
[20610.652685] GpuWatchdog[5225]: segfault at 0 ip 000055f7f6e6f76d sp 00007fa63e0b05d0 error 6 in chrome[55f7f27c2000+785b000]
[20610.652696] Code: Bad RIP value.
[20610.652994] audit: type=1701 audit(1592593154.666:113): auid=1000 uid=1000 gid=1000 ses=2 subj==unconfined pid=5147 comm="GpuWatchdog" exe="/opt/google/chrome/chrome" sig=11 res=1
[20610.674438] audit: type=1334 audit(1592593154.687:114): prog-id=15 op=LOAD
[20610.674597] audit: type=1334 audit(1592593154.687:115): prog-id=16 op=LOAD
[20610.675951] audit: type=1130 audit(1592593154.688:116): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-coredump@0-10631-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20611.663071] audit: type=1131 audit(1592593155.675:117): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-coredump@0-10631-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20611.701231] audit: type=1334 audit(1592593155.714:118): prog-id=16 op=UNLOAD
[20611.701236] audit: type=1334 audit(1592593155.714:119): prog-id=15 op=UNLOAD
[20617.685151] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[20617.694549] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[20627.925351] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[20638.165634] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:80:DP-2] flip_done timed out
[20648.405154] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:55:plane-5] flip_done timed out
[20658.645157] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:61:plane-7] flip_done timed out
[20658.646471] ------------[ cut here ]------------
[20658.646473] WARNING: CPU: 10 PID: 999 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7016 amdgpu_dm_atomic_commit_tail+0x2139/0x22b0 [amdgpu]
[20658.646474] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse joydev mousedev input_leds hid_generic usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev snd_usbmidi_lib snd_rawmidi snd_seq_device mc rfkill squashfs nls_iso8859_1 snd_hda_codec_realtek nls_cp437 vfat snd_hda_codec_generic fat ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg loop snd_hda_codec edac_mce_amd amd_energy snd_hda_core kvm_amd snd_hwdep kvm wmi_bmof snd_pcm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_timer aesni_intel r8169 snd crypto_simd realtek cryptd ccp glue_helper sp5100_tco k10temp soundcore libphy i2c_piix4 rng_core pcspkr wmi evdev mac_hid pinctrl_amd gpio_amdpt acpi_cpufreq uinput sg crypto_user ip_tables x_tables xhci_pci xhci_pci_renesas xhci_hcd amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm
[20658.646503] CPU: 10 PID: 999 Comm: Xorg Tainted: G   M    W         5.8.0-rc1-MANJARO+ #2
[20658.646504] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
[20658.646505] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2139/0x22b0 [amdgpu]
[20658.646506] Code: 22 ef ff ff 41 8b 4c 24 60 48 c7 c2 20 bc 89 c0 bf 02 00 00 00 48 c7 c6 88 58 91 c0 e8 e0 6d d0 ff 49 8b 4f 08 e9 8f e0 ff ff <0f> 0b e9 0a f0 ff ff 0f 0b 0f 0b e9 21 f0 ff ff 48 8b 85 f0 fc ff
[20658.646506] RSP: 0018:ffffa6b781987948 EFLAGS: 00010002
[20658.646507] RAX: 0000000000000286 RBX: 0000000000000bfc RCX: 0000000000000000
[20658.646508] RDX: 0000000000000002 RSI: 0000000000000206 RDI: 0000000000000000
[20658.646509] RBP: ffffa6b781987cb8 R08: 0000000000000005 R09: 0000000000000000
[20658.646509] R10: ffffa6b7819878b0 R11: ffffa6b7819878b4 R12: 0000000000000286
[20658.646510] R13: ffff941201964000 R14: ffff9410db79c000 R15: ffff9410fcc69600
[20658.646511] FS:  00007f87fbe2be80(0000) GS:ffff94120ea80000(0000) knlGS:0000000000000000
[20658.646511] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20658.646512] CR2: 00001a0ee45cb008 CR3: 0000000402700000 CR4: 00000000003406e0
[20658.646512] Call Trace:
[20658.646513]  commit_tail+0x94/0x130 [drm_kms_helper]
[20658.646514]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[20658.646514]  drm_mode_obj_set_property_ioctl+0x156/0x320 [drm]
[20658.646515]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646515]  drm_ioctl_kernel+0xb2/0x100 [drm]
[20658.646516]  drm_ioctl+0x208/0x360 [drm]
[20658.646516]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646517]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[20658.646517]  ksys_ioctl+0x82/0xc0
[20658.646518]  __x64_sys_ioctl+0x16/0x20
[20658.646518]  do_syscall_64+0x44/0x70
[20658.646519]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20658.646519] RIP: 0033:0x7f87fca658eb
[20658.646520] Code: Bad RIP value.
[20658.646520] RSP: 002b:00007ffc20a995c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[20658.646521] RAX: ffffffffffffffda RBX: 00007ffc20a99600 RCX: 00007f87fca658eb
[20658.646522] RDX: 00007ffc20a99600 RSI: 00000000c01864ba RDI: 000000000000000d
[20658.646523] RBP: 00000000c01864ba R08: 000000000000006c R09: 00000000cccccccc
[20658.646523] R10: 0000000000000fff R11: 0000000000000246 R12: 000055c87121db90
[20658.646524] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000003
[20658.646525] ---[ end trace 96f7cc95700c9635 ]---
[20658.646525] ------------[ cut here ]------------
[20658.646526] WARNING: CPU: 10 PID: 999 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6613 amdgpu_dm_atomic_commit_tail+0x2142/0x22b0 [amdgpu]
[20658.646527] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse joydev mousedev input_leds hid_generic usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev snd_usbmidi_lib snd_rawmidi snd_seq_device mc rfkill squashfs nls_iso8859_1 snd_hda_codec_realtek nls_cp437 vfat snd_hda_codec_generic fat ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg loop snd_hda_codec edac_mce_amd amd_energy snd_hda_core kvm_amd snd_hwdep kvm wmi_bmof snd_pcm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_timer aesni_intel r8169 snd crypto_simd realtek cryptd ccp glue_helper sp5100_tco k10temp soundcore libphy i2c_piix4 rng_core pcspkr wmi evdev mac_hid pinctrl_amd gpio_amdpt acpi_cpufreq uinput sg crypto_user ip_tables x_tables xhci_pci xhci_pci_renesas xhci_hcd amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm
[20658.646556] CPU: 10 PID: 999 Comm: Xorg Tainted: G   M    W         5.8.0-rc1-MANJARO+ #2
[20658.646557] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
[20658.646557] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2142/0x22b0 [amdgpu]
[20658.646558] Code: 48 c7 c2 20 bc 89 c0 bf 02 00 00 00 48 c7 c6 88 58 91 c0 e8 e0 6d d0 ff 49 8b 4f 08 e9 8f e0 ff ff 0f 0b e9 0a f0 ff ff 0f 0b <0f> 0b e9 21 f0 ff ff 48 8b 85 f0 fc ff ff 48 8d 8d 64 fd ff ff 48
[20658.646559] RSP: 0018:ffffa6b781987948 EFLAGS: 00010082
[20658.646560] RAX: 0000000000000001 RBX: 0000000000000bfc RCX: 0000000000000000
[20658.646561] RDX: 0000000000000002 RSI: 0000000000000206 RDI: 0000000000000000
[20658.646561] RBP: ffffa6b781987cb8 R08: 0000000000000005 R09: 0000000000000000
[20658.646562] R10: ffffa6b7819878b0 R11: ffffa6b7819878b4 R12: 0000000000000286
[20658.646563] R13: ffff941201964000 R14: ffff9410db79c000 R15: ffff9410fcc69600
[20658.646563] FS:  00007f87fbe2be80(0000) GS:ffff94120ea80000(0000) knlGS:0000000000000000
[20658.646564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20658.646564] CR2: 00001a0ee45cb008 CR3: 0000000402700000 CR4: 00000000003406e0
[20658.646565] Call Trace:
[20658.646565]  commit_tail+0x94/0x130 [drm_kms_helper]
[20658.646566]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[20658.646567]  drm_mode_obj_set_property_ioctl+0x156/0x320 [drm]
[20658.646567]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646568]  drm_ioctl_kernel+0xb2/0x100 [drm]
[20658.646568]  drm_ioctl+0x208/0x360 [drm]
[20658.646569]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646569]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[20658.646570]  ksys_ioctl+0x82/0xc0
[20658.646570]  __x64_sys_ioctl+0x16/0x20
[20658.646571]  do_syscall_64+0x44/0x70
[20658.646571]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20658.646572] RIP: 0033:0x7f87fca658eb
[20658.646572] Code: Bad RIP value.
[20658.646573] RSP: 002b:00007ffc20a995c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[20658.646574] RAX: ffffffffffffffda RBX: 00007ffc20a99600 RCX: 00007f87fca658eb
[20658.646574] RDX: 00007ffc20a99600 RSI: 00000000c01864ba RDI: 000000000000000d
[20658.646575] RBP: 00000000c01864ba R08: 000000000000006c R09: 00000000cccccccc
[20658.646576] R10: 0000000000000fff R11: 0000000000000246 R12: 000055c87121db90
[20658.646576] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000003
[20658.646577] ---[ end trace 96f7cc95700c9636 ]---
[20668.885142] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[20684.245559] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
[20694.485139] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:61:plane-7] flip_done timed out
Comment 35 Randune 2020-08-10 23:49:21 UTC
I've been getting "ring gfx timeouts" for some time, most of the time it's when the computer has not had any input for a while (while I'm away from it).  When it freezes I can SSH into it but when I try to do a: "shutdown -h now" it boots me out of SSH as it should but the computer never seems to actually shutdown.  The screen stays frozen with whatever was on the display when it froze.  Any help would be greatly appreciated, here is my info:

Mobo: AsRock AB350 Pro4 UEFI: 5.80
Video card: Sapphire Nitro+ RX580 (8GB)
Distro: Manjaro
Kernel: 5.7.9-1-MANJARO

Aug 09 21:33:06.054857 kernel: pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Aug 09 21:33:06.068305 kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Aug 09 21:33:06.068636 kernel: pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00200000/00000000
Aug 09 21:33:06.068863 kernel: pcieport 0000:00:03.1: AER:    [21] ACSViol                (First)
Aug 09 21:33:06.069137 kernel: amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
Aug 09 21:33:06.069421 kernel: snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
Aug 09 21:33:06.069633 kernel: pcieport 0000:00:03.1: AER: device recovery failed
Aug 09 21:33:16.258283 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=9087, emitted seq=9089
Aug 09 21:33:16.258412 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 09 21:33:16.258446 kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Aug 09 21:33:16.258741 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Aug 09 21:33:16.258773 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258803 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.258835 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258869 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.258896 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258925 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.258951 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258977 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259009 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259035 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259060 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259084 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259108 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259131 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259156 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259186 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259213 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259242 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259272 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259298 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259324 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259350 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259373 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259400 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259426 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259456 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259483 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259509 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259540 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259566 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259592 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259617 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259642 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259671 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259697 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259723 kernel: amdgpu: [powerplay] 
                                failed to send message 306 ret is 65535 
Aug 09 21:33:16.259754 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259785 kernel: amdgpu: [powerplay] 
                                failed to send message 5e ret is 65535 
Aug 09 21:33:16.259816 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259860 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:16.259913 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259947 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:16.259976 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.260003 kernel: amdgpu: [powerplay] 
                                failed to send message 148 ret is 65535 
Aug 09 21:33:16.260034 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.260061 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:16.260088 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.260114 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:16.291929 kernel: [drm] REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:955
Aug 09 21:33:16.292012 kernel: ------------[ cut here ]------------
Aug 09 21:33:16.292044 kernel: WARNING: CPU: 3 PID: 154 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:526 generic_reg_wait.cold+0x26/0x2d [amdgpu]
Aug 09 21:33:16.292070 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter squashfs loop nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev snd_usbmidi_lib snd_rawmidi snd_seq_device mc joydev mousedev input_leds wmi_bmof amdgpu snd_hda_codec_realtek snd_hda_codec_generic wl(POE) ledtrig_audio snd_hda_codec_hdmi snd_hda_intel gpu_sched i2c_algo_bit edac_mce_amd snd_intel_dspcfg ttm snd_hda_codec kvm_amd drm_kms_helper r8169 snd_hda_core kvm cfg80211 snd_hwdep snd_pcm cec realtek irqbypass rc_core snd_timer libphy syscopyarea snd rfkill sysfillrect k10temp
Aug 09 21:33:16.292112 kernel:  pcspkr sysimgblt sp5100_tco i2c_piix4 fb_sys_fops soundcore wmi evdev mac_hid gpio_amdpt pinctrl_amd acpi_cpufreq drm uinput sg crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt dm_mod uas usb_storage hid_logitech ff_memless hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ccp xhci_pci mpt3sas rng_core xhci_hcd raid_class scsi_transport_sas
Aug 09 21:33:16.292141 kernel: CPU: 3 PID: 154 Comm: kworker/3:1 Tainted: P           OE     5.7.9-1-MANJARO #1
Aug 09 21:33:16.292164 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AB350 Pro4, BIOS P5.80 06/14/2019
Aug 09 21:33:16.292188 kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Aug 09 21:33:16.292213 kernel: RIP: 0010:generic_reg_wait.cold+0x26/0x2d [amdgpu]
Aug 09 21:33:16.292240 kernel: Code: a7 41 fd ff 44 8b 44 24 24 48 8b 4c 24 18 89 ee 48 c7 c7 08 14 cd c1 8b 54 24 20 e8 7a 91 d2 f9 83 7b 20 01 0f 84 c3 52 fd ff <0f> 0b e9 bc 52 fd ff 48 c7 c7 fd 4c c8 c1 e8 f3 c2 12 fa e8 4a 29
Aug 09 21:33:16.292263 kernel: RSP: 0018:ffffab9b806c3610 EFLAGS: 00010297
Aug 09 21:33:16.292284 kernel: RAX: 0000000000000052 RBX: ffff92334ad7fa40 RCX: 0000000000000000
Aug 09 21:33:16.292306 kernel: RDX: 0000000000000000 RSI: ffff92334e8d9ac8 RDI: 00000000ffffffff
Aug 09 21:33:16.292335 kernel: RBP: 000000000000000a R08: 0000000000000561 R09: 0000000000000001
Aug 09 21:33:16.292356 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
Aug 09 21:33:16.292376 kernel: R13: 0000000000010000 R14: 0000000000004ea4 R15: 0000000000000bb9
Aug 09 21:33:16.292398 kernel: FS:  0000000000000000(0000) GS:ffff92334e8c0000(0000) knlGS:0000000000000000
Aug 09 21:33:16.292421 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 09 21:33:16.292446 kernel: CR2: 00007f494fc04000 CR3: 00000003af1ce000 CR4: 00000000003406e0
Aug 09 21:33:16.292466 kernel: Call Trace:
Aug 09 21:33:16.292485 kernel:  dce110_stream_encoder_dp_blank+0xea/0x140 [amdgpu]
Aug 09 21:33:16.292507 kernel:  core_link_disable_stream+0x9c/0x200 [amdgpu]
Aug 09 21:33:16.292525 kernel:  dce110_reset_hw_ctx_wrap+0xbe/0x240 [amdgpu]
Aug 09 21:33:16.292543 kernel:  dce110_apply_ctx_to_hw+0x4f/0x570 [amdgpu]
Aug 09 21:33:16.292560 kernel:  ? hwmgr_handle_task+0x98/0xf0 [amdgpu]
Aug 09 21:33:16.292578 kernel:  ? pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
Aug 09 21:33:16.292598 kernel:  ? dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
Aug 09 21:33:16.292619 kernel:  dc_commit_state+0x323/0x970 [amdgpu]
Aug 09 21:33:16.292640 kernel:  amdgpu_dm_atomic_commit_tail+0x38c/0x2310 [amdgpu]
Aug 09 21:33:16.292662 kernel:  ? free_one_page+0x57/0xd0
Aug 09 21:33:16.292680 kernel:  ? kfree+0x219/0x250
Aug 09 21:33:16.292698 kernel:  ? bw_calcs+0xa30/0x4380 [amdgpu]
Aug 09 21:33:16.292718 kernel:  ? dc_validate_global_state+0x2f2/0x390 [amdgpu]
Aug 09 21:33:16.292736 kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Aug 09 21:33:16.292757 kernel:  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
Aug 09 21:33:16.292776 kernel:  drm_atomic_helper_disable_all+0x175/0x190 [drm_kms_helper]
Aug 09 21:33:16.292792 kernel:  drm_atomic_helper_suspend+0x78/0x150 [drm_kms_helper]
Aug 09 21:33:16.292810 kernel:  dm_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:16.292869 kernel:  amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu]
Aug 09 21:33:16.292889 kernel:  ? _raw_spin_lock+0x13/0x30
Aug 09 21:33:16.292908 kernel:  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:16.292926 kernel:  amdgpu_device_pre_asic_reset+0x16b/0x182 [amdgpu]
Aug 09 21:33:16.292944 kernel:  amdgpu_device_gpu_recover.cold+0x42a/0xc74 [amdgpu]
Aug 09 21:33:16.292962 kernel:  amdgpu_job_timedout+0x105/0x130 [amdgpu]
Aug 09 21:33:16.292981 kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Aug 09 21:33:16.293001 kernel:  process_one_work+0x1da/0x3d0
Aug 09 21:33:16.293017 kernel:  worker_thread+0x4d/0x3e0
Aug 09 21:33:16.293036 kernel:  ? rescuer_thread+0x3f0/0x3f0
Aug 09 21:33:16.293057 kernel:  kthread+0x13e/0x160
Aug 09 21:33:16.293074 kernel:  ? __kthread_bind_mask+0x60/0x60
Aug 09 21:33:16.293097 kernel:  ret_from_fork+0x22/0x40
Aug 09 21:33:16.293123 kernel: ---[ end trace aa4b924a702f7188 ]---
Aug 09 21:33:26.298272 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 10secs aborting
Aug 09 21:33:26.298425 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing DB6E (len 824, WS 0, PS 0) @ 0xDCEE
Aug 09 21:33:26.298470 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing DA28 (len 326, WS 0, PS 0) @ 0xDB18
Aug 09 21:33:26.298505 kernel: [drm:dce110_link_encoder_disable_output [amdgpu]] *ERROR* dce110_link_encoder_disable_output: Failed to execute VBIOS command table!
Aug 09 21:33:26.298535 kernel: ------------[ cut here ]------------
Aug 09 21:33:26.298571 kernel: WARNING: CPU: 3 PID: 154 at drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_link_encoder.c:1099 dce110_link_encoder_disable_output+0x141/0x150 [amdgpu]
Aug 09 21:33:26.298607 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter squashfs loop nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev snd_usbmidi_lib snd_rawmidi snd_seq_device mc joydev mousedev input_leds wmi_bmof amdgpu snd_hda_codec_realtek snd_hda_codec_generic wl(POE) ledtrig_audio snd_hda_codec_hdmi snd_hda_intel gpu_sched i2c_algo_bit edac_mce_amd snd_intel_dspcfg ttm snd_hda_codec kvm_amd drm_kms_helper r8169 snd_hda_core kvm cfg80211 snd_hwdep snd_pcm cec realtek irqbypass rc_core snd_timer libphy syscopyarea snd rfkill sysfillrect k10temp
Aug 09 21:33:26.298656 kernel:  pcspkr sysimgblt sp5100_tco i2c_piix4 fb_sys_fops soundcore wmi evdev mac_hid gpio_amdpt pinctrl_amd acpi_cpufreq drm uinput sg crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt dm_mod uas usb_storage hid_logitech ff_memless hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ccp xhci_pci mpt3sas rng_core xhci_hcd raid_class scsi_transport_sas
Aug 09 21:33:26.298691 kernel: CPU: 3 PID: 154 Comm: kworker/3:1 Tainted: P        W  OE     5.7.9-1-MANJARO #1
Aug 09 21:33:26.298722 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AB350 Pro4, BIOS P5.80 06/14/2019
Aug 09 21:33:26.298753 kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Aug 09 21:33:26.298783 kernel: RIP: 0010:dce110_link_encoder_disable_output+0x141/0x150 [amdgpu]
Aug 09 21:33:26.298811 kernel: Code: 44 24 38 65 48 2b 04 25 28 00 00 00 75 20 48 83 c4 40 5b 5d 41 5c c3 48 c7 c6 60 4a c4 c1 48 c7 c7 30 f2 cb c1 e8 4f 2c bd fe <0f> 0b eb d0 e8 76 01 db f9 66 0f 1f 44 00 00 0f 1f 44 00 00 41 57
Aug 09 21:33:26.298840 kernel: RSP: 0018:ffffab9b806c3600 EFLAGS: 00010246
Aug 09 21:33:26.298865 kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
Aug 09 21:33:26.298896 kernel: RDX: 0000000000000000 RSI: 0000000000000086 RDI: 00000000ffffffff
Aug 09 21:33:26.298926 kernel: RBP: ffff923349574720 R08: 0000000000000598 R09: 0000000000000001
Aug 09 21:33:26.298954 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffffab9b806c3604
Aug 09 21:33:26.298979 kernel: R13: 0000000000000000 R14: ffff923251500000 R15: ffff92334c016900
Aug 09 21:33:26.299004 kernel: FS:  0000000000000000(0000) GS:ffff92334e8c0000(0000) knlGS:0000000000000000
Aug 09 21:33:26.299032 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 09 21:33:26.299059 kernel: CR2: 00007f494fc04000 CR3: 000000038dd62000 CR4: 00000000003406e0
Aug 09 21:33:26.299087 kernel: Call Trace:
Aug 09 21:33:26.299111 kernel:  dp_disable_link_phy+0x83/0x150 [amdgpu]
Aug 09 21:33:26.299142 kernel:  disable_link+0x4f/0xa0 [amdgpu]
Aug 09 21:33:26.299170 kernel:  core_link_disable_stream+0xd6/0x200 [amdgpu]
Aug 09 21:33:26.299203 kernel:  dce110_reset_hw_ctx_wrap+0xbe/0x240 [amdgpu]
Aug 09 21:33:26.299231 kernel:  dce110_apply_ctx_to_hw+0x4f/0x570 [amdgpu]
Aug 09 21:33:26.299258 kernel:  ? hwmgr_handle_task+0x98/0xf0 [amdgpu]
Aug 09 21:33:26.299283 kernel:  ? pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
Aug 09 21:33:26.299309 kernel:  ? dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
Aug 09 21:33:26.299361 kernel:  dc_commit_state+0x323/0x970 [amdgpu]
Aug 09 21:33:26.299392 kernel:  amdgpu_dm_atomic_commit_tail+0x38c/0x2310 [amdgpu]
Aug 09 21:33:26.299421 kernel:  ? free_one_page+0x57/0xd0
Aug 09 21:33:26.299448 kernel:  ? kfree+0x219/0x250
Aug 09 21:33:26.299476 kernel:  ? bw_calcs+0xa30/0x4380 [amdgpu]
Aug 09 21:33:26.299502 kernel:  ? dc_validate_global_state+0x2f2/0x390 [amdgpu]
Aug 09 21:33:26.299532 kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Aug 09 21:33:26.299555 kernel:  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
Aug 09 21:33:26.299581 kernel:  drm_atomic_helper_disable_all+0x175/0x190 [drm_kms_helper]
Aug 09 21:33:26.299606 kernel:  drm_atomic_helper_suspend+0x78/0x150 [drm_kms_helper]
Aug 09 21:33:26.299633 kernel:  dm_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:26.299660 kernel:  amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu]
Aug 09 21:33:26.299685 kernel:  ? _raw_spin_lock+0x13/0x30
Aug 09 21:33:26.299710 kernel:  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:26.299736 kernel:  amdgpu_device_pre_asic_reset+0x16b/0x182 [amdgpu]
Aug 09 21:33:26.299761 kernel:  amdgpu_device_gpu_recover.cold+0x42a/0xc74 [amdgpu]
Aug 09 21:33:26.299787 kernel:  amdgpu_job_timedout+0x105/0x130 [amdgpu]
Aug 09 21:33:26.299818 kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Aug 09 21:33:26.299844 kernel:  process_one_work+0x1da/0x3d0
Aug 09 21:33:26.299872 kernel:  worker_thread+0x4d/0x3e0
Aug 09 21:33:26.299898 kernel:  ? rescuer_thread+0x3f0/0x3f0
Aug 09 21:33:26.299925 kernel:  kthread+0x13e/0x160
Aug 09 21:33:26.299951 kernel:  ? __kthread_bind_mask+0x60/0x60
Aug 09 21:33:26.299979 kernel:  ret_from_fork+0x22/0x40
Aug 09 21:33:26.300004 kernel: ---[ end trace aa4b924a702f7189 ]---
Aug 09 21:33:36.301609 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 10secs aborting
Aug 09 21:33:36.301729 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C51A (len 62, WS 0, PS 0) @ 0xC536
Aug 09 21:33:36.334815 kernel: [drm] REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:955
Aug 09 21:33:46.338270 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 10secs aborting
Aug 09 21:33:46.338400 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing DB6E (len 824, WS 0, PS 0) @ 0xDCEE
Aug 09 21:33:46.338434 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing DA28 (len 326, WS 0, PS 0) @ 0xDB18
Aug 09 21:33:46.338466 kernel: [drm:dce110_link_encoder_disable_output [amdgpu]] *ERROR* dce110_link_encoder_disable_output: Failed to execute VBIOS command table!
Aug 09 21:33:56.339196 plasmashell[1403]: qrc:/plasma/plasmoids/org.kde.plasma.volume/contents/ui/ListItemBase.qml:151: TypeError: Cannot read property 'ports' of undefined
Aug 09 21:33:56.346378 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 10secs aborting
Aug 09 21:33:56.346481 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C51A (len 62, WS 0, PS 0) @ 0xC536
Aug 09 21:33:56.346519 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:56.346572 kernel: amdgpu: [powerplay] 
                                failed to send message 148 ret is 65535 
Aug 09 21:33:56.346606 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:56.346632 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:56.346659 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:56.346692 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:56.345571 plasmashell[1403]: qrc:/plasma/plasmoids/org.kde.plasma.volume/contents/ui/main.qml:550:39: QML DeviceListItem: Binding loop detected for property "width"
Aug 09 21:33:56.591481 kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vce_v3_0> failed -110
Aug 09 21:33:57.054823 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.054914 kernel: amdgpu: [powerplay] 
                                failed to send message 133 ret is 65535 
Aug 09 21:33:57.054952 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.054971 kernel: amdgpu: [powerplay] 
                                failed to send message 306 ret is 65535 
Aug 09 21:33:57.054990 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055010 kernel: amdgpu: [powerplay] 
                                failed to send message 5e ret is 65535 
Aug 09 21:33:57.055027 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055047 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:57.055064 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055080 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:57.055097 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055113 kernel: amdgpu: [powerplay] 
                                failed to send message 148 ret is 65535 
Aug 09 21:33:57.055134 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055151 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:57.055165 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055180 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:57.055195 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055208 kernel: amdgpu: [powerplay] 
                                failed to send message 16a ret is 65535 
Aug 09 21:33:57.055225 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055238 kernel: amdgpu: [powerplay] 
                                failed to send message 186 ret is 65535 
Aug 09 21:33:57.055253 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055267 kernel: amdgpu: [powerplay] 
                                failed to send message 54 ret is 65535 
Aug 09 21:33:57.558146 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558240 kernel: amdgpu: [powerplay] 
                                failed to send message 26b ret is 65535 
Aug 09 21:33:57.558260 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558279 kernel: amdgpu: [powerplay] 
                                failed to send message 13d ret is 65535 
Aug 09 21:33:57.558297 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558313 kernel: amdgpu: [powerplay] 
                                failed to send message 14f ret is 65535 
Aug 09 21:33:57.558329 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558342 kernel: amdgpu: [powerplay] 
                                failed to send message 151 ret is 65535 
Aug 09 21:33:57.558356 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558369 kernel: amdgpu: [powerplay] 
                                failed to send message 135 ret is 65535 
Aug 09 21:33:57.558384 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558398 kernel: amdgpu: [powerplay] 
                                failed to send message 190 ret is 65535 
Aug 09 21:33:57.558415 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558428 kernel: amdgpu: [powerplay] 
                                failed to send message 63 ret is 65535 
Aug 09 21:33:57.558442 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558454 kernel: amdgpu: [powerplay] 
                                failed to send message 84 ret is 65535 
Aug 09 21:33:57.558468 kernel: amdgpu: [powerplay] Failed to force to switch arbf0!
Aug 09 21:33:57.558485 kernel: amdgpu: [powerplay] [disable_dpm_tasks] Failed to disable DPM!
Aug 09 21:33:57.558502 kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -22
Aug 09 21:33:57.811494 kernel: amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Aug 09 21:33:57.811816 kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Aug 09 21:33:58.314928 kernel: cp is busy, skip halt cp
Aug 09 21:33:58.564823 kernel: rlc is busy, skip halt rlc
Aug 09 21:33:58.818145 kernel: amdgpu 0000:0a:00.0: GPU BACO reset
Aug 09 21:34:59.601512 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 10secs aborting
Aug 09 21:34:59.601664 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C51A (len 62, WS 0, PS 0) @ 0xC536
Aug 09 21:34:59.601700 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing ADA0 (len 142, WS 0, PS 8) @ 0xADBB
Aug 09 21:34:59.601732 kernel: [drm] asic atom init failed!
Aug 09 21:34:59.601767 kernel: amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
Aug 09 21:34:59.851491 kernel: amdgpu 0000:0a:00.0: Wait for MC idle timedout !
Aug 09 21:35:00.101588 kernel: amdgpu 0000:0a:00.0: Wait for MC idle timedout !
Aug 09 21:35:00.104823 kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F4007E9000).
Aug 09 21:35:00.104893 kernel: [drm] VRAM is lost due to GPU reset!
Aug 09 21:35:00.121493 kernel: amdgpu: [powerplay] Failed to send Message.
Aug 09 21:35:00.121580 kernel: amdgpu: [powerplay] SMC address must be 4 byte aligned.
Aug 09 21:35:00.121616 kernel: amdgpu: [powerplay] [AVFS][Polaris10_SetupGfxLvlStruct] Problems copying VRConfig value over to SMC
Aug 09 21:35:00.121645 kernel: amdgpu: [powerplay] [AVFS][Polaris10_AVFSEventMgr] Could not Copy Graphics Level table over to SMU
Aug 09 21:35:00.121672 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121706 kernel: amdgpu: [powerplay] 
                                failed to send message 252 ret is 65535 
Aug 09 21:35:00.121740 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121767 kernel: amdgpu: [powerplay] 
                                failed to send message 253 ret is 65535 
Aug 09 21:35:00.121796 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121822 kernel: amdgpu: [powerplay] 
                                failed to send message 250 ret is 65535 
Aug 09 21:35:00.121853 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121879 kernel: amdgpu: [powerplay] 
                                failed to send message 251 ret is 65535 
Aug 09 21:35:00.121911 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121940 kernel: amdgpu: [powerplay] 
                                failed to send message 254 ret is 65535 
Aug 09 21:35:00.374824 kernel: [drm] Timeout wait for RLC serdes 0,0
Aug 09 21:35:00.624828 kernel: amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Aug 09 21:35:00.625100 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
Aug 09 21:35:00.625130 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625152 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625166 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625180 kernel: amdgpu 0000:0a:00.0: GPU reset(2) failed
Aug 09 21:35:00.625307 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625320 kernel: amdgpu 0000:0a:00.0: GPU reset end with ret = -110
Aug 09 21:35:10.818142 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=9089, emitted seq=9089
Aug 09 21:35:10.818255 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 09 21:35:10.818280 kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Comment 36 Evgeniy A. Dushistov 2020-09-01 14:00:26 UTC
Linux kernel 5.4.61/amd64 / 
Radeon RX 560 got the same problem today:

[86631.543134] [drm] Fence fallback timer expired on ring gfx
[86642.133543] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1349762, emitted seq=1349767
[86642.133628] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 8032 thread Xorg:cs0 pid 8155
[86642.133634] amdgpu 0000:41:00.0: GPU reset begin!
[86642.134073] amdgpu: [powerplay] 
                last message was failed ret is 65535
[86642.134075] amdgpu: [powerplay] 
                failed to send message 281 ret is 65535 

I have never seen a similar problem before.
Comment 37 juan.zenos 2020-09-13 11:14:47 UTC
I have this problem with 2 different brand new rx580s in a brand new asus prime-p x570 and an old asus p9x79 with various ubuntu 20.04 kernels 5.4.x - 5.8.x - ...

I wanted to play these games on Linux so badly, the heartbreaking solution is to purchase a windows license... ;_;