Bug 191281

Summary: [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on ring 12 (-110)
Product: Drivers Reporter: Johannes Hirte (johannes.hirte)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.10-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: full dmesg output
dmesg output with RX460

Description Johannes Hirte 2016-12-27 21:33:48 UTC
With kernel 4.10-rc1 I get the following error on Carrizo:

[    5.414764] Console: switching to colour frame buffer device 240x67
[    5.419628] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[    5.426001] [drm] ib test on ring 0 succeeded
[    5.426315] [drm] ib test on ring 1 succeeded
[    5.426384] [drm] ib test on ring 2 succeeded
[    5.426426] [drm] ib test on ring 3 succeeded
[    5.426464] [drm] ib test on ring 4 succeeded
[    5.426506] [drm] ib test on ring 5 succeeded
[    5.426545] [drm] ib test on ring 6 succeeded
[    5.426583] [drm] ib test on ring 7 succeeded
[    5.426623] [drm] ib test on ring 8 succeeded
[    5.426657] [drm] ib test on ring 9 succeeded
[    5.426688] [drm] ib test on ring 10 succeeded
[    6.453373] [drm] ib test on ring 11 succeeded
[    7.688045] [drm:amdgpu_vce_ring_test_ib] *ERROR* amdgpu: IB test timed out.
[    7.688088] [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on ring 12 (-110).
[    7.688122] [drm:amdgpu_device_init] *ERROR* ib ring test failed (-110).
[    7.688268] [ powerplay ] min_core_set_clock not set
[    8.397417] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:00:01.0 on minor 0

Bisecting was a pain in the ass this time cause of three other bugs. But I was able to track this down go:

commit ecc2cf7cc8baa1fdb73a7bb9495f6befbcac8cd8
Author: Maruthi Srinivas Bayyavarapu <Maruthi.Bayyavarapu@amd.com>
Date:   Thu Nov 17 17:29:50 2016 +0530

    drm/amdgpu: enable VCE clockgating in Polaris-10/11
    
    VCE clocks are set to be disabled, when not in use.
    
    Signed-off-by: Maruthi Bayyavarapu <maruthi.bayyavarapu@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Simply reverting this wasn't possible due to conflicts, but only reverting this part

diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 39f03f137a56..6b3293a1c7b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -134,7 +134,7 @@ static void vce_v3_0_set_vce_sw_clock_gating(struct amdgpu_device *adev,
           accessible but the firmware will throttle the clocks on the
           fly as necessary.
        */
-       if (gated) {
+       if (!gated) {
                data = RREG32(mmVCE_CLOCK_GATING_B);
                data |= 0x1ff;
                data &= ~0xef0000;

made my system boot as before.
Comment 1 Johannes Hirte 2016-12-27 21:35:12 UTC
Created attachment 248741 [details]
full dmesg output
Comment 2 fin4478 2017-01-07 13:15:18 UTC
Created attachment 250711 [details]
dmesg output with RX460

I have same errors with Gigabyte RX460 and ~agd5f/linux/log/drivers/gpu/drm/amd?h=drm-next-4.10-wip kernel that I cloned today. Computer seems to work normally, but booting is 3 seconds slower because of this and cpu firmware bug traces.
Comment 3 Johannes Hirte 2017-01-08 21:29:40 UTC
With amdgpu.dpm=0 this doesn't occur. Also tested with amdgpu.powerplay=0, but it didn't help. I don't know about the meaning of the values applied in vce_v3_0_set_vce_sw_clock_gating(), but just inverting the "if (gated)" looks wrong to me.
Comment 4 Johannes Hirte 2017-01-10 19:14:11 UTC
I can confirm that https://lists.freedesktop.org/archives/amd-gfx/2017-January/004537.html fixes boot for me. Tested on top of linux-4.10.0-rc3-00029-gbd5d7428f5e5
Comment 5 Johannes Hirte 2017-06-21 18:57:49 UTC
fixed -> closing