Bug 202263

Summary: AMDGPU: DRM couldn't schedule ib on ring (-22)
Product: Drivers Reporter: Emre Işık (e.isik27)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: high CC: alexdeucher, e.isik27
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.19.12-1-default Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg.log
Xorg.0.log
possible fix
after_patch_dmesg.log

Description Emre Işık 2019-01-14 15:17:10 UTC
Hello everyone!

I have some troubles with the amdgpu driver.

I'm getting some errors from the kernel when I force to use the discrete GPU.
By the way, I have a hybrid GPU System. (Lenovo Laptop: Intel HD & AMD Radeon 530)  

First of all I cleared the dmesg logs.

Then I executed:

DRI_PRIME=1 glxinfo | grep OpenGL

and then I got this errors:

https://pastebin.com/iaejbQ2t (for full dmesg)

[  422.628116] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628146] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.628147] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628182] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.628185] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628212] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.628214] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628241] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.628243] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628271] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.628273] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628300] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.628301] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628328] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.628330] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628357] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
[  422.629694] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.629743] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)

But when I set the DRI_PRIME value to 0, this errors doesn't appear.

I also tried with DRI_PRIME=1 glxgears.

There I get TONS of errors like this here:
[  422.628301] amdgpu 0000:01:00.0: couldn't schedule ib on ring <sdma0>
[  422.628328] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)

I am using the latest version of OpenSUSE tumbleweed.
Kernel: 4.19.12-1-default

amdgpu is installed & kernel-firmware, too.

Thanks!!
Comment 1 Alex Deucher 2019-01-14 15:24:46 UTC
Please attach your full dmesg output form boot (so it doesn't get lost when the pastebin goes away) and xorg log (if using X).
Comment 2 Emre Işık 2019-01-15 07:07:16 UTC
Created attachment 280479 [details]
dmesg.log
Comment 3 Emre Işık 2019-01-15 07:07:49 UTC
Created attachment 280481 [details]
Xorg.0.log
Comment 4 Emre Işık 2019-01-15 07:08:48 UTC
(In reply to Alex Deucher from comment #1)
> Please attach your full dmesg output form boot (so it doesn't get lost when
> the pastebin goes away) and xorg log (if using X).

Okey I uploaded the both log files here.
Thanks for the fast reply!
Comment 5 Emre Işık 2019-01-15 08:30:58 UTC
I found this line in my dmesg logs:

> [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than
> 5secs aborting

So I searched and found some threads about TLP and AMDGPU problems.
They told they removed TLP, so I (just for testing) removed TLP from my Laptop system and then the errors are gone.
It worked. But TLP shouldn't be removed because it is responsible about the power managment. Should I just install "tuned" or other TLP-like programs or is this error a kernel issue?

Thanks, again!
Comment 6 Alex Deucher 2019-01-15 17:11:44 UTC
Created attachment 280505 [details]
possible fix

Does this patch fix the issue?
Comment 7 Emre Işık 2019-01-16 08:41:10 UTC
Created attachment 280531 [details]
after_patch_dmesg.log

Thanks Alex for your support!

I recompiled the kernel 4.19.12 with the your patch.
And reinstalled TLP & rebooted the system.

When I execute DRI_PRIME=1 glxgears everything works!
glxinfo say's that I am using the AMD GPU.

And dmesg doesn't have any errors anymore.
See attachment.

Thanks for the patch.
Do you gonna commit this patch to the official kernel?
Maybe others have the same error like me.

Thanks, again!
Comment 8 Alex Deucher 2019-01-16 16:02:57 UTC
(In reply to Emre Işık from comment #7)
> Thanks for the patch.
> Do you gonna commit this patch to the official kernel?
> Maybe others have the same error like me.

Yes, I'll make sure the patches gets upstream and into stable kernels.  Thanks for the report and testing.
Comment 9 Emre Işık 2019-01-16 16:12:55 UTC
Thank you, too, for writing the patch so quickly.
It has been a great pleasure to cooperate with you, Alex Deucher.

It's a great thing helping other people's.

Which u a nice day!

Best regards,
Emre Isik