Bug 209457 - AMDGPU resume fail with RX 580 GPU
Summary: AMDGPU resume fail with RX 580 GPU
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-01 17:15 UTC by Robert M. Muncrief
Modified: 2021-12-17 12:14 UTC (History)
12 users (show)

See Also:
Kernel Version: 5.8.12 to 5.10-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg resume fail output with kernel 5.9.0-rc6 (177.08 KB, text/plain)
2020-10-01 17:15 UTC, Robert M. Muncrief
Details
Resume failure, full dmesg output from kernel 5.8.5 (89.47 KB, text/plain)
2020-10-01 18:36 UTC, Robert M. Muncrief
Details
Full dmesg resume fail output for kernel 5.8.12 (195.11 KB, text/plain)
2020-10-02 01:25 UTC, Robert M. Muncrief
Details
Full Xorg resume fail output for kernel 5.8.12 (52.87 KB, text/plain)
2020-10-02 01:26 UTC, Robert M. Muncrief
Details
dmesg output when booting with kernel 5.9, suspending, then resuming (137.32 KB, text/plain)
2020-11-11 16:54 UTC, fawz
Details
Kernel crash log for kernel 5.10.x (23.52 KB, text/plain)
2021-06-16 20:57 UTC, Leandro Jacques
Details
amdgpu crash log for kernel 5.4.126 (99.29 KB, text/plain)
2021-06-18 18:22 UTC, Leandro Jacques
Details
Linux Firmware version info (1.09 KB, text/plain)
2021-06-22 17:16 UTC, Leandro Jacques
Details
Linux Firmware version info 20210511.7685cf4 (12.88 KB, text/plain)
2021-07-14 14:51 UTC, Leandro Jacques
Details
Linux Firmware version info 20210511.7685cf4 (1.34 KB, text/plain)
2021-07-14 14:54 UTC, Leandro Jacques
Details
Kernel crash log for linux firmware version 20210511.7685cf4 (12.88 KB, text/plain)
2021-07-14 14:55 UTC, Leandro Jacques
Details

Description Robert M. Muncrief 2020-10-01 17:15:29 UTC
Created attachment 292739 [details]
dmesg resume fail output with kernel 5.9.0-rc6

I've been having random resume problems from around kernel 5.5, and it persists even up to 5.9-rc6. When this occurs I can still login to SSH and give a reboot command, but though SSH disconnects my computer doesn't reboot and I have to press the reset button.  
  
I have an ASUS Gaming TUF X570 motherboard, R7 3700X CPU, RX 580 GPU, and 16GB of RAM.  
  
The primary error recorded over and over in dmesg is:  
  
[xxxxx.xxxxxx] amdgpu:  
                failed to send message 201 ret is 65535  
[xxxxx.xxxxxx] amdgpu:  
                last message was failed ret is 65535  
  
I've included the part of dmesg beginning with suspend event through the resume failure for kernel 5.9-rc6.
Comment 1 Alex Deucher 2020-10-01 17:56:45 UTC
Please attach your full dmesg output from boot.  Can you bisect?
Comment 2 Robert M. Muncrief 2020-10-01 18:36:05 UTC
Created attachment 292741 [details]
Resume failure, full dmesg output from kernel 5.8.5

The last full dmesg output I have is from kernel 5.8.5, and I've attached it to this response. However the messages haven't changed since then.  
  
Going forward would you rather I run the current 5.8 (on arch it's 5.8.12) or the 5.9 RC release candidates (currently 5.9-rc6) to capture the next event?  
  
I can bisect, but don't know how to bisect a random issue like this. It's difficult to say how often it happens, but I'd estimate one out of seven to twelve times.  
  
I actually tried purposely going through multiple suspend/resume cycles sometime ago in hopes of gathering more info for a bug report, but got to 20 cycles with no errors so I gave up. So it seems the issue only occurs if my computer has been suspended for a significant period of time, as it only occurs when my computer has been suspended overnight.  
  
It's also significant to note that I have two identical XFX Radeon RX 580 GTS XXX Edition GPUs, and one is passed through via VFIO at boot.  
  
In any case I'll be happy to assist on this issue in any way I can. I've seen multiple complaints about it online, but saw other bug reports that I assumed were already addressing it or I would have filed a new bug report sooner. I wasn't aware of my error until this morning.
Comment 3 Alex Deucher 2020-10-01 18:56:08 UTC
Looks like you attached the wrong file?  Can you elaborate on how you use your GPUs?  If you take vfio out of the picture, do you still have the issues?
Comment 4 Robert M. Muncrief 2020-10-01 19:36:54 UTC
You are correct, the restored 5.8.5 dmesg output doesn't have the full output either, and it's the only other output I can find in my backups. I apologize for my error.  
  
Unfortunately I can't remove my VFIO setup for any extended period of time because I'm working on a project with other musicians that demands I use my Windows 10 VM daily for software that has no Linux alternative. There is other almost-equivalent software that could have been used (which I actually prefer) but the other musicians aren't willing to switch to Linux. In their defense they did all try quite awhile ago, but it was just to difficult for them, and their frustration ended up causing anger and contention among our group.  
  
In any case here's my VFIO passthrough setup:  
  
/etc/default/grub boot command line:  
  
GRUB_CMDLINE_LINUX_DEFAULT="quiet loglevel=3 video=efifb:off audit=0 acpi_enforce_resources=lax rd.modules-load=vfio-pci amd_iommu=on iommu=
pt"  
  
  
/etc/modprobe.d/kvm.conf:  
  
options kvm_amd avic=1  
  
  
/etc/modprobe.d/vfio.conf:  
  
options vfio-pci disable_vga=1  
softdep amdgpu pre: vfio-pci  
softdep radeon pre: vfio-pci  
softdep ahci pre: vfio-pci  
softdep xhci_pci pre: vfio-pci  
install vfio-pci /usr/local/bin/vfio-pci-override.sh  
  
  
/usr/local/bin/vfio-pci-override.sh  
  
```
#!/bin/sh

DEVS="0000:0b:00.0 0000:0b:00.1"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
    done
fi

modprobe -i vfio-pci
```
Comment 5 Robert M. Muncrief 2020-10-02 01:25:09 UTC
Created attachment 292753 [details]
Full dmesg resume fail output for kernel 5.8.12

I suspended my computer during dinner and when I tried to resume it failed. I've attached the full dmesg output to this message. The full Xorg log will follow.
Comment 6 Robert M. Muncrief 2020-10-02 01:26:12 UTC
Created attachment 292755 [details]
Full Xorg resume fail output for kernel 5.8.12

Here is the Xorg.0.log log output for the resume fail.
Comment 7 Robert M. Muncrief 2020-10-14 20:07:46 UTC
This bug still persists with kernel 5.9.0. I didn't attach new logs because the bug output is identical to the 5.8 kernel series.
Comment 8 Alex Deucher 2020-10-14 20:36:26 UTC
[ 3399.070651] pcieport 0000:03:02.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 3399.073473] amdgpu 0000:05:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 3399.136581] snd_hda_intel 0000:05:00.1: can't change power state from D3hot to D0 (config space inaccessible)

Seems like the card never gets powered back up by the platform on resume.
Comment 9 Robert M. Muncrief 2020-10-14 21:08:40 UTC
The same type of problem also occurred when I had my old R9-390 and GT 710 GPUs, FX-6300 CPU, and Gigabyte GA-990FXA-UD5 motherboard. However if I put the GT 710 in the primary PCIE slot the resume problem never occurred.

I can't be certain it was the exact same problem though, because there were a lot of AMDGPU resume problems and I just assumed it was because the hardware I had was so old. And since my R9 390 AMDGPU support was considered experimental I figured I had to live with the issue.

So I really hoped it would go away when I got my two new RX 580 GPUs, R7 3700X CPU, and X570 motherboard, but unfortunately the resume problem still occurs. And I gave away my GT 710 so I can't check to see if it still alleviates the issue.
Comment 10 dark_sylinc 2020-10-15 22:54:29 UTC
I'm having the same problem; I'm using Ubuntu 18.04 LTS and whatever they backported to kernel 5.4.0-51-generic started causing this problem; while the problem goes away in 5.4.0-48-generic (Ubuntu flavors)

I have more information:

 - Card is Radeon RX 560 Series (POLARIS11, DRM 3.35.0, 5.4.0-48-generic, LLVM 10.0.1)
 - The bug sometimes also triggers when plugging or unplugging an HDMI TV. (this may be https://bugzilla.kernel.org/show_bug.cgi?id=204241 ?)
 - The keyboard locks up, but I can still login via SSH
 - 'sudo shutdown now' will never finish. The kernel is stuck
 - In my case dmesg nor xorg.log notice at all something went wrong
 - Trying to kill X reveals the following:

[ 1571.941734] Call Trace:
[ 1571.941747]  __schedule+0x293/0x720
[ 1571.941752]  ? __queue_work+0x14c/0x400
[ 1571.941758]  schedule+0x33/0xa0
[ 1571.941765]  rpm_resume+0x108/0x780
[ 1571.941769]  ? __switch_to_asm+0x40/0x70
[ 1571.941776]  ? wait_woken+0x80/0x80
[ 1571.941782]  __pm_runtime_resume+0x4e/0x80
[ 1571.941939]  amdgpu_drm_ioctl+0x39/0x80 [amdgpu]
[ 1571.941944]  do_vfs_ioctl+0xa9/0x640
[ 1571.941950]  ? __schedule+0x29b/0x720
[ 1571.941954]  ksys_ioctl+0x75/0x80
[ 1571.941957]  __x64_sys_ioctl+0x1a/0x20
[ 1571.941964]  do_syscall_64+0x57/0x190
[ 1571.941968]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1571.941973] RIP: 0033:0x7f746d5a96d7
[ 1571.941982] Code: Bad RIP value.
[ 1571.941985] RSP: 002b:00007fff1ec6a7a8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 1571.941990] RAX: ffffffffffffffda RBX: 00007fff1ec6a7e0 RCX: 00007f746d5a96d7
[ 1571.941992] RDX: 00007fff1ec6a7e0 RSI: 00000000c06864a2 RDI: 000000000000000d
[ 1571.941994] RBP: 00007fff1ec6a7e0 R08: 0000000000000000 R09: 0000000000000000
[ 1571.941996] R10: 0000000000000000 R11: 0000000000003246 R12: 00000000c06864a2
[ 1571.941998] R13: 000000000000000d R14: 000055f52f391780 R15: 000055f52f2176a0
[ 1571.942021] INFO: task chrome:shlo0:2563 blocked for more than 120 seconds.
[ 1571.942026]       Tainted: G           OE     5.4.0-51-generic #56~18.04.1-Ubuntu
[ 1571.942029] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.


[ 1692.774402] python3:disk$2  D    0  6187      1 0x80004002
[ 1692.774404] Call Trace:
[ 1692.774410]  __schedule+0x293/0x720
[ 1692.774414]  ? __switch_to_asm+0x40/0x70
[ 1692.774419]  schedule+0x33/0xa0
[ 1692.774424]  schedule_preempt_disabled+0xe/0x10
[ 1692.774429]  __mutex_lock.isra.9+0x26d/0x4e0
[ 1692.774436]  __mutex_lock_slowpath+0x13/0x20
[ 1692.774441]  ? __mutex_lock_slowpath+0x13/0x20
[ 1692.774446]  mutex_lock+0x2f/0x40
[ 1692.774472]  drm_release+0x2e/0xd0 [drm]
[ 1692.774476]  __fput+0xc6/0x260
[ 1692.774481]  ____fput+0xe/0x10
[ 1692.774485]  task_work_run+0x9d/0xc0
[ 1692.774491]  do_exit+0x382/0xb80
[ 1692.774496]  ? mem_cgroup_try_charge+0x75/0x190
[ 1692.774503]  do_group_exit+0x43/0xa0
[ 1692.774506]  get_signal+0x14f/0x860
[ 1692.774512]  do_signal+0x34/0x6d0
[ 1692.774515]  ? strlcpy+0x32/0x50
[ 1692.774519]  ? __x64_sys_futex+0x13f/0x190
[ 1692.774525]  exit_to_usermode_loop+0x90/0x130
[ 1692.774530]  do_syscall_64+0x170/0x190
[ 1692.774534]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1692.774536] RIP: 0033:0x7f7f31d789f3
[ 1692.774541] Code: Bad RIP value.
[ 1692.774543] RSP: 002b:00007f7ef49abd10 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[ 1692.774546] RAX: fffffffffffffe00 RBX: 0000000002041e80 RCX: 00007f7f31d789f3
[ 1692.774548] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000002041ea8
[ 1692.774549] RBP: 0000000002041ea4 R08: 0000000000000000 R09: 0000000000000000
[ 1692.774551] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002041ea8
[ 1692.774553] R13: 0000000000000000 R14: 0000000002041e58 R15: 0000000000000002
[ 1692.774558] INFO: task kworker/4:1:6532 blocked for more than 241 seconds.
[ 1692.774561]       Tainted: G           OE     5.4.0-51-generic #56~18.04.1-Ubuntu
[ 1692.774563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1692.774566] kworker/4:1     D    0  6532      2 0x80004000
Comment 11 Robert M. Muncrief 2020-10-29 23:11:31 UTC
This bug still exists in kernel 5.10-rc1.
Comment 12 dark_sylinc 2020-10-29 23:35:29 UTC
Btw I reported that I was experiencing this too in my RX 560, however for me it went away with 5.8.15

I think my problem was unrelated to the one in this ticket, sorry.

Btw it may be worth writing down whether the GPU requires an extra PCIE power plug, as this may be relevant.
My RX560 requires one (and is plugged).
Comment 13 Robert M. Muncrief 2020-10-29 23:41:15 UTC
(In reply to dark_sylinc from comment #12)
> Btw I reported that I was experiencing this too in my RX 560, however for me
> it went away with 5.8.15
> 
> I think my problem was unrelated to the one in this ticket, sorry.
> 
> Btw it may be worth writing down whether the GPU requires an extra PCIE
> power plug, as this may be relevant.
> My RX560 requires one (and is plugged).

I have two XFX Radeon RX 580 GTS XXX cards, one for Linux and one for a KVM Windows VM. They have a single 8 pin power connector.
Comment 14 fawz 2020-11-11 16:51:57 UTC
Hi all,

I'm not sure if I'm experiencing the same bug, but the outcome and some of the log messages seem to be the same. 

For me, resuming from suspend is broken with kernel 5.9.0-2 and works when I boot with 5.7.0-1, keeping the rest of my system the same. I'm on debian sid.

For SEO, in dmesg I see messages like

    nov. 11 17:27:35 [  202.045603] amdgpu: [powerplay] 
    nov. 11 17:27:35                 failed to send message 146 ret is 0

    nov. 11 17:27:35 [  203.073392] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!

    nov. 11 17:27:35 [  216.242177] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd test failed (-110)
nov. 11 17:27:35 [  216.242245] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v4_2> failed -110
nov. 11 17:27:35 [  216.242312] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).

   nov. 11 17:27:44 [  224.963014] [drm] Fence fallback timer expired on ring sdma0


My hardware:

- CPU: AMD FX-6200 Six-Core Processor
- VGA: [AMD/ATI] Hawaii PRO [Radeon R9 290/390]
- Motherboard: GA-990FXA-UD3 Firmware F9

Software
- debian sid
- firmware-amd-graphics: 20200918-1
- libdrm-amdgpu1: 2.4.102-1
- mesa: 20.2.1-1


To reiterate, just booting into kernel 5.7 instead of 5.9, resume from suspend will work, keeping the above software the same.
Comment 15 fawz 2020-11-11 16:53:21 UTC
Hi all,

I'm not sure if I'm experiencing the same bug, but the outcome and some of the log messages seem to be the same. 

For me, resuming from suspend is broken with kernel 5.9.0-2 and works when I boot with 5.7.0-1, keeping the rest of my system the same. I'm on debian sid.

For SEO, in dmesg I see messages like

    nov. 11 17:27:35 [  202.045603] amdgpu: [powerplay] 
    nov. 11 17:27:35                 failed to send message 146 ret is 0

    nov. 11 17:27:35 [  203.073392] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!

    nov. 11 17:27:35 [  216.242177] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd test failed (-110)
nov. 11 17:27:35 [  216.242245] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v4_2> failed -110
nov. 11 17:27:35 [  216.242312] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).

   nov. 11 17:27:44 [  224.963014] [drm] Fence fallback timer expired on ring sdma0


My hardware:

- CPU: AMD FX-6200 Six-Core Processor
- VGA: [AMD/ATI] Hawaii PRO [Radeon R9 290/390]
- Motherboard: GA-990FXA-UD3 Firmware F9

Software
- debian sid
- firmware-amd-graphics: 20200918-1
- libdrm-amdgpu1: 2.4.102-1
- mesa: 20.2.1-1


To reiterate, just booting into kernel 5.7 instead of 5.9, resume from suspend will work, keeping the above software the same.
Comment 16 fawz 2020-11-11 16:54:05 UTC
Created attachment 293629 [details]
dmesg output when booting with kernel 5.9, suspending, then resuming
Comment 17 Илья Индиго 2020-12-12 10:43:12 UTC
I have the same problem with the Radeon HD 7770.
Comment 18 Илья Индиго 2020-12-12 11:21:36 UTC
(In reply to Илья Индиго from comment #17)
> I have the same problem with the Radeon HD 7770.
I have the same problem with the Radeon HD 7770.
This also happens with the amdgpu and radeonsi drivers.
It enters the S1 mode (although in the BIOS I specified to use only S3) and does not exit it.
With the old videocard, the 8600GT with nouveau entered S3 mode and exited normally.
Comment 19 xrootware 2020-12-23 18:15:31 UTC
Have same bug with Vega 3 on fedora 33 on kernel >= 5.9 and newest
Comment 20 Sven Neumann 2021-03-05 14:35:54 UTC
I am experiencing what appears to be the same problem. My hardware is a Lenovo Thinkpad 14s with AMD Ryzen 4750u. The notebook quite frequently doesn't come out of suspend. Or rather it seems to come out of suspend, but can not initialize the graphics hardware, resulting in a black screen:

Mar 05 13:31:23 zapp systemd[1]: Starting Suspend...
Mar 05 13:31:23 zapp systemd-sleep[4072]: Suspending system...
Mar 05 13:31:23 zapp kernel: PM: suspend entry (s2idle)
Mar 05 13:31:23 zapp kernel: Filesystems sync: 0.009 seconds
Mar 05 13:31:38 zapp kernel: rfkill: input handler enabled
Mar 05 13:31:38 zapp kernel: Freezing user space processes ... (elapsed 0.003 seconds) done.
Mar 05 13:31:38 zapp kernel: OOM killer disabled.
Mar 05 13:31:38 zapp kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Mar 05 13:31:38 zapp kernel: [drm] free PSP TMR buffer
Mar 05 13:31:38 zapp kernel: ACPI: EC: interrupt blocked
Mar 05 13:31:38 zapp kernel: ACPI: button: The lid device is not compliant to SW_LID.
Mar 05 13:31:38 zapp kernel: ACPI: EC: interrupt unblocked
Mar 05 13:31:38 zapp kernel: pci 0000:00:00.2: can't derive routing for PCI INT A
Mar 05 13:31:38 zapp kernel: pci 0000:00:00.2: PCI INT A: no GSI
Mar 05 13:31:38 zapp kernel: usb usb2: root hub lost power or was reset
Mar 05 13:31:38 zapp kernel: usb usb3: root hub lost power or was reset
Mar 05 13:31:38 zapp kernel: xhci_hcd 0000:05:00.0: Zeroing 64bit base registers, expecting fault
Mar 05 13:31:38 zapp kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
Mar 05 13:31:38 zapp kernel: [drm] PSP is resuming...
Mar 05 13:31:38 zapp kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: SMU is resuming...
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: dpm has been disabled
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: SMU is resumed successfully!
Mar 05 13:31:38 zapp kernel: usb 2-2: reset high-speed USB device number 2 using xhci_hcd
Mar 05 13:31:38 zapp kernel: [drm] kiq ring mec 2 pipe 1 q 0
Mar 05 13:31:38 zapp kernel: [drm] DMUB hardware initialized: version=0x00000001
Mar 05 13:31:38 zapp kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Mar 05 13:31:38 zapp kernel: [drm] JPEG decode initialized successfully.
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Mar 05 13:31:38 zapp kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
Mar 05 13:31:38 zapp kernel: fbcon: Taking over console
Mar 05 13:31:38 zapp kernel: [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).
Mar 05 13:31:38 zapp kernel: [drm] Failed to add display topology, DTM TA is not initialized.


The most relevant part seems to be

[drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).

This is on 5.12-rc1+, compiled from master this morning. But I have seen this problem with Ubuntu mainline kernels 5.10.17, 5.10.20 and 5.11.3 as well.
Comment 21 Alex Deucher 2021-03-05 15:34:57 UTC
Unless you have a polaris board please file your own bug.
Comment 22 Marius 2021-05-15 17:48:14 UTC
With 5.12.3, monitor remains blank after resume.
Relevant log:

```
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: dpm has been disabled
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
May 15 20:21:37 fedora kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: PM: failed to resume async: error -110
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
May 15 20:21:37 fedora kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
May 15 20:21:37 fedora kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
May 15 20:21:37 fedora kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
May 15 20:21:47 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=139973, emitted seq=139977
May 15 20:21:47 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1895 thread gnome-shel:cs0 pid 1926
May 15 20:21:47 fedora kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
May 15 20:21:47 fedora kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
May 15 20:21:47 fedora kernel: amdgpu 0000:04:00.0: amdgpu: MODE2 reset
May 15 20:21:47 fedora kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: dpm has been disabled
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
May 15 20:21:48 fedora kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset(2) failed
May 15 20:21:48 fedora kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -110
May 15 20:21:58 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
May 15 20:22:08 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
May 15 20:22:12 fedora kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
May 15 20:22:12 fedora kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
May 15 20:22:12 fedora kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
```

AMD Ryzen 7 4700U with Radeon Graphics, Lenovo Ideapad 5.
Comment 23 Leandro Jacques 2021-06-01 17:51:13 UTC
I'm facing exactly the same issue with a Ryzen 7 Vega 10 Graphics integrated GPU. I'll put my kernel log below, it began to happen after kernel 5.4, I had to downgrade my kernel to 5.4-lts from AUR and it's already 3 days without any GPU reset event. 

Kernel crash log in amdgpu driver:

mai 26 16:39:14 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=26777, emitted seq=26778
mai 26 16:39:14 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) succeeded!
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480b00 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480b40 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480b20 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480b60 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480b80 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480bc0 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480ba0 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480c00 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480be0 flags=0x0070]
mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b480c40 flags=0x0070]
mai 26 16:39:25 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
mai 26 16:39:35 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2117313, emitted seq=2117316
mai 26 16:39:35 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasmashell pid 1137 thread plasmashel:cs0 pid 1234
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485b40 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485b60 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485b80 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485ba0 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485bc0 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485be0 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485c20 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485c00 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485c40 flags=0x0070]
mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10b485c60 flags=0x0070]
mai 26 16:39:36 S145 kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
mai 26 16:39:36 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
mai 26 16:39:36 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
mai 26 16:39:37 S145 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(4) failed
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -110
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
mai 26 16:39:47 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 24 Leandro Jacques 2021-06-01 20:21:16 UTC
I forgot to mention the kernel version I was using when it crashed. It was 5.10.x

(In reply to Leandro Jacques from comment #23)
> I'm facing exactly the same issue with a Ryzen 7 Vega 10 Graphics integrated
> GPU. I'll put my kernel log below, it began to happen after kernel 5.4, I
> had to downgrade my kernel to 5.4-lts from AUR and it's already 3 days
> without any GPU reset event. 
> 
> Kernel crash log in amdgpu driver:
> 
> mai 26 16:39:14 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
> sdma0 timeout, signaled seq=26777, emitted seq=26778
> mai 26 16:39:14 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> Process information: process  pid 0 thread  pid 0
> mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
> mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
> mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset
> succeeded, trying to resume
> mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras
> ta ucode is not available
> mai 26 16:39:14 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap
> ta ucode is not available
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx uses VM
> inv eng 0 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0
> uses VM inv eng 1 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0
> uses VM inv eng 4 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0
> uses VM inv eng 5 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0
> uses VM inv eng 6 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1
> uses VM inv eng 7 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1
> uses VM inv eng 8 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1
> uses VM inv eng 9 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1
> uses VM inv eng 10 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0
> uses VM inv eng 11 on hub 0
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM
> inv eng 0 on hub 1
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_dec uses
> VM inv eng 1 on hub 1
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc0 uses
> VM inv eng 4 on hub 1
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses
> VM inv eng 5 on hub 1
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses
> VM inv eng 6 on hub 1
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo
> from shadow start
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo
> from shadow done
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1)
> succeeded!
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480b00 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480b40 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480b20 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480b60 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480b80 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480bc0 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480ba0 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480c00 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480be0 flags=0x0070]
> mai 26 16:39:15 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b480c40 flags=0x0070]
> mai 26 16:39:25 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
> gfx timeout, but soft recovered
> mai 26 16:39:35 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
> gfx timeout, signaled seq=2117313, emitted seq=2117316
> mai 26 16:39:35 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> Process information: process plasmashell pid 1137 thread plasmashel:cs0 pid
> 1234
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485b40 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485b60 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485b80 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485ba0 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485bc0 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485be0 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485c20 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485c00 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485c40 flags=0x0070]
> mai 26 16:39:35 S145 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x0000 address=0x10b485c60 flags=0x0070]
> mai 26 16:39:36 S145 kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
> mai 26 16:39:36 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset
> succeeded, trying to resume
> mai 26 16:39:36 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras
> ta ucode is not available
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap
> ta ucode is not available
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0:
> [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
> *ERROR* resume of IP block <sdma_v4_0> failed -110
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(4) failed
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with
> ret = -110
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:37 S145 kernel: amdgpu 0000:03:00.0: amdgpu: couldn't schedule
> ib on ring <sdma0>
> mai 26 16:39:37 S145 kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
> scheduling IBs (-22)
> mai 26 16:39:47 S145 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
> gfx timeout, but soft recovered
Comment 25 Leandro Jacques 2021-06-15 22:08:48 UTC
I don't have this issue with kernel 5.12.10
Comment 26 Leandro Jacques 2021-06-16 20:57:32 UTC
Created attachment 297415 [details]
Kernel crash log for kernel 5.10.x
Comment 27 Leandro Jacques 2021-06-16 20:58:13 UTC
Comment on attachment 297415 [details]
Kernel crash log for kernel 5.10.x

I had to downgrade to kernel 5.4 LTS to get rid of any problems
Comment 28 Leandro Jacques 2021-06-18 18:22:26 UTC
Created attachment 297465 [details]
amdgpu crash log for kernel 5.4.126

Another problem appeared in kernel 5.4.126 as the attached log shows. Before version 5.4.126 I was running out of problems.
Comment 29 Leandro Jacques 2021-06-22 17:16:25 UTC
Created attachment 297567 [details]
Linux Firmware version info

I tried downgrading the kernel to 5.4.123 and it didn't work out, I had the same  issues. So I downgraded linux-firmware to see if the problem disappears, I'm using linux-firmware 20210315
Comment 30 Leandro Jacques 2021-06-30 19:01:59 UTC
(In reply to Leandro Jacques from comment #29)

Until now, no problems. So the problem is with newer firmware versions, working without any issues since 2021-06-22 17:16:25 UTC with version 20210315
Comment 31 Leandro Jacques 2021-07-05 16:56:25 UTC
How to file a bug to the linux-firmware project for the amdgpu driver? After the downgrade I haven't experienced any issues anymore.
Comment 32 Alex Deucher 2021-07-06 13:44:18 UTC
Can you narrow down which specific firmware file (ce, me, smc, etc.) causes the problem?
Comment 33 Leandro Jacques 2021-07-06 17:31:22 UTC
(In reply to Alex Deucher from comment #32)
Sorry for that, but when I had the problem I wasn't paying attention for the firmware version, I thought it was a kernel problem. I focused on linux-firmware package only when I saw some people complaining about the same issues I was having, but blaming the linux-firmware package. So I saw a post telling that the latest good linux-firmware that was working well was 20210315, so I downgraded to this version and problem is gone, since the downgrade on 2021-06-22, I locked that package to not be upgraded anymore. I'll try to updgrade it again and see if the latest solves the problem too, but, by now, I can only guarantee that 20210315 version works without issues.
Comment 34 Leandro Jacques 2021-07-14 14:51:06 UTC
Created attachment 297851 [details]
Linux Firmware version info 20210511.7685cf4
Comment 35 Leandro Jacques 2021-07-14 14:54:21 UTC
Created attachment 297853 [details]
Linux Firmware version info 20210511.7685cf4

Firmware version when crashed
Comment 36 Leandro Jacques 2021-07-14 14:55:29 UTC
Created attachment 297855 [details]
Kernel crash log for linux firmware version 20210511.7685cf4

Kernel log when crashed.
Comment 37 Leandro Jacques 2021-07-14 17:56:09 UTC
(In reply to Alex Deucher from comment #32)
As you asked about the firmware version details, I upgraded my linux-firmware package to see if the problem would come back and it came back. So, this time, I could attatch the kernel log for the amdgpu driver and the amdgpu firmware versions details as of the crash event to narrow down the issue. By now, I'll return to the older version to make my system stable again.
Comment 38 Alex Deucher 2021-07-14 18:43:43 UTC
(In reply to Leandro Jacques from comment #37)
> (In reply to Alex Deucher from comment #32)
> As you asked about the firmware version details, I upgraded my
> linux-firmware package to see if the problem would come back and it came
> back. So, this time, I could attatch the kernel log for the amdgpu driver
> and the amdgpu firmware versions details as of the crash event to narrow
> down the issue. By now, I'll return to the older version to make my system
> stable again.

You have a Picasso system.  The original bug was about an RX 580.  I don't think this is the same issue.  Sounds like you are seeing this issue:
https://lists.freedesktop.org/archives/amd-gfx/2021-July/066452.html
Comment 39 Leandro Jacques 2021-07-15 13:35:52 UTC
(In reply to Alex Deucher from comment #38)
> 
> You have a Picasso system.  The original bug was about an RX 580.  I don't
> think this is the same issue.  Sounds like you are seeing this issue:
> https://lists.freedesktop.org/archives/amd-gfx/2021-July/066452.html

No, the error message is exactly the same of this one
https://bugzilla.kernel.org/show_bug.cgi?id=213391

Note You need to log in before you can comment on or make changes to this bug.