Bug 214587

Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=5900910, emitted seq=5900912
Product: Drivers Reporter: Lahfa Samy (samy)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: serg
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.14.7-arch1-1 Subsystem:
Regression: No Bisected commit-id:

Description Lahfa Samy 2021-09-30 21:36:58 UTC
Hi,

I've just recently hit this issue on ArchLinux kernel 5.14.7-arch1-1, linux-firmware 20210919.d526e04-1 with a Thinkpad T495 AMD Ryzen 7 3700U along a Vega Radeon RX 10 while using hashcat to brute-force hashes, hashcat was using OpenCL in order to use the GPU and then the computer just froze, and a GPU reset happened see the following logs.

Logs from dmesg:
[87507.678904] [drm] Fence fallback timer expired on ring gfx
[87512.691933] [drm] Fence fallback timer expired on ring gfx
[87517.572033] [drm] Fence fallback timer expired on ring gfx
[87523.012214] [drm] Fence fallback timer expired on ring gfx
[87533.129069] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=5900910, emitted seq=5900912
[87533.129518] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2937 thread Xorg:cs0 pid 3143
[87533.129957] amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
[87533.138994] amdgpu 0000:06:00.0: amdgpu: Guilty job already signaled, skipping HW reset
[87533.139056] amdgpu 0000:06:00.0: amdgpu: GPU reset(2) succeeded!
Comment 1 Lahfa Samy 2021-09-30 21:38:10 UTC
The computer did unfreeze then after the reset of the GPU but it seems hashcat cannot use the GPU anymore for some reason, I'm not too sure why, but I think I need to reboot my machine.