Bug 210201 - [amdpgu] crash when playing after suspend/resume
Summary: [amdpgu] crash when playing after suspend/resume
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-14 16:45 UTC by Artur Bac
Modified: 2021-01-06 22:10 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.9.8, 5.6.19, 5.8.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Full dmesg after video crash (199.41 KB, text/plain)
2020-11-14 16:45 UTC, Artur Bac
Details

Description Artur Bac 2020-11-14 16:45:51 UTC
Created attachment 293669 [details]
Full dmesg after video crash

When i play vulkan games, like Kerbal Space Program, ReadDeadRedemption2(via Proton) after i return from suspend, after runing them after 30min - 1h graphics driver crashes.



OS: Gentoo 
Kernel: x86_64 Linux 5.9.8
Resolution: 7680x2160 (2 monitors attached 4K free sync)
DE: KDE 5.75.0 / Plasma 5.20.3
WM: KWin
GTK Theme: Adwaita [GTK2/3]
CPU: AMD Ryzen 9 3900X 12-Core @ 24x 3.8GHz
GPU: AMD Radeon RX 5700 XT (NAVI10, DRM 3.39.0, 5.9.8, LLVM 10.0.1) Mesa 20.2.2
RAM: 32038MiB

Full dmesg attached.

[104307.850190] amdgpu 0000:0f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[104307.850192] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[104307.850194] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[104307.850195] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[104307.850196] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[104307.850198] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[104307.850199] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[104307.850201] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[104307.850202] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[104307.850203] amdgpu 0000:0f:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[104307.850205] amdgpu 0000:0f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[104307.850206] amdgpu 0000:0f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[104307.850208] amdgpu 0000:0f:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[104307.850209] amdgpu 0000:0f:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[104307.850210] amdgpu 0000:0f:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
[104307.850212] amdgpu 0000:0f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[104307.852128] [drm] recover vram bo from shadow start
[104307.872340] [drm] recover vram bo from shadow done
[104307.872342] [drm] Skip scheduling IBs!
[104307.872343] [drm] Skip scheduling IBs!
[104307.872357] [drm] Skip scheduling IBs!
[104307.872362] amdgpu 0000:0f:00.0: amdgpu: GPU reset(2) succeeded!
[104307.872373] [drm] Skip scheduling IBs!
[repeated many times...]
[104307.872440] [drm] Skip scheduling IBs!
[104314.769174] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
[104314.795600] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
[104314.847946] amdgpu 0000:0f:00.0: amdgpu: failed to clear page tables on GEM object close (-16)
[repeated many times...]
[104315.300254] amdgpu 0000:0f:00.0: amdgpu: failed to clear page tables on GEM object close (-16)
[104325.487235] GpuWatchdog[731778]: segfault at 0 ip 00007f9be2bf92dd sp 00007f9bd77ed670 error 6 in libcef.so[7f9bdee73000+69a4000]
[104325.488266] Code: 00 79 09 48 8b 7d a0 e8 21 80 c1 02 41 8b 85 00 01 00 00 85 c0 0f 84 ab 00 00 00 49 8b 45 00 4c 89 ef be 01 00 00 00 ff 50 58 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 c1 a5 37 03 01 80 bd 7f ff
[104335.590624] GpuWatchdog[731809]: segfault at 0 ip 00007f494f6142dd sp 00007f4944208670 error 6 in libcef.so[7f494b88e000+69a4000]
[104335.590631] Code: 00 79 09 48 8b 7d a0 e8 21 80 c1 02 41 8b 85 00 01 00 00 85 c0 0f 84 ab 00 00 00 49 8b 45 00 4c 89 ef be 01 00 00 00 ff 50 58 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 c1 a5 37 03 01 80 bd 7f ff
[104345.690401] GpuWatchdog[731833]: segfault at 0 ip 00007fcbb4c722dd sp 00007fcba9866670 error 6 in libcef.so[7fcbb0eec000+69a4000]
[104345.692176] Code: 00 79 09 48 8b 7d a0 e8 21 80 c1 02 41 8b 85 00 01 00 00 85 c0 0f 84 ab 00 00 00 49 8b 45 00 4c 89 ef be 01 00 00 00 ff 50 58 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 c1 a5 37 03 01 80 bd 7f ff
Comment 1 Artur Bac 2020-11-19 17:17:07 UTC
What is interesting people report similar case ~1h on windows, does amdgpu is sharing code with windows driver ?

https://www.reddit.com/r/AMDHelp/comments/jx4660/crash_rx5700xt
Comment 2 Artur Bac 2021-01-06 22:10:17 UTC
I can confirm this bug exists only with clang compiled kernel.
gnu gcc compiled works ok.

Note You need to log in before you can comment on or make changes to this bug.