Bug 214197

Summary: [Asus G713QY] RX6800M not usable after exiting Vulkan application
Product: Drivers Reporter: velemas
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED OBSOLETE    
Severity: normal CC: alexdeucher, waltercool
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.13.13 Subsystem:
Regression: No Bisected commit-id:
Attachments: full dmesg output

Description velemas 2021-08-27 10:44:35 UTC
Asus ROG Strix G17 Advantage Edition (G713QY) has hybrid-graphics with dGPU RX6800M. After exiting any Vulkan application, it becomes unusable. Vulkaninfo sees dGPU before Vulkan app and does not see RX6800M after.

After Vulkan app close, dmesg reports:

[  154.385749] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[  154.401405] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  154.401409] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[  159.038150] amdgpu 0000:03:00.0: amdgpu: message:        RunDcBtc (54)       param: 0x00000000 is timeout (no response)
[  159.038154] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[  159.038156] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[  159.038220] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).

Using amdgpu.runpm=0 parameter fixes the issue.
Comment 1 Alex Deucher 2021-08-27 15:05:20 UTC
Please attach your full dmesg output from boot through the problematic case.
Comment 3 velemas 2021-08-28 17:48:13 UTC
Created attachment 298505 [details]
full dmesg output
Comment 4 velemas 2021-08-28 17:49:40 UTC
(In reply to Alex Deucher from comment #2)
> Does this patch fix the issue?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=202ead5a3c589b0594a75cb99f080174f6851fed

Kernel 5.13.13 has this patch already. So apparently it does not fix the problem.
It occurs with radv, amdvlk, and amdvlk-pro. External monitor is attached via HDMI (although it happens without ext. monitor too).

Sometimes dmesg does not contain above mentioned lines but dGPU is still unusable. Sometimes DXVK delivers VK_ERROR_DEVICE_LOST status even during runtime.
Comment 5 Pablo Cholaky 2021-10-19 05:36:46 UTC
Can confirm this issue as well under MSI Delta with RX6700M, in order to discard any "laptop specific issue". Both are Zen3 Navi cards.

Now, while it doesn't break GPU usage, but its a waste of power resources.

This issue it's kinda common, even with kernel 5.15.0-rc5. I don't have any steps to reproduce sadly.
Comment 7 velemas 2021-10-21 08:57:24 UTC
Kernel 5.14.14 already has it but it's not fixed. I got mostly the same dmesg message but somewhat different:

[  367.167527] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[  367.183399] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  367.183406] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[  371.863082] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[  371.863085] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[  371.863147] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Comment 8 velemas 2022-01-30 11:56:36 UTC
Recent kernels in 5.15.* and 5.16.* fix the issue for me.