Asus ROG Strix G17 Advantage Edition (G713QY) has hybrid-graphics with dGPU RX6800M. After exiting any Vulkan application, it becomes unusable. Vulkaninfo sees dGPU before Vulkan app and does not see RX6800M after. After Vulkan app close, dmesg reports: [ 154.385749] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 154.401405] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 154.401409] amdgpu 0000:03:00.0: amdgpu: SMU is resuming... [ 159.038150] amdgpu 0000:03:00.0: amdgpu: message: RunDcBtc (54) param: 0x00000000 is timeout (no response) [ 159.038154] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw! [ 159.038156] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62 [ 159.038220] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). Using amdgpu.runpm=0 parameter fixes the issue.
Please attach your full dmesg output from boot through the problematic case.
Does this patch fix the issue? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=202ead5a3c589b0594a75cb99f080174f6851fed
Created attachment 298505 [details] full dmesg output
(In reply to Alex Deucher from comment #2) > Does this patch fix the issue? > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=202ead5a3c589b0594a75cb99f080174f6851fed Kernel 5.13.13 has this patch already. So apparently it does not fix the problem. It occurs with radv, amdvlk, and amdvlk-pro. External monitor is attached via HDMI (although it happens without ext. monitor too). Sometimes dmesg does not contain above mentioned lines but dGPU is still unusable. Sometimes DXVK delivers VK_ERROR_DEVICE_LOST status even during runtime.
Can confirm this issue as well under MSI Delta with RX6700M, in order to discard any "laptop specific issue". Both are Zen3 Navi cards. Now, while it doesn't break GPU usage, but its a waste of power resources. This issue it's kinda common, even with kernel 5.15.0-rc5. I don't have any steps to reproduce sadly.
Does this patch help? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=60b78ed088ebe1a872ee1320b6c5ad6ee2c4bd9a
Kernel 5.14.14 already has it but it's not fixed. I got mostly the same dmesg message but somewhat different: [ 367.167527] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 367.183399] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 367.183406] amdgpu 0000:03:00.0: amdgpu: SMU is resuming... [ 371.863082] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw! [ 371.863085] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62 [ 371.863147] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Recent kernels in 5.15.* and 5.16.* fix the issue for me.