Bug 212655

Summary: AMDGPU crashes when resuming from suspend when amd_iommu=on
Product: Drivers Reporter: felipejfc (fjfcavalcanti)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: jurik.phys, xiehuanjun
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.11.10-1 Subsystem:
Regression: No Bisected commit-id:

Description felipejfc 2021-04-12 17:22:06 UTC
So, my setup is the following:

Manjaro Linux on kernel 5.11.10, but also tested on pop OS and it also happens.
Mb MSI Tomahawk B450
Ryzen 5 3600
GPU Radeon RX5700 (Powercolor Red Devil)

I tried multiple kernels from 5.9 to 5.12 and all had the same issue, if I turn on iommu AMDGPU crashes during resume, and I have to hard-reset the system (I cant reset it using shutdown -r for example)

What I see in DMESG after resume is the following:

[   36.492418] amdgpu 0000:28:00.0: amdgpu: failed send message:     RunBtc (58) 	param: 0x00000000 response 0xffffffc2
[   36.492420] amdgpu 0000:28:00.0: amdgpu: RunBtc failed!
[   36.492421] amdgpu 0000:28:00.0: amdgpu: Failed to setup smc hw!
[   36.492422] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[   36.492515] amdgpu 0000:28:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
[   36.492516] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
[   36.492520] PM: Device 0000:28:00.0 failed to resume async: error -62
Comment 1 coxackie 2021-09-02 08:10:39 UTC
I have the same problem on Alienware Aurora R10 - AMD Radeon RX 5700 GPU. Arch linux, Now at kernel 5.13. I have not "turned on iommu AMDGPU", but there is a crash all the same. The monitor does not get any signal, so cannot turn on after suspend.


logs:

```
amdgpu 0000:0d:00.0: amdgpu: message:          RunBtc (58)         param: 0x00000000 is timeout (no response)
amdgpu 0000:0d:00.0: amdgpu: RunBtc failed!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
amdgpu 0000:0d:00.0: PM: failed to resume async: error -62
amdgpu: Move buffer fallback to memcpy unavailable
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
```
Comment 2 Yury O. 2023-06-04 11:08:41 UTC
I have thе same problem but resume error occurs sometimes, not every time. 

OS: Debian GNU/Linux 11 (bullseye)
Kernel from backports (native kernel also have this problem): 6.1.0-0.deb11.7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-2~bpo11+1 
GPU: RX5500XT (MSI).

I noticed that if the graphics card's RGB backlight went out when going to sleep, then resume from sleep state will succeed. 

But if the backlight remained active when going to sleep, then the resume will fail. In this case in systemd journal I have next

amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000003A SMN_C2PMSG_82:0x00000000
amdgpu 0000:03:00.0: amdgpu: RunBtc failed!
amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw! 
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62 
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62 
amdgpu 0000:03:00.0: PM: failed to resume async: error -62