Created attachment 284485 [details]
journalctl: amdgpu lockup on resume from sleep.
My system locks up when trying to wake from sleep (open lid). The screen remains black and is unresponsive to keyboard/mouse input. I'm able to ssh from another machine and have attached the output from journalctl -b. The log shows scrolling errors...
kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
kernel: amdgpu 0000:05:00.0: couldn't schedule ib on ring <gfx>
This is a Lenovo E585 laptop with an AMD R5 2500U APU.
If this is a regression can you bisect?
The problem is after v5.1, and before v5.2. It's very reproducible on v5.2 but might be less frequent as the bisect progresses. Attempts have driven me into the weeds, but I'm still trying.
It looks like another user reported the same issue here:
During my bisect I was seeing visual artifacts without the lockup so I believe they're separate issues.
I'm still working on trying to bisect the problem, but it's been challenging. Following the advice at https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues I turned on the initcall_debug and no_console_suspend boot options.
I then see the following messages in the boot log after bringing the system back up.
> Sep 15 17:36:39 mobile kernel: [drm] reserve 0x400000 from 0xf400c00000 for
> PSP TMR SIZE
> Sep 15 17:36:39 mobile kernel: [drm] psp command failed and response status
> is (0)
> Sep 15 17:36:39 mobile kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP load
> tmr failed!
> Sep 15 17:36:39 mobile kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume
> Sep 15 17:36:39 mobile kernel: [drm:amdgpu_device_fw_loading [amdgpu]]
> *ERROR* resume of IP block <psp> failed -22
> Sep 15 17:36:39 mobile kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
> amdgpu_device_ip_resume failed (-22).
> Sep 15 17:36:39 mobile kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x90
> returns -22
> Sep 15 17:36:39 mobile kernel: amdgpu 0000:05:00.0: pci_pm_resume+0x0/0x90
> returned -22 after 19543535 usecs
> Sep 15 17:36:39 mobile kernel: PM: Device 0000:05:00.0 failed to resume
> async: error -22
I've been able to narrow the problem down a bit.
The first commit where I get the scrolling amdgpu errors is
Unfortunately that's a merge commit.
One of the parents appears to be good
The other parent
causes lockups that don't have any journal messages after going to sleep. I've tried bisecting this back to v5.1-rc1 (good) but the lockups become much less consistent.
I have the same problem on a Thinkpad X395, Ryzen 5 3500U. I have a downstream bug report at https://bugzilla.redhat.com/show_bug.cgi?id=1731915
Created attachment 285365 [details]
journalctl output on Thinkpad X395
Same for Thinkpad E585 with Ryzen 5 2500U.
kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP load tmr failed!
kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22).
kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x80 returns -22
kernel: PM: Device 0000:05:00.0 failed to resume async: error -22
kernel: acpi LNXPOWER:01: Turning OFF
kernel: OOM killer enabled.
kernel: Restarting tasks ...
Issue also present on Lenovo e585 -> "AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx"
I can provide debugging information upon request, availability permitting. Omitted for now, as substantially similar to Vic Luo. I'm not just posting this as a 'me too', I'll try to make availability to help out in whatever ways I can.
Created attachment 289265 [details]
journalctl amdgpu fails on resume
confirming the bug on
AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
Resume fails presumably (fan still active)
iommu=pt and amd_iommu=on do not work, disabling pageflip does not work
latest bios version from HP is installed.
Created attachment 289269 [details]
dmesg output when switching to console
update: when switching to console (ctrl alt f4) before suspend, pc wakes up again.
direct switching back to wayland freezes pc
when instead restarting gdm from console, computer can resume again in wayland (took two logins)
attached the dmesg output of suspend and resume in console mode.
Created attachment 289653 [details]
I am also affected by this issue on a Dell Inspiron 14 2-in-1 7405 with a Ryzen 7 4700u. I also am willing to help debug and test, but unfortunately I cannot help bisect because amdgpu did not support my gpu at all when the regression occurred.