Bug 204611
Summary: | amdgpu error scheduling IBs when waking from sleep | ||
---|---|---|---|
Product: | Drivers | Reporter: | tones111 |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | aeon.descriptor, alexdeucher, bastian, bjo, carmen, danielrparks, dario.tislar, sevenever, thierry.monnier5, vicluo96 |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.2.9 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
journalctl: amdgpu lockup on resume from sleep.
journalctl output on Thinkpad X395 journalctl amdgpu fails on resume dmesg output when switching to console 4700u journal attachment-29192-0.html attachment-26652-0.html |
If this is a regression can you bisect? The problem is after v5.1, and before v5.2. It's very reproducible on v5.2 but might be less frequent as the bisect progresses. Attempts have driven me into the weeds, but I'm still trying. It looks like another user reported the same issue here: https://bugzilla.kernel.org/show_bug.cgi?id=204227 During my bisect I was seeing visual artifacts without the lockup so I believe they're separate issues. I'm still working on trying to bisect the problem, but it's been challenging. Following the advice at https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues I turned on the initcall_debug and no_console_suspend boot options. I then see the following messages in the boot log after bringing the system back up. > Sep 15 17:36:39 mobile kernel: [drm] reserve 0x400000 from 0xf400c00000 for > PSP TMR SIZE > ... > Sep 15 17:36:39 mobile kernel: [drm] psp command failed and response status > is (0) > Sep 15 17:36:39 mobile kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP load > tmr failed! > Sep 15 17:36:39 mobile kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume > failed > Sep 15 17:36:39 mobile kernel: [drm:amdgpu_device_fw_loading [amdgpu]] > *ERROR* resume of IP block <psp> failed -22 > Sep 15 17:36:39 mobile kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* > amdgpu_device_ip_resume failed (-22). > Sep 15 17:36:39 mobile kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 > returns -22 > Sep 15 17:36:39 mobile kernel: amdgpu 0000:05:00.0: pci_pm_resume+0x0/0x90 > returned -22 after 19543535 usecs > Sep 15 17:36:39 mobile kernel: PM: Device 0000:05:00.0 failed to resume > async: error -22 I've been able to narrow the problem down a bit. The first commit where I get the scrolling amdgpu errors is 4f8b49092c37cf0c87c43bb2698d43c71cf0e4e5 Unfortunately that's a merge commit. One of the parents appears to be good ceacbc0e145e3b27d8b12eecb881f9d87702765a The other parent 5dd6c49339126c2c8df2179041373222362d6e49 causes lockups that don't have any journal messages after going to sleep. I've tried bisecting this back to v5.1-rc1 (good) but the lockups become much less consistent. I have the same problem on a Thinkpad X395, Ryzen 5 3500U. I have a downstream bug report at https://bugzilla.redhat.com/show_bug.cgi?id=1731915 Created attachment 285365 [details]
journalctl output on Thinkpad X395
Same for Thinkpad E585 with Ryzen 5 2500U. kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP load tmr failed! kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-22). kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x80 returns -22 kernel: PM: Device 0000:05:00.0 failed to resume async: error -22 kernel: acpi LNXPOWER:01: Turning OFF kernel: OOM killer enabled. kernel: Restarting tasks ... Issue also present on Lenovo e585 -> "AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx" I can provide debugging information upon request, availability permitting. Omitted for now, as substantially similar to Vic Luo. I'm not just posting this as a 'me too', I'll try to make availability to help out in whatever ways I can. Created attachment 289265 [details]
journalctl amdgpu fails on resume
confirming the bug on
AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
Fedora 32
Kernel: 5.6.14-300.fc32.x86_64
Resume fails presumably (fan still active)
iommu=pt and amd_iommu=on do not work, disabling pageflip does not work
latest bios version from HP is installed.
Created attachment 289269 [details]
dmesg output when switching to console
update: when switching to console (ctrl alt f4) before suspend, pc wakes up again.
direct switching back to wayland freezes pc
when instead restarting gdm from console, computer can resume again in wayland (took two logins)
attached the dmesg output of suspend and resume in console mode.
Created attachment 289653 [details]
4700u journal
I am also affected by this issue on a Dell Inspiron 14 2-in-1 7405 with a Ryzen 7 4700u. I also am willing to help debug and test, but unfortunately I cannot help bisect because amdgpu did not support my gpu at all when the regression occurred.
I haven't seen problems resuming from sleep in some time. Is anyone still experiencing this problem on newer kernels? If not then I'd like to propose this issue be marked as resolved. Created attachment 300344 [details] attachment-29192-0.html I still have this issue, but I'm using the latest Ubuntu 20.04 patched kernels, so I don't know how 'latest' that is. What kernel versions work? I could try them out. On Sat, Jan 29, 2022, 2:55 PM <bugzilla-daemon@kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=204611 > > --- Comment #12 from tones111@hotmail.com --- > I haven't seen problems resuming from sleep in some time. Is anyone still > experiencing this problem on newer kernels? If not then I'd like to > propose > this issue be marked as resolved. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are on the CC list for the bug. Created attachment 300346 [details] attachment-26652-0.html No problem since a long time too. I think it's solve. Le sam. 29 janv. 2022 à 23:59, <bugzilla-daemon@kernel.org> a écrit : > https://bugzilla.kernel.org/show_bug.cgi?id=204611 > > --- Comment #13 from aeon.descriptor@gmail.com --- > I still have this issue, but I'm using the latest Ubuntu 20.04 patched > kernels, so I don't know how 'latest' that is. > > What kernel versions work? I could try them out. > > On Sat, Jan 29, 2022, 2:55 PM <bugzilla-daemon@kernel.org> wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=204611 > > > > --- Comment #12 from tones111@hotmail.com --- > > I haven't seen problems resuming from sleep in some time. Is anyone > still > > experiencing this problem on newer kernels? If not then I'd like to > > propose > > this issue be marked as resolved. > > > > -- > > You may reply to this email to add a comment. > > > > You are receiving this mail because: > > You are on the CC list for the bug. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are on the CC list for the bug. |
Created attachment 284485 [details] journalctl: amdgpu lockup on resume from sleep. My system locks up when trying to wake from sleep (open lid). The screen remains black and is unresponsive to keyboard/mouse input. I'm able to ssh from another machine and have attached the output from journalctl -b. The log shows scrolling errors... kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) kernel: amdgpu 0000:05:00.0: couldn't schedule ib on ring <gfx> This is a Lenovo E585 laptop with an AMD R5 2500U APU.