Since Linux-6.0.18 hibernation isn't working anymore. 6.0.17 was working fine. When doing systemctl start hibernate.target the screen turns black, but the system doesn't power down. Same problem for "platform" and "shutdown" in /etc/systemd/sleep.conf => HibernateMode. Force rebooting the system makes GRUB skip the boot menu (like normal when waking from hibernation), but the system just boots up freshly, not restoring the old state. = System = Model: HP EliteBook 845 G8 (notebook) CPU+GPU: Ryzen 5650U incl. Radeon GPU OS: openSUSE-15.4 Kernel: compiled from kernel.org The HP EliteBook 845 G8 uses s0ix/s2idle. Sadly I don't know how to provide helpful logs. After reboot there's nothing helpful in /var/log/messages Just this: 2023-01-11T18:45:51.208584+01:00 myhost systemd[1]: Reached target Sleep. 2023-01-11T18:45:51.224615+01:00 myhost systemd[1]: Starting Hibernate... 2023-01-11T18:45:51.253804+01:00 myhost systemd-sleep[1998]: INFO: running /usr/lib/systemd/system-sleep/grub2.sleep for hibernate 2023-01-11T18:45:51.253875+01:00 myhost systemd-sleep[1998]: INFO: Running prepare-grub .. 2023-01-11T18:45:51.322933+01:00 myhost systemd-sleep[1998]: running kernel is grub menu entry openSUSE Leap 15.4 (vmlinuz-6.0.18-v6.0.18-myhost) 2023-01-11T18:45:51.323010+01:00 myhost systemd-sleep[1998]: preparing boot-loader: selecting entry openSUSE Leap 15.4, kernel /boot/6.0.18-v6.0.18-myhost 2023-01-11T18:45:51.331546+01:00 myhost systemd-sleep[1998]: running /usr/sbin/grub2-once "openSUSE Leap 15.4" 2023-01-11T18:45:51.585503+01:00 myhost systemd-sleep[1998]: time needed for sync: 0.0 seconds, time needed for grub: 0.2 seconds. 2023-01-11T18:45:51.585585+01:00 myhost systemd-sleep[1998]: INFO: Done. 2023-01-11T18:45:51.586273+01:00 myhost systemd-sleep[1996]: Entering sleep state 'hibernate'... 2023-01-11T18:45:51.588025+01:00 myhost kernel: [ 39.640758][ T1996] PM: hibernation: hibernation entry
I've narrowed the problem down to somewhen between 6fc4c0cd9 (last known good) and v6.0.18 Linux-6.1.4 is fine. (i just can't use it productively because of https://gitlab.freedesktop.org/drm/amd/-/issues/2171 )
If it's between 6fc4c0cd9 and v6.0.18 a bisect would be best, but my first educated guess would be: 306df163069e ("drm/amdgpu: make display pinning more flexible (v2)") If you revert that does it start working again? It's peculiar that 6.1.4 is fine, that fix is also in 6.1.4 but we might need something else. > (i just can't use it productively because of > https://gitlab.freedesktop.org/drm/amd/-/issues/2171 ) Yeah; hopefully that's fixed soon.
Perfect guess! Indeed 306df163069e is broken and it's predecessor is fine. Reverting 306df163069e on v6.0.18 also made the problem disappear. Last good: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=306df163069e78160e7a534b892c5cd6fefdd537^ First bad: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=306df163069e78160e7a534b892c5cd6fefdd537 Just wanted to say THANK YOU for all your help in the last couple of month!!! Both of my Ryzen notebooks wouldn't work as great as they do without you and Alex.
> Perfect guess! OK.. so we need to find out why this works in 6.1.y and not in 6.0.y. There are some fairly severe bugs it fixed. Is it 100% failure rate on 6.0.y? Since you mentioned that you couldn't effectively use 6.1.y because of the MST issue, are you only finding it on 6.0.y when connected to a dock or anything else unique? > Sadly I don't know how to provide helpful logs. After reboot there's nothing > helpful in /var/log/messages Can you check /var/lib/systemd/pstore? Perhaps there was a kernel crash that got saved into NVRAM and restored by systemd on the next boot. > Just wanted to say THANK YOU for all your help in the last couple of month!!! :)
Can you attach your dmesg output?
Created attachment 303585 [details] 6.1.4 dmesg after hibernation (In reply to Mario Limonciello (AMD) from comment #4) > [...] > Is it 100% failure rate on 6.0.y? Yes. > Since you mentioned that you couldn't effectively use 6.1.y because of the > MST issue, are you only finding it on 6.0.y when connected to a dock or > anything else unique? No. Happens with dock, with simple USB-C power (no dock) and on battery. > > Sadly I don't know how to provide helpful logs. After reboot there's > nothing > > helpful in /var/log/messages > > Can you check /var/lib/systemd/pstore? Perhaps there was a kernel crash > that got saved into NVRAM and restored by systemd on the next boot. Sadly that file doesn't exist. There are some files in /sys/fs/pstore/. But nothing from today. (In reply to Alex Deucher from comment #5) > Can you attach your dmesg output? I don't know how to get logs (including dmesg) when hibernation has failed. As said, after reboot there's nothing new in /var/log/messages Instead I attached dmesg after hibernation with v6.1.4. Is that helpful? Another thing: Is it important that my SWAP is a file /swap on an ext4 partition inside a LUKS partition?
do you still have the problem with: CONFIG_DRM_FBDEV_EMULATION=n in your .config? Does reverting a6250bdb6c4677ee77d699b338e077b900f94c0c fix it?
(In reply to Alex Deucher from comment #7) > do you still have the problem with: > CONFIG_DRM_FBDEV_EMULATION=n > in your .config? The problem unfortunately still exists with CONFIG_DRM_FBDEV_EMULATION=n (and I get a black screen on the virtual console) > Does reverting a6250bdb6c4677ee77d699b338e077b900f94c0c fix it? No. That also doesn't help. I'm sorry. Anything else I can try?
FWIW, I just wanted to add this to the regression tracking, but 6.0.y is EOL now; and it seems 6.1.y works. Greg might do another fixup release, but maybe investigating this further is not worth it.
Looks like the display issue with linux-6.1.y is on a good way. Hibernation still works fine with the latest revert-commit by Mario & Wayne, which I tested here. https://gitlab.freedesktop.org/drm/amd/-/issues/2171#note_1720281 So from my point of view this bug isn't relevant anymore. At least as long as it doesn't appear on newer kernels again.
Just for the record, if someone cares or lands here some time in the future: There is another report about hibernation problems with ryzen cppus in 6.0.18 here: https://lore.kernel.org/all/2d59ed2b-ba8f-6695-9764-fd3b109acd4c@mailbox.org/ Bisection result included (drm/amdgpu: make display pinning more flexible (v2)).
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #9) > FWIW, I just wanted to add this to the regression tracking, but 6.0.y is EOL > now; and it seems 6.1.y works. Greg might do another fixup release, but > maybe investigating this further is not worth it. I beg to differ. Longterm kernels 5.15.87/88 and probably all other LTS kernels to which commit 306df163069e78160e7a534b892c5cd6fefdd537 has been backported are also affected. As "hibernate" is a basic, reliable feature and a hard reset (as in my case) may result in data loss, I only see two possibilities: either revert the commit in the longterm kernels or try to find out quickly what makes it work for them. The diff between 6.0.18 and 6.1.4 (where it was introduced) shows that only 86 files in /drivers/gpu/drm/amd/amdgpu have been modified. Probably only a few of them are relevant in this matter. So for the experts it should not be too hard to figure out a solution.
Can we please confirm it's actually broken in 5.15.y before going through that effort?
> and a hard reset (as in my case) may result Sorry - specifically that reverting the backported commit fixes your case. If so, yeah then we should see if there is anything else obvious to backport to help it.
(In reply to Mario Limonciello (AMD) from comment #13) > Can we please confirm it's actually broken in 5.15.y before going through > that effort? I have tested this with 5.15.87/88. Error messages and symptoms were the same as with 6.0.18. Spared me the bisecting this time, though.
(In reply to Rainer Fiebig from comment #15) > (In reply to Mario Limonciello (AMD) from comment #13) > > Can we please confirm it's actually broken in 5.15.y before going through > > that effort? > > I have tested this with 5.15.87/88. Error messages and symptoms were the > same as with 6.0.18. Spared me the bisecting this time, though. Can you verify that reverting the change in 5.15.y fixes it?
(In reply to Alex Deucher from comment #16) > (In reply to Rainer Fiebig from comment #15) > > (In reply to Mario Limonciello (AMD) from comment #13) > > > Can we please confirm it's actually broken in 5.15.y before going through > > > that effort? > > > > I have tested this with 5.15.87/88. Error messages and symptoms were the > > same as with 6.0.18. Spared me the bisecting this time, though. > > Can you verify that reverting the change in 5.15.y fixes it? Will do it.
(In reply to Alex Deucher from comment #16) > (In reply to Rainer Fiebig from comment #15) > > (In reply to Mario Limonciello (AMD) from comment #13) > > > Can we please confirm it's actually broken in 5.15.y before going through > > > that effort? > > > > I have tested this with 5.15.87/88. Error messages and symptoms were the > > same as with 6.0.18. Spared me the bisecting this time, though. > > Can you verify that reverting the change in 5.15.y fixes it? Alright, I do confirm that reverting commit 306df163069e78160e7a534b892c5cd6fefdd537 ("drm/amdgpu: make display pinning more flexible (v2)") solves the problem with hibernate and resume in 5.15.88. To me it seems that this patch cannot be backported in an isolated fashion.
Assuming it's within amdgpu and not DRM helpers it's still ~800 commits to sift through. Even though 6.0.y is EOL now, I think it would be easier to check the missing commit(s) from there to backport. We can worry about 5.15.y after that. Can you see if this series from 6.1 on top of 6.0.19 helps? https://patchwork.freedesktop.org/series/106027/
(In reply to Mario Limonciello (AMD) from comment #19) > Assuming it's within amdgpu and not DRM helpers it's still ~800 commits to > sift through. Even though 6.0.y is EOL now, I think it would be easier to > check the missing commit(s) from there to backport. We can worry about > 5.15.y after that. > > Can you see if this series from 6.1 on top of 6.0.19 helps? > > https://patchwork.freedesktop.org/series/106027/ Yes, but may take a while.
(In reply to Mario Limonciello (AMD) from comment #19) > Assuming it's within amdgpu and not DRM helpers it's still ~800 commits to > sift through. Even though 6.0.y is EOL now, I think it would be easier to > check the missing commit(s) from there to backport. We can worry about > 5.15.y after that. > > Can you see if this series from 6.1 on top of 6.0.19 helps? > > https://patchwork.freedesktop.org/series/106027/ No, those patches didn't help. Hibernate was always fine but resume always failed in the same way as described in my original mail to "stable". Note that I'm not going to test 800 commits in this manner. ;) So long!
Thanks for trying. Another idea that might be feasible to do to identify it is a proper bisect between v6.0 and v6.1 but manually applying '306df163069e78160e7a534b892c5cd6fefdd537 ("drm/amdgpu: make display pinning more flexible (v2)")' on each test point.
I'll just revert it. It is more important for kernels with the the drm_buddy changes.
(In reply to Alex Deucher from comment #23) > I'll just revert it. It is more important for kernels with the the > drm_buddy changes. Right thing to do for now, I guess. If I can find a way to identify the commit(s) between 6.0.19 and 6.1 that fix the problem, I'll report it here. Thanks. Rainer
(In reply to Alex Deucher from comment #23) > I'll just revert it. It is more important for kernels with the the > drm_buddy changes. Would the following be equivalent to what you intended with your commit? Looks a bit awkward but hibernate/resume work with it for 6.0.19 (and a Ryzen 5600G): uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, uint32_t domain) { if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) { domain = AMDGPU_GEM_DOMAIN_VRAM; if ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY)) { if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD) domain = AMDGPU_GEM_DOMAIN_GTT; } } return domain; } Let me know whether this is worth persuing. I could then test it with 5.15.88 and 6.1.6.
(In reply to Rainer Fiebig from comment #25) > (In reply to Alex Deucher from comment #23) > > I'll just revert it. It is more important for kernels with the the > > drm_buddy changes. > > Would the following be equivalent to what you intended with your commit? > Looks a bit awkward but hibernate/resume work with it for 6.0.19 (and a > Ryzen 5600G): > > > uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, > uint32_t domain) > { > if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) { > domain = AMDGPU_GEM_DOMAIN_VRAM; > if ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == > CHIP_STONEY)) > { > if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD) > domain = AMDGPU_GEM_DOMAIN_GTT; > } > } > return domain; > } > > > Let me know whether this is worth persuing. I could then test it with > 5.15.88 and 6.1.6. Nope. What my patch does is allow display buffers to be in either system memory (GTT) or carve out (VRAM) depending on what is available. Without the patch, the driver picks either VRAM or GTT depending on how much VRAM is available on the system. This can lead to memory exhaustion in some cases with multiple large resolution monitors depending on memory fragmentation. What your patch does is just always use VRAM unless the chip is Carrizo or Stoney. So it is effectively just reverting the commit (depending on how much VRAM your system has).
(In reply to Alex Deucher from comment #26) > (In reply to Rainer Fiebig from comment #25) > > (In reply to Alex Deucher from comment #23) > > > I'll just revert it. It is more important for kernels with the the > > > drm_buddy changes. > > > > Would the following be equivalent to what you intended with your commit? > > Looks a bit awkward but hibernate/resume work with it for 6.0.19 (and a > > Ryzen 5600G): > > > > > > uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, > > uint32_t domain) > > { > > if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) { > > domain = AMDGPU_GEM_DOMAIN_VRAM; > > if ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == > > CHIP_STONEY)) > > { > > if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD) > > domain = AMDGPU_GEM_DOMAIN_GTT; > > } > > } > > return domain; > > } > > > > > > Let me know whether this is worth persuing. I could then test it with > > 5.15.88 and 6.1.6. > > Nope. What my patch does is allow display buffers to be in either system > memory (GTT) or carve out (VRAM) depending on what is available. Without > the patch, the driver picks either VRAM or GTT depending on how much VRAM is > available on the system. This can lead to memory exhaustion in some cases > with multiple large resolution monitors depending on memory fragmentation. > > What your patch does is just always use VRAM unless the chip is Carrizo or > Stoney. So it is effectively just reverting the commit (depending on how > much VRAM your system has). I see. Thanks a lot for the explanation!