Created attachment 287441 [details] dmesg output for resume-from-suspend, linux 5.5.4 OS: Arch Linux GPU: (MSI) Radeon R9 380 On `systemctl suspend` and subsequent resume, the monitors display "no signal". The machine is responsive, commands can be typed on the keyboard, SSH'ing is also possible. Somewhat unexpectedly, resume from hibernation works fine (i.e. there is signal). This started happening a few weeks ago, seemingly when `linux` v5.5.2 was installed. Was also present on v5.5.3 and v5.5.4 (current). `linux-lts` v5.4.20 does not exhibit this behaviour; it's a regression.
Created attachment 287443 [details] dmesg output for resume-from-suspend, linux 5.4.20
Created attachment 287445 [details] dmesg output for resume-from-hibernate, linux 5.5.4
Created attachment 287447 [details] lspci output
Can you bisect?
I'm not able to bisect at current moment. Will try by end of workweek. ----- User `muncrief` has recently reported something similar in a different bug report, here: https://bugzilla.kernel.org/show_bug.cgi?id=204241#c48 ... for Radeon R9 390, ever since linux 5.5-rc1. They were suggested opening a new issue, but a search on bugzilla shows they never did.
I have the same graphics card and the same problem. Do you need additional dmesg outputs from kernel 5.4.20 and 5.5.4? I don't know if this helps but I diffed my `amdgpu` filtered dmesg outputs: ``` --- 5.4.20-1-lts_amdgpu_wo_uptime.txt 2020-02-18 18:38:07.393633705 +0100 +++ 5.5.4-arch1-1_amdgpu_wo_uptime.txt 2020-02-18 18:38:32.714488497 +0100 @@ -1,7 +1,4 @@ [drm] amdgpu kernel modesetting enabled. -amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xd0000000 -> 0xdfffffff -amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xe0000000 -> 0xe01fffff -amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xefe00000 -> 0xefe3ffff fb0: switching to amdgpudrmfb from VESA VGA amdgpu 0000:01:00.0: vgaarb: deactivate vga console amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used) @@ -9,7 +6,9 @@ [drm] amdgpu: 4096M of VRAM memory ready [drm] amdgpu: 4096M of GTT memory ready. amdgpu: [powerplay] hwmgr_sw_init smu backed is tonga_smu +[drm:dm_helpers_parse_edid_caps [amdgpu]] *ERROR* Couldn't read SADs: -2 fbcon: amdgpudrmfb (fb0) is primary device amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device -[drm] Initialized amdgpu 3.35.0 20150101 for 0000:01:00.0 on minor 0 +[drm] Initialized amdgpu 3.36.0 20150101 for 0000:01:00.0 on minor 0 snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) +[drm:dm_helpers_parse_edid_caps [amdgpu]] *ERROR* Couldn't read SADs: -2 ``` The last line from the diff `[drm:dm_helpers_parse_edid_caps [amdgpu]] *ERROR* Couldn't read SADs: -2` happens after resuming (with blank screen).
`git bisect log` output at: https://gist.github.com/veox/36aeb77acfbcaea9c4ba1cc70052329a Had to `skip` a few because of system instability on v5.5.4 (cause unknown, likely unrelated to this bug); switched to v5.4.20 halfway-in to avoid. Result as follows (e-mails changed). 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 is the first bad commit commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 Author: Noah Abradjian <spam@gmail.com> Date: Fri Sep 27 16:30:57 2019 -0400 drm/amd/display: Make clk mgr the only dto update point [Why] * Clk Mgr DTO update point did not cover all needed updates, as it included a check for plane_state which does not exist yet when the updater is called on driver startup * This resulted in another update path in the pipe programming sequence, based on a dppclk update flag * However, this alternate path allowed for stray DTO updates, some of which would occur in the wrong order during dppclk lowering and cause underflow [How] * Remove plane_state check and use of plane_res.dpp->inst, getting rid of sequence dependencies (this results in extra dto programming for unused pipes but that doesn't cause issues and is a small cost) * Allow DTOs to be updated even if global clock is equal, to account for edge case exposed by diags tests * Remove update_dpp_dto call in pipe programming sequence (leave update to dppclk_control there, as that update is necessary and shouldn't occur in clk mgr) * Remove call to optimize_bandwidth when committing state, as it is not needed and resulted in sporadic underflows even with other fixes in place Signed-off-by: Noah Abradjian <spam@gmail.com> Reviewed-by: Jun Lei <spam@gmail.com> Acked-by: Leo Li <spam@gmail.com> Signed-off-by: Alex Deucher <spam@gmail.com> .../gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c | 14 +++++++++----- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 3 ++- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ---- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 8 +------- 4 files changed, 12 insertions(+), 17 deletions(-)
The *ERROR* Couldn't read SADs: -2 in my and Thomas' logs are unrelated to the issue, I believe, and pertain to sound (HDMI sound?..). The error comes from drivers/gpu/drm/radeon/radeon_audio.c, referring to Speaker Allocation Data. Anyway, I've seen the error on resume-from-suspend for commits that managed to "signal up" properly in the bisect above, as well as the "blank screen" cases.
Some of the commits that got skipped in that `git bisect log` of mine actually come before the one above when viewing `git log`. :/ Guess I'll try the bisect again in coming days.
That came in negative. Looks like it's 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 indeed.
I can confirm Noel's finding. Reverting 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 brings back the screen output after resume for me as well.
This issue is also present on a discrete r9-290, also on ArchLinux. So it seems the issue is with any 200 or 300 series cards. Likewise, there's a thread on Reddit with this issue was well: https://www.reddit.com/r/archlinux/comments/f7oti1/issue_with_resume_from_suspend_black_backlit/
I have the same problem. My graphic card is an AMD R9 285 and since some kernel updates ago the resume from suspend dont work with a black screen output but system and keyboard respond well. My distro is Anarchy Linux with KDE desktop and SDDM like display manager. Regards.
Looks like this has been corrected in 5.6... is there any intent to include the fix in any 5.5 kernel or will we just have to wait for 5.6?
(In reply to Joe Ramsey from comment #14) > Looks like this has been corrected in 5.6... is there any intent to include > the fix in any 5.5 kernel or will we just have to wait for 5.6? Can you identify the fix?
(In reply to Alex Deucher from comment #15) > (In reply to Joe Ramsey from comment #14) > > Looks like this has been corrected in 5.6... is there any intent to include > > the fix in any 5.5 kernel or will we just have to wait for 5.6? > > Can you identify the fix? If I understood Noel Maersk's and Thomas Frank's posts reverting 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 resolves the issue. The Reddit thread that was referenced (https://www.reddit.com/r/archlinux/comments/f7oti1/issue_with_resume_from_suspend_black_backlit/) seems to indicate that it's resolved in 5.6. Was wondering if whatever fix was applied to 5.6 would also be applied to 5.5. Could be I've completely misunderstood things. I'm running Slackware and have been using the -current kernel packages (currently at 5.4.25), but the kernel modules for virtualbox don't seem to be compiling under that kernel for some reason. I tried several of the recent 5.5 releases (5.5.8-5.5.10), and can get the virtualbox kernel modules to compile under them, but they all seem to have this bug. Was hoping to get one kernel that would allow my laptop to suspend and also compile the virtualbox modules. :^)
If you could verify that 5.6 works for you, you could bisect to see what commit fixed it.
(In reply to Alex Deucher from comment #17) > If you could verify that 5.6 works for you, you could bisect to see what > commit fixed it. OK, I'm about to reveal my ignorance. I just got a chance to compile 5.6-rc7 to confirm that resume from suspend worked (it did), but I have no idea how to bisect. Googled for it and it looks like I need to be using git, but I'm just downloading the tarball from kernel.org to compile my kernel. Is this even worth messing with given that it looks like we may have a stable 5.6 in the near future?
I can confirm that this issue was solved on 5.6 kernel, but sadly I will continue using lts kernel because I still have problems with my webcam's fps and microphone's bitrate on others kernels.
I'll do a bisect to identify the fix. Roughly 15 steps.
Created attachment 288351 [details] git bisect log to find the culprit Attaching original git bisect log (was posted to github previously).
(In reply to Duncan from comment #19) > I can confirm that this issue was solved on 5.6 kernel, but sadly I will > continue using lts kernel because I still have problems with my webcam's fps > and microphone's bitrate on others kernels. My webcam and microphone issues seem to be resolved in this new kernel, but I will keep an lts kernel in case I ever have to use it again.
Created attachment 288445 [details] git bisect log to find the fix (failed) After >45 steps, I gave up the bisect. There's a different bug that prevents the initramfs from loading at all, making it impossible to check if the issue-at-hand is still present. After having `skip`ped the first time this happened, I made the bad call of "maybe I'll tag this condition `old`, too"; I did this just once, but it might've had a negative effect on the outcome. I'm attaching the bisect log anyway.
(In reply to Alex Deucher from comment #17) > If you could verify that 5.6 works for you, you could bisect to see what > commit fixed it. I'm not 100% on bug closing process for the kernel; is this strictly required to mark the bug as resolved? The issue is no longer there, and the fix seems difficult to pin down. :(
Go ahead and close it. You can always open a new one if you see further issues.
Will close as resolved shortly. I did run a second bisect successfully, showing: f2988e67144a263e33aa3b916457bf3095288c94 is the first new commit commit f2988e67144a263e33aa3b916457bf3095288c94 Author: Yongqiang Sun <yongqiang.sun@amd.com> Date: Fri Oct 18 18:24:59 2019 -0400 drm/amd/display: optimize bandwidth after commit streams. [Why] System is unable to enter S0i3 due to DISPLAY_OFF_MASK not asserted in SMU. [How] Optimized bandwidth should be called paired and to resolve unplug display underflow issue, optimize bandwidth after commit streams is moved to next page flip, in case of S0i3, there is a change for no flip coming causing display count is 1 in SMU side. Add optimize bandwidth after commit stream. Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com> Reviewed-by: Tony Cheng <Tony.Cheng@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ++++ 1 file changed, 4 insertions(+)
Created attachment 288483 [details] git bisect log to find the fix (successful) Attaching successful git bisect log.
Closing as resolved - fix already in tree and released versions.