Bug 215203

Summary: Monitors no longer remain in a sleep state
Product: EFI Reporter: Brandon Nielsen (nielsenb)
Component: VideoAssignee: EFI Virtual User (efi)
Status: NEW ---    
Severity: normal CC: alexdeucher, dwagelaar, ego.cordatus, imre.deak, jan.public, mitch9654, pmenzel+bugzilla.kernel.org, regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.16-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: possible fix
possible fix 2
possible fix
patch 1/2
patch 2/2
patch 1/2
patch 2/2
Alternative patch 1/2
Alternative patch 2/2
possible fix

Description Brandon Nielsen 2021-12-02 20:49:48 UTC
Monitors no longer remain in sleep state after a screen lock, instead "waking up" and displaying a black screen.

5.14.17 and 5.14.18 do not have the issue. 5.15.4 does.

I have performed a bisect, first "bad" commit to master is 55285e21f04517939480966164a33898c34b2af2, the same change made it into the 5.15 branch as e3b39825ed0813f787cb3ebdc5ecaa5131623647. I have verified the issue exists in latest master (a51e3ac43ddbad891c2b1a4f3aa52371d6939570).


Steps to reproduce:

  1. Boot system (Fedora Workstation 35 in this case)
  2. Log in
  3. Lock screen (after a few seconds, monitors will enter power save "sleep" state)
  4. Wait (usually no more than 30 seconds, sometimes up to a few minutes)
  5. Observer monitor leaving "sleep" state (backlight comes back on), but nothing is displayed


The system in question is a desktop using the amdgpu GPU driver, two monitors connected via DP->HDMI adapters.

This is an upstream copy of a downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=2028613
Comment 1 Imre Deak 2021-12-07 17:47:04 UTC
Hi Brandon,

from
https://gist.github.com/belarusaed/7f21e2532cba9e30d9023b474c4bca77

referenced from the redhat ticket I presume the display is connected to the
0b:00.0

graphics card, and by
"5. Observer monitor leaving "sleep" state (backlight comes back on), but nothing is displayed"

you mean that the screen contents do get restored after some input events (so it doesn't stay blank).

I suspect that the result of the commit you bisected will enable runtime PM on that card (as for any PCI graphics card that would take over the EFI framebuffer) and this somehow reveals the issue you observe. Could you verify this by checking with and without the commit

cat /sys/devices/pci0000:00/0000:0b:00.0/power/control

after having booted up?

If the above shows 'auto' with the commit applied, could you set this to 'on' and check if the problem still reproduces?
Comment 2 Brandon Nielsen 2021-12-07 21:57:40 UTC
(In reply to Imre Deak from comment #1)
> Hi Brandon,
> 
> from
> https://gist.github.com/belarusaed/7f21e2532cba9e30d9023b474c4bca77
> 
> referenced from the redhat ticket I presume the display is connected to the
> 0b:00.0



That is not mine, that bug report turned into a bit of a mess of "me-to" comments, and was started well before this commit, I probably shouldn't have reopened it. I can attach fpaste output to this ticket if you want me to. Display on my machine is at 01:00.0.


> graphics card, and by
> "5. Observer monitor leaving "sleep" state (backlight comes back on), but
> nothing is displayed"
> 
> you mean that the screen contents do get restored after some input events
> (so it doesn't stay blank).



Yes, that is correct.

 
> I suspect that the result of the commit you bisected will enable runtime PM
> on that card (as for any PCI graphics card that would take over the EFI
> framebuffer) and this somehow reveals the issue you observe. Could you
> verify this by checking with and without the commit
> 
> cat /sys/devices/pci0000:00/0000:0b:00.0/power/control
> 
> after having booted up?



Before the commit, kernel git revision fac956710ab0:


[nielsenb@boleyn ~]$ cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/control
auto


After the commit, kernel git revision 55285e21f045:

[nielsenb@boleyn kernel]$ cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/control
auto


> If the above shows 'auto' with the commit applied, could you set this to
> 'on' and check if the problem still reproduces?


With the commit applied (git revision 55285e21f045):

Forcing on:

[nielsenb@boleyn linux]$ sudo sh -c "echo on > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/control"

Makes the problem go away, returning to auto:

[nielsenb@boleyn linux]$ sudo sh -c "echo auto > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/control"

Makes the problem return.
Comment 3 Imre Deak 2021-12-07 22:55:55 UTC
Thanks, so runtime PM being enabled for the AMD gfx driver seems to trigger the problem.

About power/control being 'auto' at fac956710ab0: I suppose that tree doesn't trigger the problem. It doesn't contain 55285e21f045 so I wonder how runtime PM there can be enabled after the EFI FB->drm FB takeover (IOW how the issue in that tree doesn't trigger if we consider the root cause being RPM being enabled). 

If you could double-check power/control in fac956710ab0 and it's really 'auto', there could be multiple issues at hand with the same symptom and it would be better to just take v5.15.6, and check power/control and if the issue triggers there before and after reverting 55285e21f045.

In any case I suspect AMD folks will need to look at this, as 55285e21f045 seems to reveal a yet unknown issue.
Comment 4 Brandon Nielsen 2021-12-08 20:11:20 UTC
(In reply to Imre Deak from comment #3)
> Thanks, so runtime PM being enabled for the AMD gfx driver seems to trigger
> the problem.
> 
> About power/control being 'auto' at fac956710ab0: I suppose that tree
> doesn't trigger the problem. It doesn't contain 55285e21f045 so I wonder how
> runtime PM there can be enabled after the EFI FB->drm FB takeover (IOW how
> the issue in that tree doesn't trigger if we consider the root cause being
> RPM being enabled). 
> 
> If you could double-check power/control in fac956710ab0 and it's really
> 'auto', there could be multiple issues at hand with the same symptom and it
> would be better to just take v5.15.6, and check power/control and if the
> issue triggers there before and after reverting 55285e21f045.

Just double checked, it really is set to 'auto' in fac956710ab0.

Switching to the stable branch, with 5.15.6 the issue does occur, PM state is 'auto'.

After reverting e3b39825ed0813f787cb3ebdc5ecaa5131623647 in v5.15.6, the issue does not occur, PM state is 'auto'. Switching PM state to 'on' makes the issue come back but only for the first screen lock. The second time, after a lock the monitors will take longer to go to sleep (~15 sec. vs the usual ~3 sec.), but will not wake up again until an input event. The third time, after a lock, everything works as expected (~3 sec. after lock for monitors to go to sleep, monitors stay asleep until an input event). I am not sure if this behavior is consistent with previous kernels, I only just noticed it.
Comment 5 Imre Deak 2021-12-08 20:46:46 UTC
Ok. Could you check in 5.15.6 if setting power/control to 'on' solves the problem, at least at the second and subsequent screen blankings?

Also it's actually power/usage_count that would show if the device runtime suspends or not, so could you test once more on an ssh shell the content of power/usage_count after blanking the screen (and waiting until the monitor is supposed to be in power saving state) both in 5.15.6 and then reverting e3b39825ed081?

Thanks.
Comment 6 Brandon Nielsen 2021-12-09 01:19:54 UTC
(In reply to Imre Deak from comment #5)
> Ok. Could you check in 5.15.6 if setting power/control to 'on' solves the
> problem, at least at the second and subsequent screen blankings?

With v5.15.6, forcing power/control to 'on' does not seem to solve the problem, even on second and subsequent blankings. This is different than I observed in comment 2 with mainline commit 55285e21f045. Perhaps 'on' isn't always a fix, and just makes the actual problem less likely to occur?

> Also it's actually power/usage_count that would show if the device runtime
> suspends or not, so could you test once more on an ssh shell the content of
> power/usage_count after blanking the screen (and waiting until the monitor
> is supposed to be in power saving state) both in 5.15.6 and then reverting
> e3b39825ed081?
> 
> Thanks.

I don't have a power/usage_count. I have:

[nielsenb@boleyn ~]$ ls /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/*count*
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/wakeup_abort_count
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/wakeup_active_count
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/wakeup_count
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/wakeup_expire_count
Comment 7 Imre Deak 2021-12-09 02:27:44 UTC
Sorry, I meant power/runtime_usage.

Not sure how 55285e21f045 makes a difference if disabling runtime PM on the Gfx PCI device doesn't work around the problem.

Maybe 0000:00:01.0/0000:01:00.0 is not the correct PCI device? Could you run

find \( -path \*power/control -o -path \*power/runtime_usage \) -exec grep -H . {} \;

in

/sys/devices/pci0000:00/0000:00:01.0

while in the suspended state?
Comment 8 Imre Deak 2021-12-09 02:48:41 UTC
You could also check the PCI device in interest and its control/runtime_usage corresponding to the DRM driver/DRM FB by

# ls -l /sys/class/drm/card0/device
# cat /sys/class/drm/card0/device/power/{control,runtime_usage}
Comment 9 Brandon Nielsen 2021-12-09 20:39:11 UTC
(In reply to Imre Deak from comment #7)
> Sorry, I meant power/runtime_usage.

Don't have that either:

[nielsenb@boleyn ~]$ ll /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/
total 0
-rw-r--r--. 1 root root 4096 Dec  9 14:26 autosuspend_delay_ms
-rw-r--r--. 1 root root 4096 Dec  9 14:26 control
-r--r--r--. 1 root root 4096 Dec  9 14:26 runtime_active_time
-r--r--r--. 1 root root 4096 Dec  9 14:26 runtime_status
-r--r--r--. 1 root root 4096 Dec  9 14:26 runtime_suspended_time
-rw-r--r--. 1 root root 4096 Dec  9 14:26 wakeup
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_abort_count
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_active
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_active_count
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_count
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_expire_count
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_last_time_ms
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_max_time_ms
-r--r--r--. 1 root root 4096 Dec  9 14:26 wakeup_total_time_ms

> Not sure how 55285e21f045 makes a difference if disabling runtime PM on the
> Gfx PCI device doesn't work around the problem.
 
Okay, I double checked everything, and I think the issue was I was confused by some of the behavior I was seeing. Specifically that forcing power/control to 'on' can sometimes result in the monitors not sleeping for 15 or so seconds and I'm impatient.

55285e21f045 - Displays issue, power/control 'auto' by default. Forcing power/control to 'on' fixes the issue (with sometimes delayed monitor sleep, as mentioned above).

v5.15.6 - Same as above. Displays issue, power/control 'auto' by default. Forcing power/control to 'on' fixes the issue (with sometimes delayed monitor sleep, as mentioned above). When I performed my test from comment 6, I probably didn't wait for the monitor to sleep.

v5.15.6 with e3b39825ed0813f787cb3ebdc5ecaa5131623647 reverted - Does not display issue. power/control 'auto' by default, no monitor sleep delay. Forcing power/control to 'on' introduces the monitor sleep delay but otherwise does not display the issue.

> Maybe 0000:00:01.0/0000:01:00.0 is not the correct PCI device? Could you run
> 
> find \( -path \*power/control -o -path \*power/runtime_usage \) -exec grep
> -H . {} \;
> 
> in
> 
> /sys/devices/pci0000:00/0000:00:01.0
> 
> while in the suspended state?

[nielsenb@boleyn ~]$ cd /sys/devices/pci0000:00/0000:00:01.0
[nielsenb@boleyn 0000:00:01.0]$ find \( -path \*power/control -o -path \*power/runtime_usage \) -exec grep -H . {} \;
./power/control:auto
./0000:01:00.0/graphics/fb0/power/control:auto
./0000:01:00.0/power/control:auto
./0000:01:00.0/hwmon/hwmon1/power/control:auto
./0000:01:00.0/drm/renderD128/power/control:auto
./0000:01:00.0/drm/card0/card0-HDMI-A-1/power/control:auto
./0000:01:00.0/drm/card0/card0-DVI-D-1/power/control:auto
./0000:01:00.0/drm/card0/power/control:auto
./0000:01:00.0/drm/card0/card0-DP-2/power/control:auto
./0000:01:00.0/drm/card0/card0-DP-2/drm_dp_aux1/power/control:auto
./0000:01:00.0/drm/card0/card0-DP-3/power/control:auto
./0000:01:00.0/drm/card0/card0-DP-3/drm_dp_aux2/power/control:auto
./0000:01:00.0/drm/card0/card0-DP-1/power/control:auto
./0000:01:00.0/drm/card0/card0-DP-1/drm_dp_aux0/power/control:auto
./pci_bus/0000:01/power/control:auto
./0000:01:00.1/power/control:auto
./0000:01:00.1/hdaudioC1D0/power/control:auto
./0000:01:00.1/sound/card1/input9/event9/power/control:auto
./0000:01:00.1/sound/card1/input9/power/control:auto
./0000:01:00.1/sound/card1/hwC1D0/power/control:auto
./0000:01:00.1/sound/card1/input14/power/control:auto
./0000:01:00.1/sound/card1/input14/event14/power/control:auto
./0000:01:00.1/sound/card1/pcmC1D9p/power/control:auto
./0000:01:00.1/sound/card1/pcmC1D8p/power/control:auto
./0000:01:00.1/sound/card1/pcmC1D7p/power/control:auto
./0000:01:00.1/sound/card1/input12/power/control:auto
./0000:01:00.1/sound/card1/input12/event12/power/control:auto
./0000:01:00.1/sound/card1/pcmC1D3p/power/control:auto
./0000:01:00.1/sound/card1/input10/power/control:auto
./0000:01:00.1/sound/card1/input10/event10/power/control:auto
./0000:01:00.1/sound/card1/power/control:auto
./0000:01:00.1/sound/card1/input13/event13/power/control:auto
./0000:01:00.1/sound/card1/input13/power/control:auto
./0000:01:00.1/sound/card1/controlC1/power/control:auto
./0000:01:00.1/sound/card1/input11/power/control:auto
./0000:01:00.1/sound/card1/input11/event11/power/control:auto
./0000:01:00.1/sound/card1/pcmC1D11p/power/control:auto
./0000:01:00.1/sound/card1/pcmC1D10p/power/control:auto
Comment 10 Imre Deak 2021-12-09 21:48:56 UTC
Ok, thanks for the tests.

+Alex, could you take a look at this? I think e3b39825ed0813 revealed another issue in the AMD Gfx driver which should be root-caused/fixed (as the bisected commit only permits all Gfx devices to runtime suspend whenever EFI FB is used).

The reasoning based on all of Brandon's tests so far:

On v5.15.6 (containing  e3b39825ed0813) after inactivity the display (on /sys/devices/pci0000:00/0000:00:01.0) will blank/runtime suspend, but after a while will wake up prematurely without any user input.

Reverting e3b39825ed0813 or force-disabling runtime PM on the device (by setting power/control to "on") seems to work around the issue. Brandon noticed that disabling RPM adds some extra delay before the monitor enters the sleep state. Not sure about the reason for that, power/autosuspend_delay_ms could be checked/lowered, though normally it would only have an effect when RPM is enabled.

More direct ways to check that the device does in fact runtime suspend (on v5.15.6 with power/control left at 'auto'):
- Check if power/runtime_suspended_time increases across multiple display sleep/wake-ups.
In case the above counter doesn't increment:
- Build the kernel with CONFIG_PM_ADVANCED=y and check power/runtime_usage at the point the monitor goes to sleep.
or
- Enable some debug option specific to the AMD gfx driver indicating the runtime suspend/resume events in dmesg.
Comment 11 Alex Deucher 2021-12-09 22:31:07 UTC
Yes, please check to see if the device has actually runtime suspended (runtime_status and runtime_suspended_time).  Note that the displays blanking are independent of runtime suspend.  Runtime suspend actually turns off the card when it's completely idle.  The display blanking is controlled by the desktop compositor, but the card may still be active if you have background compute work or something still using the GPU.  Also, in these kernels, runtime pm shouldn't kick in at all if you have drm fbdev support enabled and a console bound for this device because the console maintains a mmap to the framebuffer which increments the runtime pm ref count.

If runtime pm is active, any access to the device will wake the card (e.g., IOCTL, hwmon sysfs query, etc.).  You may have some process running that queries the card on a (ir)regular basis.
Comment 12 Brandon Nielsen 2021-12-10 00:29:38 UTC
With 5.15.6, power/control on 'auto', I can confirm approximately around the time the screens blank, /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/runtime_status goes to 'suspended'.

Suspended time increments as well:

[nielsenb@boleyn ~]$ cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/runtime_suspended_time
6409

At around the time the screens wake up, runtime_status returns to 'active', it never seems to return to 'suspended'.
Comment 13 The Linux kernel's regression tracker (Thorsten Leemhuis) 2021-12-20 04:15:15 UTC
Lo, this is your Linux kernel regression tracker speaking. This bug about a regression on 5.16-rc looks stalled right before the holiday season, as there hasn't been any activity since ten days now.

@Alex, could you give a status update please? Was the information provided by Brandon what you needed? What's needed to get this moving again?
Comment 14 Alex Deucher 2021-12-21 17:18:08 UTC
(In reply to Brandon Nielsen from comment #12)
> With 5.15.6, power/control on 'auto', I can confirm approximately around the
> time the screens blank,
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/runtime_status goes
> to 'suspended'.
> 
> Suspended time increments as well:
> 
> [nielsenb@boleyn ~]$ cat
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/power/
> runtime_suspended_time
> 6409
> 
> At around the time the screens wake up, runtime_status returns to 'active',
> it never seems to return to 'suspended'.

It seems that shortly after the device goes into runtime power management, something wakes it again.  Perhaps some userspace process is querying the driver for something (e.g., the the compositor is polling for display changes or some process is polling GPU temperature) on a regular basis?
Comment 15 The Linux kernel's regression tracker (Thorsten Leemhuis) 2021-12-21 17:29:24 UTC
Imre, Alex, as we are at rc6 already and the festive season is coming up; Linus said that he will release a rc8, but nevertheless:

Is reverting the culprit an option? And if it is: is it a good idea to buy us more time?
Comment 16 Alex Deucher 2021-12-21 17:54:37 UTC
Runtime power management has been enabled in amdgpu long before either 55285e21f0451 or a6c0fd3d5a8b were applied to efifb.  runtime pm has been enabled for radeon GPUs in laptops for close to a decade, and it's been enabled on desktop GPUs for almost 2 years.  It seems like perhaps there was some change to some userspace component in the meantime which seems to regularly wake the GPU and perhaps the changes in 55285e21f0451 or a6c0fd3d5a8b papered over that.  E.g., some desktop application is polling the GPU for display changes or GPU temperature.  Can you repro the issue on bare bones X server for example without a full gnome or kde environment loaded?
Comment 17 Imre Deak 2021-12-21 18:37:57 UTC
@Thorsten, reverting 55285e21f0451 alone is not acceptable from my POV since it fixed an actual problem leaking a runtime PM reference from the EFI FB driver. That would block runtime PM power saving for both the AMD and Intel graphics devices.
Comment 18 Alex Deucher 2021-12-21 18:39:38 UTC
Created attachment 300103 [details]
possible fix

Does this patch help?
Comment 19 Brandon Nielsen 2021-12-21 20:33:43 UTC
(In reply to Alex Deucher from comment #18)
> Created attachment 300103 [details]
> possible fix
> 
> Does this patch help?

I can confirm it fixes the issue of the monitors waking from sleep, but the device no longer seems to runtime suspend ('runtime_suspended_time' no longer increments).
Comment 20 Brandon Nielsen 2021-12-21 20:37:32 UTC
(In reply to Alex Deucher from comment #16)
> Runtime power management has been enabled in amdgpu long before either
> 55285e21f0451 or a6c0fd3d5a8b were applied to efifb.  runtime pm has been
> enabled for radeon GPUs in laptops for close to a decade, and it's been
> enabled on desktop GPUs for almost 2 years.  It seems like perhaps there was
> some change to some userspace component in the meantime which seems to
> regularly wake the GPU and perhaps the changes in 55285e21f0451 or
> a6c0fd3d5a8b papered over that.  E.g., some desktop application is polling
> the GPU for display changes or GPU temperature.  Can you repro the issue on
> bare bones X server for example without a full gnome or kde environment
> loaded?

I don't have any particular system monitoring services configured that I'm away of, but I am open to the idea this is just a strange confluence of issues that 55285e21f0451 happens to expose as 'monitors not sleeping'. There is certainly precedent for it really being a GNOME / mutter issue[0].

If it is still of interest, I can try to figure out how to boot this system into bare X and see what happens.

[0] - https://gitlab.gnome.org/GNOME/mutter/-/issues/1021
Comment 21 Brandon Nielsen 2021-12-21 21:08:21 UTC
Booted into twm (ah memories), but I can't figure out how to suspend in a fashion that would enter runtime power management.

[nielsenb@boleyn ~]$ xlock -dpmsoff 1

Makes the monitors sleep, but I don't see 'runtime_suspended_time' incrementing. If this test is of interest, I think I'm going to need more guidance.
Comment 22 Imre Deak 2021-12-21 21:17:24 UTC
$ xset dpms force off ?
Comment 23 Alex Deucher 2021-12-21 21:23:45 UTC
(In reply to Brandon Nielsen from comment #19)
> (In reply to Alex Deucher from comment #18)
> > Created attachment 300103 [details]
> > possible fix
> > 
> > Does this patch help?
> 
> I can confirm it fixes the issue of the monitors waking from sleep, but the
> device no longer seems to runtime suspend ('runtime_suspended_time' no
> longer increments).

I think that issue may be separate.  It sounds like something is polling the GPU on a regular basis.  If the frequency of that polling is <5 seconds, the device won't runtime suspend as the default autosuspend timeout set by the driver is 5 seconds.
Comment 24 Alex Deucher 2021-12-21 21:49:11 UTC
See also: https://gitlab.freedesktop.org/drm/amd/-/issues/1840
Comment 25 Alex Deucher 2021-12-21 21:51:15 UTC
Created attachment 300109 [details]
possible fix 2

Does this patch also help?
Comment 26 Brandon Nielsen 2021-12-21 22:18:45 UTC
(In reply to Imre Deak from comment #22)
> $ xset dpms force off ?

Same as '$ xlock -dpmsoff 1' without the thrill of 5 seconds of your favorite random retro screensaver. Monitors sleep correctly, and stay sleeping, but 'runtime_suspended_time' does not increment.
Comment 27 Brandon Nielsen 2021-12-22 00:19:20 UTC
(In reply to Alex Deucher from comment #25)
> Created attachment 300109 [details]
> possible fix 2
> 
> Does this patch also help?

Yes. Monitors stay in sleep state, `runtime_suspended_time` increases.

I will be gone until sometime early next week, so I won't be around to test until then.
Comment 28 Alex Deucher 2021-12-22 15:40:21 UTC
(In reply to Brandon Nielsen from comment #27)
> (In reply to Alex Deucher from comment #25)
> > Created attachment 300109 [details]
> > possible fix 2
> > 
> > Does this patch also help?
> 
> Yes. Monitors stay in sleep state, `runtime_suspended_time` increases.
> 
> I will be gone until sometime early next week, so I won't be around to test
> until then.

Thanks.  Unfortunately, it doesn't fix it for https://gitlab.freedesktop.org/drm/amd/-/issues/1840 .
Comment 29 Alex Deucher 2021-12-22 15:41:30 UTC
Created attachment 300111 [details]
possible fix

This patch should fix the issue at least for this stage of the kernel.  We can revisit it more thoroughly for the next cycle.
Comment 30 Alex Deucher 2021-12-23 04:03:39 UTC
Created attachment 300121 [details]
patch 1/2
Comment 31 Alex Deucher 2021-12-23 04:04:35 UTC
Created attachment 300123 [details]
patch 2/2

How about these patches?  They should restore the previous behavior.
Comment 32 Imre Deak 2021-12-23 09:09:42 UTC
@Alex, shouldn't amdgpu_is_conflicting_framebuffer() be called before drm_aperture_remove_conflicting_pci_framebuffers() in amdgpu_pci_probe()? I think the RPM refcount is still dropped then by drm_aperture_remove_conflicting_pci_framebuffers(), not sure if that's an issue and the driver should get it's own ref before that.
Comment 33 Alex Deucher 2021-12-24 06:23:06 UTC
Created attachment 300135 [details]
patch 1/2
Comment 34 Alex Deucher 2021-12-24 06:23:33 UTC
Created attachment 300137 [details]
patch 2/2
Comment 35 Alex Deucher 2021-12-24 06:24:14 UTC
Thanks.  Fixed up patches attached.
Comment 36 Brandon Nielsen 2021-12-27 17:42:22 UTC
(In reply to Alex Deucher from comment #35)
> Thanks.  Fixed up patches attached.

I can confirm they change 'power/control' to 'on', and that monitors sleep correctly.
Comment 37 Alex Deucher 2021-12-28 22:37:55 UTC
Created attachment 300177 [details]
Alternative patch 1/2

These new patches are an alternative approach which should handle things better if there is no efifb configured in your kernel.
Comment 38 Alex Deucher 2021-12-28 22:38:26 UTC
Created attachment 300178 [details]
Alternative patch 2/2
Comment 39 Alex Deucher 2021-12-29 17:16:06 UTC
(In reply to Alex Deucher from comment #37)
> Created attachment 300177 [details]
> Alternative patch 1/2
> 
> These new patches are an alternative approach which should handle things
> better if there is no efifb configured in your kernel.

hmm, looking at the runtime pm documentation more closely, I don't think these patches will help.  If the driver enables autosuspend, the idle callback is not used.
Comment 40 Alex Deucher 2022-01-03 15:03:57 UTC
Created attachment 300213 [details]
possible fix

Does this patch work?
Comment 41 Brandon Nielsen 2022-01-03 19:33:52 UTC
(In reply to Alex Deucher from comment #40)
> Created attachment 300213 [details]
> possible fix
> 
> Does this patch work?

No. Same behavior as described in comment #12. 'power/control' is 'auto'. On the first screen lock, monitors suspend, and '/power/runtime_suspended_time' increments. After a few seconds the screens wake up, but blank.

Subsequent lockings behave similarly, but '/power/runtime_suspended_time' never increments further.
Comment 42 Paul Menzel 2022-10-12 11:50:35 UTC
(In reply to Imre Deak from comment #17)
> @Thorsten, reverting 55285e21f0451 alone is not acceptable from my POV since
> it fixed an actual problem leaking a runtime PM reference from the EFI FB
> driver. That would block runtime PM power saving for both the AMD and Intel
> graphics devices.

So reverting commit 55285e21f045 (fbdev/efifb: Release PCI device's runtime PM ref during FB destroy) and a6c0fd3d5a8b (efifb: Ensure graphics device for efifb stays at PCI D0) would need to be reverted?
Comment 43 Alex Deucher 2022-10-12 13:23:37 UTC
I think this bug can be closed.  The original behavior was restored in amdgpu with several iterations of patches, ultimately in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4020c2280233279ea682a7f2f24b54426416d91d