Bug 210683 - Nasty amdgpu powersave regression Navi14
Summary: Nasty amdgpu powersave regression Navi14
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-14 13:08 UTC by siyia
Modified: 2021-01-16 04:33 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.10
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg outputs for 5.9.14 and 5.10.1. (47.90 KB, application/gzip)
2020-12-20 03:26 UTC, David Mak
Details
git bisect for Navi 21. (2.00 KB, text/plain)
2020-12-22 08:29 UTC, David Mak
Details

Description siyia 2020-12-14 13:08:57 UTC
Regression compared to kernel 5.19, the gpu idles constantly from 5W-6W to 15W-16W after upgrading to kernel 5.10rc7 resulting in higher temps at idle
Comment 1 siyia 2020-12-14 13:22:24 UTC
Just tested kernel 5.10 stable and idle powersave is still broken on the gpu compared to kernel 5.9
Comment 2 siyia 2020-12-14 13:40:42 UTC
vddgfx also plays between 6.00mv-700mv compared to a steady 6.00 mv in kernel 5.9 at idle
Comment 3 Alex Deucher 2020-12-14 18:22:43 UTC
Please attach your dmesg output.  Can you bisect?
Comment 4 David Mak 2020-12-20 03:26:23 UTC
Created attachment 294239 [details]
dmesg outputs for 5.9.14 and 5.10.1.

I can reproduce the issue on the RX 6800 (Navi 21 XL).

I use Radeontop to inspect the memory/GPU clock of my GPU.

When using Linux 5.9.14:
- In both KDE Plasma and tty2, Memory Clock hovers at around 100MHz.
- GPU Power reported by lm_sensors is around 5-7W.

When using Linux 5.10.1:
- In tty2, Memory Clock hovers at around 100MHz and GPU Power reported by lm_sensors is around 5-7W.
- In KDE Plasma, Memory Clock is usually around 1GHz (100%), although it can be down to ~470MHz, and GPU Power reported by lm_sensors is around 30W.
- Disconnecting one of my two monitors does not change the memory clock.

I am trying to bisect the commit, but many revisions seem to give a blank screen or the amdgpu module is not loaded. (I suspect I am not building the kernel properly)

Tested linux-firmware versions: 20201120.bc9cd0b, 20201218.646f159
Comment 5 David Mak 2020-12-22 08:29:32 UTC
Created attachment 294289 [details]
git bisect for Navi 21.

Not sure if I am doing is properly, but I performed a git bisect between 5.9 and 5.10 on drivers/gpu/drm/amd.

Note that none of the supposedly "good" commits actually fixed the issue. I just mark them as "good" because those commits either cannot modeset to my monitor's resolution, or the kernel fails to write certain registers to my GPU and causes my display lose signal and go to standby. So technically the "first bad commit" is just the first commit that I can boot into SDDM/KDE and can also reproduce the issue.

Please let me know if there are anything else I can help with. I have a spare Vega 64 (Vega 10) card lying around, but it has its own memory clock issues as far as I remember =/
Comment 6 Mershl 2020-12-24 11:25:58 UTC
This seems to affect APUs as well. I can reproduce the issue on Raven (3500U).
Comment 7 Andreas Prittwitz 2020-12-26 16:14:07 UTC
I can confirm this with an RX 5700 XT.

GPU-power and temperatures with running fans @1100 rpm:

with kernel 5.9.14 during idle = 10 -11 W, temps around 32-35 deg C
with kernel 5.10.1 and 5.10.2 during idle = 35 W, temps around 42-45 deg C
Comment 8 siyia 2020-12-26 16:15:54 UTC
I've found a workaround here: https://gitlab.freedesktop.org/drm/amd/-/issues/1407

read last 2 comments
Comment 9 Andreas Prittwitz 2020-12-27 16:24:22 UTC
I have done some addtional testing.
I am running an up to date openSUSE Tumleweed with KDE Plasma.
My monitor is capable of running at 60, 100, 120 and 144 Hz refresh rate.

For some reason unknown to me it was set to 60 Hz after the last update.
With this setting it idled at 35 W instead of the usual 11 W.
Also the memory/GPU frequencies come down to what they used to and where they should be.

After setting it back to 144 Hz, wattage at idle came back down to 11 W.
Setting it back to 60 Hz refresh rate lets the wattage come back up to 35 W.
This is reproducible any time.

Setting it to 100 and 120 Hz resepectively lets the graphics card also idle at
11W.

It seems, at least on my system, that this bug only affects me, when the
monitor is set to 60 Hz refresh rate.
Comment 10 Andreas Prittwitz 2020-12-27 16:26:48 UTC
I have done some addtional testing.
I am running an up to date openSUSE Tumleweed with KDE Plasma.
My monitor is capable of running at 60, 100, 120 and 144 Hz refresh rate.

For some reason unknown to me it was set to 60 Hz after the last update.
With this setting it idled at 35 W instead of the usual 11 W.

After setting it back to 144 Hz, wattage at idle came back down to 11 W.
Also the memory/GPU frequencies come down to what they used to and where they should be.

Setting it back to 60 Hz refresh rate lets the wattage come back up to 35 W.
This is reproducible any time.

Setting it to 100 and 120 Hz resepectively lets the graphics card also idle at
11W.

It seems, at least on my system, that this bug only affects me, when the
monitor is set to 60 Hz refresh rate.

Note You need to log in before you can comment on or make changes to this bug.