Bug 208115

Summary: amdgpu (likely) - power management and display connection problems with an RX590 card
Product: Drivers Reporter: Adarion from userland (h_mailinglists)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: juippis, paananen.olli
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.x.x Subsystem:
Regression: No Bisected commit-id:
Attachments: excerpt from dmesg grepping amdgpu

Description Adarion from userland 2020-06-09 20:59:03 UTC
Created attachment 289583 [details]
excerpt from dmesg grepping amdgpu

Bug report - power management and display connection problems with an RX590 card

Hello developer team
Please bear with me, it is my first bug report on the actual kernel. 

It _might_ partially be related to
https://bugzilla.kernel.org/show_bug.cgi?id=201139


background / generic info:
I have an AMD RX 590,  which is giving me some severe troubles.

I have a multitude of ATI/AMD cards/APUs in use for years, mostly Gentoo Linux, a few Deb. derivatives and W32.
RX 590 (PCIe)
RX 560 (PCIe)
HD 5770 (PCIe)
HD 5670 (PCIe)
HD 5450 and the likes (PCI, PCIe)
HD 3870 (PCIe)
Kabini (Athlon 5350) (AM1)
Kabini E-2100 (soldered/BGA)
E-350 (soldered/BGA)
Geode LX ;-)  (soldered/BGA / companion chip)
and more

the very chip/card in question:

Sapphire Nitro+ Radeon RX 590 8G 50th Anniversary, 8192 MB GDDR5
(the golden one)

the following setup it is currently dysfunctional:

RX 590
Zen+ 2700
MSI PC-Mate B-350 (latest FW)
16 GiB RAM
PSU BeQuiet DarkPowerPro 550 (should be strong enough, and problems are on the low power state side)
Monitor: Eizo EV2436W hooked up via DP

The setup works _nicely_ with a different GPU (e.g. HD 5450, okay, that's not amdgpu driver, but anyway).
My other actual amdgpu card, the RX 560 (Polaris 11) works like a charm in an FX 6300 setup.
The very (Eizo) screen also works flawless on my Kabini (though there I have to use a HDMI-2-DVI adapter connection); also an old Geode LX runs fairly well via VGA.


software
(Gentoo) Linux (5.x.x kernel; tried various versions over time, dind't really get much better), libdrm 2.4.9x / 2.4.10x, mesa 19.3.2 or later, xf86-video-amdgpu 19.1.0


I built a box based on a Ryzen Zen+ 2700, MSI PC-Mate B-350 mainboard. 
While I was setting it up I ran my elderly HD 5670 in it and everything was fine.
All other cards in that ZEN+ system I tried so far worked like a charm. Severe video transcoding (CPU based), just "desktopping around", severe compiling (<-Gentoo): No problem! Power management? No problem!

With the RX 590 it's a sheer pain.


problems:
* GPU not coming back once monitor goes into powersaving
* link lost on every second power save (screen blanking / suspend / off / BACO)
	relation to #201139 ?
* reading EDID problems message I found once in dmesg could be a hint (but it seems all others (cards or different boxes) can obtain the EDID)
* Sometimes it seems I can still send commands via keyboard / work blindly and thus I might try to start a xrandr script to switch on/off ("reset") the digital outputs?
* occasionally switching to VT (and back) helps, sometimes not, and the hardware is frozen; even REISUB (!) won't work.
* once I also got it back - but - in max. 800 x 600 resolution
* sometimes I can re-gain a signal by
    replugging the cable
    switching monitor on/off
    
* freezes (which seem power management related)
e.g. running a standard compile job
host system had little to do, compilation was running inside a chroot env. (amd64 on amd64)
next morning: LEDs on mainboard/GPU still glowing, fans spinning, system entirely frozen, not even REISUB would help
nothing in the logs
from /var/log/emerge.log it must have stopped somewhere in the middle of a harmless compile (iirc. it was sys-fs/fuse or something), and I don't use strange CFLAGs which might throw illegeal opcodes or something

    
* power consumption is too high during idle
* strange power readings in "sensors" at least 33 W (should be 10 W on idling and 3 W in BACO / zero core)
* hint: also the W32 / W64 blob showed quite high consumption during desktop idle (AMD blob / GPU-Z)
* wall measured might be slightly better but whole system (Zen+, GPU, 2 SSDs and one HDD, hardly any USB periphery no other cards in slots one BD/DVD/CDROM) never drops below 55 W, it's rather higher


Is there something I might have missed?
Should I try to obtain more verbose logs? Is there any "x-trace" tool that I could run? Radeontop information outputs?


I'll attach one of the few logs I could obtain which might contains some hints towards what is happening.

on my to-do list: 
* try a different monitor (though that very EIZO monitor worked like a charm with everything else I threw at it)
* try HDMI instead of DP, but I think I don't have HDMI monitors at hand
* try the RX590 in  a different box (e.g. my FX 6300 unit, which currently runs flawless with an RX 560) - and see if it still misbehaves... 

Sorry for the wall of text.

keywords: link lost, power management problems, powerplay, device reset reinitialization, system freeze, x86-64 amd64, amdgpu, AMD RX 590 RX590 Polaris
Comment 1 Adarion from userland 2020-08-18 19:46:09 UTC
Sadly I did not yet find the time for long time tests in my productive systems. 

a quick check:
RX 590 in my FX-6300 based box (Asus M5A78L-M PLUS/USB3)
I did one successful return from DPMS, but that was just one, and different (newly purchased) monitor (Asus PA24A) hooked up via HDMI.
Kernel was 5.7.x.

idle KDE session: ~69 W (wall, "empty system" just mainboard, CPU, fans, RAM and one SSD to boot from)
early DPMS off: 66 W
BACO(?): jumps between 62...77 W (there shouldn't have been much of a system activity, I wonder what's caused the alterations)

"sensors" readings still high, show something around 30W idle.

(The RX 560 however, is down to 56 W in idle KDE, but also shows some variance during DPMS off/BACO.)

Expected: The RX 590 should be roughly 9-12 W idle, like it's smaller brethren.

Currently running the Zen+ with some old HD 5450, the RX 590 is gathering dust. (Can't risk crashing my productive system every 10 minutes.)
Comment 2 op 2020-10-30 15:21:43 UTC
I have similar behaviour with RX 5700 XT and two outputs, one DisplayPort(primary) one HDMI(tv).  

If i plugin hdmi cable after i booted the system, everything works fine. Idle power consumption is at ~10-12 watts -> plug in and enable HDMI output, it goes up to ~36 watts, disable or plug out cable -> drops back to ~10-12 watts as expected.

But if i have the HDMI cable connected from the start when i boot up the system, this behaviour appears. Power consumption is at ~36 watts all the time, no matter if i disable HDMI with xrandr or even plug out the HDMI cable.
Comment 3 Adarion from userland 2021-03-01 17:39:09 UTC
A little (and very late, sorry, but I have a v. stressful "real life") update from my side. 
I recently found the time to plug the RX 590 back into the Zen+ setup. (was running with the HD 5450 (radeonhd) meanwhile)
I am on kernel 5.10.x / 5.11. there, recent libdrm, mesa etc.; most recent BIOS I could obtain for the MSI B350 PCMate.

I did plug the DP cable into a different port this time. I need to test thoroughly, but now I am on something that seems normal! Screen wakeup seems okay so far, it wakes up from power saving (BACO?), an regains the screen control correctly, without resolution drop or distortion. Full operable.
S2Ram works incl. wakeup (tested once only yet, but that gave me hope).
And: wall power measurement looks as expected: 51... 55 watts for the whole box. This is on par with the HD 5450 and as I would expect it to be on idle. 

I'll keep an eye on it, esp. once I start using it as my main box (so far it was mostly used for chroot compiling Gentoo intallations and occasional video encoding), but maybe things have settled?
It's sad that I don't know the real reason for the change (different DP port on GPU side? kernel update (PM things should be in the kernel so I don't think it's mesa related), mainboard BIOS update?).
If I find the time I'll test it on different mainboards and I'll try to find out if the GPU-side video output port has an influence on behaviour. (But I have very full weeks ahead.)