Bug 199655 - amdgpu: XFX Radeon RX 580 runs its fans only in dangerously low speeds and ignores temperature
Summary: amdgpu: XFX Radeon RX 580 runs its fans only in dangerously low speeds and ig...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-08 13:29 UTC by Sergey Kondakov
Modified: 2019-05-11 13:25 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.16.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg_2018-05-08-quircks (86.33 KB, text/plain)
2018-05-08 13:29 UTC, Sergey Kondakov
Details
lspci_2018-05-08-quircks (49.73 KB, text/plain)
2018-05-08 13:30 UTC, Sergey Kondakov
Details

Description Sergey Kondakov 2018-05-08 13:29:36 UTC
Created attachment 275835 [details]
dmesg_2018-05-08-quircks

In Windows AMD drivers like to ignore VGA BIOS fan control settings along with their own "Wattman" and disable fans until core temperature starts to go near 50-60 degrees, even at full load fans don't go over 2600 RPM there BUT on Linux they get stuck at 800-900 RPM by default (if pwm1_enable is not tempered with), at 1300-1400 RPM if pwm1_enable is set to '2', at 3500 RPM if it set to '0' and only manual control of '1' works as expected. temp1_* settings are outright ignored with error "permission denied". Unless manual control is used, GPU's core may overheat to >70 degrees (I don't even want to know what's happening on VRMs) at >90% load. This is madness.

In Linux on idle (~0% load in radeontop) GPU doesn't go lower than 41 degrees even with 1300-1400 RPM on fans, even though under Windows it goes to 35-40 with fans completely off (which I don't want to allow anyway because I don't know how safe are VRMs).

Easy way to overheat it is to use 'FSRCNNX_x2_r1_16-0-4-1.glsl' from https://github.com/igv/FSRCNN-TensorFlow/releases with a ≤720p video.
https://bugs.freedesktop.org/show_bug.cgi?id=103401#c2 - my modded BIOS with more aggressive cooling and lower frequency than stock.
Comment 1 Sergey Kondakov 2018-05-08 13:30:13 UTC
Created attachment 275837 [details]
lspci_2018-05-08-quircks
Comment 2 Sergey Kondakov 2018-07-03 07:07:49 UTC
Same thing with 4.17 kernel. It seems that any kind of fan speed control is happening only when pwm1_enable were completely untouched and there is high GPU load. However, it never changes memory frequency, so on idle it reports 40-44W of power usage instead of 12-15W on Windows. Now, in the middle of hellish summer, GPU temperature quickly rises from cold-boot 35 to 50-55 and stays there because of all that. Even my crappy FX-6100 CPU with 9mm fan isn't that hot on idle.
Comment 3 Uladzimir Panasiuk 2019-01-28 12:55:36 UTC
Same bug with 4.19.16 and Strix RX 470.

Note You need to log in before you can comment on or make changes to this bug.