Bug 199655

Summary: amdgpu: XFX Radeon RX 580 runs its fans only in dangerously low speeds and ignores temperature
Product: Drivers Reporter: Sergey Kondakov (virtuousfox)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: 0xe2.0x9a.0x9b, v.s.panasyuk, xomachiner
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.16.7 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg_2018-05-08-quircks
lspci_2018-05-08-quircks

Description Sergey Kondakov 2018-05-08 13:29:36 UTC
Created attachment 275835 [details]
dmesg_2018-05-08-quircks

In Windows AMD drivers like to ignore VGA BIOS fan control settings along with their own "Wattman" and disable fans until core temperature starts to go near 50-60 degrees, even at full load fans don't go over 2600 RPM there BUT on Linux they get stuck at 800-900 RPM by default (if pwm1_enable is not tempered with), at 1300-1400 RPM if pwm1_enable is set to '2', at 3500 RPM if it set to '0' and only manual control of '1' works as expected. temp1_* settings are outright ignored with error "permission denied". Unless manual control is used, GPU's core may overheat to >70 degrees (I don't even want to know what's happening on VRMs) at >90% load. This is madness.

In Linux on idle (~0% load in radeontop) GPU doesn't go lower than 41 degrees even with 1300-1400 RPM on fans, even though under Windows it goes to 35-40 with fans completely off (which I don't want to allow anyway because I don't know how safe are VRMs).

Easy way to overheat it is to use 'FSRCNNX_x2_r1_16-0-4-1.glsl' from https://github.com/igv/FSRCNN-TensorFlow/releases with a ≤720p video.
https://bugs.freedesktop.org/show_bug.cgi?id=103401#c2 - my modded BIOS with more aggressive cooling and lower frequency than stock.
Comment 1 Sergey Kondakov 2018-05-08 13:30:13 UTC
Created attachment 275837 [details]
lspci_2018-05-08-quircks
Comment 2 Sergey Kondakov 2018-07-03 07:07:49 UTC
Same thing with 4.17 kernel. It seems that any kind of fan speed control is happening only when pwm1_enable were completely untouched and there is high GPU load. However, it never changes memory frequency, so on idle it reports 40-44W of power usage instead of 12-15W on Windows. Now, in the middle of hellish summer, GPU temperature quickly rises from cold-boot 35 to 50-55 and stays there because of all that. Even my crappy FX-6100 CPU with 9mm fan isn't that hot on idle.
Comment 3 Uladzimir Panasiuk 2019-01-28 12:55:36 UTC
Same bug with 4.19.16 and Strix RX 470.