Bug 119211 - amdgpu disables fan by default
Summary: amdgpu disables fan by default
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-29 22:09 UTC by Stas Sergeev
Modified: 2024-04-03 08:35 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.7.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (138.21 KB, text/plain)
2016-08-09 20:45 UTC, Stas Sergeev
Details
Xorg log (42.25 KB, text/plain)
2016-08-09 20:47 UTC, Stas Sergeev
Details

Description Stas Sergeev 2016-05-29 22:09:54 UTC
I have radeon R9 380.
After the KMS driver activates, the fans on
the GPU stops. They can be activated again by
properly setting up fancontrol, but this wasn't
configured on my PC. As the result, for the last
few years I have replaced many motherboards, all
starting to have bad capacitors around the video
card. But only now I have noticed that the fans
are not rotating... :(

The driver should activate fans by default, or
don't touch the initial settings (fans are rotating
before linux have started), but not stop them by default.
Comment 1 Vedran Miletić 2016-05-29 22:24:06 UTC
You should produce some GPU load before the fan activates.
Comment 2 Stas Sergeev 2016-05-29 22:30:19 UTC
Even besides the fact that GPU was so hot I
couldn't even touch it?
Comment 3 Stas Sergeev 2016-05-29 23:01:44 UTC
Essentially, when driver initializes, it puts 0
to /sys/class/hwmon/hwmon0/pwm1. And unless you
set up fancontrol (which is a major pita), this
0 remain there, no matter how you load you GPU.
It should put some other value there, like 50 or
more. On my system 50 is a minimum value needed
to get the GPU fan rotating.
Comment 4 Jimi 2016-08-09 17:44:49 UTC
Is this bug still happening? With my R9 Fury on amdgpu, cat /sys/class/hwmon/hwmon0/pwm1 (well, in my case, it's hwmon2 because I have another card), returns 35 on idle, not 0, but the fans are not running. Even when I'm running a AAA game, my card doesn't even reach 40°C, so my cooling system is too good for me to be able to actually see if the fans turn on when they should. When I first started my computer up, pwm1 was giving me 56, but it went down to 35 before I could finish opening my case and has stayed there no matter what I do. When the card is bound to vfio-pci instead of amdgpu (for a virtual machine), the fan is on all the time, even though the card's low idle temperatures must be similar.
Comment 5 Jimi 2016-08-09 17:53:37 UTC
I managed to check the fans while pwm1 was giving values like 68, 61, and 56, and they were not turned on. I don't know if that means anything, because the card was still <40 degrees and definitely not too hot to touch.
Comment 6 Stas Sergeev 2016-08-09 19:25:29 UTC
> Is this bug still happening?

For me it is happening as a hell.
And because fancontrol service also doesn't
work on my PC (I've filled another reports
about it), the problems are very real.

> I managed to check the fans while pwm1 was giving values like

You can write the values there, too.
In fact, I wonder who changes them for you.
Do you have the fancontrol set up and running?
$ systemctl status fancontrol
Comment 7 Jimi 2016-08-09 19:31:53 UTC
I do not have fancontrol set up or running (it's inactive on my system). I don't know anything about fancontrol at all. I'm running Arch Linux, so I pretty much only am running services that I know about.

I tried writing values myself with echo, like 'echo 50 > /sys/class/hwmon/hwmon0/pwm1', but that didn't affect it at all. Is that not how you're supposed to change it?
Comment 8 Alex Deucher 2016-08-09 19:46:37 UTC
By default the hw controls the fan based on temperature, etc.

Not all cards have a fan control. If you do, then the following standard HWMON
pwm attributes should be available:

 * pwm1_enable: Current fan management mode (MANUAL or AUTO)
 * pwm1: Current PWM value (power percentage)
 * pwm1_min: The minimum PWM speed allowed
 * pwm1_max: The maximum PWM speed allowed (bypassed when hitting Fan_boost)

The fan can be driven in different modes:

 * 1: The fan can be driven in manual (use pwm1 to change the speed);
 * 2; The fan is driven automatically depending on the temperature.

See:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c#n644
Comment 9 Stas Sergeev 2016-08-09 20:20:39 UTC
(In reply to Alex Deucher from comment #8)
> By default the hw controls the fan based on temperature, etc.
For me not.

> Not all cards have a fan control. If you do, then the following standard
> HWMON
> pwm attributes should be available:
> 
>  * pwm1_enable: Current fan management mode (MANUAL or AUTO)
I have always '1' there.
Trying to write 0 or 2 there still leaves 1.
It simply doesn't change.

>  * pwm1: Current PWM value (power percentage)
Always 0, unless manually written.

> The fan can be driven in different modes:
> 
>  * 1: The fan can be driven in manual (use pwm1 to change the speed);
>  * 2; The fan is driven automatically depending on the temperature.
What should I write to pwm1_enable? '2'? It doesn't change.
Comment 10 Alex Deucher 2016-08-09 20:34:10 UTC
Please attach your dmesg output and xorg log (if running X).
Comment 11 Stas Sergeev 2016-08-09 20:45:30 UTC
Created attachment 228111 [details]
dmesg
Comment 12 Stas Sergeev 2016-08-09 20:47:12 UTC
Created attachment 228121 [details]
Xorg log
Comment 13 Jimi 2016-08-10 01:50:49 UTC
That's interesting. My pwm1_enable returns 1, and trying to change it to 0 or 2 does nothing, but my pwm1 value does indeed change on its own, and I've never seen it be 0. It sounds like I don't have this bug but do have some other less major one? If pwm1 is changing on its own, can I trust that the fan will turn on if my card ever gets too hot?
Comment 14 Jimi 2016-08-10 01:53:58 UTC
I should mention, my pwm1_min is 0 and pwm1_max is 255.
Comment 15 Stas Sergeev 2016-08-10 05:39:20 UTC
> I should mention, my pwm1_min is 0 and pwm1_max is 255.

Same here.
IMHO pwm1_min should contain the value that
keeps the fan rotating at a minimal safe speed.
Putting 0 there makes it entirely useless.
Comment 16 Jimi 2016-08-10 08:59:22 UTC
Not necessarily. Less fans means less power usage means money saved, and as we can see with my computer, you can keep the card cool without its own fans. I have 4 case fans that are plugged directly into power and so are less competent at knowing when to turn off. Something needs to be done about your fan not activating when it should, though.
Comment 17 Stas Sergeev 2016-08-10 18:58:41 UTC
> Not necessarily. Less fans means less power usage means money saved

You can set up fancontrol or put 0 into pwm1 manually
to stop the fan. But putting 0 into pwm1_min is IMHO
quite useless, it can as well just not exist at all.
But if it will contain the minimum _safe_ value, then
that can well be used.
Currently fancontrol have to "evaluate" the minimal
safe value by hands. It lowers the pwm1 value and looks
when the fan have stopped by checking the value of
fan1_input if that exists. And it doesn't exist for
amdgpu, so you need to do such a probe by hands.
Comment 18 Stas Sergeev 2024-04-03 08:35:39 UTC
I figured out something very interesting
regarding this bug.

Writing 2 to pwm1_enable causes the
fan to rotate for about 10 seconds.
Note that the old value of pwm1_enable
is also 2, so it doesn't change, but
the mere fact of writing has an effect!

And this is not all!
Now if you periodically READ from pwm1,
then the fan doesn't stop! I can do:
while :; do cat /sys/class/hwmon/hwmon0/pwm1; sleep 1; done
And with this, the fan keeps rotating
forever! But if you stop that script
for something like 10 seconds, then
the fan stops and pwm1 reads always
return 0. You need to start again by
writing 2 to pwm1_enable (even if there
is already 2!), and quickly start reading
from pwm1, and you have your fan finally
rotating. :)
A bit of a hand-written fancontrol script. :)

Alex Deucher can you make any sense
out of that?

Note You need to log in before you can comment on or make changes to this bug.