Bug 212107 - Temperature increase by 15°C on radeon gpu
Summary: Temperature increase by 15°C on radeon gpu
Status: REOPENED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-07 18:29 UTC by Martin
Modified: 2021-06-24 21:03 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.11
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg.log (53.93 KB, text/plain)
2021-03-07 18:29 UTC, Martin
Details
kernel config (117.65 KB, text/plain)
2021-03-07 18:30 UTC, Martin
Details

Description Martin 2021-03-07 18:29:33 UTC
Created attachment 295701 [details]
dmesg.log

Since upgrading my kernel from 5.10.16 to 5.11.3 I noticed an increase in temperature on my AMD gpu (Radeon RX550). I later tried both 5.10.20 and 5.11.4 and I can notice the increase in temperature only on 5.11 kernel.

In addition to the temperature I noticed that the fan on gpu would spin up to max rpm for a second or two right after waking up the PC from sleep. I've never noticed such behaviour before.

I can't see any errors in the logs and the system seems to be running normal. No crashes nor degraded performance either.

I check temperature using sensors utility. For 5.11.4 it shows the following:

1st run:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      962.00 mV 
fan1:         963 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +54.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        9.13 W  (cap =  36.00 W)


2nd run few minutes later:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      825.00 mV 
fan1:         978 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +47.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        7.19 W  (cap =  36.00 W)


5.11.3:
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      962.00 mV 
fan1:         991 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +57.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        9.08 W  (cap =  36.00 W)


these two are on 5.10.16:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      787.00 mV 
fan1:         976 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +39.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        6.11 W  (cap =  36.00 W)

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      962.00 mV 
fan1:         976 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +40.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        8.26 W  (cap =  36.00 W)



I'm attaching parts of dmesg log I thought relevant.
Comment 1 Martin 2021-03-07 18:30:56 UTC
Created attachment 295703 [details]
kernel config
Comment 2 Dieter Nützel 2021-03-08 00:18:44 UTC
It could be the ZeroCore thing, which has finally landed with 5.11.
Please verify, that your gfx fans stopped with 5.11 and running with all kernels below 5.11.
Comment 3 Martin 2021-03-08 09:04:17 UTC
(In reply to Dieter Nützel from comment #2)
> It could be the ZeroCore thing, which has finally landed with 5.11.
> Please verify, that your gfx fans stopped with 5.11 and running with all
> kernels below 5.11.

Bloody hell, you're right. On 5.11.4 the fan on gpu stops completely, even though sensors claim it's spinning.

I suppose I'm lucky I didn't fry my gpu ._.

How is that even possible?
Comment 4 Alex Deucher 2021-03-08 21:59:23 UTC
The driver turns off the fans for acoustic reasons if the OEM enabled support for the feature in the vbios.  They will still go on when the temperature gets high enough.
Comment 5 Martin 2021-03-09 15:43:24 UTC
(In reply to Alex Deucher from comment #4)
> The driver turns off the fans for acoustic reasons if the OEM enabled
> support for the feature in the vbios.  They will still go on when the
> temperature gets high enough.

Ok. I checked again and the fan does turn on when playing a game (gzdoom).

Too bad it's the quietest fan in my PC :)

I guess I panicked.

If this is the expected behaviour then this bug can be closed.

Thank you.
Comment 6 Dieter Nützel 2021-03-09 16:06:43 UTC
(In reply to Martin from comment #5)
> (In reply to Alex Deucher from comment #4)
> > The driver turns off the fans for acoustic reasons if the OEM enabled
> > support for the feature in the vbios.  They will still go on when the
> > temperature gets high enough.
> 
> Ok. I checked again and the fan does turn on when playing a game (gzdoom).
> 
> Too bad it's the quietest fan in my PC :)
> 
> I guess I panicked.
> 
> If this is the expected behaviour then this bug can be closed.
> 
> Thank you.

It _is_ expected and we waited (very) long for it. ;-)
(Did regularly testing with amd-staging-drm-next kernel.)

You can close it.
(First 'solved', later 'closed').

Greetings
Dieter
Comment 7 Dieter Nützel 2021-03-09 16:11:36 UTC
Addendum (@Alex)
Maybe we could do someting about the reported fan speed.
Zero (0) if stopped.

@Martin
You can verify the fan speed (raise) if you put load on your gfx card.
Comment 8 Martin 2021-03-09 20:24:19 UTC
(In reply to Dieter Nützel from comment #7)
> Addendum (@Alex)
> Maybe we could do someting about the reported fan speed.
> Zero (0) if stopped.
> 
> @Martin
> You can verify the fan speed (raise) if you put load on your gfx card.

I've just rebooted into 5.11.5. The gpu fan went into max speed for one second or so. After the computer finished booting sensors still showed over 3000rpm, even though at that point the fan was already off:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      825.00 mV
fan1:        3601 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +47.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        7.15 W  (cap =  36.00 W)

I waited a couple of minutes and then watched 4k vid. The fan turned on and sensors started showing this:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      962.00 mV
fan1:        1004 RPM  (min =    0 RPM, max = 3500 RPM)
edge:         +57.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:       12.03 W  (cap =  36.00 W)


So there is a change in reported fan speed. After turning of the video the fan turned off again but the reported fan speed stayed at roughly 900 RPM.

I played a bit of Xonotic. Again, fan turned on but the reported fan speed remained roughly the same, around 900 RPM.

I have no way of measuring the actual fan speed while playing a game or watching a video so I don't know if what sensors are reporting is accurate.

sensors --version outputs the following:

sensors version 3.6.0 with libsensors version 3.6.0

For the CPU fan speed it does seem to report the actual speed. I've recently switched to a new CPU cooler with a new fan and both this new one and the old one were reporting speed change that corresponded to actual cpu fan speed. The old cooler had manual fan control so I could see the reported speed change live.



ps. I'm in UTC+1 and I probably won't be able to post more tonight.
Comment 9 Martin 2021-04-29 17:01:31 UTC
Hello,

is it possible to return to the behaviour from version 5.10?
Back then my gpu was cool and quiet.

I'm running 5.11.17 currently and temperature on the GPU gets to 70°C but fan is at like 300rpm.

The above is without touching anything in /sys/class/drm/card0/device/hwmon/hwmon1

When I disable fan control by putting 0 in /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable the fan spins up to top speed. 2 keeps it running at 2000rpm and it's loud. Which is strange because after booting it's 2.

Ideally it would be great if I could return to how it worked on 5.10
Comment 10 Martin 2021-05-02 07:18:22 UTC
(In reply to Martin from comment #9)
> 
> I'm running 5.11.17 currently and temperature on the GPU gets to 70°C but
> fan is at like 300rpm.
> 

This isn't always reproducible. I thought it may be related to suspending my PC but in last few days the temperature is kept around 55°C
Comment 11 miloog 2021-06-24 21:03:58 UTC
I can confirm.

But in a different scenario. I'm using debian bullseye with lts kernel and latest amdgpu firmware. I don't change any fan control mechanism.

5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland window manager) my gpu usage is at 100% without doing anything.

It's a vega 56.

Note You need to log in before you can comment on or make changes to this bug.