Ever since going to 5.11 version and later 5.12 the fan speed on my Radeon RX550 is erratic causing the temperature to reach dangerous level. sensors output: amdgpu-pci-0100 Adapter: PCI adapter vddgfx: 825.00 mV fan1: 200 RPM (min = 0 RPM, max = 3500 RPM) edge: +69.0°C (crit = +97.0°C, hyst = -273.1°C) power1: 7.03 W (cap = 36.00 W) I'm afraid it'll eventually kill my gpu. I've already reported another bug for 5.11: https://bugzilla.kernel.org/show_bug.cgi?id=212107 From what I gather there were changes in fan control in 5.11. Is it possible to disable those changes? There were no issues on 5.10. Fan went to roughly 1000rpm, it was cool and quiet. The behaviour from 5.11 onward is dangerous, can cause hardware destruction.
I can confirm. But in a different scenario. I'm using debian bullseye with lts kernel and latest amdgpu firmware. I don't change any fan control mechanism. 5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland window manager) my gpu usage is at 100% without doing anything. It's a vega 56.
In my case it was watching a video that made the gpu reach 70°C
This is a legitimate bug which is present starting 5.12.13 and the issue was said to have been fixed starting 5.13-rc8. I wanted to comment out of reassurance that 70°C edge temperature for that GPU cannot damage it. Notice "crit = +97.0°C" which is the throttle temperature. The computer should shut down at the "emerg" temperature which is not present in your sensors output, but should be +5.0°C over "crit" for your GPU.
(In reply to miloog from comment #1) > I can confirm. > > But in a different scenario. I'm using debian bullseye with lts kernel and > latest amdgpu firmware. I don't change any fan control mechanism. > > 5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland > window manager) my gpu usage is at 100% without doing anything. > > It's a vega 56. You are probably hit by a recent regression introduced with kernel 5.10.46 and 5.12.13 (cf. https://bugzilla.kernel.org/show_bug.cgi?id=213561), where patches are on its way (https://lists.freedesktop.org/archives/amd-gfx/2021-June/065612.html). This is not related to the original bug report here, I presume.
(In reply to James from comment #3) > This is a legitimate bug which is present starting 5.12.13 and the issue was > said to have been fixed starting 5.13-rc8. I wanted to comment out of > reassurance that 70°C edge temperature for that GPU cannot damage it. Notice > "crit = +97.0°C" which is the throttle temperature. > > The computer should shut down at the "emerg" temperature which is not > present in your sensors output, but should be +5.0°C over "crit" for your > GPU. Thank you for explanation. I've never seen 70°C on my gpu before so to me it looked scary. Before those changes landed in 5.11 the usual temperature on my gpu would be around 40°C. The fan would be around 1000rpm which on my gpu doesn't produce any perceivable sound.