Bug 213569
Summary: | Amdgpu temperature reaching dangerous levels | ||
---|---|---|---|
Product: | Drivers | Reporter: | Martin (martin.tk) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | blocking | CC: | mileikasjos, mrjameshennig |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.13 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Martin
2021-06-24 15:32:43 UTC
I can confirm. But in a different scenario. I'm using debian bullseye with lts kernel and latest amdgpu firmware. I don't change any fan control mechanism. 5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland window manager) my gpu usage is at 100% without doing anything. It's a vega 56. In my case it was watching a video that made the gpu reach 70°C This is a legitimate bug which is present starting 5.12.13 and the issue was said to have been fixed starting 5.13-rc8. I wanted to comment out of reassurance that 70°C edge temperature for that GPU cannot damage it. Notice "crit = +97.0°C" which is the throttle temperature. The computer should shut down at the "emerg" temperature which is not present in your sensors output, but should be +5.0°C over "crit" for your GPU. (In reply to miloog from comment #1) > I can confirm. > > But in a different scenario. I'm using debian bullseye with lts kernel and > latest amdgpu firmware. I don't change any fan control mechanism. > > 5.10.44 and 5.10.45 works fine but 5.10.46 if i'm only start sway (wayland > window manager) my gpu usage is at 100% without doing anything. > > It's a vega 56. You are probably hit by a recent regression introduced with kernel 5.10.46 and 5.12.13 (cf. https://bugzilla.kernel.org/show_bug.cgi?id=213561), where patches are on its way (https://lists.freedesktop.org/archives/amd-gfx/2021-June/065612.html). This is not related to the original bug report here, I presume. (In reply to James from comment #3) > This is a legitimate bug which is present starting 5.12.13 and the issue was > said to have been fixed starting 5.13-rc8. I wanted to comment out of > reassurance that 70°C edge temperature for that GPU cannot damage it. Notice > "crit = +97.0°C" which is the throttle temperature. > > The computer should shut down at the "emerg" temperature which is not > present in your sensors output, but should be +5.0°C over "crit" for your > GPU. Thank you for explanation. I've never seen 70°C on my gpu before so to me it looked scary. Before those changes landed in 5.11 the usual temperature on my gpu would be around 40°C. The fan would be around 1000rpm which on my gpu doesn't produce any perceivable sound. |