Bug 213823 - Broken power management for amdgpu
Summary: Broken power management for amdgpu
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-22 20:03 UTC by Bruno Pagani
Modified: 2022-10-02 17:42 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.13.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
amdgpu dmesg output on 5.12 (1.17 KB, text/plain)
2021-07-22 20:03 UTC, Bruno Pagani
Details
amdgpu dmesg output on 5.13 (1.37 KB, text/plain)
2021-07-22 20:04 UTC, Bruno Pagani
Details
dmesg 5.12 (81.72 KB, text/plain)
2021-07-22 22:06 UTC, Bruno Pagani
Details
dmesg 5.13 (82.03 KB, text/plain)
2021-07-22 22:06 UTC, Bruno Pagani
Details

Description Bruno Pagani 2021-07-22 20:03:34 UTC
Created attachment 298003 [details]
amdgpu dmesg output on 5.12

After upgrading to kernel 5.13.4 (from 5.12.15, on Arch Linux), I’ve realized my AMD dGPU was not powering off anymore resulting in increased power consumption, heat and noise (because of the fan trying to dissipate the heat).

I’ve compared kernel dmesg on both kernels, and I’ve found related differences:

@@ -1,4 +1,6 @@
 [drm] amdgpu kernel modesetting enabled.
+amdgpu: CRAT table not found
+amdgpu: Virtual CRAT table created for CPU
 amdgpu: Topology: Add CPU node
 fb0: switching to amdgpudrmfb from EFI VGA
 amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
@@ -14,7 +16,10 @@ amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
 [drm] amdgpu: 4096M of VRAM memory ready
 [drm] amdgpu: 4096M of GTT memory ready.
 amdgpu: hwmgr_sw_init smu backed is vegam_smu
+kfd kfd: amdgpu: Allocated 3969056 bytes on gart
+amdgpu: Virtual CRAT table created for GPU
 amdgpu: Topology: Add dGPU node [0x694f:0x1002]
+kfd kfd: amdgpu: added device 1002:694f
 amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 6, active_cu_number 20
-amdgpu 0000:01:00.0: amdgpu: Using ATPX for runtime pm
-[drm] Initialized amdgpu 3.40.0 20150101 for 0000:01:00.0 on minor 1
+amdgpu 0000:01:00.0: amdgpu: Using BOCO for runtime pm
+[drm] Initialized amdgpu 3.41.0 20150101 for 0000:01:00.0 on minor 1

I’ve attached both excerpt matching the diff above.

FWIW, this is a Dell Precision 5530 2-in-1 with a Kaby Lake-G CPU, which has an Intel HD 630 iGPU as well as an AMD Polaris 22 MGL XL [Radeon Pro WX Vega M GL] dGPU.

Please tell me if there is anything else that I can provide in order to fix this.
Comment 1 Bruno Pagani 2021-07-22 20:04:12 UTC
Created attachment 298005 [details]
amdgpu dmesg output on 5.13
Comment 2 Alex Deucher 2021-07-22 20:16:54 UTC
Please attach your full dmesg outputs.  Can you bisect?
Comment 3 Bruno Pagani 2021-07-22 22:06:28 UTC
Created attachment 298009 [details]
dmesg 5.12
Comment 4 Bruno Pagani 2021-07-22 22:06:47 UTC
Created attachment 298011 [details]
dmesg 5.13
Comment 5 Bruno Pagani 2021-07-22 22:14:01 UTC
(In reply to Alex Deucher from comment #2)
> Please attach your full dmesg outputs.  Can you bisect?

Done. Unfortunately no: I’ve never done so before, so while I expect to be technically able to do it, I guess it will take some time for me to setup (I have never compiled a kernel myself either), and time is something I definitively lack of currently (several deadlines to meet each week until the end of August).

Since I can live with a 5.12 kernel (or even 5.10 LTS), I’m fine with it having to wait until I have time to setup bisecting if need be though.
Comment 6 Bruno Pagani 2021-11-14 14:03:47 UTC
So while I still don’t have time to setup bisecting, I’m now affected even on LTS kernel. Also, I’ve been in touch with other users having a similar laptop (the XPS version instead of the Precision, but still KabyLake-G), and they don’t seem affected. Thus I’m not sure anymore this is a kernel issue (and whether BOCO vs ATPX is relevant). Where should I seek for guidance in understanding why my dGPU stays stuck in D0 instead of going into D3? Is this or https://gitlab.freedesktop.org/drm/amd and appropriate place?
Comment 7 Bruno Pagani 2022-10-02 17:42:24 UTC
Closing this old bug, this seems gone now on newer kernels.

Note You need to log in before you can comment on or make changes to this bug.