Created attachment 279303 [details] reversing this commit makes my dGPU power down again I have an Alienware 15R3 with hybrid Intel/AMD (RX470) graphics. I noticed in the development of the 4.19 kernel that at some point dpm for my dGPU started to break - as in: it wouldn't power down anymore, when not in use. I finally got around to properly bisecting the issue and finally identified: From 37a3a98ef601f89100e3bb657fb0e190b857028c Mon Sep 17 00:00:00 2001 From: Takashi Iwai <tiwai@suse.de> Date: Mon, 10 Sep 2018 16:20:25 +0200 Subject: ALSA: hda - Enable runtime PM only for discrete GPU as the commit that broke dpm on my system. Any kernel past this commit does not power down the dGPU when it is idle, reverting this commit restores dpm. I'll be happy to assist in further investigations.
Check whether azx_vs_gpu_bound gets called on your system, and see which client_id value is passed. The flag for runtime PM for discrete GPU must be enabled (cleared) there.
Hi Takashi, thank you for your prompt answer. I am sorry though, you'll have be be a little more ELI5 with me - I do know how to bisect a kernel bug, but other than that I'll need some handholding to do the debugging that you need. Back over here (https://bugs.freedesktop.org/show_bug.cgi?id=106597) Lukas Wunner did a very good job leading me through the necessary steps (just to give you an idea of my level of ...competence...). So, where would I look for azx_vs_gpu_bound? Should I look in both the 'vanilla' kernel and the one where I reverted your patch? Thank you for your help!
For checking the call of azx_vs_gpu_bound(), just put a debug print like: --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1330,6 +1330,7 @@ static void azx_vs_gpu_bound(struct pci_dev *pci, struct azx *chip = card->private_data; struct hda_intel *hda = container_of(chip, struct hda_intel, chip); + dev_info(&pci->dev, "XXX azx_vs_gpu_bound: client_id=%d\n", client_id); if (client_id == VGA_SWITCHEROO_DIS) hda->need_eld_notify_link = 0; setup_vga_switcheroo_runtime_pm(chip); ... onto the original code (no revert), and reboot and check the kernel message. I guess this print won't be triggered. If my guess is correct, a patch below might fix the issue. Please check it out.
Created attachment 279309 [details] Test fix patch
Created attachment 279317 [details] dmesg of kernel 4.19 with both patches applied OK, seems that you were right on the first count - I'm not seeing that debug message printed in the dmesg. However, even with your proposed patch applied, I'm still getting this: │ File: /sys/kernel/debug/dri/1/amdgpu_pm_info ───────┼────────────────────────────────────────────── 1 │ Clock Gating Flags Mask: 0x3fbcf 2 │ Graphics Medium Grain Clock Gating: On 3 │ Graphics Medium Grain memory Light Sleep: On 4 │ Graphics Coarse Grain Clock Gating: On 5 │ Graphics Coarse Grain memory Light Sleep: On 6 │ Graphics Coarse Grain Tree Shader Clock Gating: Off 7 │ Graphics Coarse Grain Tree Shader Light Sleep: Off 8 │ Graphics Command Processor Light Sleep: On 9 │ Graphics Run List Controller Light Sleep: On 10 │ Graphics 3D Coarse Grain Clock Gating: Off 11 │ Graphics 3D Coarse Grain memory Light Sleep: Off 12 │ Memory Controller Light Sleep: On 13 │ Memory Controller Medium Grain Clock Gating: On 14 │ System Direct Memory Access Light Sleep: Off 15 │ System Direct Memory Access Medium Grain Clock Gating: On 16 │ Bus Interface Medium Grain Clock Gating: Off 17 │ Bus Interface Light Sleep: On 18 │ Unified Video Decoder Medium Grain Clock Gating: On 19 │ Video Compression Engine Medium Grain Clock Gating: On 20 │ Host Data Path Light Sleep: On 21 │ Host Data Path Medium Grain Clock Gating: On 22 │ Digital Right Management Medium Grain Clock Gating: Off 23 │ Digital Right Management Light Sleep: Off 24 │ Rom Medium Grain Clock Gating: On 25 │ Data Fabric Medium Grain Clock Gating: Off 26 │ 27 │ GFX Clocks and Power: 28 │ 0 MHz (MCLK) 29 │ 0 MHz (SCLK) 30 │ 300 MHz (PSTATE_SCLK) 31 │ 300 MHz (PSTATE_MCLK) 32 │ 900 mV (VDDGFX) 33 │ 10.57 W (average GPU) 34 │ 35 │ GPU Temperature: 49 C 36 │ GPU Load: 0 % 37 │ 38 │ UVD: Disabled 39 │ 40 │ VCE: Disabled
Thanks for testing. I see the problem in my test patch. vga_switcheroo_ready() isn't suitable there, and we should call it there unconditionally. The revised patch is below. Please give it a try.
Created attachment 279319 [details] Revised test patch
OK, with the revised patch from comment 7 and your debug print from earlier applied on top of 4.20-rc1 I am now getting a kernel print: > $ dmesg | grep -i azx > [14.320229] snd_hda_intel 0000:01:00.1: XXX azx_vs_gpu_bound: client_id=1 Also: File: /sys/kernel/debug/dri/1/amdgpu_pm_info ───────┼─────────────────────────────────────── 1 │ Clock Gating Flags Mask: 0x3fbcf 2 │ Graphics Medium Grain Clock Gating: On 3 │ Graphics Medium Grain memory Light Sleep: On 4 │ Graphics Coarse Grain Clock Gating: On 5 │ Graphics Coarse Grain memory Light Sleep: On 6 │ Graphics Coarse Grain Tree Shader Clock Gating: Off 7 │ Graphics Coarse Grain Tree Shader Light Sleep: Off 8 │ Graphics Command Processor Light Sleep: On 9 │ Graphics Run List Controller Light Sleep: On 10 │ Graphics 3D Coarse Grain Clock Gating: Off 11 │ Graphics 3D Coarse Grain memory Light Sleep: Off 12 │ Memory Controller Light Sleep: On 13 │ Memory Controller Medium Grain Clock Gating: On 14 │ System Direct Memory Access Light Sleep: Off 15 │ System Direct Memory Access Medium Grain Clock Gating: On 16 │ Bus Interface Medium Grain Clock Gating: Off 17 │ Bus Interface Light Sleep: On 18 │ Unified Video Decoder Medium Grain Clock Gating: On 19 │ Video Compression Engine Medium Grain Clock Gating: On 20 │ Host Data Path Light Sleep: On 21 │ Host Data Path Medium Grain Clock Gating: On 22 │ Digital Right Management Medium Grain Clock Gating: Off 23 │ Digital Right Management Light Sleep: Off 24 │ Rom Medium Grain Clock Gating: On 25 │ Data Fabric Medium Grain Clock Gating: Off 26 │ 27 │ PX asic powered off I guess that counts as a success?
Looks good, yeah. I'm going to submit the proper patch (attached below) for review to the upstream. Thanks!
Created attachment 279323 [details] Fix patch (proeprly formatted)
Thank you for the quick resolution!
The fix was merged to sound git tree and will be likely included in 4.20-rc2, then backported to 4.19.y stable eventually later.