Bug 201615
Summary: | [Regression][bisected] commit 37a3a98ef601f89100e3bb657fb0e190b857028c breaks dpm for my hybrid Intel/AMD laptop's dGPU | ||
---|---|---|---|
Product: | Drivers | Reporter: | taijian |
Component: | Sound(ALSA) | Assignee: | Jaroslav Kysela (perex) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | taijian, tiwai |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.19-rc4+ | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
reversing this commit makes my dGPU power down again
Test fix patch dmesg of kernel 4.19 with both patches applied Revised test patch Fix patch (proeprly formatted) |
Description
taijian
2018-11-03 21:52:02 UTC
Check whether azx_vs_gpu_bound gets called on your system, and see which client_id value is passed. The flag for runtime PM for discrete GPU must be enabled (cleared) there. Hi Takashi, thank you for your prompt answer. I am sorry though, you'll have be be a little more ELI5 with me - I do know how to bisect a kernel bug, but other than that I'll need some handholding to do the debugging that you need. Back over here (https://bugs.freedesktop.org/show_bug.cgi?id=106597) Lukas Wunner did a very good job leading me through the necessary steps (just to give you an idea of my level of ...competence...). So, where would I look for azx_vs_gpu_bound? Should I look in both the 'vanilla' kernel and the one where I reverted your patch? Thank you for your help! For checking the call of azx_vs_gpu_bound(), just put a debug print like: --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1330,6 +1330,7 @@ static void azx_vs_gpu_bound(struct pci_dev *pci, struct azx *chip = card->private_data; struct hda_intel *hda = container_of(chip, struct hda_intel, chip); + dev_info(&pci->dev, "XXX azx_vs_gpu_bound: client_id=%d\n", client_id); if (client_id == VGA_SWITCHEROO_DIS) hda->need_eld_notify_link = 0; setup_vga_switcheroo_runtime_pm(chip); ... onto the original code (no revert), and reboot and check the kernel message. I guess this print won't be triggered. If my guess is correct, a patch below might fix the issue. Please check it out. Created attachment 279309 [details]
Test fix patch
Created attachment 279317 [details]
dmesg of kernel 4.19 with both patches applied
OK, seems that you were right on the first count - I'm not seeing that debug message printed in the dmesg. However, even with your proposed patch applied, I'm still getting this:
│ File: /sys/kernel/debug/dri/1/amdgpu_pm_info
───────┼──────────────────────────────────────────────
1 │ Clock Gating Flags Mask: 0x3fbcf
2 │ Graphics Medium Grain Clock Gating: On
3 │ Graphics Medium Grain memory Light Sleep: On
4 │ Graphics Coarse Grain Clock Gating: On
5 │ Graphics Coarse Grain memory Light Sleep: On
6 │ Graphics Coarse Grain Tree Shader Clock Gating: Off
7 │ Graphics Coarse Grain Tree Shader Light Sleep: Off
8 │ Graphics Command Processor Light Sleep: On
9 │ Graphics Run List Controller Light Sleep: On
10 │ Graphics 3D Coarse Grain Clock Gating: Off
11 │ Graphics 3D Coarse Grain memory Light Sleep: Off
12 │ Memory Controller Light Sleep: On
13 │ Memory Controller Medium Grain Clock Gating: On
14 │ System Direct Memory Access Light Sleep: Off
15 │ System Direct Memory Access Medium Grain Clock Gating: On
16 │ Bus Interface Medium Grain Clock Gating: Off
17 │ Bus Interface Light Sleep: On
18 │ Unified Video Decoder Medium Grain Clock Gating: On
19 │ Video Compression Engine Medium Grain Clock Gating: On
20 │ Host Data Path Light Sleep: On
21 │ Host Data Path Medium Grain Clock Gating: On
22 │ Digital Right Management Medium Grain Clock Gating: Off
23 │ Digital Right Management Light Sleep: Off
24 │ Rom Medium Grain Clock Gating: On
25 │ Data Fabric Medium Grain Clock Gating: Off
26 │
27 │ GFX Clocks and Power:
28 │ 0 MHz (MCLK)
29 │ 0 MHz (SCLK)
30 │ 300 MHz (PSTATE_SCLK)
31 │ 300 MHz (PSTATE_MCLK)
32 │ 900 mV (VDDGFX)
33 │ 10.57 W (average GPU)
34 │
35 │ GPU Temperature: 49 C
36 │ GPU Load: 0 %
37 │
38 │ UVD: Disabled
39 │
40 │ VCE: Disabled
Thanks for testing. I see the problem in my test patch. vga_switcheroo_ready() isn't suitable there, and we should call it there unconditionally. The revised patch is below. Please give it a try. Created attachment 279319 [details]
Revised test patch
OK, with the revised patch from comment 7 and your debug print from earlier applied on top of 4.20-rc1 I am now getting a kernel print: > $ dmesg | grep -i azx > [14.320229] snd_hda_intel 0000:01:00.1: XXX azx_vs_gpu_bound: client_id=1 Also: File: /sys/kernel/debug/dri/1/amdgpu_pm_info ───────┼─────────────────────────────────────── 1 │ Clock Gating Flags Mask: 0x3fbcf 2 │ Graphics Medium Grain Clock Gating: On 3 │ Graphics Medium Grain memory Light Sleep: On 4 │ Graphics Coarse Grain Clock Gating: On 5 │ Graphics Coarse Grain memory Light Sleep: On 6 │ Graphics Coarse Grain Tree Shader Clock Gating: Off 7 │ Graphics Coarse Grain Tree Shader Light Sleep: Off 8 │ Graphics Command Processor Light Sleep: On 9 │ Graphics Run List Controller Light Sleep: On 10 │ Graphics 3D Coarse Grain Clock Gating: Off 11 │ Graphics 3D Coarse Grain memory Light Sleep: Off 12 │ Memory Controller Light Sleep: On 13 │ Memory Controller Medium Grain Clock Gating: On 14 │ System Direct Memory Access Light Sleep: Off 15 │ System Direct Memory Access Medium Grain Clock Gating: On 16 │ Bus Interface Medium Grain Clock Gating: Off 17 │ Bus Interface Light Sleep: On 18 │ Unified Video Decoder Medium Grain Clock Gating: On 19 │ Video Compression Engine Medium Grain Clock Gating: On 20 │ Host Data Path Light Sleep: On 21 │ Host Data Path Medium Grain Clock Gating: On 22 │ Digital Right Management Medium Grain Clock Gating: Off 23 │ Digital Right Management Light Sleep: Off 24 │ Rom Medium Grain Clock Gating: On 25 │ Data Fabric Medium Grain Clock Gating: Off 26 │ 27 │ PX asic powered off I guess that counts as a success? Looks good, yeah. I'm going to submit the proper patch (attached below) for review to the upstream. Thanks! Created attachment 279323 [details]
Fix patch (proeprly formatted)
Thank you for the quick resolution! The fix was merged to sound git tree and will be likely included in 4.20-rc2, then backported to 4.19.y stable eventually later. |