Bug 205277 - [amd powerplay] vega10: soc voltage for power state 7 is not changed by overdrive.
Summary: [amd powerplay] vega10: soc voltage for power state 7 is not changed by overd...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-20 15:59 UTC by Pelle van Gils
Modified: 2019-10-31 05:54 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.4.0-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
debug patch (1.26 KB, application/mbox)
2019-10-20 15:59 UTC, Pelle van Gils
Details
proposed patch (1.48 KB, patch)
2019-10-20 17:11 UTC, Pelle van Gils
Details | Diff
proposed patch v2 (1.56 KB, patch)
2019-10-24 16:03 UTC, Pelle van Gils
Details | Diff

Description Pelle van Gils 2019-10-20 15:59:39 UTC
Created attachment 285583 [details]
debug patch

Using Overdrive to set voltage and frequency on a vega10 card does not set the voltage for the highest power state (state 7).

To reproduce:

boot with kernel parameter 'amdgpu.ppfeaturemask=0xffffffff'


cat pp_od_clk_voltage on boot:

OD_SCLK:
0:        852Mhz        800mV
1:        991Mhz        900mV
2:       1138Mhz        950mV
3:       1269Mhz       1000mV
4:       1312Mhz       1050mV
5:       1474Mhz       1100mV
6:       1538Mhz       1150mV
7:       1590Mhz       1200mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        700Mhz        900mV
3:        800Mhz        950mV
OD_RANGE:
SCLK:     852MHz       2400MHz
MCLK:     167MHz       1500MHz
VDDC:     800mV        1200mV


set pp_od_clk_voltage:

# cd /sys/class/drm/card0/device

# echo "s 2 1138 910" > pp_od_clk_voltage
# echo "s 3 1269 920" > pp_od_clk_voltage
# echo "s 4 1312 930" > pp_od_clk_voltage
# echo "s 5 1474 940" > pp_od_clk_voltage
# echo "s 6 1538 950" > pp_od_clk_voltage
# echo "s 7 1590 980" > pp_od_clk_voltage

# echo "c" > pp_od_clk_voltage


cat pp_od_clk_voltage:

OD_SCLK:
0:        852Mhz        800mV
1:        991Mhz        900mV
2:       1138Mhz        910mV
3:       1269Mhz        920mV
4:       1269Mhz        920mV
5:       1474Mhz        940mV
6:       1538Mhz        950mV
7:       1590Mhz        980mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        700Mhz        900mV
3:        800Mhz        950mV
OD_RANGE:
SCLK:     852MHz       2400MHz
MCLK:     167MHz       1500MHz
VDDC:     800mV        1200mV


This all seems fine. The voltages are set for all the power states.
But when stressing the gpu it still uses it's default of 1200mV for power state 7, as can be observed in amdgpu_pm_info:

# cat /sys/kernel/debug/dri/0/amdgpu_pm_info
...
GFX Clocks and Power:
	800 MHz (MCLK)
	1484 MHz (SCLK)
	1269 MHz (PSTATE_SCLK)
	700 MHz (PSTATE_MCLK)
	1200 mV (VDDGFX)
	260.0 W (average GPU)
...


Using the attached patch to print the voltages that are actually being set to the vddc_lookup_table the output to dmesg is:

...
[  521.364502] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 0 vddc: 800
[  521.364504] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 1 vddc: 900
[  521.364504] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 2 vddc: 910
[  521.364505] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 3 vddc: 920
[  521.364505] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 4 vddc: 920
[  521.364506] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 5 vddc: 940
[  521.364506] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 6 vddc: 950
...


_Not_ printing state 7. So it appears the vddc value for state 7 is never set in the new lookup table.
Comment 1 Pelle van Gils 2019-10-20 17:11:23 UTC
Created attachment 285585 [details]
proposed patch

added proposed fix
Comment 2 Pelle van Gils 2019-10-20 17:38:58 UTC
(In reply to Pelle van Gils from comment #1)
> Created attachment 285585 [details]
> proposed patch
> 
> added proposed fix

with this patch applied (and the debug patch) dmesg output is:
...
[  107.149105] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 0 vddc: 800
[  107.149107] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 1 vddc: 900
[  107.149108] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 2 vddc: 910
[  107.149109] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 3 vddc: 920
[  107.149109] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 4 vddc: 930
[  107.149110] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 5 vddc: 940
[  107.149111] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 6 vddc: 950
[  107.149112] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 7 vddc: 980
...


And the soc voltage under stress stays at the set value:

# cat /sys/kernel/debug/dri/0/amdgpu_pm_info
...
GFX Clocks and Power:
	800 MHz (MCLK)
	1541 MHz (SCLK)
	1269 MHz (PSTATE_SCLK)
	700 MHz (PSTATE_MCLK)
	981 mV (VDDGFX)
	161.0 W (average GPU)
...
Comment 3 haro41 2019-10-24 11:10:00 UTC
Did you debug this issue? I think the problem could be outside this code. 

I would outcomment the if-statement following for-loop in your proposed patch, because otherwise 'i' points outside the array boundarys here.
Comment 4 stefanspr94 2019-10-24 12:27:16 UTC
(In reply to haro41 from comment #3)
> Did you debug this issue? I think the problem could be outside this code. 
> 
> I would outcomment the if-statement following for-loop in your proposed
> patch, because otherwise 'i' points outside the array boundarys here.

I think the if statement is fine as both od_vddc_lookup_table->entries[] and podn_vdd_dep->entries[] both hold MAX_REGULAR_DPM_NUMBER members, which is 8, so accessing entries[7] is not out of bounds.

Btw, the patch works for me aswell. Card behaves as it should after loading my pp_table, which was not the case before.
Comment 5 haro41 2019-10-24 14:54:21 UTC
In the (now obsolete) proposed code, the variable 'i' will become 8, when the for-loop is done. The following if-statement will access something outside the array memory. 

Something like this may work without problems, but it can trigger a new problem too.
Comment 6 Pelle van Gils 2019-10-24 16:03:20 UTC
Created attachment 285633 [details]
proposed patch v2
Comment 7 Pelle van Gils 2019-10-24 16:10:20 UTC
(In reply to haro41 from comment #3)
> Did you debug this issue? I think the problem could be outside this code. 
> 
> I would outcomment the if-statement following for-loop in your proposed
> patch, because otherwise 'i' points outside the array boundarys here.

Thank you for your reply. I have uploaded a new patch with your suggestion.

It looks to me now that this is not so much a bug but intended beviour. I would still like to see this changed though.
Comment 8 haro41 2019-10-24 19:17:00 UTC
I have to agree, the code in its current state, only allows overvolting for dpm level 7.

Since the highest performance level is the most interesting one, if it comes to undervolting, energy saving and performance maximization, that should be fixed asap.

Thanks for your effort, btw.
Comment 9 Alex Deucher 2019-10-28 13:57:20 UTC
(In reply to Pelle van Gils from comment #6)
> Created attachment 285633 [details]
> proposed patch v2

Applied.  thanks!

Note You need to log in before you can comment on or make changes to this bug.