Created attachment 285583 [details] debug patch Using Overdrive to set voltage and frequency on a vega10 card does not set the voltage for the highest power state (state 7). To reproduce: boot with kernel parameter 'amdgpu.ppfeaturemask=0xffffffff' cat pp_od_clk_voltage on boot: OD_SCLK: 0: 852Mhz 800mV 1: 991Mhz 900mV 2: 1138Mhz 950mV 3: 1269Mhz 1000mV 4: 1312Mhz 1050mV 5: 1474Mhz 1100mV 6: 1538Mhz 1150mV 7: 1590Mhz 1200mV OD_MCLK: 0: 167Mhz 800mV 1: 500Mhz 800mV 2: 700Mhz 900mV 3: 800Mhz 950mV OD_RANGE: SCLK: 852MHz 2400MHz MCLK: 167MHz 1500MHz VDDC: 800mV 1200mV set pp_od_clk_voltage: # cd /sys/class/drm/card0/device # echo "s 2 1138 910" > pp_od_clk_voltage # echo "s 3 1269 920" > pp_od_clk_voltage # echo "s 4 1312 930" > pp_od_clk_voltage # echo "s 5 1474 940" > pp_od_clk_voltage # echo "s 6 1538 950" > pp_od_clk_voltage # echo "s 7 1590 980" > pp_od_clk_voltage # echo "c" > pp_od_clk_voltage cat pp_od_clk_voltage: OD_SCLK: 0: 852Mhz 800mV 1: 991Mhz 900mV 2: 1138Mhz 910mV 3: 1269Mhz 920mV 4: 1269Mhz 920mV 5: 1474Mhz 940mV 6: 1538Mhz 950mV 7: 1590Mhz 980mV OD_MCLK: 0: 167Mhz 800mV 1: 500Mhz 800mV 2: 700Mhz 900mV 3: 800Mhz 950mV OD_RANGE: SCLK: 852MHz 2400MHz MCLK: 167MHz 1500MHz VDDC: 800mV 1200mV This all seems fine. The voltages are set for all the power states. But when stressing the gpu it still uses it's default of 1200mV for power state 7, as can be observed in amdgpu_pm_info: # cat /sys/kernel/debug/dri/0/amdgpu_pm_info ... GFX Clocks and Power: 800 MHz (MCLK) 1484 MHz (SCLK) 1269 MHz (PSTATE_SCLK) 700 MHz (PSTATE_MCLK) 1200 mV (VDDGFX) 260.0 W (average GPU) ... Using the attached patch to print the voltages that are actually being set to the vddc_lookup_table the output to dmesg is: ... [ 521.364502] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 0 vddc: 800 [ 521.364504] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 1 vddc: 900 [ 521.364504] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 2 vddc: 910 [ 521.364505] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 3 vddc: 920 [ 521.364505] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 4 vddc: 920 [ 521.364506] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 5 vddc: 940 [ 521.364506] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 6 vddc: 950 ... _Not_ printing state 7. So it appears the vddc value for state 7 is never set in the new lookup table.
Created attachment 285585 [details] proposed patch added proposed fix
(In reply to Pelle van Gils from comment #1) > Created attachment 285585 [details] > proposed patch > > added proposed fix with this patch applied (and the debug patch) dmesg output is: ... [ 107.149105] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 0 vddc: 800 [ 107.149107] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 1 vddc: 900 [ 107.149108] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 2 vddc: 910 [ 107.149109] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 3 vddc: 920 [ 107.149109] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 4 vddc: 930 [ 107.149110] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 5 vddc: 940 [ 107.149111] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 6 vddc: 950 [ 107.149112] amdgpu: [powerplay] vega10 SCLK vddc_lookup_table state: 7 vddc: 980 ... And the soc voltage under stress stays at the set value: # cat /sys/kernel/debug/dri/0/amdgpu_pm_info ... GFX Clocks and Power: 800 MHz (MCLK) 1541 MHz (SCLK) 1269 MHz (PSTATE_SCLK) 700 MHz (PSTATE_MCLK) 981 mV (VDDGFX) 161.0 W (average GPU) ...
Did you debug this issue? I think the problem could be outside this code. I would outcomment the if-statement following for-loop in your proposed patch, because otherwise 'i' points outside the array boundarys here.
(In reply to haro41 from comment #3) > Did you debug this issue? I think the problem could be outside this code. > > I would outcomment the if-statement following for-loop in your proposed > patch, because otherwise 'i' points outside the array boundarys here. I think the if statement is fine as both od_vddc_lookup_table->entries[] and podn_vdd_dep->entries[] both hold MAX_REGULAR_DPM_NUMBER members, which is 8, so accessing entries[7] is not out of bounds. Btw, the patch works for me aswell. Card behaves as it should after loading my pp_table, which was not the case before.
In the (now obsolete) proposed code, the variable 'i' will become 8, when the for-loop is done. The following if-statement will access something outside the array memory. Something like this may work without problems, but it can trigger a new problem too.
Created attachment 285633 [details] proposed patch v2
(In reply to haro41 from comment #3) > Did you debug this issue? I think the problem could be outside this code. > > I would outcomment the if-statement following for-loop in your proposed > patch, because otherwise 'i' points outside the array boundarys here. Thank you for your reply. I have uploaded a new patch with your suggestion. It looks to me now that this is not so much a bug but intended beviour. I would still like to see this changed though.
I have to agree, the code in its current state, only allows overvolting for dpm level 7. Since the highest performance level is the most interesting one, if it comes to undervolting, energy saving and performance maximization, that should be fixed asap. Thanks for your effort, btw.
(In reply to Pelle van Gils from comment #6) > Created attachment 285633 [details] > proposed patch v2 Applied. thanks!