Bug 219348

Summary: [6.12] amd cpb boost disabling does not lock frequency
Product: Power Management Reporter: Peter Jung (ptr1337)
Component: cpufreqAssignee: Mario Limonciello (AMD) (mario.limonciello)
Status: NEW ---    
Severity: high CC: christian, Dhananjay.Ugwekar, mario.limonciello, Perry.Yuan
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: Visulizing the problem
possible patch (v1)
possible patch (v2)
Values of 6.12rc1 clean, 6.12rc1 + v2 patches, 6.12rc1 + revert
possible patch (v3)
cpufreq grep values
possible patch (v4)
possible patch (v5)

Description Peter Jung 2024-10-03 16:22:03 UTC
Created attachment 306956 [details]
Visulizing the problem

Hi together,

The 6.12 Kernel introduces an amd-pstate regression, which makes disabling boost not possible. 

The max frequency reported by cpupower is correct (4.3 GHz), when  boost is disabled but it reaches the max frequency and this is not limited.

I have bisected it down to following commit https://lore.kernel.org/lkml/20240905163007.1350840-9-superm1@kernel.org/ :

```
cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()

The special case in amd_pstate_highest_perf_set() is the value used
for calculating the boost numerator.  Merge this into
amd_get_boost_ratio_numerator() and then use that to calculate boost
ratio.

This allows dropping more special casing of the highest perf value.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.sheoy@amd.com>
```

Reverting this commit does fix the **used** frequency limits.
In the attachments is a picture, which visualize this issue.

Info:
```
Operating System: CachyOS Linux 
KDE Plasma Version: 6.1.90
KDE Frameworks Version: 6.6.0
Qt Version: 6.8.0
Kernel Version: 6.12.0-rc1 (64-bit)
Graphics Platform: Wayland
Processors: 32 × AMD Ryzen 9 9950X 16-Core Processor
Memory: 62,4 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 4070 SUPER/PCIe/SSE2
Manufacturer: ASRock
Product Name: X670E Pro RS
```
Comment 1 Mario Limonciello (AMD) 2024-10-03 20:37:39 UTC
This is the mainline commit ID matching what you bisected down to.

https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/commit/?id=ad4caad58d91d3293880f8074f7ad125490ce636

I believe the issue is that refresh_frequency_limits() and amd_pstate_cpu_boost_update() both end up writing the CPPC register.  Let me reproduce and see what makes sense to do here.
Comment 2 Mario Limonciello (AMD) 2024-10-03 21:31:42 UTC
Created attachment 306960 [details]
possible patch (v1)

I think I got a handle on what's going on.  There are two problems I found.

1) The wrong upper limit is used when EPP limits are rewritten.
2) The CPPC value is written twice with different values.

Please try the attached mbox (it's two patches).
Comment 3 Peter Jung 2024-10-03 21:57:20 UTC
Hi Mario,

I have tested these two patches, while applying on top of the 6.12.0rc1 Kernel, but this currently results into the same issue.
Comment 4 Mario Limonciello (AMD) 2024-10-04 04:07:38 UTC
Created attachment 306962 [details]
possible patch (v2)

OK, let me try again.  See if this helps.  If it doesn't help then I would ask if you can please capture the MSR values using rdmsr for the following cases:

1) 6.12-rc1 at bootup (no changes; boost will be on)
2) 6.12-rc1 at bootup + turn off boost
3) 6.12-rc1 + revert at bootup (no changes; boost will be on)
4) 6.12-rc1 + revert at bootup + turn off boost

Thanks!
Comment 5 Peter Jung 2024-10-04 08:49:05 UTC
Hi Mario, 

You can find the values below. Ive used -a because there were some differences on some cores:

```
6.12 rc1 boost enabled

800058b0
800055ab
800062c4
800064c9
800053a6
80005fbf
80005ab5
80005dba
800069d3
800067ce
80006cd8
800073e7
80006edd
800071e2
800076ec
800076ec
800058b0
800055ab
800062c4
800064c9
800053a6
80005fbf
80005ab5
80005dba
800069d3
800067ce
80006cd8
800073e7
80006edd
800071e2
800076ec
800076ec

6.12rc1 boost disabled:
ff00117e
ff00117a
ff00138c
ff001490
ff001177
ff001388
ff001281
ff001285
ff001597
ff001493
ff00159a
ff0017a5
ff00169e
ff0016a2
ff0017a9
ff0017a9
ff00117e
ff00117a
ff00138c
ff001490
ff001177
ff001388
ff001281
ff001285
ff001597
ff001493
ff00159a
ff0017a5
ff00169e
ff0016a2
ff0017a9
ff0017a9

6.12rc1 + revert boost enabled:

800053b0
800052ab
800053c4
800053c9
800053a6
800053bf
800052b5
800053ba
800053d3
800052ce
800053d8
800053e7
800053dd
800053e2
800053ec
800053ec
800053b0
800052ab
800053c4
800053c9
800053a6
800053bf
800052b5
800053ba
800053d3
800052ce
800053d8
800053e7
800053dd
800053e2
800053ec
800053ec

6.12rc1 + revert boost disabled:

ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177
ff001177

```

Thanks :)
Comment 6 Mario Limonciello (AMD) 2024-10-05 00:53:25 UTC
OK, can you get me the same MSRs with 6.12-rc1 + v2 patch (both enabled and disabled)

I see some problems in your above output.
1) It seems like the EPP value isn't written correctly when boost is disabled revert or not.  I think that's fixed in my 6.12-rc1.
2) It seems that the max perf for all CPUs is the same with the revert.  I am not sure this is really correct.  Do all CPUs have the same "nominal" frequency?  I don't think so.


So I'd also like if you can fully characterize the situation with the v2 patches.  Are cores really going above nominal freq?
Comment 7 Peter Jung 2024-10-05 13:38:43 UTC
Created attachment 306973 [details]
Values of 6.12rc1 clean, 6.12rc1 + v2 patches, 6.12rc1 + revert
Comment 8 Mario Limonciello (AMD) 2024-10-07 19:02:33 UTC
The revert seems to change the minimum perf from 0x55 (on first CPU) when non revert has 0x58.  Is that possibly because of some lowest nonlinear freq related patches or the PPD behavior?

Just want to make sure I'm not mixing things up.

At least for your first CPU can you get me output for :
# grep -v /sys/bus/cpu/devices/cpu0/cpufreq/*

w/ 6.12-rc as well as 6.12-rc with the revert?

I want to see what else changes.
Comment 9 Mario Limonciello (AMD) 2024-10-09 18:51:25 UTC
Created attachment 306993 [details]
possible patch (v3)

I have a guess at what's going on, here's another patch to try.

I would still like that data though even if this doesn't help.
Comment 10 Peter Jung 2024-10-09 20:00:44 UTC
Created attachment 306994 [details]
cpufreq grep values

In the attachment you can find the values of the grep command.

I have also tested v3 and disabling boost is still not working.
Comment 11 Mario Limonciello (AMD) 2024-10-09 20:24:56 UTC
Well the revert definitely isn't the right way to go.  It's causing a totally wrong max freq calculation when boost is on.

/sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_max_freq:8180000
/sys/bus/cpu/devices/cpu0/cpufreq/scaling_max_freq:8180000

V3 at least looks more right to me.  Can I see the MSRs for that?
Comment 12 Peter Jung 2024-10-09 20:27:37 UTC
I dont see these high values on 6.11 tough. Maybe its due the partial revert.

RDMSR for 6.12.0rc3 + v3:

Boost on:
800058ab
800056a6
800065c4
800068c9
80005bb0
800060ba
800062bf
80005db5
80006dd3
80006ace
80006fd8
800077e7
800072dd
800075e2
80007aec
80007aec
800058ab
800056a6
800065c4
800068c9
80005bb0
800060ba
800062bf
80005db5
80006dd3
80006ace
80006fd8
800077e7
800072dd
800075e2
80007aec
80007aec


Boost off:
8000587f
8000567c
80006592
80006896
80005b83
8000608b
8000628e
80005d87
80006d9d
80006a99
80006fa1
800077ac
800072a5
800075a8
80007ab0
80007ab0
8000587f
8000567c
80006592
80006896
80005b83
8000608b
8000628e
80005d87
80006d9d
80006a99
80006fa1
800077ac
800072a5
800075a8
80007ab0
80007ab0
Comment 13 Mario Limonciello (AMD) 2024-10-12 02:09:38 UTC
Created attachment 307002 [details]
possible patch (v4)

OK, I spent some time tracing this and I think I've got a solution.

The reason I was having a hard time reproducing it is because it's specific to the calculation of the upper boundary for preferred core systems.

Assuming v4 works I'll plan to get this into 6.12-rc, but I still want to pull some of my earlier cleanups too for 6.13 as follow ups.
Comment 14 Mario Limonciello (AMD) 2024-10-12 03:05:41 UTC
Gah; nope it seems like this regressed in the other direction now, it's not reaching the highest perf.

Back to the drawing board :/
Comment 15 Mario Limonciello (AMD) 2024-10-12 04:16:22 UTC
Created attachment 307003 [details]
possible patch (v5)

OK, here's an updated series.  The first patch in the series I expect fixes the issue, and that would go for 6.12-rc.  The others are cleanups that I would push to 6.13.
Comment 16 Peter Jung 2024-10-12 13:44:29 UTC
Thanks Mario, the v5 series did fix this issue on my 9950X.

Boost is correctly disabled and everything is working correct, see Screenshot:
https://imgur.com/a/EjbBk6h