Bug 219348
Summary: | [6.12] amd cpb boost disabling does not lock frequency | ||
---|---|---|---|
Product: | Power Management | Reporter: | Peter Jung (ptr1337) |
Component: | cpufreq | Assignee: | Mario Limonciello (AMD) (mario.limonciello) |
Status: | NEW --- | ||
Severity: | high | CC: | christian, Dhananjay.Ugwekar, mario.limonciello, Perry.Yuan |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
Visulizing the problem
possible patch (v1) possible patch (v2) Values of 6.12rc1 clean, 6.12rc1 + v2 patches, 6.12rc1 + revert possible patch (v3) cpufreq grep values possible patch (v4) possible patch (v5) |
Description
Peter Jung
2024-10-03 16:22:03 UTC
This is the mainline commit ID matching what you bisected down to. https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/commit/?id=ad4caad58d91d3293880f8074f7ad125490ce636 I believe the issue is that refresh_frequency_limits() and amd_pstate_cpu_boost_update() both end up writing the CPPC register. Let me reproduce and see what makes sense to do here. Created attachment 306960 [details]
possible patch (v1)
I think I got a handle on what's going on. There are two problems I found.
1) The wrong upper limit is used when EPP limits are rewritten.
2) The CPPC value is written twice with different values.
Please try the attached mbox (it's two patches).
Hi Mario, I have tested these two patches, while applying on top of the 6.12.0rc1 Kernel, but this currently results into the same issue. Created attachment 306962 [details]
possible patch (v2)
OK, let me try again. See if this helps. If it doesn't help then I would ask if you can please capture the MSR values using rdmsr for the following cases:
1) 6.12-rc1 at bootup (no changes; boost will be on)
2) 6.12-rc1 at bootup + turn off boost
3) 6.12-rc1 + revert at bootup (no changes; boost will be on)
4) 6.12-rc1 + revert at bootup + turn off boost
Thanks!
Hi Mario, You can find the values below. Ive used -a because there were some differences on some cores: ``` 6.12 rc1 boost enabled 800058b0 800055ab 800062c4 800064c9 800053a6 80005fbf 80005ab5 80005dba 800069d3 800067ce 80006cd8 800073e7 80006edd 800071e2 800076ec 800076ec 800058b0 800055ab 800062c4 800064c9 800053a6 80005fbf 80005ab5 80005dba 800069d3 800067ce 80006cd8 800073e7 80006edd 800071e2 800076ec 800076ec 6.12rc1 boost disabled: ff00117e ff00117a ff00138c ff001490 ff001177 ff001388 ff001281 ff001285 ff001597 ff001493 ff00159a ff0017a5 ff00169e ff0016a2 ff0017a9 ff0017a9 ff00117e ff00117a ff00138c ff001490 ff001177 ff001388 ff001281 ff001285 ff001597 ff001493 ff00159a ff0017a5 ff00169e ff0016a2 ff0017a9 ff0017a9 6.12rc1 + revert boost enabled: 800053b0 800052ab 800053c4 800053c9 800053a6 800053bf 800052b5 800053ba 800053d3 800052ce 800053d8 800053e7 800053dd 800053e2 800053ec 800053ec 800053b0 800052ab 800053c4 800053c9 800053a6 800053bf 800052b5 800053ba 800053d3 800052ce 800053d8 800053e7 800053dd 800053e2 800053ec 800053ec 6.12rc1 + revert boost disabled: ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ff001177 ``` Thanks :) OK, can you get me the same MSRs with 6.12-rc1 + v2 patch (both enabled and disabled) I see some problems in your above output. 1) It seems like the EPP value isn't written correctly when boost is disabled revert or not. I think that's fixed in my 6.12-rc1. 2) It seems that the max perf for all CPUs is the same with the revert. I am not sure this is really correct. Do all CPUs have the same "nominal" frequency? I don't think so. So I'd also like if you can fully characterize the situation with the v2 patches. Are cores really going above nominal freq? Created attachment 306973 [details]
Values of 6.12rc1 clean, 6.12rc1 + v2 patches, 6.12rc1 + revert
The revert seems to change the minimum perf from 0x55 (on first CPU) when non revert has 0x58. Is that possibly because of some lowest nonlinear freq related patches or the PPD behavior? Just want to make sure I'm not mixing things up. At least for your first CPU can you get me output for : # grep -v /sys/bus/cpu/devices/cpu0/cpufreq/* w/ 6.12-rc as well as 6.12-rc with the revert? I want to see what else changes. Created attachment 306993 [details]
possible patch (v3)
I have a guess at what's going on, here's another patch to try.
I would still like that data though even if this doesn't help.
Created attachment 306994 [details]
cpufreq grep values
In the attachment you can find the values of the grep command.
I have also tested v3 and disabling boost is still not working.
Well the revert definitely isn't the right way to go. It's causing a totally wrong max freq calculation when boost is on. /sys/bus/cpu/devices/cpu0/cpufreq/cpuinfo_max_freq:8180000 /sys/bus/cpu/devices/cpu0/cpufreq/scaling_max_freq:8180000 V3 at least looks more right to me. Can I see the MSRs for that? I dont see these high values on 6.11 tough. Maybe its due the partial revert. RDMSR for 6.12.0rc3 + v3: Boost on: 800058ab 800056a6 800065c4 800068c9 80005bb0 800060ba 800062bf 80005db5 80006dd3 80006ace 80006fd8 800077e7 800072dd 800075e2 80007aec 80007aec 800058ab 800056a6 800065c4 800068c9 80005bb0 800060ba 800062bf 80005db5 80006dd3 80006ace 80006fd8 800077e7 800072dd 800075e2 80007aec 80007aec Boost off: 8000587f 8000567c 80006592 80006896 80005b83 8000608b 8000628e 80005d87 80006d9d 80006a99 80006fa1 800077ac 800072a5 800075a8 80007ab0 80007ab0 8000587f 8000567c 80006592 80006896 80005b83 8000608b 8000628e 80005d87 80006d9d 80006a99 80006fa1 800077ac 800072a5 800075a8 80007ab0 80007ab0 Created attachment 307002 [details]
possible patch (v4)
OK, I spent some time tracing this and I think I've got a solution.
The reason I was having a hard time reproducing it is because it's specific to the calculation of the upper boundary for preferred core systems.
Assuming v4 works I'll plan to get this into 6.12-rc, but I still want to pull some of my earlier cleanups too for 6.13 as follow ups.
Gah; nope it seems like this regressed in the other direction now, it's not reaching the highest perf. Back to the drawing board :/ Created attachment 307003 [details]
possible patch (v5)
OK, here's an updated series. The first patch in the series I expect fixes the issue, and that would go for 6.12-rc. The others are cleanups that I would push to 6.13.
Thanks Mario, the v5 series did fix this issue on my 9950X. Boost is correctly disabled and everything is working correct, see Screenshot: https://imgur.com/a/EjbBk6h |