Bug 219640

Summary: [REGRESSION, BISECTED] Preferred cores working incorrectly for Zen3 CPU (regression)
Product: Power Management Reporter: Sebastian Obrusiewicz (sobrus)
Component: cpufreqAssignee: Mario Limonciello (AMD) (mario.limonciello)
Status: RESOLVED CODE_FIX    
Severity: normal CC: mario.limonciello
Priority: P3    
Hardware: AMD   
OS: Linux   
URL: https://discuss.cachyos.org/t/preferred-cores-stopped-working-for-zen3-cpu/5108/6
Kernel Version: 6.13rc4 Subsystem:
Regression: Yes Bisected commit-id: 39311a230e04eab2fe7e257ad79922040bfdaf1c
Attachments: Outputs comparing affected and unaffected kernels
potential patch (v1)

Description Sebastian Obrusiewicz 2024-12-30 09:16:21 UTC
Created attachment 307415 [details]
Outputs comparing affected and unaffected kernels

Hi Kernel Team,

I've noticed a preferred cores regression while using most recent CachyOS kernels on Zen3 platform with amd_pstate_epp driver (5950x, amd_pstate=active).
Regression is clearly visible in ryzen_monitor when running low thread workloads - instead of sticking to best cores, it chooses the cores randomly (?) for each subsequent workload run.

CachyOS team developer has bissected the issue to commit 39311a230e04eab2fe7e257ad79922040bfdaf1c

Original CachyOS forum thread:
https://discuss.cachyos.org/t/preferred-cores-stopped-working-for-zen3-cpu/5108/6

Steps to reproduce:
- use ryzen_monitor or some kind of system monitor to see which cores are being utilized
- run “stress - c 2” or other software causing low-thread workload.
- observe core utilization, frequency and consumed power

Below I'm attaching detailed commit info, output of cpupower frequency-info, output of amd-pstate-triage.py and output of ryzen_monitor while running "stress -c 4" for both affected and unaffected kernels.

The differences I've noticed are:
- affected kernel has ITMT:	Y instead of 1
- affected kernel highest performance reported by cpupower frequency-info is 166 instead of 196.

Hardware is Ryzen 5950x, undervolted using per-core PBO and voltage offset, limited to 4750Mhz and 80C, AGESA 1.2.0.7, chipset B550.

Commit info:

commit 39311a230e04eab2fe7e257ad79922040bfdaf1c
Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Mon Dec 9 12:52:34 2024 -0600

    cpufreq/amd-pstate: Store the boost numerator as highest perf again

    commit ad4caad58d91d ("cpufreq: amd-pstate: Merge
    amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()")
    changed the semantics for highest perf and commit 18d9b52271213
    ("cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled")
    worked around those semantic changes.

    This however is a confusing result and furthermore makes it awkward to
    change frequency limits and boost due to the scaling differences. Restore
    the boost numerator to highest perf again.

    Suggested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
    Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
    Fixes: ad4caad58d91 ("cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()")
    Link: https://lore.kernel.org/r/20241209185248.16301-2-mario.limonciello@amd.com
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

 Documentation/admin-guide/pm/amd-pstate.rst |  4 +---
 drivers/cpufreq/amd-pstate.c                | 25 +++++++++++++++----------
 2 files changed, 16 insertions(+), 13 deletions(-)
Comment 1 Mario Limonciello (AMD) 2025-01-02 02:11:29 UTC
Created attachment 307438 [details]
potential patch (v1)

FWIW that commit ID is a little bit different than what landed in mainline.  This is the mainline commit ID:

https://git.kernel.org/torvalds/c/50a062a762005

Looking at the changes in what you identified, I have a patch that might help.  Can you please have a try with it?

If that doesn't work to help isolate; can you please tell me if you can also reproduce this issue using the 'linux-next' branch here:

https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/log/?h=linux-next

While also adding this patch series on top of it?

https://lore.kernel.org/lkml/20241223043407.1611-1-kprateek.nayak@amd.com/#t

If not aware you can use the tool 'b4' to download and apply that series:

```
b4 shazam https://lore.kernel.org/lkml/20241223043407.1611-1-kprateek.nayak@amd.com/#t
```
Comment 2 Sebastian Obrusiewicz 2025-01-02 10:17:09 UTC
Hi! This patch seems to be working fine (patched kernel build provided my Naim)

Linux version 6.12.7-5-cachyos-test (linux-cachyos-test@cachyos) (clang version 18.1.8, LLD 18.1.8) #1 SMP PREEMPT_DYNAMIC Thu, 02 Jan 2025 09:27:15 +0000

│  Core 0 │   Sleeping |  1.848 W | 1.325 V |  49.62 C | C0:   3.2 % | C1:  96.8 % | C6:   0.0 % │
│  Core 1 │   4750 MHz |  8.691 W | 1.325 V |  65.41 C | C0: 100.0 % | C1:   0.0 % | C6:   0.0 % │
│  Core 2 │   4750 MHz |  8.749 W | 1.325 V |  66.34 C | C0: 100.0 % | C1:   0.0 % | C6:   0.0 % │
│  Core 3 │   Sleeping |  0.699 W | 0.598 V |  47.98 C | C0:   1.2 % | C1:  34.2 % | C6:  64.6 % │
│  Core 4 │   Sleeping |  0.357 W | 0.353 V |  50.10 C | C0:   0.9 % | C1:  12.7 % | C6:  86.4 % │
│  Core 5 │   Sleeping |  0.320 W | 0.330 V |  47.51 C | C0:   1.2 % | C1:  10.3 % | C6:  88.4 % │
│  Core 6 │   4750 MHz |  8.778 W | 1.325 V |  66.78 C | C0: 100.0 % | C1:   0.0 % | C6:   0.0 % │
│  Core 7 │   4750 MHz |  8.709 W | 1.325 V |  65.68 C | C0: 100.0 % | C1:   0.0 % | C6:   0.0 % │
│  Core 8 │   Sleeping |  0.036 W | 0.204 V |  39.22 C | C0:   0.1 % | C1:   0.3 % | C6:  99.6 % │
│  Core 9 │   Sleeping |  0.026 W | 0.200 V |  34.78 C | C0:   0.0 % | C1:   0.0 % | C6: 100.0 % │
│ Core 10 │   Sleeping |  0.030 W | 0.200 V |  39.01 C | C0:   0.0 % | C1:   0.0 % | C6: 100.0 % │
│ Core 11 │   Sleeping |  0.027 W | 0.201 V |  34.86 C | C0:   0.1 % | C1:   0.1 % | C6:  99.9 % │
│ Core 12 │   Sleeping |  0.029 W | 0.200 V |  38.71 C | C0:   0.0 % | C1:   0.0 % | C6: 100.0 % │
│ Core 13 │   Sleeping |  0.039 W | 0.217 V |  34.82 C | C0:   0.1 % | C1:   1.5 % | C6:  98.5 % │
│ Core 14 │   Sleeping |  0.029 W | 0.200 V |  38.34 C | C0:   0.0 % | C1:   0.0 % | C6: 100.0 % │
│ Core 15 │   Sleeping |  0.035 W | 0.216 V |  34.71 C | C0:   0.0 % | C1:   1.4 % | C6:  98.6 % │
Comment 3 Mario Limonciello (AMD) 2025-01-02 16:53:39 UTC
Sent the patch for review.

https://lore.kernel.org/linux-pm/20250102141204.3413202-1-superm1@kernel.org/T/#u