Bug 211305
Summary: | schedutil selects low P-States on AMD EPYC with frequency invariance | ||
---|---|---|---|
Product: | Power Management | Reporter: | Giovanni Gherdovich (ggherdovich) |
Component: | cpufreq | Assignee: | linux-pm (linux-pm) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | gardotd426, ggherdovich, rjw, rric |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | v5.11-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
plot of mpstat activity data
plot of frequency requests from the tracepoint power:cpu_frequency plot of frequency data from hardware feedback (APERF, MPERF) plot of PELT root runqueues utilization cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known (v2) |
Description
Giovanni Gherdovich
2021-01-21 00:55:45 UTC
Created attachment 294791 [details]
plot of mpstat activity data
Activity data of good and bad kernel. The plot shows that the test is CPU-bound.
Created attachment 294793 [details]
plot of frequency requests from the tracepoint power:cpu_frequency
The tracepoint shows that on the bad kernel schedutil requests almost exclusively the minimum P-State
Created attachment 294795 [details]
plot of frequency data from hardware feedback (APERF, MPERF)
"cpupower monitor" shows that the bad kernel actually run at the minimum P-State.
Created attachment 294797 [details]
plot of PELT root runqueues utilization
The PELT utilization for root runqueues of the bad kernel is half what was on the good kernel (~450 vs ~825).
A candidate fix for this problem has been posted to LKML: https://lore.kernel.org/lkml/20210122204038.3238-1-ggherdovich@suse.cz So, the replacement patch from Rafael causes Zen 3 frequency reporting to be ALL jacked up. Before the patch, core frequencies in /proc/cpuinfo as well as using tools like nmon seemed accurate. After testing Rafael's patch, my core frequencies are all up around 6 GHz (!), and even external tools like Geekbench report my 5800X's BASE clock as 6.0 GHz (https://browser.geekbench.com/v5/cpu/6466982) I'm sure this isn't intended behavior. The patch was merged like yesterday into the mainline kernel, so should I file an actual bug report? On Fri, Feb 12, 2021 at 6:29 PM <bugzilla-daemon@bugzilla.kernel.org> wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=211305 > > Matt McDonald (gardotd426@gmail.com) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |gardotd426@gmail.com > > --- Comment #6 from Matt McDonald (gardotd426@gmail.com) --- > So, the replacement patch from Rafael causes Zen 3 frequency reporting to be > ALL jacked up. > > Before the patch, core frequencies in /proc/cpuinfo as well as using tools > like > nmon seemed accurate. After testing Rafael's patch, my core frequencies are > all > up around 6 GHz (!), and even external tools like Geekbench report my 5800X's > BASE clock as 6.0 GHz (https://browser.geekbench.com/v5/cpu/6466982) > > I'm sure this isn't intended behavior. If the reported frequencies are like that all the time, then it isn't. What is there in scaling_cur_freq in sysfs if the system is idle? > The patch was merged like yesterday into > the mainline kernel, so should I file an actual bug report? It doesn't particularly matter, because I have seen this comment from you. Created attachment 295255 [details]
cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
Attached is a tentative fix on top of commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies").
Please give it a go and report back.
Comment on attachment 295255 [details]
cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
Yeah sure thing, I'm building now.
Okay so that's *way* worse. Everything's limited and locked to 2.2GHz. And yes, it's actually running at 2.2GHz, it's not misreporting. My Geekbench score was less than a third of what it should be cat /proc/cpuinfo | grep MHz cpu MHz : 2200.000 cpu MHz : 2200.088 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2199.982 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cpu MHz : 2200.000 cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq 2199680 2199981 2195932 2199979 2195634 2198726 2199437 2195587 2197662 2198924 2198856 2195535 2196402 2199234 2199880 2195064 analyzing CPU 0: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 2.20 GHz - 6.00 GHz available frequency steps: 3.80 GHz, 2.80 GHz, 2.20 GHz available cpufreq governors: performance schedutil current policy: frequency should be within 2.20 GHz and 2.20 GHz. The governor "schedutil" may decide which speed to use within this range. current CPU frequency: 2.20 GHz (asserted by call to hardware) boost state support: Supported: yes Active: no Boost States: 0 Total States: 3 Pstate-P0: 1000MHz Pstate-P1: 700MHz Pstate-P2: 500MHz Both schedutil and performance governors had no effect. But I do see in that cpupower output that it says the hardware limits happen to be 2.20GHz to 6.0GHz. Oh, I can also add that the previous patch that was turned down and replaced with this patchset doesn't cause this issue, cpu frequency and frequency reporting work as expected with that patch, and I'm able to boost up to 4750MHz under full load and 5GHz under single-core load. (In reply to Matt McDonald from comment #10) > Okay so that's *way* worse. > > Everything's limited and locked to 2.2GHz. And yes, it's actually running at > 2.2GHz, it's not misreporting. My Geekbench score was less than a third of > what it should be > > cat /proc/cpuinfo | grep MHz > cpu MHz : 2200.000 > cpu MHz : 2200.088 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2199.982 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 > cpu MHz : 2200.000 This actually doesn't mean that the CPUs are running at the given frequency. > > cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq > 2199680 > 2199981 > 2195932 > 2199979 > 2195634 > 2198726 > 2199437 > 2195587 > 2197662 > 2198924 > 2198856 > 2195535 > 2196402 > 2199234 > 2199880 > 2195064 And so this. > analyzing CPU 0: > driver: acpi-cpufreq > CPUs which run at the same hardware frequency: 0 > CPUs which need to have their frequency coordinated by software: 0 > maximum transition latency: Cannot determine or is not supported. > hardware limits: 2.20 GHz - 6.00 GHz > available frequency steps: 3.80 GHz, 2.80 GHz, 2.20 GHz > available cpufreq governors: performance schedutil > current policy: frequency should be within 2.20 GHz and 2.20 GHz. > The governor "schedutil" may decide which speed to use > within this range. > current CPU frequency: 2.20 GHz (asserted by call to hardware) > boost state support: > Supported: yes > Active: no > Boost States: 0 > Total States: 3 > Pstate-P0: 1000MHz > Pstate-P1: 700MHz > Pstate-P2: 500MHz > > > Both schedutil and performance governors had no effect. > > But I do see in that cpupower output that it says the hardware limits happen > to be 2.20GHz to 6.0GHz. That's as expected. (In reply to Matt McDonald from comment #11) > Oh, I can also add that the previous patch that was turned down and replaced > with this patchset doesn't cause this issue, cpu frequency and frequency > reporting work as expected with that patch, and I'm able to boost up to > 4750MHz under full load and 5GHz under single-core load. So what do you see in /proc/cpuinfo and scaling_cur_freq with commits 3c55e94c0ade and d11a1d08a082 reverted? Also can you please enable dynamic debug in freq_table.c, unload acpi-cpufreq, load it again and attach the output of dmesg? Created attachment 295295 [details]
cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known (v2)
I found a mistake in the previous version of the fix patch which didn't initialize policy->max properly.
Please test this one instead and there is no need to provide the information requested in the previous comments (at least not ATM).
Thanks!
Haha I'd just typed my response and bugzilla stopped me from submitting. That's a cool feature. Yeah, I'll build and test now. That does seem to have fixed it: cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq 4854354 3823787 3647266 4016171 3576030 3974600 3816628 3590646 3919312 3626692 3618178 3597246 4367040 3599805 3837612 3874146 cat /proc/cpuinfo | grep MHz cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 4193.751 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 cpu MHz : 3800.000 sudo cpupower frequency-info [sudo] password for matt: analyzing CPU 0: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 2.20 GHz - 6.00 GHz available frequency steps: 3.80 GHz, 2.80 GHz, 2.20 GHz available cpufreq governors: performance schedutil current policy: frequency should be within 2.20 GHz and 3.80 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: 3.80 GHz (asserted by call to hardware) boost state support: Supported: yes Active: no Boost States: 0 Total States: 3 Pstate-P0: 1000MHz Pstate-P1: 700MHz Pstate-P2: 500MHz Everything is back to how it should be, only now with assumingly better schedutil performance (I'll run some benchmarks later). No 6.0GHz reporting and no being stuck at 2.20GHz. CPU performance under the "performance" governor is back to where it should be, and I'm boosting up to 4.9-5.0 in single core and 4.8 all-core. OK, thanks for testing! Let me post the last patch for verification. |