Bug 75121
Description
Doug Smythies
2014-04-29 23:47:53 UTC
Created attachment 134311 [details]
The phoronix ffmpeg test is an easy way to show the issue
Various kernels, showing the increase in test execution time as of and since the commit referenced herein.
For reference some apci-cpufreq modes are included as is an intel_pstate performance mode run.
Created attachment 134321 [details]
CPU 7 frequency vs load. Turbo on. various kernels
This graph is more generic, and shows the CPU not ramping up in CPU frequency as the load becomes significant.
Created attachment 134331 [details]
CPU 7 frequency vs load / idle freqency. Turbo on. various kernels
This graph is also more generic and shows the frequency response (meaning the load /idle frequency response) under constant load.
Created attachment 134341 [details]
The phoronix ffmpeg test graph again with one more data point
I forgot to add the results from my "doug" test kernel to the phoronix graph before I posted it, so adding it now.
Hi Doug, I am trying to get phoronix installed to reproduce this (without success so far) Could you run: perf record -a -c 1 -e power:pstate_sample phoronix-test-suite ffmpeg And attach the output of: perf script Created attachment 134771 [details]
output from perf script as requested
I had troubles with phoronix stuff also when I tried to run it under "perf record". It seemed to think it needed to re-install and then I couldn't find where the test profile was, so that I could change it to 1 run instead of the default 3 (I typically run it 10 times, and always abort the first run because it has an inconsistent run time due to the extra time needed to load the file, which thereafter is cached).
In the end I ran this:
sudo /home/doug/bin/perf record -a -c 1 -e power:pstate_sample phoronix-test-suite benchmark pts/ffmpeg
The output from "pref script" has been truncated to just one of the 3 tests.
Myself, I find it much easier to interpret pref data captured for the "consume" program that was used to make the other graphs. I'll attach some of that data also.
Created attachment 134781 [details]
Perf data for Kernel 3.15RC3 CPU at 85% load at 200 Hertz load / noload frequency.
Under these conditions, the CPU is never in the C0 state for an entire intel_pstate sample time. In my opinion the C0 inclusion unduly biases the target p-state downwards.
Created attachment 134791 [details]
Perf data for Kernel 3.15RC3 CPU at 10% load at 200 Hertz load / noload frequency.
Under these conditions, the CPU used to have started to ramp up in frequency.
Created attachment 134801 [details]
Perf data for Kernel 3.15RC3-doug CPU at 10% load at 200 Hertz load / noload frequency.
This time with the C0 inclusion removed
Created attachment 134811 [details]
Perf data for Kernel 3.15RC3 CPU at 0.5% load at 200 Hertz load / noload frequency.
The purpose of this data is to show: Math seems to have underflowed or something a couple of times; Where is all the data, there should be more samples.
Created attachment 134821 [details]
Perf data for Kernel 3.15RC3-doug2 CPU at 85% load at 200 Hertz load / noload frequency.
For this one, I tried the following, which includes C0 but applied to core_pct on a scale from min_perf_pct rather than from 0. However, it made no difference, as the C0 inclusion still dominates.
static inline void intel_pstate_calc_busy(struct cpudata *cpu,
struct sample *sample)
{
int32_t core_pct;
int32_t c0_pct;
int32_t temp;
// need to figure out how to do fractional weight
#define C0_WEIGHT 1
core_pct = div_fp(int_tofp((sample->aperf)),
int_tofp((sample->mperf)));
core_pct = mul_fp(core_pct, int_tofp(100));
FP_ROUNDUP(core_pct);
c0_pct = div_fp(int_tofp(sample->mperf), int_tofp(sample->tsc));
sample->freq = fp_toint(
mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
// sample->core_pct_busy = core_pct;
// sample->core_pct_busy = mul_fp(core_pct, c0_pct);
temp = core_pct - limits.min_perf_pct;
c0_pct = int_tofp(1) - mul_fp((int_tofp(1) - c0_pct), int_tofp(C0_WEIGHT));
temp = mul_fp(temp, c0_pct);
sample->core_pct_busy = temp + limits.min_perf_pct;
}
Created attachment 134831 [details]
CPU 7 frequency vs load. Turbo on. Modified C0 inclusion
There was a stupid mistake in the code change I made in the previous post. Here it is again, fixed.
static inline void intel_pstate_calc_busy(struct cpudata *cpu,
struct sample *sample)
{
int32_t core_pct;
int32_t c0_pct;
int32_t temp;
// As a float with 6 FRAC_BITS ( 1 << FRAC_BITS / 4 )
#define C0_WEIGHT 16
core_pct = div_fp(int_tofp((sample->aperf)),
int_tofp((sample->mperf)));
core_pct = mul_fp(core_pct, int_tofp(100));
FP_ROUNDUP(core_pct);
c0_pct = div_fp(int_tofp(sample->mperf), int_tofp(sample->tsc));
sample->freq = fp_toint(
mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
// sample->core_pct_busy = core_pct;
// sample->core_pct_busy = mul_fp(core_pct, c0_pct);
temp = core_pct - int_tofp(limits.min_perf_pct);
c0_pct = int_tofp(1) - mul_fp((int_tofp(1) - c0_pct), C0_WEIGHT);
temp = mul_fp(temp, c0_pct);
sample->core_pct_busy = temp + int_tofp(limits.min_perf_pct);
}
And the graph shows the effect, for C0_WEIGHT of 1 (Doug3) and 0.25 (Doug 4)
Created attachment 134841 [details]
The phoronix ffmpeg test graph again with the C0 weight 25% data point
Just showing some improvement in the phoronix ffmpeg test with the code mods shown previously.
Created attachment 134851 [details]
CPU 7 frequency vs load / idle freqency. Turbo on. Modified C0 inclusion. Various kernels
Adding C0 weight 25% data to the previously posted load / idle frequency graph. Actually, it turned out better than I thought it would.
Created attachment 135151 [details] Just another method I tried. Phoronix ffmpeg test is as fast as performance mode with this method. Hi Dirk: Did you ever figure out for certain why some had the CPU freq remains high after suspend issue? It remains unclear to me, and I do not know how to even try to re-create here on my test computer. References (some but not all): https://bugzilla.kernel.org/show_bug.cgi?id=66581#c21 https://bugzilla.kernel.org/show_bug.cgi?id=66581#c26 (In reply to Doug Smythies from comment #15) > Created attachment 135151 [details] > Just another method I tried. Phoronix ffmpeg test is as fast as performance > mode with this method. > > Hi Dirk: Did you ever figure out for certain why some had the CPU freq > remains high after suspend issue? It remains unclear to me, and I do not > know how to even try to re-create here on my test computer. > I could make it happen occasionally on my ivy bridge laptop test system. I think it have to the hardware coordination on the chip but I haven't proven that. > References (some but not all): > https://bugzilla.kernel.org/show_bug.cgi?id=66581#c21 > https://bugzilla.kernel.org/show_bug.cgi?id=66581#c26 (In reply to Dirk Brandewie from comment #16) > (In reply to Doug Smythies from comment #15) > > Hi Dirk: Did you ever figure out for certain why some had the CPU freq > > remains high after suspend issue? It remains unclear to me, and I do not > > know how to even try to re-create here on my test computer. > > > > I could make it happen occasionally on my ivy bridge laptop test system. > > I think it have to the hardware coordination on the chip but I haven't > proven that. > I have figured out how to "suspend" my test computer. I have tried a few times, but so far haven't been able to re-create the issue. I wanted to be able to re-create the issue, as a base line reference, so that I would know how far I can adjust C0_WEIGHT or C0_MINIMUM and still have the issue never occur. It still doesn't make sense to me that the intel_pstate driver would work fine before but not after a "suspend". If the root issue is some hardware coordination on the chip, then shouldn't that be fixed (if possible via whatever re-initialization) rather than messing with the intel_pstate servo loop? If it is not some hardware coordination on the chip, but rather due to some flaw in the servo loop, then we should be able to re-create the issue without any intervening "suspend". Created attachment 135491 [details] CPU 7 frequency vs load. Turbo on. Dirk patch applied Reference: https://lkml.org/lkml/2014/5/8/574 For the Phoronix ffmpeg test I get the exact same numbers as Dirk (we have the same CPU) so am not posting a new graph. For CPU 7 freq. Vs. load, this is a new graph with Dirk's patches applied (I have removed some previous test data to reduce clutter). I'll post a new CPU 7 frequency vs load / idle frequency in a moment. Created attachment 135501 [details]
CPU 7 frequency vs load / idle freqency. Turbo on. Dirk patch applied.
Fix sent to LKML also the fix to FP_ROUNDUP() macro https://lkml.org/lkml/2014/5/8/574 Created attachment 137641 [details] Just showing Performance mode CPU frequency response curve The graph just shows performance mode response curve (done twice) with powersave, but min set to 100%, response curve (done twice) with normal powersave mode, but the C0 reduced or removed (two versions). I mentioned I would post this graph in http://www.spinics.net/lists/cpufreq/msg10167.html re: comment #20 There is no longer an FP_ROUNDUP macro in intel_pstate.c It was removed here: commit f0fe3cd7e12d8290c82284b5c8aee723cbd0371a Author: Dirk Brandewie <dirk.j.brandewie@intel.com> Date: Thu May 29 09:32:23 2014 -0700 intel_pstate: Correct rounding in busy calculation which shipped in Linux-3.15 Please re-open this report if the issue here was not fixed by that change. |