Bug 59481

Summary: Intel Pstates driver issues, including CPU frequency not stepping up enough
Product: Power Management Reporter: Doug Smythies (dsmythies)
Component: cpufreqAssignee: Dirk Brandewie (dirk.brandewie)
Status: CLOSED CODE_FIX    
Severity: normal CC: aaron.lu, gmesalazar, lenb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.10.0-rc4+ Subsystem:
Regression: No Bisected commit-id:
Attachments: 1 of 4 - CPU always under 100% load between samples
2 of 4 - CPU always under 100% load - sanity check test
3 of 4 - CPU unloaded between samples, sample 5 seconds after 100% load. Turbo off.
4 of 4 - CPU unloaded between samples. Turbo on.
1 of 4 - CPU always under 100% load between samples (correct graph)
Kernel Config
script for 1000 samples CPU
Script to ramp frequency up and down 12 times
forever CPU loading time waster
CPU loading time waster, but exits after a while
re-test of the "3 of 4" attachment from above with setpoint = 108
Sometimes fequency jumps fast, sometimes slower
re-test of the "3 of 4" attachment from above with code patch of today
re-test of the "4 of 4" attachment from above with code patch of yesterday
intel_pstate verses old driver compare
A frequency (process sleep frequency) ramp test
A detailed frequency area and 3 methods of pstate driver
Phoronix ffmpeg test results with various kernels and configs

Description Doug Smythies 2013-06-08 23:37:15 UTC
Created attachment 103961 [details]
1 of 4 - CPU always under 100% load between samples

Kernel (from: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
doug@s15:~/temp$ uname -a
Linux s15 3.10.0-rc4+ #1 SMP Sat Jun 8 08:48:15 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
However, much of my testing work was done on a distro specific version of 3.10rc4, and then I confirmed the same results on the above kernel.

Processor:
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
stepping        : 7
microcode       : 0x26

maximum frequency, turbo on = 3.8 GHz = 100%
maximum frequency, turbo off = 2.4 GHz = 100% (when turbo disabled, apparently)
minimum frequency = 1.6 GHz = 42% of max, turbo on or 47% of max, turbo off (see comments about min freq later)

Note 1: Thermal throttling is not the issue in any of these tests.
Note 2: I realize that can use intel_pstate=disable to be able to do some things mentioned herein, but I am just trying to help with this driver.
Note 3: I have read everything I could find about this new driver, but that doesn't mean I didn't miss something important.

Issue 1 (the main issue):

pstate driver conditions:
"powersave"
/sys/devices/system/cpu/intel_pstate/max_perf_pct = 100
/sys/devices/system/cpu/intel_pstate/min_perf_pct = 42
/sys/devices/system/cpu/intel_pstate/no_turbo = 0 or 1, it does not matter

Note: problem does not occur under these conditions:
"performance"
/sys/devices/system/cpu/intel_pstate/max_perf_pct = 100
/sys/devices/system/cpu/intel_pstate/min_perf_pct = 100
/sys/devices/system/cpu/intel_pstate/no_turbo = 0 (1 not tested)

Sometimes (approximately 20% of the time, sample size 1000 tests for each pstate driver conditions listed above), when a CPU intensive single thread task is started, the CPU only ramps up to 1.768 GHz instead of the maximum frequency. When this occurs, more load can be added and added until all CPUs are fully loaded and the CPU frequency will never increase.

Example (issue occurring):

doug@s15:~/temp$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1768000
1768000
1768000
1768000
1768000
1768000
1768000
1768000

top - 09:56:37 up 51 min,  3 users,  load average: 8.00, 7.99, 6.84
Tasks: 154 total,   9 running, 145 sleeping,   0 stopped,   0 zombie
Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Example (issue not occurring, single CPU (7) loaded):

doug@s15:~/temp$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
3672000
3672000
3638000
3774000
3672000
3502000
3604000
3774000  <<<< The active fully loaded CPU

top - 10:00:22 up 55 min,  3 users,  load average: 2.75, 6.29, 6.49
Tasks: 147 total,   2 running, 145 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Issue 2:

When turbo is turned off, it seems that max_perf_pct = 100 now means 3.4 GHz (see test result graphs). So, shouldn't min_perf_pct now become 47 percent (for my case)? Or, min should always be 47 and max should be 100 for turbo off and 112 for turbo on?

Issue 3:

Sometimes it is desirable to lock the CPU frequencies at some value, at least when it is active. Now, these settings work fine:

"performance"
/sys/devices/system/cpu/intel_pstate/max_perf_pct = 42
/sys/devices/system/cpu/intel_pstate/min_perf_pct = 42
/sys/devices/system/cpu/intel_pstate/no_turbo = 0 (or 1 with both of the above set to 47, but leaving them at 42 works also)

But some other settings give odd results, and also seem to depend on if the numbers are increasing or decreasing and if the CPU was unload or loaded before the change. This is best described by graphs of the experiments done. Attached.
Potentially of particular interest are the cases were the CPU frequency is always 1.768 GHz when sampled (5 seconds after the CPU becomes 100% loaded). For example:

"performance"
/sys/devices/system/cpu/intel_pstate/max_perf_pct = 69
/sys/devices/system/cpu/intel_pstate/min_perf_pct = 69
/sys/devices/system/cpu/intel_pstate/no_turbo = 0

Where we would expect something around 2.6 GHz. I also tested loading other CPUs at the same time, and could never get any other frequency (tested about 100 times, in addition to the 24 done to make the graphs)(Correction: a few times the CPU kicked up in frequency, but just after the sample was taken, and I think because of the action of either waking up to take the sample or whatever).

Sub-experiment: Take another frequency sample after another couple of seconds, to get an accurate count of how many times out of 100 it kicks up after the first sample: Results: 3 times out of 100.
Sub-sub-experiment: Does the CPU kick up because the controlling script came out of sleep, or is it because it reads /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_cur_freq? Result: Inconclusive (Will try to think of a better test and test more later, if required. Might require a larger sample size).

To be clear, it is not best effieciency wanted here, it is knowing that the active CPU frequency will always be the same. An example application is the test procedure to make sure that reported load averages are correct.

Issue 4:

It appears as though there is a slight rounding down effect in the reported frequencies. Wouldn't we at least expect turbostat to give the same numbers? Example:

doug@s15:~/temp$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
3468000
3468000
3468000
3468000
3468000
3468000
3468000
3468000

doug@s15:~/temp$ sudo ./turbostat
 cr CPU    %c0  GHz  TSC    %c1    %c3    %c6    %c7  %pc2  %pc3  %pc6  %pc7
         100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   0   0 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   0   4 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   1   1 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   1   5 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   2   2 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   2   6 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   3   3 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00
   3   7 100.00 3.51 3.41   0.00   0.00   0.00   0.00  0.00  0.00  0.00  0.00

Issue 5: (actually just an opinion, and at the risk of raising an issue that seems to have been brought up a lot)

Why wasn't what is now called "powersave" called "ondemand" (or "conservative" if preferred), as it is similar? Then "powersave" could have been as it was, namely:
"powersave"
/sys/devices/system/cpu/intel_pstate/max_perf_pct = 42
/sys/devices/system/cpu/intel_pstate/min_perf_pct = 42
/sys/devices/system/cpu/intel_pstate/no_turbo = 0 (or 1 with both of the above set to 47)
Comment 1 Doug Smythies 2013-06-08 23:39:06 UTC
Created attachment 103971 [details]
2 of 4 - CPU always under 100% load - sanity check test
Comment 2 Doug Smythies 2013-06-08 23:41:14 UTC
Created attachment 103981 [details]
3 of 4 - CPU unloaded between samples, sample 5 seconds after 100% load. Turbo off.
Comment 3 Doug Smythies 2013-06-08 23:42:23 UTC
Created attachment 103991 [details]
4 of 4 - CPU unloaded between samples. Turbo on.
Comment 4 Doug Smythies 2013-06-08 23:53:41 UTC
Created attachment 104001 [details]
1 of 4 - CPU always under 100% load between samples (correct graph)
Comment 5 Doug Smythies 2013-06-09 06:48:56 UTC
I always forget to restore my machine settings when I finish stuff. So after my posting earlier, it was left as:

"performance"
/sys/devices/system/cpu/intel_pstate/max_perf_pct = 69
/sys/devices/system/cpu/intel_pstate/min_perf_pct = 69
/sys/devices/system/cpu/intel_pstate/no_turbo = 0

No, problem, I'll just set it back to default. I don't know how, but it seems to "remember" the max_perf_pct = 69 setting. Example (notice how the 69% comes back after being auto set to 100%):

doug@s15:~/temp$ sudo ./set_cpu_performance
[sudo] password for doug:
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
performance
performance
performance
performance
performance
performance
performance
performance
doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/*
100
100
0
doug@s15:~/temp$ sudo ./set_cpu_powersave
performance
performance
performance
performance
performance
performance
performance
performance
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/*
69
42
0

Details of the two simple script calls:

doug@s15:~/temp$ cat set_cpu_performance
#! /bin/bash
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

doug@s15:~/temp$ cat set_cpu_powersave
#! /bin/bash
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Comment 6 Doug Smythies 2013-06-10 06:31:07 UTC
Updated kernel to:
doug@s15:~/temp$ uname -a
Linux s15 3.10.0-rc5 #2 SMP Sun Jun 9 11:08:41 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux

I didn't see anything in the change log to think anything would be different but tested anyhow:

Did 1000 tests of issue 1 from a fresh boot (powersave, max=100, min=42, turbo on):

do 1000 times{
  load cpu 7 to 100%
  sleep for 5 seconds
  read CPU 7 frequency
  sleep for 12 seconds << allows cpu loading program to terminate, under slow or fast clock speeds and also enough time for plenty of idle before next loop.
}

Results: freq = 1.786 GHz 283 times (28.3%); freq = 3.196 GHz once; freq = 3.57GHz 3 times (assume 2 other cores busy); freq = 3.672 GHz 66 times (assume 1 other core busy); freq = 3.706 GHz once; freq = 3.74 GHz twice; freq = 3.774 GHZ 643 times.
Comment 7 Dirk Brandewie 2013-06-10 17:51:43 UTC
What application are you using to load the cpu?

Could you attach the the script you are using for your test?

Please attach your kernel config.

In your tests did you explicitly set min/max to 69?  If not is was likely the thermal driver acpi_thermal will to this, 

When up select the performance scaling governor it equivalent to the performance governor intel_pstate will request the highest available p-state and never change it so if you care about power/heat it is not particularly useful IMHO.
Comment 8 Doug Smythies 2013-06-10 19:26:57 UTC
(In reply to comment #7)
> What application are you using to load the cpu?

Just a CPU time waster that I wrote myself. I will post it. There are two versions, one that runs forever, and prints stuff after N loops. I edit and re-compile to change N. The other one, exits after K loops, about 5.5 seconds if my CPU is at 3.8 GHZ and about 13 seconds if my CPU is at 1.6 GHz. I know by observing the times in the prints outs what frequency the CPU is running at, and if it kicks up.

> 
> Could you attach the the script you are using for your test?

Yes, I will. Note it has a mistake, but I didn't bother to fix it because: I didn't care; It actually provided info as to when the frequency sample was being taken (and, yes, there are more professional ways to get that info, i.e. prepare to be unimpressed).

> 
> Please attach your kernel config.

Will do.

> 
> In your tests did you explicitly set min/max to 69?  If not is was likely the
> thermal driver acpi_thermal will to this, 

I specifically set the min/max to 69. For all of my tests where min/max was varied, I specifically did it. As far as I know, I should never have incurred thermal throttling (and I was watching CPU temps, although not often).

> 
> When up select the performance scaling governor it equivalent to the
> performance governor intel_pstate will request the highest available p-state
> and never change it so if you care about power/heat it is not particularly
> useful IMHO.

I absolutely agree, except when I specifically want that scenario. For example, when I am testing the reported load averages are correct. The test takes about 61 hours and checks an array of conditions (sleep frequencies, number of simultaneous threads, cpu load per thread). The test can only work if I know the frequency any active CPU will run at. Because I don't care how fast, and to keep things cooler I usually run the test at minimum CPU frequency. I actually have not run that test for some time now. I went off exploring these scenarios for this bug report in hopes of finding more useful information towards the root issue.

By the way, test from last night: Turbo = off; max=min=74% (the center of the "dead spot" in attachment 3 [details] of 4. 1000 loops: Results: freq = 1.786 GHz 1000 times (100%);
Comment 9 Doug Smythies 2013-06-10 19:31:01 UTC
Created attachment 104211 [details]
Kernel Config
Comment 10 Doug Smythies 2013-06-10 19:33:28 UTC
Created attachment 104221 [details]
script for 1000 samples CPU
Comment 11 Doug Smythies 2013-06-10 19:36:33 UTC
Created attachment 104231 [details]
Script to ramp frequency up and down 12 times
Comment 12 Doug Smythies 2013-06-10 19:42:18 UTC
Created attachment 104241 [details]
forever CPU loading time waster
Comment 13 Doug Smythies 2013-06-10 19:43:23 UTC
Created attachment 104251 [details]
CPU loading time waster, but exits after a while
Comment 14 Dirk Brandewie 2013-06-13 18:00:58 UTC
For issue #1:

I have reproduced the issue and I am working on a fix since it triggers kind of a weird corner case in the driver.

can try the workaround to change the setpoint of the driver with

   echo 108 > /sys/kernel/debug/pstate_snb/setpoint

This is NOT a general fix but should unwedge your test case. it does on my test system YMMV.

For issue #2:

The values of min/max are percent of available performance and don't reflect turbo being on/off.  Disabling turbo changes the absolute value of what 100% of available is but is still accurate.  

For issue #3:
Still investigating

For issue #4:
I will look this with the maintainer of turbostat and see where the math fail is I claim mine is more accurate but we will see :-) 

For issue #5:
These names are chosen but the cpufreq core not by intel_pstate.  This comes from the type of scaling driver interface the driver exposes.
Comment 15 Doug Smythies 2013-06-13 21:32:00 UTC
(In reply to comment #14)
> For issue #1:
> I have reproduced the issue and I am working on a fix since it triggers kind
> > of a weird corner case in the driver.
> can try the workaround to change the setpoint of the driver with
>    echo 108 > /sys/kernel/debug/pstate_snb/setpoint

I had finished trying to acquire additional meaningful information via further tests, and had started to look at the code. I had already determined that the system is extremely sensitive to the "setpoint", and can easily be made unstable by small changes. I did not know about the cool debug way to change the "setpoint", and had been re-compiling to change it. ... Anyway, my test results:

Test 1: Control sample; Sample size 100; Turbo on; 30% had the issue.
Test 2; Control sample; Sample size 100; Turbo off; 22% had the issue.
Test 3; setpoint = 108; Sample size 100; Turbo on: 0% had the issue.
Test 4; setpoint = 108; Sample size 100; Turbo off: 0% had the issue.

> This is NOT a general fix but should unwedge your test case. it does on my
> test system YMMV.

Understood. I assume you are just wanting extra test cases, which is good and I am willing to provide such tests anytime.

> For issue #2: 
> The values of min/max are percent of available performance and don't reflect
> turbo being on/off.  Disabling turbo changes the absolute value of what 100%
> > of available is but is still accurate.

Yes, but then the min is not accurate. It stays at 42%, but should change to 47%
(using my computer as an example only).

> For issue #3:
> Still investigating

Me to, although the setpoint change seems to fix it also. (based on only 1 quick test of turbo off, and min=max=74%, that always (100.00%) ended up at the wrong frequency). I'll do much more complete testing, but it will take time.
Comment 16 Doug Smythies 2013-06-14 06:22:23 UTC
Created attachment 104711 [details]
re-test of the "3 of 4" attachment from above with setpoint = 108

The exact same test was run as was done for the "3 of 4" graph above.
Turbo was off. min=max was ramped from 47% to 100% and then back down from 100% to 47%. The ramp up and ramp down was repeated 12 times. For each operating point, CPU 7 was loaded to 100% then the frequency sampled 5 seconds later, then the controlling script slept for 12 seconds to allow the loading program to finish and to allow things to get to get a steady state idle with previous history forgotten.

Summary: The setpoint=108 temporary workaround also fixes this issue, as was thought from the brief test earlier today.
Comment 17 Dirk Brandewie 2013-06-26 18:04:35 UTC
Hi Doug,

Could you try the following patch and let me know what you find in your testing

commit fd0b60726700123aa7c68a0e7a6d075a242761d1
Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
Date:   Wed Jun 26 08:30:19 2013 -0700

    Change to scale off of max frequency
    
    Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d1fc492..51a2696 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -105,10 +105,10 @@ struct pstate_adjust_policy {
 static struct pstate_adjust_policy default_policy = {
 	.sample_rate_ms = 10,
 	.deadband = 0,
-	.setpoint = 109,
-	.p_gain_pct = 17,
+	.setpoint = 97,
+	.p_gain_pct = 20,
 	.d_gain_pct = 0,
-	.i_gain_pct = 4,
+	.i_gain_pct = 0,
 };
 
 struct perf_limits {
@@ -471,12 +471,12 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 static inline int intel_pstate_get_scaled_busy(struct cpudata *cpu)
 {
 	int32_t busy_scaled;
-	int32_t core_busy, turbo_pstate, current_pstate;
+	int32_t core_busy, max_pstate, current_pstate;
 
 	core_busy = int_tofp(cpu->samples[cpu->sample_ptr].core_pct_busy);
-	turbo_pstate = int_tofp(cpu->pstate.turbo_pstate);
+	max_pstate = int_tofp(cpu->pstate.max_pstate);
 	current_pstate = int_tofp(cpu->pstate.current_pstate);
-	busy_scaled = mul_fp(core_busy, div_fp(turbo_pstate, current_pstate));
+	busy_scaled = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
 
 	return fp_toint(busy_scaled);
 }
Comment 18 Doug Smythies 2013-06-26 21:42:05 UTC
Created attachment 106111 [details]
Sometimes fequency jumps fast, sometimes slower

O.K. It makes sense to me that the controller be just a proportional controller (i and d gains of 0, for PID).

So far, I have done issue #1 tests, at 100 samples each for turbo on and off. No issue. I'll start the issue #3 tests, now. I think I recall they take about 6 hours per test, so I'll report back later.

The attached graph (and I have many many similar) shows how sometimes (previous code) the frequency jumps up and actually overshoots and sometimes comes up slower over a few samples. It is just an observation, I haven't figured out why.

The other thing I was thinking of trying, after these tests are done, is more proportional gain, but slew rate limiting on the output. I.E.:

diff -u intel_pstate.c.original intel_pstate.c.doug02
--- intel_pstate.c.original     2013-06-11 16:37:53.408641517 -0700
+++ intel_pstate.c.doug02       2013-06-18 23:19:20.752768433 -0700
@@ -491,10 +491,14 @@
        ctl = pid_calc(pid, busy_scaled);

        steps = abs(ctl);
+/* Doug test               */
+/* force slew rate control */
        if (ctl < 0)
-               intel_pstate_pstate_increase(cpu, steps);
+//             intel_pstate_pstate_increase(cpu, steps);
+               intel_pstate_pstate_increase(cpu, 1);
        else
-               intel_pstate_pstate_decrease(cpu, steps);
+//             intel_pstate_pstate_decrease(cpu, steps);
+               intel_pstate_pstate_decrease(cpu, 1);
 }

 static void intel_pstate_timer_func(unsigned long __data)
Comment 19 Doug Smythies 2013-06-27 05:16:36 UTC
Created attachment 106121 [details]
re-test of the "3 of 4" attachment from above with code patch of today

Initially this graph looks the same as the re-test with setpoint = 108.
However, it has a couple of subtle, and unimportant, differences due to the scale change. For example look at the 60% requested frequency spot.

This was the turbo off test. The turbo on test (not done for the setpoint = 108 case) will be started shortly. Expect the resulting graph in about 10 hours.
Comment 20 Doug Smythies 2013-06-27 14:24:47 UTC
Created attachment 106141 [details]
re-test of the "4 of 4" attachment from above with code patch of yesterday

Look good.

There are a bunch of other tests that I have done, but never posted results because they didn't show anything abnormal. Various numbers of processes, loads, sleep frequencies, ...
Some of those tests I will re-do also.
Comment 21 Doug Smythies 2013-06-27 14:27:57 UTC
Created attachment 106151 [details]
intel_pstate verses old driver compare

This graph (done with the older code and setpoint =108) is just for readers possible interest.
Comment 22 Doug Smythies 2013-06-28 21:31:45 UTC
Created attachment 106241 [details]
A frequency (process sleep frequency) ramp test

The attached graph (actually several graphs in one attachment) is for information only. It shows CPU 7 frequency during a test where one process is running at what would be a load of 0.40 if the CPU frequency were locked at 1.6 GHz, which it isn't so the actual load is less. The frequency of the CPU (#7) working verses sleeping is ramped from 1 hertz to 500 hertz, with a 2 second gap between each frequency step. CPU 0 was used to sample the CPU 7 frequency. I was just looking for any odd behavior.
Comment 23 Doug Smythies 2013-06-28 21:43:02 UTC
Dirk,
I struggle to understand some things about the code, and I wonder if I could ask some questions:

For: intel_pstate_timer_func : It seems to kick up to max p-state if there have been 5 continuous min pstates. Why?

For: intel_pstate_sample : There seems to be some sample array, but I cann't see where it is ever used. i.e. It is not clear to me why an array is is used there. Is there some reason?

For intel_pstate_get : The function is never called from anything anywhere in the entire kernel. Why is that function there?
Comment 24 Dirk Brandewie 2013-07-09 15:57:50 UTC
(In reply to Doug Smythies from comment #23)
> Dirk,
> I struggle to understand some things about the code, and I wonder if I could
> ask some questions:
> 
> For: intel_pstate_timer_func : It seems to kick up to max p-state if there
> have been 5 continuous min pstates. Why?

This is there to fix issues with some workloads where you have two processes
that scale their work based on the progress of the other process and ping pong
between each other much faster than our sample time.  This makes it look like the 
processor is not very busy because the process is limiting itself based on the 
progress of the other.  If the core is really idle most of the time or the load 
is really low this doesn't hurt since if it is idle the effective frequency is
zero and if the load is low we will fix the frequency on the next sample.
 
> 
> For: intel_pstate_sample : There seems to be some sample array, but I cann't
> see where it is ever used. i.e. It is not clear to me why an array is is
> used there. Is there some reason?

Historical ATM from earlier implementations/experiments it is on my list to
clean it up.

> 
> For intel_pstate_get : The function is never called from anything anywhere
> in the entire kernel. Why is that function there?

This function is called by the cpufreq core to get the current frequency.
Comment 25 Doug Smythies 2013-07-10 23:50:48 UTC
Created attachment 106869 [details]
A detailed frequency area and 3 methods of pstate driver

Thanks for the explanations of the code. I might try to repeat the ffmpeg test for my own interest.

The attachment is a more detailed frequency (process sleep frequency) ramp test, focused on some conditions where the internal governor reacts to CPU idle times and the intel_pstate driver seems to also react.

The attached graph (actually several graphs in one attachment) compares three different methods for the pstate driver: The driver as it was in kernel 3.10RC7, but setpoint=108; The driver with Dirk's patch; The driver with Dirk's patch and output slew rate control added (as per comment 18), and the gain set to 30 (a number picked without any thought). The grapahs show CPU 7 frequency during a test where one process is running at what would be a load of 0.40 if the CPU frequency were locked at 1.6 GHz, which it isn't so the actual load is less. The frequency of the CPU (#7) working verses sleeping is ramped from 50 hertz to 53 hertz, in steps of 0.1 Hz, with a 2 second gap between each frequency step. CPU 0 was used to sample the CPU 7 frequency. The first 3 graphs are just an overview showing things getting "noisey" when the CPU idle times alias and overlap the pstate driver sample times (I think). The next 3 graphs are zoomed in details. I am suggesting that the pstate driver can over react to feedback from the internal governor, and that output slew rate control might help. Since the effect seems very asymetric, it might be worth considering asymetric slew rate control also (i.e. only implement slew rate control for the falling pstate direction).
Comment 26 Doug Smythies 2013-07-12 03:28:19 UTC
Created attachment 106875 [details]
Phoronix ffmpeg test results with various kernels and configs

I have tried to re-create the ping-pong issue by making kernels without the "kick" code and using the Phoronix ffmpeg test. I have not been able to re-create it. The only problem observed is the same problem as already covered in this bug report. Is there something I am missing in how to re-create the ping-pong failure scenario?

Tests were done, and results are in the attachment, for these conditions:

1.) Kernel 3.10 with no changes: Reference sample 1 of 3: driver = acpi-cpufreq and mode = ondemand
2.) Kernel 3.10 with no changes: Reference sample 2 of 3: driver = acpi-cpufreq and mode = performance
3.) Kernel 3.10 with no changes: Reference sample 3 of 3: driver = acpi-cpufreq and mode = powersave
4.) Kernel 3.10 with no changes:
5.) Kernel 3.10 with setpoint 108:
6.) Kernel 3.10 with "kick" removed
7.) Kernel 3.10 with "kick" removed and setpoint 108:
8.) Kernel 3.10 with Dirk patch (from comment 17):
9.) Kernel 3.10 with Dirk patch (from comment 17) with "kick" removed:
10.) Kernel 3.10 with Dirk patch (from comment 17) and output slew rate limited:
11.) Kernel 3.10 with Dirk patch (from comment 17) with "kick" removed and output slew rate limited:
12.) Much longer test (500 loops): Kernel 3.10 with "kick" removed. (Result = 7.6% frequency lockup.)
Comment 27 Doug Smythies 2013-07-12 23:14:42 UTC
Addendum to comment 26: An additional test was run:

13.) Much longer test (500 loops): Kernel 3.10 with Dirk patch (from comment 17) with "kick" removed:

Results: No problems.

I have searched and searched and have not been able to find any test results or bug reports or discussions or whatever about this "ping-pong" problem. I have only been able to find the comments related to the code patch submission that it was needed. My purpose in attempting to re-create the issue, was to test if the solution might no longer be needed with the changes discussed in this thread. However, since I have been unable to re-create the problem, I am unable to proceed as I have no control sample.
Comment 28 Dirk Brandewie 2013-07-18 15:04:23 UTC
(In reply to Doug Smythies from comment #27)
> Addendum to comment 26: An additional test was run:
> 
> 13.) Much longer test (500 loops): Kernel 3.10 with Dirk patch (from comment
> 17) with "kick" removed:
> 
> Results: No problems.
> 
> I have searched and searched and have not been able to find any test results
> or bug reports or discussions or whatever about this "ping-pong" problem. I
> have only been able to find the comments related to the code patch
> submission that it was needed. My purpose in attempting to re-create the
> issue, was to test if the solution might no longer be needed with the
> changes discussed in this thread. However, since I have been unable to
> re-create the problem, I am unable to proceed as I have no control sample.

Sorry for slow response I have been out for a few days.

There were no filed bug reports since I found it in my testing.  I will go back and look it the "kick" is still needed.

I have been thinking about slew rate control both on the up and down direction.
On the up direction it can save some power on the down can save some performance. These would need to be a tunable in debugfs since there is no hope IMHO that I could come up with settings that would fit every set of performance/power requirements.

I will submit my patch upstream today.  Feel free to send me email if you would like to continue talking about the slew rate control.  Thank you for all the time and effort you have spent on this issue.
Comment 29 Guilherme Salazar 2013-08-01 19:39:38 UTC
Hi,

I've just tested the patch and, even though it works (the frequency now sits around 1.3-1.5 GHz), I noticed that if I suspend the laptop and then resume it, the abnormal behavior (frequency sitting above 2 GHz) returns.

Before suspending, the value in /sys/devices/system/cpu/intel_pstate/min_perf_pct was 37; afterwards, 100.

cpuinfo: http://pastebin.com/C6zR90HC
config: http://pastebin.com/XqLXn9Y6

I'd be glad to give any further details.

Best,
Guilherme
Comment 30 Doug Smythies 2013-11-02 19:03:30 UTC
(In reply to Doug Smythies from comment #19)
> Created attachment 106121 [details]
> re-test of the "3 of 4" attachment from above with code patch of today
> 
> Initially this graph looks the same as the re-test with setpoint = 108.
> However, it has a couple of subtle, and unimportant, differences due to the
> scale change. For example look at the 60% requested frequency spot.
> 
> This was the turbo off test. The turbo on test (not done for the setpoint =
> 108 case) will be started shortly. Expect the resulting graph in about 10
> hours.

In Kernel 3.12RC7 there is a patch by Brennan Shacklett "cpufreq: intel_pstate: Improve accuracy by not truncating until final result", which eliminates the slight remaining "jitter" seen on the attachment. Now the requested frequency ramp up and ramp down traces exactly and repeatedly.
Comment 31 Len Brown 2014-10-28 05:34:55 UTC
d253d2a52676cfa3d89b8f0737a08ce7db665207
Author: Brennan Shacklett <brennan@genyes.org>  2013-10-21 12:20:32
Follows: v3.12-rc6
Precedes: v3.12-rc7
intel_pstate: Improve accuracy by not truncating until final result

closed.