Bug 59481
Created attachment 103971 [details]
2 of 4 - CPU always under 100% load - sanity check test
Created attachment 103981 [details]
3 of 4 - CPU unloaded between samples, sample 5 seconds after 100% load. Turbo off.
Created attachment 103991 [details]
4 of 4 - CPU unloaded between samples. Turbo on.
Created attachment 104001 [details]
1 of 4 - CPU always under 100% load between samples (correct graph)
I always forget to restore my machine settings when I finish stuff. So after my posting earlier, it was left as: "performance" /sys/devices/system/cpu/intel_pstate/max_perf_pct = 69 /sys/devices/system/cpu/intel_pstate/min_perf_pct = 69 /sys/devices/system/cpu/intel_pstate/no_turbo = 0 No, problem, I'll just set it back to default. I don't know how, but it seems to "remember" the max_perf_pct = 69 setting. Example (notice how the 69% comes back after being auto set to 100%): doug@s15:~/temp$ sudo ./set_cpu_performance [sudo] password for doug: powersave powersave powersave powersave powersave powersave powersave powersave performance performance performance performance performance performance performance performance doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/* 100 100 0 doug@s15:~/temp$ sudo ./set_cpu_powersave performance performance performance performance performance performance performance performance powersave powersave powersave powersave powersave powersave powersave powersave doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/* 69 42 0 Details of the two simple script calls: doug@s15:~/temp$ cat set_cpu_performance #! /bin/bash cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor doug@s15:~/temp$ cat set_cpu_powersave #! /bin/bash cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor Updated kernel to: doug@s15:~/temp$ uname -a Linux s15 3.10.0-rc5 #2 SMP Sun Jun 9 11:08:41 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux I didn't see anything in the change log to think anything would be different but tested anyhow: Did 1000 tests of issue 1 from a fresh boot (powersave, max=100, min=42, turbo on): do 1000 times{ load cpu 7 to 100% sleep for 5 seconds read CPU 7 frequency sleep for 12 seconds << allows cpu loading program to terminate, under slow or fast clock speeds and also enough time for plenty of idle before next loop. } Results: freq = 1.786 GHz 283 times (28.3%); freq = 3.196 GHz once; freq = 3.57GHz 3 times (assume 2 other cores busy); freq = 3.672 GHz 66 times (assume 1 other core busy); freq = 3.706 GHz once; freq = 3.74 GHz twice; freq = 3.774 GHZ 643 times. What application are you using to load the cpu? Could you attach the the script you are using for your test? Please attach your kernel config. In your tests did you explicitly set min/max to 69? If not is was likely the thermal driver acpi_thermal will to this, When up select the performance scaling governor it equivalent to the performance governor intel_pstate will request the highest available p-state and never change it so if you care about power/heat it is not particularly useful IMHO. (In reply to comment #7) > What application are you using to load the cpu? Just a CPU time waster that I wrote myself. I will post it. There are two versions, one that runs forever, and prints stuff after N loops. I edit and re-compile to change N. The other one, exits after K loops, about 5.5 seconds if my CPU is at 3.8 GHZ and about 13 seconds if my CPU is at 1.6 GHz. I know by observing the times in the prints outs what frequency the CPU is running at, and if it kicks up. > > Could you attach the the script you are using for your test? Yes, I will. Note it has a mistake, but I didn't bother to fix it because: I didn't care; It actually provided info as to when the frequency sample was being taken (and, yes, there are more professional ways to get that info, i.e. prepare to be unimpressed). > > Please attach your kernel config. Will do. > > In your tests did you explicitly set min/max to 69? If not is was likely the > thermal driver acpi_thermal will to this, I specifically set the min/max to 69. For all of my tests where min/max was varied, I specifically did it. As far as I know, I should never have incurred thermal throttling (and I was watching CPU temps, although not often). > > When up select the performance scaling governor it equivalent to the > performance governor intel_pstate will request the highest available p-state > and never change it so if you care about power/heat it is not particularly > useful IMHO. I absolutely agree, except when I specifically want that scenario. For example, when I am testing the reported load averages are correct. The test takes about 61 hours and checks an array of conditions (sleep frequencies, number of simultaneous threads, cpu load per thread). The test can only work if I know the frequency any active CPU will run at. Because I don't care how fast, and to keep things cooler I usually run the test at minimum CPU frequency. I actually have not run that test for some time now. I went off exploring these scenarios for this bug report in hopes of finding more useful information towards the root issue. By the way, test from last night: Turbo = off; max=min=74% (the center of the "dead spot" in attachment 3 [details] of 4. 1000 loops: Results: freq = 1.786 GHz 1000 times (100%); Created attachment 104211 [details]
Kernel Config
Created attachment 104221 [details]
script for 1000 samples CPU
Created attachment 104231 [details]
Script to ramp frequency up and down 12 times
Created attachment 104241 [details]
forever CPU loading time waster
Created attachment 104251 [details]
CPU loading time waster, but exits after a while
For issue #1: I have reproduced the issue and I am working on a fix since it triggers kind of a weird corner case in the driver. can try the workaround to change the setpoint of the driver with echo 108 > /sys/kernel/debug/pstate_snb/setpoint This is NOT a general fix but should unwedge your test case. it does on my test system YMMV. For issue #2: The values of min/max are percent of available performance and don't reflect turbo being on/off. Disabling turbo changes the absolute value of what 100% of available is but is still accurate. For issue #3: Still investigating For issue #4: I will look this with the maintainer of turbostat and see where the math fail is I claim mine is more accurate but we will see :-) For issue #5: These names are chosen but the cpufreq core not by intel_pstate. This comes from the type of scaling driver interface the driver exposes. (In reply to comment #14) > For issue #1: > I have reproduced the issue and I am working on a fix since it triggers kind > > of a weird corner case in the driver. > can try the workaround to change the setpoint of the driver with > echo 108 > /sys/kernel/debug/pstate_snb/setpoint I had finished trying to acquire additional meaningful information via further tests, and had started to look at the code. I had already determined that the system is extremely sensitive to the "setpoint", and can easily be made unstable by small changes. I did not know about the cool debug way to change the "setpoint", and had been re-compiling to change it. ... Anyway, my test results: Test 1: Control sample; Sample size 100; Turbo on; 30% had the issue. Test 2; Control sample; Sample size 100; Turbo off; 22% had the issue. Test 3; setpoint = 108; Sample size 100; Turbo on: 0% had the issue. Test 4; setpoint = 108; Sample size 100; Turbo off: 0% had the issue. > This is NOT a general fix but should unwedge your test case. it does on my > test system YMMV. Understood. I assume you are just wanting extra test cases, which is good and I am willing to provide such tests anytime. > For issue #2: > The values of min/max are percent of available performance and don't reflect > turbo being on/off. Disabling turbo changes the absolute value of what 100% > > of available is but is still accurate. Yes, but then the min is not accurate. It stays at 42%, but should change to 47% (using my computer as an example only). > For issue #3: > Still investigating Me to, although the setpoint change seems to fix it also. (based on only 1 quick test of turbo off, and min=max=74%, that always (100.00%) ended up at the wrong frequency). I'll do much more complete testing, but it will take time. Created attachment 104711 [details]
re-test of the "3 of 4" attachment from above with setpoint = 108
The exact same test was run as was done for the "3 of 4" graph above.
Turbo was off. min=max was ramped from 47% to 100% and then back down from 100% to 47%. The ramp up and ramp down was repeated 12 times. For each operating point, CPU 7 was loaded to 100% then the frequency sampled 5 seconds later, then the controlling script slept for 12 seconds to allow the loading program to finish and to allow things to get to get a steady state idle with previous history forgotten.
Summary: The setpoint=108 temporary workaround also fixes this issue, as was thought from the brief test earlier today.
Hi Doug, Could you try the following patch and let me know what you find in your testing commit fd0b60726700123aa7c68a0e7a6d075a242761d1 Author: Dirk Brandewie <dirk.j.brandewie@intel.com> Date: Wed Jun 26 08:30:19 2013 -0700 Change to scale off of max frequency Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index d1fc492..51a2696 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -105,10 +105,10 @@ struct pstate_adjust_policy { static struct pstate_adjust_policy default_policy = { .sample_rate_ms = 10, .deadband = 0, - .setpoint = 109, - .p_gain_pct = 17, + .setpoint = 97, + .p_gain_pct = 20, .d_gain_pct = 0, - .i_gain_pct = 4, + .i_gain_pct = 0, }; struct perf_limits { @@ -471,12 +471,12 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu) static inline int intel_pstate_get_scaled_busy(struct cpudata *cpu) { int32_t busy_scaled; - int32_t core_busy, turbo_pstate, current_pstate; + int32_t core_busy, max_pstate, current_pstate; core_busy = int_tofp(cpu->samples[cpu->sample_ptr].core_pct_busy); - turbo_pstate = int_tofp(cpu->pstate.turbo_pstate); + max_pstate = int_tofp(cpu->pstate.max_pstate); current_pstate = int_tofp(cpu->pstate.current_pstate); - busy_scaled = mul_fp(core_busy, div_fp(turbo_pstate, current_pstate)); + busy_scaled = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); return fp_toint(busy_scaled); } Created attachment 106111 [details]
Sometimes fequency jumps fast, sometimes slower
O.K. It makes sense to me that the controller be just a proportional controller (i and d gains of 0, for PID).
So far, I have done issue #1 tests, at 100 samples each for turbo on and off. No issue. I'll start the issue #3 tests, now. I think I recall they take about 6 hours per test, so I'll report back later.
The attached graph (and I have many many similar) shows how sometimes (previous code) the frequency jumps up and actually overshoots and sometimes comes up slower over a few samples. It is just an observation, I haven't figured out why.
The other thing I was thinking of trying, after these tests are done, is more proportional gain, but slew rate limiting on the output. I.E.:
diff -u intel_pstate.c.original intel_pstate.c.doug02
--- intel_pstate.c.original 2013-06-11 16:37:53.408641517 -0700
+++ intel_pstate.c.doug02 2013-06-18 23:19:20.752768433 -0700
@@ -491,10 +491,14 @@
ctl = pid_calc(pid, busy_scaled);
steps = abs(ctl);
+/* Doug test */
+/* force slew rate control */
if (ctl < 0)
- intel_pstate_pstate_increase(cpu, steps);
+// intel_pstate_pstate_increase(cpu, steps);
+ intel_pstate_pstate_increase(cpu, 1);
else
- intel_pstate_pstate_decrease(cpu, steps);
+// intel_pstate_pstate_decrease(cpu, steps);
+ intel_pstate_pstate_decrease(cpu, 1);
}
static void intel_pstate_timer_func(unsigned long __data)
Created attachment 106121 [details]
re-test of the "3 of 4" attachment from above with code patch of today
Initially this graph looks the same as the re-test with setpoint = 108.
However, it has a couple of subtle, and unimportant, differences due to the scale change. For example look at the 60% requested frequency spot.
This was the turbo off test. The turbo on test (not done for the setpoint = 108 case) will be started shortly. Expect the resulting graph in about 10 hours.
Created attachment 106141 [details]
re-test of the "4 of 4" attachment from above with code patch of yesterday
Look good.
There are a bunch of other tests that I have done, but never posted results because they didn't show anything abnormal. Various numbers of processes, loads, sleep frequencies, ...
Some of those tests I will re-do also.
Created attachment 106151 [details]
intel_pstate verses old driver compare
This graph (done with the older code and setpoint =108) is just for readers possible interest.
Created attachment 106241 [details]
A frequency (process sleep frequency) ramp test
The attached graph (actually several graphs in one attachment) is for information only. It shows CPU 7 frequency during a test where one process is running at what would be a load of 0.40 if the CPU frequency were locked at 1.6 GHz, which it isn't so the actual load is less. The frequency of the CPU (#7) working verses sleeping is ramped from 1 hertz to 500 hertz, with a 2 second gap between each frequency step. CPU 0 was used to sample the CPU 7 frequency. I was just looking for any odd behavior.
Dirk, I struggle to understand some things about the code, and I wonder if I could ask some questions: For: intel_pstate_timer_func : It seems to kick up to max p-state if there have been 5 continuous min pstates. Why? For: intel_pstate_sample : There seems to be some sample array, but I cann't see where it is ever used. i.e. It is not clear to me why an array is is used there. Is there some reason? For intel_pstate_get : The function is never called from anything anywhere in the entire kernel. Why is that function there? (In reply to Doug Smythies from comment #23) > Dirk, > I struggle to understand some things about the code, and I wonder if I could > ask some questions: > > For: intel_pstate_timer_func : It seems to kick up to max p-state if there > have been 5 continuous min pstates. Why? This is there to fix issues with some workloads where you have two processes that scale their work based on the progress of the other process and ping pong between each other much faster than our sample time. This makes it look like the processor is not very busy because the process is limiting itself based on the progress of the other. If the core is really idle most of the time or the load is really low this doesn't hurt since if it is idle the effective frequency is zero and if the load is low we will fix the frequency on the next sample. > > For: intel_pstate_sample : There seems to be some sample array, but I cann't > see where it is ever used. i.e. It is not clear to me why an array is is > used there. Is there some reason? Historical ATM from earlier implementations/experiments it is on my list to clean it up. > > For intel_pstate_get : The function is never called from anything anywhere > in the entire kernel. Why is that function there? This function is called by the cpufreq core to get the current frequency. Created attachment 106869 [details] A detailed frequency area and 3 methods of pstate driver Thanks for the explanations of the code. I might try to repeat the ffmpeg test for my own interest. The attachment is a more detailed frequency (process sleep frequency) ramp test, focused on some conditions where the internal governor reacts to CPU idle times and the intel_pstate driver seems to also react. The attached graph (actually several graphs in one attachment) compares three different methods for the pstate driver: The driver as it was in kernel 3.10RC7, but setpoint=108; The driver with Dirk's patch; The driver with Dirk's patch and output slew rate control added (as per comment 18), and the gain set to 30 (a number picked without any thought). The grapahs show CPU 7 frequency during a test where one process is running at what would be a load of 0.40 if the CPU frequency were locked at 1.6 GHz, which it isn't so the actual load is less. The frequency of the CPU (#7) working verses sleeping is ramped from 50 hertz to 53 hertz, in steps of 0.1 Hz, with a 2 second gap between each frequency step. CPU 0 was used to sample the CPU 7 frequency. The first 3 graphs are just an overview showing things getting "noisey" when the CPU idle times alias and overlap the pstate driver sample times (I think). The next 3 graphs are zoomed in details. I am suggesting that the pstate driver can over react to feedback from the internal governor, and that output slew rate control might help. Since the effect seems very asymetric, it might be worth considering asymetric slew rate control also (i.e. only implement slew rate control for the falling pstate direction). Created attachment 106875 [details] Phoronix ffmpeg test results with various kernels and configs I have tried to re-create the ping-pong issue by making kernels without the "kick" code and using the Phoronix ffmpeg test. I have not been able to re-create it. The only problem observed is the same problem as already covered in this bug report. Is there something I am missing in how to re-create the ping-pong failure scenario? Tests were done, and results are in the attachment, for these conditions: 1.) Kernel 3.10 with no changes: Reference sample 1 of 3: driver = acpi-cpufreq and mode = ondemand 2.) Kernel 3.10 with no changes: Reference sample 2 of 3: driver = acpi-cpufreq and mode = performance 3.) Kernel 3.10 with no changes: Reference sample 3 of 3: driver = acpi-cpufreq and mode = powersave 4.) Kernel 3.10 with no changes: 5.) Kernel 3.10 with setpoint 108: 6.) Kernel 3.10 with "kick" removed 7.) Kernel 3.10 with "kick" removed and setpoint 108: 8.) Kernel 3.10 with Dirk patch (from comment 17): 9.) Kernel 3.10 with Dirk patch (from comment 17) with "kick" removed: 10.) Kernel 3.10 with Dirk patch (from comment 17) and output slew rate limited: 11.) Kernel 3.10 with Dirk patch (from comment 17) with "kick" removed and output slew rate limited: 12.) Much longer test (500 loops): Kernel 3.10 with "kick" removed. (Result = 7.6% frequency lockup.) Addendum to comment 26: An additional test was run: 13.) Much longer test (500 loops): Kernel 3.10 with Dirk patch (from comment 17) with "kick" removed: Results: No problems. I have searched and searched and have not been able to find any test results or bug reports or discussions or whatever about this "ping-pong" problem. I have only been able to find the comments related to the code patch submission that it was needed. My purpose in attempting to re-create the issue, was to test if the solution might no longer be needed with the changes discussed in this thread. However, since I have been unable to re-create the problem, I am unable to proceed as I have no control sample. (In reply to Doug Smythies from comment #27) > Addendum to comment 26: An additional test was run: > > 13.) Much longer test (500 loops): Kernel 3.10 with Dirk patch (from comment > 17) with "kick" removed: > > Results: No problems. > > I have searched and searched and have not been able to find any test results > or bug reports or discussions or whatever about this "ping-pong" problem. I > have only been able to find the comments related to the code patch > submission that it was needed. My purpose in attempting to re-create the > issue, was to test if the solution might no longer be needed with the > changes discussed in this thread. However, since I have been unable to > re-create the problem, I am unable to proceed as I have no control sample. Sorry for slow response I have been out for a few days. There were no filed bug reports since I found it in my testing. I will go back and look it the "kick" is still needed. I have been thinking about slew rate control both on the up and down direction. On the up direction it can save some power on the down can save some performance. These would need to be a tunable in debugfs since there is no hope IMHO that I could come up with settings that would fit every set of performance/power requirements. I will submit my patch upstream today. Feel free to send me email if you would like to continue talking about the slew rate control. Thank you for all the time and effort you have spent on this issue. Hi, I've just tested the patch and, even though it works (the frequency now sits around 1.3-1.5 GHz), I noticed that if I suspend the laptop and then resume it, the abnormal behavior (frequency sitting above 2 GHz) returns. Before suspending, the value in /sys/devices/system/cpu/intel_pstate/min_perf_pct was 37; afterwards, 100. cpuinfo: http://pastebin.com/C6zR90HC config: http://pastebin.com/XqLXn9Y6 I'd be glad to give any further details. Best, Guilherme (In reply to Doug Smythies from comment #19) > Created attachment 106121 [details] > re-test of the "3 of 4" attachment from above with code patch of today > > Initially this graph looks the same as the re-test with setpoint = 108. > However, it has a couple of subtle, and unimportant, differences due to the > scale change. For example look at the 60% requested frequency spot. > > This was the turbo off test. The turbo on test (not done for the setpoint = > 108 case) will be started shortly. Expect the resulting graph in about 10 > hours. In Kernel 3.12RC7 there is a patch by Brennan Shacklett "cpufreq: intel_pstate: Improve accuracy by not truncating until final result", which eliminates the slight remaining "jitter" seen on the attachment. Now the requested frequency ramp up and ramp down traces exactly and repeatedly. d253d2a52676cfa3d89b8f0737a08ce7db665207 Author: Brennan Shacklett <brennan@genyes.org> 2013-10-21 12:20:32 Follows: v3.12-rc6 Precedes: v3.12-rc7 intel_pstate: Improve accuracy by not truncating until final result closed. |
Created attachment 103961 [details] 1 of 4 - CPU always under 100% load between samples Kernel (from: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git) doug@s15:~/temp$ uname -a Linux s15 3.10.0-rc4+ #1 SMP Sat Jun 8 08:48:15 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux However, much of my testing work was done on a distro specific version of 3.10rc4, and then I confirmed the same results on the above kernel. Processor: vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz stepping : 7 microcode : 0x26 maximum frequency, turbo on = 3.8 GHz = 100% maximum frequency, turbo off = 2.4 GHz = 100% (when turbo disabled, apparently) minimum frequency = 1.6 GHz = 42% of max, turbo on or 47% of max, turbo off (see comments about min freq later) Note 1: Thermal throttling is not the issue in any of these tests. Note 2: I realize that can use intel_pstate=disable to be able to do some things mentioned herein, but I am just trying to help with this driver. Note 3: I have read everything I could find about this new driver, but that doesn't mean I didn't miss something important. Issue 1 (the main issue): pstate driver conditions: "powersave" /sys/devices/system/cpu/intel_pstate/max_perf_pct = 100 /sys/devices/system/cpu/intel_pstate/min_perf_pct = 42 /sys/devices/system/cpu/intel_pstate/no_turbo = 0 or 1, it does not matter Note: problem does not occur under these conditions: "performance" /sys/devices/system/cpu/intel_pstate/max_perf_pct = 100 /sys/devices/system/cpu/intel_pstate/min_perf_pct = 100 /sys/devices/system/cpu/intel_pstate/no_turbo = 0 (1 not tested) Sometimes (approximately 20% of the time, sample size 1000 tests for each pstate driver conditions listed above), when a CPU intensive single thread task is started, the CPU only ramps up to 1.768 GHz instead of the maximum frequency. When this occurs, more load can be added and added until all CPUs are fully loaded and the CPU frequency will never increase. Example (issue occurring): doug@s15:~/temp$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq 1768000 1768000 1768000 1768000 1768000 1768000 1768000 1768000 top - 09:56:37 up 51 min, 3 users, load average: 8.00, 7.99, 6.84 Tasks: 154 total, 9 running, 145 sleeping, 0 stopped, 0 zombie Cpu0 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Example (issue not occurring, single CPU (7) loaded): doug@s15:~/temp$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq 3672000 3672000 3638000 3774000 3672000 3502000 3604000 3774000 <<<< The active fully loaded CPU top - 10:00:22 up 55 min, 3 users, load average: 2.75, 6.29, 6.49 Tasks: 147 total, 2 running, 145 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Issue 2: When turbo is turned off, it seems that max_perf_pct = 100 now means 3.4 GHz (see test result graphs). So, shouldn't min_perf_pct now become 47 percent (for my case)? Or, min should always be 47 and max should be 100 for turbo off and 112 for turbo on? Issue 3: Sometimes it is desirable to lock the CPU frequencies at some value, at least when it is active. Now, these settings work fine: "performance" /sys/devices/system/cpu/intel_pstate/max_perf_pct = 42 /sys/devices/system/cpu/intel_pstate/min_perf_pct = 42 /sys/devices/system/cpu/intel_pstate/no_turbo = 0 (or 1 with both of the above set to 47, but leaving them at 42 works also) But some other settings give odd results, and also seem to depend on if the numbers are increasing or decreasing and if the CPU was unload or loaded before the change. This is best described by graphs of the experiments done. Attached. Potentially of particular interest are the cases were the CPU frequency is always 1.768 GHz when sampled (5 seconds after the CPU becomes 100% loaded). For example: "performance" /sys/devices/system/cpu/intel_pstate/max_perf_pct = 69 /sys/devices/system/cpu/intel_pstate/min_perf_pct = 69 /sys/devices/system/cpu/intel_pstate/no_turbo = 0 Where we would expect something around 2.6 GHz. I also tested loading other CPUs at the same time, and could never get any other frequency (tested about 100 times, in addition to the 24 done to make the graphs)(Correction: a few times the CPU kicked up in frequency, but just after the sample was taken, and I think because of the action of either waking up to take the sample or whatever). Sub-experiment: Take another frequency sample after another couple of seconds, to get an accurate count of how many times out of 100 it kicks up after the first sample: Results: 3 times out of 100. Sub-sub-experiment: Does the CPU kick up because the controlling script came out of sleep, or is it because it reads /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_cur_freq? Result: Inconclusive (Will try to think of a better test and test more later, if required. Might require a larger sample size). To be clear, it is not best effieciency wanted here, it is knowing that the active CPU frequency will always be the same. An example application is the test procedure to make sure that reported load averages are correct. Issue 4: It appears as though there is a slight rounding down effect in the reported frequencies. Wouldn't we at least expect turbostat to give the same numbers? Example: doug@s15:~/temp$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq 3468000 3468000 3468000 3468000 3468000 3468000 3468000 3468000 doug@s15:~/temp$ sudo ./turbostat cr CPU %c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 4 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1 1 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1 5 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2 2 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2 6 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3 3 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3 7 100.00 3.51 3.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Issue 5: (actually just an opinion, and at the risk of raising an issue that seems to have been brought up a lot) Why wasn't what is now called "powersave" called "ondemand" (or "conservative" if preferred), as it is similar? Then "powersave" could have been as it was, namely: "powersave" /sys/devices/system/cpu/intel_pstate/max_perf_pct = 42 /sys/devices/system/cpu/intel_pstate/min_perf_pct = 42 /sys/devices/system/cpu/intel_pstate/no_turbo = 0 (or 1 with both of the above set to 47)