Bug 64261 - Intel Pstate driver truncates to pstate instead of rounding to nearest pstate
Summary: Intel Pstate driver truncates to pstate instead of rounding to nearest pstate
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Dirk Brandewie
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-02 21:52 UTC by Doug Smythies
Modified: 2015-07-21 19:05 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.12rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
shows rounded pstate and current actual for frequency Vs. requested. (34.67 KB, image/png)
2013-11-02 21:52 UTC, Doug Smythies
Details
Shows rounded pstate and current actual frequency Vs. requested - Turbo on (32.96 KB, image/png)
2013-11-04 07:17 UTC, Doug Smythies
Details
Test done with min=max percent and min left at 42 percent (32.01 KB, image/png)
2013-11-05 15:06 UTC, Doug Smythies
Details
CPU 7 frequency Vs. requested percent with rounding. Turbo off. (30.66 KB, image/png)
2013-11-06 05:34 UTC, Doug Smythies
Details
CPU 7 frequency vs load. Turbo off. With and without rounding. (32.12 KB, image/png)
2013-11-06 05:38 UTC, Doug Smythies
Details
CPU 7 frequency vs load. Turbo on. With and without rounding. (35.86 KB, image/png)
2013-11-06 05:43 UTC, Doug Smythies
Details
CPU 7 frequency vs load. Turbo on. With and without rounding. 2 (34.13 KB, image/png)
2013-11-06 05:48 UTC, Doug Smythies
Details
The rounding code. (988 bytes, text/plain)
2013-11-06 06:04 UTC, Doug Smythies
Details
Shows rounded pstate and actual for freq. Vs. Requested. Kernel 3.15rc1 (37.76 KB, image/png)
2014-04-22 00:20 UTC, Doug Smythies
Details
CPU 7 frequency vs load. Turbo on. Kernels 3.15RC2 and 3.12 (31.98 KB, image/png)
2014-04-22 00:30 UTC, Doug Smythies
Details
Sleep / load frequency sweep from 2 to 250 Hertz. Kernel 3.12 and 3.15RC2 (31.28 KB, image/png)
2014-04-22 00:38 UTC, Doug Smythies
Details

Description Doug Smythies 2013-11-02 21:52:12 UTC
Created attachment 113151 [details]
shows rounded pstate and current actual for frequency Vs. requested.

The Intel Pstate driver seems to truncate its calculations to the lower integer pstate. The suggestion is that it should round to the nearest pstate. Recent (kernel 3.12RC7) math improvements have made achieving 100% frequency better, but rounding would make it more robust.

The attachment demonstrates.
Comment 1 Doug Smythies 2013-11-04 07:17:18 UTC
Created attachment 113231 [details]
Shows rounded pstate and current actual frequency Vs. requested - Turbo on

Just adding a turbo on version of the same attachment provided in the original posting, which was with turbo off.
Comment 2 Dirk Brandewie 2013-11-04 16:14:42 UTC
Can you attach the script you are using for this test?  Is the requested P state in response to a load?
Comment 3 Doug Smythies 2013-11-04 16:54:15 UTC
Dirk: The script is the same one as was used and posted back in bug 59481. However, now that that main issue is fixed, it is rather dense to use that script in the same way, as it takes about 10 hours. In a few hours I will post a modified version as there is no reason not to speed it up a lot now.

Yes, the requested Pstate is in response to a full load but both min and max percent have been set to whatever is shown on the x-axis.
Comment 4 Dirk Brandewie 2013-11-04 18:39:41 UTC
So you are trying to emulate the userspace governor here, not really to goal of intel_pstate but OK :-)

The ability to select a single P state was the intended usage of {min,max}_pct_perf but allow users to select a floor and ceiling for the range of available P states.  The absolute meaning percent available performance changes based on the SKU of the part (p states available).

The driver can only select integer values 16->38 in your turbo-on test case.

The p states are 2.6315789 percent wide in terms of turbo frequency and 2.9411765 percent wide in terms of the non-turbo max on your CPU

.42 * 38 = 15.96
.43 * 38 = 16.34
.44 * 38 = 16.72
.45 * 38 = 17.1

45% is where your test goes from 16->17.

For the 3 percent shelves in the turbo-off case. 
.50 * 34 = 17
.51 * 34 = 17.34
.52 * 34 = 17.68
.53 * 34 = 18.03 (so we are off by 3 percent of a pstate due to truncation)
Comment 5 Doug Smythies 2013-11-04 22:19:52 UTC
Hi Dirk,

There are cases, i.e. when investigating error in load averages where one wants to lock the CPU at whatever frequency. I realize it was not the goal of intel_pstate, but still needs to be allowed for. Regardless, I am merely using this as a way to easily demonstrate the issue.

".53 * 34 = 18.03 (so we are off by 3 percent of a pstate due to truncation)"
No, I am arguing that it is off by 103 percent of a pstate due to truncation.

I am also arguing that if rounding is used there will never be more than a half of a pstate discrepancy between desired and actual instead of 1 pstate. I am also arguing that it will help at the 100% end, where right now it might struggle to get to 100% on some processors.

For your examples, I am saying it should be:

Turbo:
int(.42 * 38 + 0.5) = 16
int(.43 * 38 + 0.5) = 16
int(.44 * 38 + 0.5) = 17
int(.45 * 38 + 0.5) = 17

Turbo off:
int(.50 * 34 + 0.5) = 17
int(.51 * 34 + 0.5) = 17
int(.52 * 34 + 0.5) = 18
int(.53 * 34 + 0.5) = 18
Comment 6 Dirk Brandewie 2013-11-05 01:28:23 UTC
(In reply to Doug Smythies from comment #5)
> Hi Dirk,
> 
> There are cases, i.e. when investigating error in load averages where one
> wants to lock the CPU at whatever frequency. I realize it was not the goal
> of intel_pstate, but still needs to be allowed for.

It is allowed for clearly you are using the mechanism.  The percentage values to get to a given frequency are SKU dependent.  We could change it to make
your graph look the way you want it on your system.  Then someone else comes up and say it goes to the higher P state too soon on their system.

In the normal case where intel_pstate is being used to save enrgy being
conservative is a good thing.

If you can't get to a selected (measured) frequency with the current interface then that is a bug.

  
> Regardless, I am merely
> using this as a way to easily demonstrate the issue.
> 
> ".53 * 34 = 18.03 (so we are off by 3 percent of a pstate due to truncation)"
> No, I am arguing that it is off by 103 percent of a pstate due to truncation.

int(.53 * 34 + 0.5) = 18

How is it off by a whole P state?
> 
> I am also arguing that if rounding is used there will never be more than a
> half of a pstate discrepancy between desired and actual instead of 1 pstate.
> I am also arguing that it will help at the 100% end, where right now it
> might struggle to get to 100% on some processors.
> 
> For your examples, I am saying it should be:
> 
> Turbo:
> int(.42 * 38 + 0.5) = 16
> int(.43 * 38 + 0.5) = 16
> int(.44 * 38 + 0.5) = 17
> int(.45 * 38 + 0.5) = 17
> 
> Turbo off:
> int(.50 * 34 + 0.5) = 17
> int(.51 * 34 + 0.5) = 17
> int(.52 * 34 + 0.5) = 18
> int(.53 * 34 + 0.5) = 18
Comment 7 Doug Smythies 2013-11-05 07:21:34 UTC
(In reply to Dirk Brandewie from comment #6)
> The percentage
> values to get to a given frequency are SKU dependent.

Yes, of course.

> int(.53 * 34 + 0.5) = 18
> 
> How is it off by a whole P state?

Because it actually goes to 1.7 GHz not 1.8 GHz:
CPU 7 is fully loaded:

doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/*
53
42
1
doug@s15:~/temp$ cat /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_cur_freq
1699867

By rounding, we stay away from finite math issues at integer boundaries.
Comment 8 Dirk Brandewie 2013-11-05 07:35:55 UTC
This NOT how you pin a single P state. Make max == min with the config above the driver is free to select any P state between 42-53%.
Comment 9 Doug Smythies 2013-11-05 08:12:08 UTC
I have done it both ways and get the same result, if the CPU is fully loaded.
All of my tests, until this morning were always done with max == min
But O.K.:

doug@s15:~/temp$ sudo ./set_cpu_turbo_off
doug@s15:~/temp$ echo "53" | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
53
doug@s15:~/temp$ echo "53" | sudo tee /sys/devices/system/cpu/intel_pstate/min_perf_pct
53
doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/*
53
53
1
doug@s15:~/temp$ cat /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_cur_freq
1700000
Comment 10 Doug Smythies 2013-11-05 15:06:54 UTC
Created attachment 113491 [details]
Test done with min=max percent and min left at 42 percent

This was the test I did so that I knew I could leave min percent at 42 percent without an unknown side effect. (as long as CPU 7 was under full load, of course)
Comment 11 Doug Smythies 2013-11-06 05:34:38 UTC
Created attachment 113581 [details]
CPU 7 frequency Vs. requested percent with rounding. Turbo off.

I added generic rounding to intel_pstate.c on the kernel 3.12 build for my test computer.

This attachment is similar to previous attachments, but with a new line added using the new code. Turbo off. The two occurrences of the span of 4 percent samples without a frequency step are still there, just moved. I'll look into that sometime.
Comment 12 Doug Smythies 2013-11-06 05:38:54 UTC
Created attachment 113591 [details]
CPU 7 frequency vs load. Turbo off. With and without rounding.

The load on cpu 7 varies from 0.005 to 0.995 in steps of 0.005 at 10 seconds per step. The frequency is monitored at 10 Hertz.
Comment 13 Doug Smythies 2013-11-06 05:43:55 UTC
Created attachment 113601 [details]
CPU 7 frequency vs load. Turbo on. With and without rounding.

The load on cpu 7 varies from 0.005 to 0.995 in steps of 0.005 at 10 seconds per step. The frequency is monitored at 10 Hertz.

Test 1 of 2. A subsequent test where the load was held at 0.83 and the load/sleep frequency was varied from 50 hertz to 300 hertz (graph not posted herein, as it was uneventful between rounding and no rounding), showed much less frequency jitter for the un-rounded case. Therefore it was decided to repeat this test.
Comment 14 Doug Smythies 2013-11-06 05:48:00 UTC
Created attachment 113611 [details]
CPU 7 frequency vs load. Turbo on. With and without rounding. 2

The load on cpu 7 varies from 0.005 to 0.995 in steps of 0.005 at 10 seconds per step. The frequency is monitored at 10 Hertz.

Turbo on test 2 of 2. The significant different for the un-rounded data at the higher frequencies is not understood. I'll look into it.

Note that gains and setpoints have not been altered, yet.
Comment 15 Doug Smythies 2013-11-06 06:04:54 UTC
Created attachment 113621 [details]
The rounding code.

The changes to the code. Probably some incorrect format.
Comment 16 Dirk Brandewie 2013-11-06 18:58:01 UTC
(In reply to Doug Smythies from comment #12)
> Created attachment 113591 [details]
> CPU 7 frequency vs load. Turbo off. With and without rounding.
> 
> The load on cpu 7 varies from 0.005 to 0.995 in steps of 0.005 at 10 seconds
> per step. The frequency is monitored at 10 Hertz.

Are you still setting {min/max}_perf_pct or are we changing horses and talking about using the driver in "normal' mode?
Comment 17 Doug Smythies 2013-11-06 19:44:45 UTC
(In reply to Dirk Brandewie from comment #16)
> (In reply to Doug Smythies from comment #12)
> > Created attachment 113591 [details]
> > CPU 7 frequency vs load. Turbo off. With and without rounding.
> > 
> > The load on cpu 7 varies from 0.005 to 0.995 in steps of 0.005 at 10
> seconds
> > per step. The frequency is monitored at 10 Hertz.
> 
> Are you still setting {min/max}_perf_pct or are we changing horses and
> talking about using the driver in "normal' mode?

No, I am not setting min=max=whatever.
Yes, I was "changing horses" here. The settings are all default from boot up, except with respect to turbo on or off. I was wanting to try to demonstrate improvement using rounding in a real operational sense and as it got closer to 100% frequency. Test 1, turbo on, did show improvement and much less jitter. However Test 2, turbo on, did not. I still need to go back and investigate why those two tests, which should have been the same, weren't.
Sorry, I should have been clearer in my description.

I have two methods for loading CPUs to various levels and working/sleeping frequencies:

One uses a program called "consume", originally from Peter Zijlstra of the kernel.org sched maintainers. It will apply the desired load at the desired work/sleep frequency, regardless of the CPU frequency. I.E. it does not respond to the CPU frequency going up, but rather modifies its work load accordingly to hold to what was asked for, not like a real system.

The other is a program called "waiter" (the name is from the text book I started it from). It will spin out the desired number of processes at the desired load at the desired work/sleep frequency, but what it actually does depends on the CPU frequency and number of running processes. I.E. it responds to the CPU frequency going up by getting its work done faster, just like a real system. The user interface for waiter is NOT good, and I tend to use another program to create scripts to provide operational parameters.

I used "consume" for these graphs.
Comment 18 Doug Smythies 2014-04-22 00:20:36 UTC
Created attachment 133231 [details]
Shows rounded pstate and actual for freq. Vs. Requested. Kernel 3.15rc1

I saw that this bug was set to resolved, and so did a new version of graphs previously posted herein. It looks to me as though the rounding is not even as good as it was.

However, it is the next graph, that I will post in a moment that is cause for concern.
Comment 19 Doug Smythies 2014-04-22 00:30:19 UTC
Created attachment 133241 [details]
CPU 7 frequency vs load. Turbo on. Kernels 3.15RC2 and 3.12

This graph, similar to others posted herein, shows old data from Kernel 3.12 as a reference and adds data from Kernel 3.15RC2. With Kernel 3.15RC2 the CPU 7 frequency never increases even though the load gets as high as 99%.

The sleep frequency was fixed at 200 Hertz for this test. Meaning, for a 99% load CPU 7 is busy for 4.95 milliseconds and idle for .05 milliseconds.

Note: I do not know when this change occurred, but I tired Kernel 3.13 and it seems similar to kernel 3.15RC2 for this. It is unlikely that this has anything to do with truncating, it just that I had similar graphs already in this bug report.

There will be one more graph...
Comment 20 Doug Smythies 2014-04-22 00:38:07 UTC
Created attachment 133251 [details]
Sleep / load frequency sweep from 2 to 250 Hertz. Kernel 3.12 and 3.15RC2

In this graph CPU 7 load was always 85%, however the sleep / load frequency was swept from 2 to 250 hertz. At 2 hertz, CPU 7 is busy for 425 milliseconds and idle for 75 milliseconds (we might expect to see some CPU frequency oscillations at this low sleep/work frequency). At 250 hertz, CPU 7 is busy for 3.4 milliseconds and idle for 0.6 milliseconds.

Note the dramatic difference in sleep/work frequency response between kernel 3.12 and 3.15 RC2.
Comment 21 Len Brown 2014-11-03 22:44:56 UTC
I don't see the suggested patch upstream or in linux-next,
so it seems that Resolved/Code_fix isn't the right state for
this report.  Re-opening, though I'd not be surprised if this one
gets explained and then closed as Documented...

Is the issue of rounding vs truncation of target frequency
in the driver independent of use of the min_perf_pct and max_perf_pct
sysfs interface?

If that is the case, it seems that intel_pstate might be choosing
"the next lower p-state" more often than if it had the luxury of
actually using floating point arithmetic.  Is that what this report is about?
If yes, that is an interesting realization, and perhaps a good suggestion
for optimization.

If that is not the case and this is about the precision control using
the sysfs interface, then I think you got what you got.
For better or for worse, it is defined in terms of percent,
not in terms of tenths or hundredths of a percent, so high
precision on the right side of the decimal place is not implied.
If there is a need for more precise control via sysfs,
please share the use-case.
Comment 22 Doug Smythies 2014-11-03 23:38:14 UTC
Hi Len,

The intel_pstate driver has changed a lot since I entered this bug report. Extra bits are maintained throughout. Rounding was added, but in the end it was a different form than in the example given in this bug report.

In the end, myself, I was O.K. with this one being closed as resolved.
My memory is vague, but I believe I re-did several of the tests and was O.K. with the results.

Yes, my original concerns were about struggling to get to the max pstate and servo response oddities that might results from digital anomalies. (some of my concerns turned out to be unfounded.)

Note You need to log in before you can comment on or make changes to this bug.