Bug 77771 - Intel P-State: Constantly changing CPU frequencies on idle system.
Summary: Intel P-State: Constantly changing CPU frequencies on idle system.
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Kristen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-13 05:49 UTC by Michael Long
Modified: 2015-07-22 01:11 UTC (History)
7 users (show)

See Also:
Kernel Version: 3.14.7, 3.15-rc8 - 3.15
Subsystem:
Regression: No
Bisected commit-id:


Attachments
turbostat look at idle system (1.09 KB, text/plain)
2014-06-15 00:44 UTC, Doug Smythies
Details
turbostat results while running simple grep (819 bytes, text/plain)
2014-06-16 19:19 UTC, Michael Long
Details
two turbostats runs with grep running (2.81 KB, text/plain)
2014-06-17 03:34 UTC, Doug Smythies
Details
Kernel config 3.15 (86.63 KB, application/octet-stream)
2014-06-19 16:25 UTC, Michael Long
Details
powertop report on idle system (43.34 KB, text/html)
2014-06-20 16:10 UTC, Michael Long
Details
powertop report with a grep in a loop (52.25 KB, text/html)
2014-06-20 16:11 UTC, Michael Long
Details

Description Michael Long 2014-06-13 05:49:56 UTC
With the introduction of the kernel patch 3.14.7 and around the release of 3.15-rc8 the idle CPU frequency of each core in an idle system is changing erratically.

Before the patch the output of grep MHz /proc/cpuinfo looked like this:

cpu MHz         : 813.625
cpu MHz         : 813.625
cpu MHz         : 813.625
cpu MHz         : 813.625

After the patch:

cpu MHz         : 1600.207
cpu MHz         : 1600.207
cpu MHz         : 1599.847
cpu MHz         : 2342.675

The CPU clocks are wildly changing over the complete spectrum of available frequencies. I can reproduce the phenomenon on at least three systems:

Desktop PC with Sandy-Bridge CPU: Intel i7-2600 (non-K)
Lenovo T530 with Ivy-Bridge CPU: i7-3610QM
Sony VAIO Pro 13 (example above) with Haswell CPU: i5-4200U

For me it is not clear whether this is expected behavior and I just hit some kind of heisenbug or something is going wrong. Primarily I report this because I noticed a significant sudden change. I haven't done any tests regarding performance, battery run-time or heat implications yet.

Note: I've seen this kind of behavior in earlier kernel releases before, but with 3.14 it vanished completely. So this might be a regression.
Comment 1 Doug Smythies 2014-06-13 06:20:56 UTC
Please define "an idle system". I ask because an "idle" linux server computer can be quite different than and "idle" linux computer with some sort of gui desktop, the latter being, in reality, actually considerably less "idle".

That being said, your frequencies do seem a little high now. Your previous frequencies were unduly being forced low and there was a change with respect to that issue.
Comment 2 Yuyang Du 2014-06-13 08:18:29 UTC
(In reply to Michael Long from comment #0)
> With the introduction of the kernel patch 3.14.7 and around the release of
> 3.15-rc8 the idle CPU frequency of each core in an idle system is changing
> erratically.
> 

This behavior is expected, see:

http://marc.info/?l=linux-pm&m=140141648726863&w=2

Nothing to say more.

Thanks,
Yuyang
Comment 3 Michael Long 2014-06-13 09:09:00 UTC
To comment #1:

I made my tests on desktop systems with a loaded KDE instance. Aside from "the usual" services running like in a typical Fedora 20 default installation. There are no other process active producing considerable load over a longer time.

I just started up, logged into KDE and let is stay for a while (at least 5 mins) and then looked at the frequencies. No browser started, no games loaded, no update-processes running etc, just an empty desktop.

There is nothing up and running on those systems that justifies having one or more cores up even deep in some turbo mode when getting the frequency stats.
Comment 4 Doug Smythies 2014-06-15 00:44:39 UTC
Created attachment 139771 [details]
turbostat look at idle system

Perhaps provide the output from "turbostat sleep 60" or "turbostat -J sleep 60".
I have attached the result from my system, where my CPU's are all at low frequency (lowest pstate). My system is a server with no GUI stuff.

I did it as an attachment rather than in-line, so as to (hopefully) not mess up the formatting.
Comment 5 Michael Long 2014-06-16 19:15:58 UTC
Thanks for the hint about the turbostat utility. Without starting a desktop environment I got similar stats, all frequencies are very close around the lowest state. Logged into KDE showed different results. Eventually I found the cause of those high clocks: 

A superkaramba desktop-widget. This widget basically does a grep on /proc/cpuinfo and checks the load internally each second. Disabling this widget gets the clock down. The same behavior can be reproduced just by running "for i in {1..99}; do grep MHz /proc/cpuinfo; sleep 1; done".

Admittedley this might be a typical layer8-problem, however why is a simple grep every second pounding the CPU so hard that it remains in higher clocks, even in turbo mode? Especially when it didn't before the patch or just using plain old acpi-cpufreq. If this is still just expected behavior sorry for the unnecessary noise.
Comment 6 Michael Long 2014-06-16 19:19:06 UTC
Created attachment 140001 [details]
turbostat results while running simple grep
Comment 7 Yuyang Du 2014-06-17 01:46:02 UTC
(In reply to Michael Long from comment #5)
> Thanks for the hint about the turbostat utility. Without starting a desktop
> environment I got similar stats, all frequencies are very close around the
> lowest state. Logged into KDE showed different results. Eventually I found
> the cause of those high clocks: 
> 
> A superkaramba desktop-widget. This widget basically does a grep on
> /proc/cpuinfo and checks the load internally each second. Disabling this
> widget gets the clock down. The same behavior can be reproduced just by
> running "for i in {1..99}; do grep MHz /proc/cpuinfo; sleep 1; done".
> 
> Admittedley this might be a typical layer8-problem, however why is a simple
> grep every second pounding the CPU so hard that it remains in higher clocks,
> even in turbo mode? Especially when it didn't before the patch or just using
> plain old acpi-cpufreq. If this is still just expected behavior sorry for
> the unnecessary noise.

Expected, but should not happen. So clearly something is wrong...

Yuyang
Comment 8 Doug Smythies 2014-06-17 03:34:18 UTC
Created attachment 140031 [details]
two turbostats runs with grep running

Is your turbostat listing for your Haswell computer? I.E. the one where the minimum CPU frequency is about 800 MHz?

I am unable to repeat your "for i in {1..99}; do grep MHz /proc/cpuinfo; sleep 1; done" results on my computer. I wish that I could. Attached are two turbostat runs done while that command was running (the 2nd one is probably of no use to anyone else). My minimum CPU frequency is about 1600 MHz.
Comment 9 Michael Long 2014-06-19 16:25:06 UTC
Created attachment 140441 [details]
Kernel config 3.15

Sorry for the delay, yes the turbostat results are all from the same Haswell i5-4200U with 800-1600 MHz (turbo 2.9 GHz) [1].

I've no clue why this ultrabook is so sensitive. Hence I've attached my current kernel-config maybe I've misconfigured something.

In the meantime I tried to find a better method to reproduce the effect but I had no real luck when testing on my other quad-core systems. There, the effect is very less severe.



[1] http://ark.intel.com/products/75459/Intel-Core-i5-4200U-Processor-3M-Cache-up-to-2_60-GHz?q=i5-4200U
Comment 10 Dirk Brandewie 2014-06-19 16:34:33 UTC
Can run "powertop --html sleep 30" and attach powertop.html
Comment 11 Michael Long 2014-06-20 16:10:44 UTC
Created attachment 140501 [details]
powertop report on idle system
Comment 12 Michael Long 2014-06-20 16:11:52 UTC
Created attachment 140511 [details]
powertop report with a grep in a loop
Comment 13 Henry Gebhardt 2014-06-23 15:20:03 UTC
I believe I have the same problem on Arch with a Haswell i7-4750HQ. A git bisect found that it was introduced with this patch:

# git bisect bad
cba64e6cbf312042e124dbf669e0a1e1dee72522 is the first bad commit
commit cba64e6cbf312042e124dbf669e0a1e1dee72522
Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
Date:   Thu May 29 09:32:22 2014 -0700

    intel_pstate: Remove C0 tracking
    
    commit adacdf3f2b8e65aa441613cf61c4f598e9042690 upstream.
    
    Commit fcb6a15c (intel_pstate: Take core C0 time into account for core
    busy calculation) introduced a regression referenced below.  The issue
    with "lockup" after suspend that this commit was addressing is now dealt
    with in the suspend path.
    
    Fixes: fcb6a15c2e7e (intel_pstate: Take core C0 time into account for core busy calculation)
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121
    Reported-by: Doug Smythies <dsmythies@telus.net>
    Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 838e9246fe6c53d840ff84f405e68e3645bb0f7d 5e5fba17fb18725cab01762ac87ac9344f95aec7 M	drivers
Comment 14 Doug Smythies 2014-06-23 23:30:16 UTC
@ Henry: Yes, thanks. We know that it is the removal of the C0 stuff.

The issue is, and as mentioned in my comment #1 above, the C0 stuff was forcing the CPU frequencies hard to the minimum value. Indeed, so hard that they often would not increase, even for heavy workloads.

I guess people got used to seeing their CPU frequencies always at minimum (which was what was wrong) and now are worried that the reported frequencies are varying so much (which is O.K. and expected).

Please do not mistake a higher reported CPU frequency from "grep MHz /proc/cpuinfo" with a problem, because that does not tell the whole story as it does not provide any information as to how much time the CPU was active in the C0 state. The CPU might well be asleep and whatever number is in /proc/cpuinfo might be stale. (and such an assertion is supported by both the powertop and turbostat postings, where each CPU is spending, by far, the majority of its time in a state where the clock is not running.)

However, and as I also mentioned in comment #1 above, for the workload as I understand it, those listed frequencies (stale or not) do seem little high. Are we worried about it? No. If it is desired to dig deeper, then we would need to acquire some trace data to analyze using "perf record". Let us know if you want to do that.
Comment 15 Len Brown 2014-07-03 03:44:24 UTC
I added the field Avg_MHz to turbostat output because many users
assumed that idle time was included in the calculation of frequency.
I re-named the previous GHz column to be Bzy_MHz for those
who want to know what the frequency is when actually running --
something that often matches a selected P-state, for example.

The difference is dramatic in cases like this, when the system is, say,
0.03% busy (99.97% idle).

I don't see any issues with the frequency, Avg_MHz, or Bzy_MHz,
as reported by turbostat in this bug report.  If there are some,
please point them out to me.

So it seems that the issue is with what users are seeing in
/proc/cpuinfo, and it isn't what they expect.

note that turbostat in this case is run over 60 seconds.
So the denominator in the cycles/time calculation is large.
If you run it with a parameter such that it returns in
a time much shorter than 1 second, that math becomes less reliable.
That is because the counter collection is not atomic, and the
denominator is small, which will magnify any jitter is calculation
of the cycles elapsed -- which may itself be very small...

So there are two questions.

First, should intel_pstate include idle time or not
in the frequency that it presents in /proc/cpuinfo.
I think the principle of least surprise leads to
the answer "no", since people are accustomed to seeing
"most recently requested p-state" here, independent
of if the CPU is busy or not.

Second, what is the minimum duration_us used in the
calculation of frequency -- is the math not working, or
is this a symptom of un-expected frequency selections
by intel_pstate?  If the math is working, then the question
is if the states selected are wise choices for
performance and power -- neither of which have yet to
be mentioned in this bug report.
Comment 16 Yuyang Du 2014-07-03 04:04:22 UTC
(In reply to Len Brown from comment #15)

> Second, what is the minimum duration_us used in the
> calculation of frequency -- is the math not working, or
> is this a symptom of un-expected frequency selections
> by intel_pstate?  If the math is working, then the question
> is if the states selected are wise choices for
> performance and power -- neither of which have yet to
> be mentioned in this bug report.

Thanks, Len. This question is the first one that should be answered before any symptoms reported. Because if the math does not work, how come result will ever be right in nature.

From my understand about the current situation, the math is totally wrong according to the following comments: 

http://marc.info/?l=linux-pm&m=139962897623086&w=2
http://marc.info/?l=linux-pm&m=140126641926395&w=2
http://marc.info/?l=linux-pm&m=140141648726863&w=2

I would really expect someone can directly address these...

Thanks,
Yuyang
Comment 17 Shawn Starr 2014-12-23 07:51:21 UTC
I am noticing problems with P-States on my i7-4910MQ 2.9Ghz (turbo 3.9Ghz) while this is a laptop a Dell Precision M6800, latest BIOS version A11 from 11/20/2014

Two things I notice differ and one is concerning overheat even with latest thermald.

I have C-States enabled in BIOS. When using both intel_pstates or acpi-cpufreq turbostat does show CPU cores entering C7 state, however PkgTmp never drops below Pkg%pc2 (there lists Pkg%pc7 level).

The concern with P-States I have is even though the frequency idles much higher then the lowest frequency the processor is running a lot hotter and fans are on constantly even for when one process pegs 1 logical processor at 100%.

When using acpi-cpufreq the processor w/ governor ondemand the processor will remain at low frequency and CPU temperatures are cooler and laptop is quieter.

Is this by design? I've had a number of thermal events with p-states and hitting overheat threshold.

If this isn't please tell me what info you want and I'll collect it accordingly.

Thanks,
Shawn
Comment 18 Doug Smythies 2015-02-26 17:25:28 UTC
I observe from the posted .config file that the kernel is a 300 Hz kernel. It turns out that there seems to be a very interesting and dramatic manifestation of the tendency to drive up the target pstate with no good reason, when the sample rate is 13.3333 mSecs (as the 300 Hz kernel will default to) and desktop gui stuff is running.

Len: It is not an issue with too short a duration giving maths issues.

Yuyang: Yes, some C0 weighting needs to be re-introduced.

Please see also: Bug 93521
https://bugzilla.kernel.org/show_bug.cgi?id=93521
Comment 19 Shawn Starr 2015-02-26 18:13:55 UTC
I scrap my comments in this BZ, the issue was with intel_iommu setting, with VT-d you must disable GFX with the IOMMU or won't get to PC6 state at all.
Comment 20 Kristen 2015-05-07 16:41:32 UTC
thanks shawn, closing as it sounds like the original issue was resolved already as well.

Note You need to log in before you can comment on or make changes to this bug.