With the introduction of the kernel patch 3.14.7 and around the release of 3.15-rc8 the idle CPU frequency of each core in an idle system is changing erratically. Before the patch the output of grep MHz /proc/cpuinfo looked like this: cpu MHz : 813.625 cpu MHz : 813.625 cpu MHz : 813.625 cpu MHz : 813.625 After the patch: cpu MHz : 1600.207 cpu MHz : 1600.207 cpu MHz : 1599.847 cpu MHz : 2342.675 The CPU clocks are wildly changing over the complete spectrum of available frequencies. I can reproduce the phenomenon on at least three systems: Desktop PC with Sandy-Bridge CPU: Intel i7-2600 (non-K) Lenovo T530 with Ivy-Bridge CPU: i7-3610QM Sony VAIO Pro 13 (example above) with Haswell CPU: i5-4200U For me it is not clear whether this is expected behavior and I just hit some kind of heisenbug or something is going wrong. Primarily I report this because I noticed a significant sudden change. I haven't done any tests regarding performance, battery run-time or heat implications yet. Note: I've seen this kind of behavior in earlier kernel releases before, but with 3.14 it vanished completely. So this might be a regression.
Please define "an idle system". I ask because an "idle" linux server computer can be quite different than and "idle" linux computer with some sort of gui desktop, the latter being, in reality, actually considerably less "idle". That being said, your frequencies do seem a little high now. Your previous frequencies were unduly being forced low and there was a change with respect to that issue.
(In reply to Michael Long from comment #0) > With the introduction of the kernel patch 3.14.7 and around the release of > 3.15-rc8 the idle CPU frequency of each core in an idle system is changing > erratically. > This behavior is expected, see: http://marc.info/?l=linux-pm&m=140141648726863&w=2 Nothing to say more. Thanks, Yuyang
To comment #1: I made my tests on desktop systems with a loaded KDE instance. Aside from "the usual" services running like in a typical Fedora 20 default installation. There are no other process active producing considerable load over a longer time. I just started up, logged into KDE and let is stay for a while (at least 5 mins) and then looked at the frequencies. No browser started, no games loaded, no update-processes running etc, just an empty desktop. There is nothing up and running on those systems that justifies having one or more cores up even deep in some turbo mode when getting the frequency stats.
Created attachment 139771 [details] turbostat look at idle system Perhaps provide the output from "turbostat sleep 60" or "turbostat -J sleep 60". I have attached the result from my system, where my CPU's are all at low frequency (lowest pstate). My system is a server with no GUI stuff. I did it as an attachment rather than in-line, so as to (hopefully) not mess up the formatting.
Thanks for the hint about the turbostat utility. Without starting a desktop environment I got similar stats, all frequencies are very close around the lowest state. Logged into KDE showed different results. Eventually I found the cause of those high clocks: A superkaramba desktop-widget. This widget basically does a grep on /proc/cpuinfo and checks the load internally each second. Disabling this widget gets the clock down. The same behavior can be reproduced just by running "for i in {1..99}; do grep MHz /proc/cpuinfo; sleep 1; done". Admittedley this might be a typical layer8-problem, however why is a simple grep every second pounding the CPU so hard that it remains in higher clocks, even in turbo mode? Especially when it didn't before the patch or just using plain old acpi-cpufreq. If this is still just expected behavior sorry for the unnecessary noise.
Created attachment 140001 [details] turbostat results while running simple grep
(In reply to Michael Long from comment #5) > Thanks for the hint about the turbostat utility. Without starting a desktop > environment I got similar stats, all frequencies are very close around the > lowest state. Logged into KDE showed different results. Eventually I found > the cause of those high clocks: > > A superkaramba desktop-widget. This widget basically does a grep on > /proc/cpuinfo and checks the load internally each second. Disabling this > widget gets the clock down. The same behavior can be reproduced just by > running "for i in {1..99}; do grep MHz /proc/cpuinfo; sleep 1; done". > > Admittedley this might be a typical layer8-problem, however why is a simple > grep every second pounding the CPU so hard that it remains in higher clocks, > even in turbo mode? Especially when it didn't before the patch or just using > plain old acpi-cpufreq. If this is still just expected behavior sorry for > the unnecessary noise. Expected, but should not happen. So clearly something is wrong... Yuyang
Created attachment 140031 [details] two turbostats runs with grep running Is your turbostat listing for your Haswell computer? I.E. the one where the minimum CPU frequency is about 800 MHz? I am unable to repeat your "for i in {1..99}; do grep MHz /proc/cpuinfo; sleep 1; done" results on my computer. I wish that I could. Attached are two turbostat runs done while that command was running (the 2nd one is probably of no use to anyone else). My minimum CPU frequency is about 1600 MHz.
Created attachment 140441 [details] Kernel config 3.15 Sorry for the delay, yes the turbostat results are all from the same Haswell i5-4200U with 800-1600 MHz (turbo 2.9 GHz) [1]. I've no clue why this ultrabook is so sensitive. Hence I've attached my current kernel-config maybe I've misconfigured something. In the meantime I tried to find a better method to reproduce the effect but I had no real luck when testing on my other quad-core systems. There, the effect is very less severe. [1] http://ark.intel.com/products/75459/Intel-Core-i5-4200U-Processor-3M-Cache-up-to-2_60-GHz?q=i5-4200U
Can run "powertop --html sleep 30" and attach powertop.html
Created attachment 140501 [details] powertop report on idle system
Created attachment 140511 [details] powertop report with a grep in a loop
I believe I have the same problem on Arch with a Haswell i7-4750HQ. A git bisect found that it was introduced with this patch: # git bisect bad cba64e6cbf312042e124dbf669e0a1e1dee72522 is the first bad commit commit cba64e6cbf312042e124dbf669e0a1e1dee72522 Author: Dirk Brandewie <dirk.j.brandewie@intel.com> Date: Thu May 29 09:32:22 2014 -0700 intel_pstate: Remove C0 tracking commit adacdf3f2b8e65aa441613cf61c4f598e9042690 upstream. Commit fcb6a15c (intel_pstate: Take core C0 time into account for core busy calculation) introduced a regression referenced below. The issue with "lockup" after suspend that this commit was addressing is now dealt with in the suspend path. Fixes: fcb6a15c2e7e (intel_pstate: Take core C0 time into account for core busy calculation) Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581 Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121 Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> :040000 040000 838e9246fe6c53d840ff84f405e68e3645bb0f7d 5e5fba17fb18725cab01762ac87ac9344f95aec7 M drivers
@ Henry: Yes, thanks. We know that it is the removal of the C0 stuff. The issue is, and as mentioned in my comment #1 above, the C0 stuff was forcing the CPU frequencies hard to the minimum value. Indeed, so hard that they often would not increase, even for heavy workloads. I guess people got used to seeing their CPU frequencies always at minimum (which was what was wrong) and now are worried that the reported frequencies are varying so much (which is O.K. and expected). Please do not mistake a higher reported CPU frequency from "grep MHz /proc/cpuinfo" with a problem, because that does not tell the whole story as it does not provide any information as to how much time the CPU was active in the C0 state. The CPU might well be asleep and whatever number is in /proc/cpuinfo might be stale. (and such an assertion is supported by both the powertop and turbostat postings, where each CPU is spending, by far, the majority of its time in a state where the clock is not running.) However, and as I also mentioned in comment #1 above, for the workload as I understand it, those listed frequencies (stale or not) do seem little high. Are we worried about it? No. If it is desired to dig deeper, then we would need to acquire some trace data to analyze using "perf record". Let us know if you want to do that.
I added the field Avg_MHz to turbostat output because many users assumed that idle time was included in the calculation of frequency. I re-named the previous GHz column to be Bzy_MHz for those who want to know what the frequency is when actually running -- something that often matches a selected P-state, for example. The difference is dramatic in cases like this, when the system is, say, 0.03% busy (99.97% idle). I don't see any issues with the frequency, Avg_MHz, or Bzy_MHz, as reported by turbostat in this bug report. If there are some, please point them out to me. So it seems that the issue is with what users are seeing in /proc/cpuinfo, and it isn't what they expect. note that turbostat in this case is run over 60 seconds. So the denominator in the cycles/time calculation is large. If you run it with a parameter such that it returns in a time much shorter than 1 second, that math becomes less reliable. That is because the counter collection is not atomic, and the denominator is small, which will magnify any jitter is calculation of the cycles elapsed -- which may itself be very small... So there are two questions. First, should intel_pstate include idle time or not in the frequency that it presents in /proc/cpuinfo. I think the principle of least surprise leads to the answer "no", since people are accustomed to seeing "most recently requested p-state" here, independent of if the CPU is busy or not. Second, what is the minimum duration_us used in the calculation of frequency -- is the math not working, or is this a symptom of un-expected frequency selections by intel_pstate? If the math is working, then the question is if the states selected are wise choices for performance and power -- neither of which have yet to be mentioned in this bug report.
(In reply to Len Brown from comment #15) > Second, what is the minimum duration_us used in the > calculation of frequency -- is the math not working, or > is this a symptom of un-expected frequency selections > by intel_pstate? If the math is working, then the question > is if the states selected are wise choices for > performance and power -- neither of which have yet to > be mentioned in this bug report. Thanks, Len. This question is the first one that should be answered before any symptoms reported. Because if the math does not work, how come result will ever be right in nature. From my understand about the current situation, the math is totally wrong according to the following comments: http://marc.info/?l=linux-pm&m=139962897623086&w=2 http://marc.info/?l=linux-pm&m=140126641926395&w=2 http://marc.info/?l=linux-pm&m=140141648726863&w=2 I would really expect someone can directly address these... Thanks, Yuyang
I am noticing problems with P-States on my i7-4910MQ 2.9Ghz (turbo 3.9Ghz) while this is a laptop a Dell Precision M6800, latest BIOS version A11 from 11/20/2014 Two things I notice differ and one is concerning overheat even with latest thermald. I have C-States enabled in BIOS. When using both intel_pstates or acpi-cpufreq turbostat does show CPU cores entering C7 state, however PkgTmp never drops below Pkg%pc2 (there lists Pkg%pc7 level). The concern with P-States I have is even though the frequency idles much higher then the lowest frequency the processor is running a lot hotter and fans are on constantly even for when one process pegs 1 logical processor at 100%. When using acpi-cpufreq the processor w/ governor ondemand the processor will remain at low frequency and CPU temperatures are cooler and laptop is quieter. Is this by design? I've had a number of thermal events with p-states and hitting overheat threshold. If this isn't please tell me what info you want and I'll collect it accordingly. Thanks, Shawn
I observe from the posted .config file that the kernel is a 300 Hz kernel. It turns out that there seems to be a very interesting and dramatic manifestation of the tendency to drive up the target pstate with no good reason, when the sample rate is 13.3333 mSecs (as the 300 Hz kernel will default to) and desktop gui stuff is running. Len: It is not an issue with too short a duration giving maths issues. Yuyang: Yes, some C0 weighting needs to be re-introduced. Please see also: Bug 93521 https://bugzilla.kernel.org/show_bug.cgi?id=93521
I scrap my comments in this BZ, the issue was with intel_iommu setting, with VT-d you must disable GFX with the IOMMU or won't get to PC6 state at all.
thanks shawn, closing as it sounds like the original issue was resolved already as well.