Bug 93521
Description
da_audiophile
2015-02-19 20:28:58 UTC
Actually, I need to change my original report: the driver appears to be working when I am in a tty without lxdm running at all. If I however keep lxdm running and open a tty, I do not see the frequency change much from the 4.40 GHz value. I have attached the i7z logs under X and booting with lxdm.service disabled. Under X = X-cpu_freq_log Under TTY = tty-cpu_freq_log Created attachment 167591 [details]
i7z log under X
Created attachment 167601 [details]
i7z log in a TTY
The attached png image was generated by booting into Xorg (shown in blue) or into a TTY (shown in green) and then using `i7z -w a` and plotting the resulting frequencies. You can clearly see bug I am reporting. The top plot is the distribution on the Haswell CPU. The mean freq at idle under Xorg is 3,401 MHz compared to 1,201 MHz into a text TTY without Xorg on the Haswell. I also included the same experiment on an Ivybridge CPU. Here the two mean values are the same. Created attachment 167801 [details]
histograms
I just realized that the i7-4790k is technically a "Haswell Refresh" and after collecting the same data on an older "Haswell" CPU, this bug seems to only affect the "Haswell Refresh" CPUs: the i3-4130T shows nearly the same mean value at idle under either Xorg or a text TTY (mean Xorg=1,089 MHz and mean TTY=1,029 MHz). Here is an independent report from another i7-4790k user confirming this behavior: https://bbs.archlinux.org/viewtopic.php?id=184817 How crazy is this: settings of both 250 Hz or 1000 Hz tickrates both give idle rates that are comparable to either Ubuntu or Fedora. A setting of 300 Hz is to blame for the odd behavior. Nothing else. Original Arch config (300 Hz) median value on idle in Xorg = 3,316 MHz Changing only the tickrate from the original Arch config... To 250 Hz tick rate median value on idle in Xorg = 889 MHz To 1000 Hz tick rate median value on idle in Xorg = 1,012 MHz What's going on with the 300 Hz value and this particular processor? Created attachment 168311 [details] excerpt from a post processed "perf record" John (da_audiophile@yahoo.com) has discovered a very interesting use case: His distribution (Arch) uses a 300 Hertz kernel. Due to integer math, the default sample time of 10 milliseconds turns into 13.33333 mSec for this case. For whatever reason, it turns out that this particular sample interval and the video frame rate of 59.95 Hertz interacts with the xorg desktop stuff in such a way as to dramatically increase the manifestation of the pre-existing condition of driving up the target pstate under very light loads. If the sample time is changed the magnitude of the issue is greatly reduced. Attached is small excerpt from a "perf record" session John did on this otherwise "idle" computer. (note: he had to apply the patch we have been using for many months so that our post processing tools would work). Notice how the smallest perturbation grows such that the target pstate ends up at the turbo maximum, and the various CPUs basically chase each other around, all with the load being less than 1 percent. There are a few issues with the intel_pstate driver: One is this tendency to drive up the target pstate for no good reason (in most manifestations the target pstate doesn't get driven up too far and also comes back down fairly quickly); Another is the tendency to drive down the target pstate when it should be higher due to excessive deferral of the deferrable timers, thus incorrectly engaging the duration reduction code. Before Dirk moved on, the little working group was working towards eliminating the duration method and re-introducing some C0 time inclusion to the calculation of the target pstate. Indeed, we had some very nice CPU frequency verses load response curves, with what we considered to be excellent tradeoffs between performance, energy savings, and flexibility. While the group agreed on the big picture, and its urgency, we disagreed on some of the exact implementation details. For my part of it, I'll try to revive and test my patch set, but it'll take probably a couple of weeks. In the meantime, the proposed workaround is to set the sample rate to 19 Msecs for these 300 Hz kernels, which will result in an actual sample rate of 20 Msecs. (Note that the working group was tending towards a longer default sample rate anyway.) @Doug, i've the same problem on my ArchLinux/64, i7-4790K, Asus H97-Plus: i tried your suggestion and tweaked my sample rate with this, note my kernel is the stock Arch kernel configured with a tick of 300hZ: echo 19 | sudo tee /sys/kernel/debug/pstate_snb/sample_rate_ms By looking at the c-states with i7z-git i can see how the CPU is staying in C7 most of the time but the frequency is modulated in the upper 4-4.4ghz if Turbo Mode is enabled, while it just sits at 3.9/4 without turbo. I can see some spikes toward 2.9Ghz, *for one single core at a time*, but that's it, not really going down for more than one second. I also tried different settings for the sample rate, such as 5, 20, 21, 22, 25, 50 and 100, but the result stays the same. @manuel: Very interesting. Your computer is otherwise "idle" right? What kernel version are you using? Are you adept at compiling the kernel? It would be great to acquire some "perf record" data, but to do so requires the addition of a yet to be released patch. I'm using the latest stock kernel, 3.18.6-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux. It was completely idle: at some point i also killed Xorg but didn't make any difference; i'll retry it this evening and see if i notice any major changes. I may be able to recompile it and give it a try, but i've lot of work to finish and i'm not sure i'll have the time, but please let me know what to tweak in case i can do it. "at some point i also killed Xorg but didn't make any difference" Then your use case is different than John's use case. It would be better to use kernel 3.19 or 4.0RC1 for tests. However, I do have the older version of the patches for kernel 3.18. With your computer otherwise idle, could you post the output from: sudo turbostat sleep 300 Ok, i was just idleing in the Gnome desktop, and even though some minor activity is there the c7 state was clearly the winner. Also i'm running an nVidia GTX970 on 346.35, so GFX wattage will be null: Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt - - 38 0.96 3959 4000 0 3.01 0.22 0.08 95.73 33 35 0.00 0.00 0.00 0.00 12.54 1.06 0.00 0 0 16 0.40 3997 4000 0 4.45 0.38 0.04 94.72 31 35 0.00 0.00 0.00 0.00 12.54 1.06 0.00 0 4 74 1.91 3859 4000 0 2.95 1 1 18 0.46 3994 4000 0 3.06 0.14 0.05 96.29 32 1 5 57 1.42 3997 4000 0 2.10 2 2 19 0.48 3992 4000 0 2.13 0.16 0.03 97.20 32 2 6 27 0.68 3993 4000 0 1.93 3 3 29 0.73 3988 4000 0 4.15 0.20 0.20 94.72 33 3 7 63 1.57 3989 4000 0 3.31 300.000566 sec The previous turbostat results were taken with a sample_rate_ms of 10 and while idleing: the next bunch of results are taken with a sample_rate_ms of 19 and light activity, such as browsing (Chromium) and checking mail (Thunderbird): Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt - - 230 5.75 3998 4000 0 18.57 4.50 1.71 69.47 34 37 0.00 0.00 0.00 0.00 18.72 7.02 0.00 0 0 294 7.37 3997 4000 0 16.52 5.24 2.01 68.86 34 37 0.00 0.00 0.00 0.00 18.72 7.02 0.00 0 4 208 5.21 3999 4000 0 18.68 1 1 287 7.19 3997 4000 0 17.58 4.20 1.59 69.44 34 1 5 200 4.99 3998 4000 0 19.78 2 2 263 6.57 3997 4000 0 18.83 4.24 1.68 68.68 33 2 6 190 4.76 3998 4000 0 20.64 3 3 249 6.22 3997 4000 0 17.00 4.33 1.54 70.91 34 3 7 148 3.70 3999 4000 0 19.52 300.000576 sec @manuel: Your CPU has a lot going on, even when the system is "idle". Could you do a test with xorg turned off and the system otherwise "idle"? @manuel: I e-mailed you that patches for both before 3.18RC1 and for 3.19RC1 onwards. Your "idle" load is much higher than John's, however your CPU frequencies do seem high for the conditions. Was your computer ever put into suspend before these tests? If yes, that is a different bug report. Your intel_pstate governor is in powersave mode correct? Thank you for the patches, yes i set the governor to powersave: $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver intel_pstate $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor powersave Created attachment 168431 [details]
Performance data sampled from patched ARch kernel 3.18.6-1
Sampled perf data from the ArchLinux/64 stock kernel 3.18.6-1 patched with "Add-tsc" and "Move-tracepoint".
@doug, i recompiled the kernel after applying both "Add-tsc" and "Move-tracepoint" patches and attached the sampled perf data to this bug report, noticed i missed some questions before: > Was your computer ever put into suspend before these tests? Nope, never suspended before the tests. > Your intel_pstate governor is in powersave mode correct? Yes, always set to powersave. @manuel: I am having trouble with the data. I get: magic/endian check failed incompatible file format I will e-mail you the post processing tools, and maybe you could try. @John: did you have to compile and install new versions of perf and trace when you did this? Not sure why but even by compiling the perf tools i have the "incompatible file format" too.. Ouch! Got it, should use "tar Jxvf perf.data.xz" to decompress that, not the Gnome archive manager o_O I'm now attaching the "manuel01.tar.gz" results/ folder. Created attachment 168461 [details]
Post-processed perf data manuel01
O.K. great, thanks. Turbo seems to be turned off, correct? There seems to be one bad data point, but it doesn't matter. It will be hours before I can comment in more details, but on quick glance, your computer has frequent bursts of very heavy CPU loads, and therefore I think it is behaving as expected. I'll comment more later today. Nice, i'll keep an eye for updates here. > Turbo seems to be turned off, correct? Yes, i'm limiting the ratio to 40x. > your computer has frequent bursts of very heavy CPU loads, and therefore I > think it is behaving as expected This is odd, at least, i stopped GDM and probably had mysqld/apache running, but with my old i7-920 (cpupower/ondemand), logged into the gnome desktop, i had the frequency scaling bursting to 2.6Ghz when needed, it just stayed at 800/1600 most of the time. > did you have to compile and install new versions of perf and trace when you
> did this?
Nope, but I believe you already figured out the compression thing.
As mentioned earlier, the Manuel data contains jumps to the maximum CPU frequency, for good reason. However the frequency lingers at a high level for much much longer than it should. In addition to the issues we already know about, its as though CPUs that have gone into the C7 state are still casting their target p_state vote into the processor PLL decision stuff, whereas I thought those CPU were supposed to lose their vote. To gain additional insight, I need to write a program to re-process the normalize.csv file, adding some additional information and calculations. I also need to read up some more about when the CPUs votes count and when they don't, as I think I am just going by what Dirk (the previous intel_pstate maintainer) always told me. @John: @manuel: Would both of you do the following experiment, the purpose of which is to determine if issues are C7 state specific. Note that my i7-2600K doesn't go into C7 anyway, but I did test syntax and such, by limiting it to C3. Eliminate the systems ability to go into C7: Edit /etc/default/grub and change the GRUB_CMDLINE_LINUX_DEFAULT line: GRUB_CMDLINE_LINUX_DEFAULT="intel_idle.max_cstate=6" If there is already other stuff on the line, the new parameter becomes additional. Example: GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 intel_pstate=enable intel_idle.max_cstate=3 crashkernel=384M-:128M" Update: "sudo update-grub" After a re-boot, observe the limit is hit (3 in my test case): doug@s15:~/temp$ dmesg | grep -i " max_cstate " [ 1.004557] intel_idle: max_cstate 3 reached doug@s15:~/temp$ grep -i " max_cstate " /var/log/kern.log Mar 1 09:17:35 s15 kernel: [ 1.004557] intel_idle: max_cstate 3 reached Observe with turbostat: doug@s15:~/temp$ sudo ./turbostat sleep 20 Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt - - 61 3.80 1605 3411 0 3.80 92.39 0.00 0.00 29 29 0.18 69.39 0.00 0.00 6.28 2.63 0.23 0 0 140 8.69 1605 3411 0 3.81 87.50 0.00 0.00 29 29 0.18 69.39 0.00 0.00 6.28 2.63 0.23 0 4 61 3.79 1605 3411 0 8.72 1 1 1 0.03 1606 3411 0 0.03 99.93 0.00 0.00 29 1 5 0 0.00 1608 3411 0 0.06 2 2 129 8.05 1605 3411 0 6.97 84.98 0.00 0.00 29 2 6 111 6.94 1605 3411 0 8.08 3 3 29 1.80 1605 3411 0 1.04 97.17 0.00 0.00 29 3 7 18 1.10 1605 3411 0 1.73 20.001256 sec On an otherwise "idle" system, do the "perf record" as previously: sudo perf record -a --event=power:pstate_sample sleep 600 and post back here the resulting perf.data file. No need for you to post process yourself, other than for general interest, as I am now using a modified normalize.py script. OK, dmesg confirms and Arch doesn't log to /var/log/kern.log % dmesg -t | grep -i " max_cstate " intel_idle: max_cstate 6 reached From grub.cfg: linux /boot/vmlinuz-linux root=/dev/sdb3 rw intel_idle.max_cstate=6 Problem is both turbostat and i7z are showing C7 most of the time. I emailed you the perf.data anyway. Note the kernel I am using is the default Arch with 300 Hz tickrate only exception is your patch has been applied. > its as though CPUs that have gone into the C7 state are still
> casting their target p_state vote into the processor PLL
> decision stuff, whereas I thought those CPU were supposed
> to lose their vote.
I should have said "gone out of the C0 state" rather than "gone into the C7 state", as that is all I can deduce so far, and even that isn't conclusive yet.
I have looked sample by sample over a lot of data now. Manuel's data more clearly shows this issue, than John's data, but John's does have it.
As a sanity test I got some data from my i7-2600K, and it is O.K., meaning that CPU's that are not in the C0 state do loose their ability to influence what the clock frequency the PLL will target (at least as best as I can determine).
I am unable to process John's data from earlier today, as it appears to be from an unpatched kernel instead of a patched kernel.
Sorry about that, Doug. I am recompiling now and will run this test ASAP. I will attempt to limit to c6 and if that fails as I reported, I will try c3. Created attachment 169101 [details]
John i7z example
For the John i7z data the attached details an example where the CPU frequency is being held too high by a CPU that is not in the C0 state.
Notice that CPU 5 has a target pstate of 41, but it is not in C0 and when the intel pstate driver finally does run on CPU 5 it has been 3.290 second since the last time the driver was run. Meanwhile CPU has done several passes through the driver.
Starting with line 1430, observe that CPU 1 exits that pass with a target pstate of 8, and no other CPU has an elevated target pstate other than CPU 5, which think is not in the C0 state.
Now, observe the frequency calculation for CPU 1 for its next pass (line 1431), it is 4.1 GHz. Why? it should be around 800 MHz. Due to the known issue where C0 weighting is needed, a target pstate of 44 is calculated for a very low load situation.
Jump ahead to line 1434 where CPU 1 sorts itself out again, exiting with a target pstate of 8. And 16.68 milliseconds (5 jiffies) later, on line 1436 the frequency is calculated as 4.10 GHz, whereas, and again, it should have been 800 MHz because CPU 5 should not be casting a vote in the PLL decision making.
Created attachment 169111 [details]
manuel data - example of CPU frequency influence
This example is from the manuel data.
start with line 1264, where CPU 7 does a pass through the intel_pstate driver, having not one for 2.63 seconds. It drops its target psate from 40 to 21. Now only CPU 5 has a target pstate of 40, and the next highest target is 21. CPU 5 is likely not in C0, and eventually goes into C0 and does a pass through the driver on line 1267, after 2.79 seconds. Meanwhile CPU 0 has done a couple of passes through the driver, and on line 1268 calculates a frequency of 3.58 GHz. In this case, and mainly because I made a mistake in which screen to take a shot of, we can not deduce anything. It is possible that CPU 5 in C0 at 4 GHz long enough to effect CPU 0.
However, observe CPU 6 in the lines above. It calculates 4 GHZ, while it appears that both CPUs 5 and 7 are not in C0 and should not be influencing the PLL decisions.
Created attachment 169121 [details]
john 300 Hz kernel example
This example is from John's 300 Hertz Kernel capture of several days ago.
In the area of concern CPU has the highest target pstate of 25. When CPU 3 does finally pass through the driver again, it has been 1.34 seconds since it last trip through. Note that there have been 1.25 million clock cycles in that time, so this is not a great example. Why not? because we don't really know if CPU 3 woke up and went into C0 several times in that 1.25 million clock cycles. All it has to do to avoid a pass through the intel_psate driver, is be not in C0 state when the jiffy boundary occurs. Anyway...
Notice on lines 2128, 2129 and 2131 CPU 0 is at 2.5 GHz, as influenced by CPU 3.
Created attachment 169131 [details]
Doug 250 Hz kernel example
From Doug's 300 Hz kernel with two tasks running: One at 7 Hertz work / sleep frequency and a load of 17.5% (25 milliseconds per work period 7 times per second); And one at 3 hertz work / sleep frequency and a load of 7.5% (25 milliseconds per work period 3 times per second).
Notice how the frequency of CPU 1 drops when CPU 3 and other CPUs are not in C0, as expected. Even though some of their target pstates is higher than CPU 1.
Created attachment 169151 [details]
From the John C3 data sample
This one shows CPU 0 frequency being influenced by CPUs 4 and 3, when they appear to not be in C0, and are in an long gap between runs of the driver, 3.3 and 4.0 seconds.
Notice in line 1750 how everything is finally O.K.
Thanks for all the work you are putting into this, Doug and for publishing your observations with data and formulating a hypothesis. Kristen - What do you think about these recent data? Sorry for not having time to follow this much better, but thanks for all the hard work you are putting into solving this problem. Where are we and how do we move forward? . I believe the 300 Hz verses 250Hz Verses 1000 Hz to be a red herring. It just so happens that the 300 Hz kernel has a higher probability of interacting with the xorg stuff in such a way as to increase the manifestation of the issues. . The i7z program itself exacerbates the issues. . Unduly increased CPU frequency is largely due to the known issue where C0 weight needs to be re-introduced into the driver. . On the particular processor in question (i7-4790K) the issue is exacerbated because, apparently, CPUs not in the C0 state do not lose their vote contribution to the PLL decision making. While its magnitude would be greatly reduced, unwarranted higher CPU frequencies from this issue would still occur even after adding the C0 weighting inclusion fix to the driver. (Think of CPU N going out of C0 for a period of seconds (never seen more than 4.00 seconds on any processor) while at a target pstate of 44.) . The only thing within the processor I can think of is the IA-32_ENERGY_PERF_BIAS Register (section 14.3.2 of Vol 3B of the IA-32 SDM. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf (which doesn't mean there are not other possibilities) I don't know how to investigate this further. . Suggested experiment: Purpose: To give more definitive, easier to analyze data. Disable all GUI stuff and non essential processes, and do a more controlled load test using the consume program (will be attached later). . Suggested experiment: Purpose: To determine if the PLL vote issue is somehow due to Arch kernel configuration: John and / or Manuel should compile a kernel but using an ubuntu config file. I can attach my kernel 4.0RC2 config file (basically Ubuntu), if John is willing to compile a 4.0RC2 test kernel with it. I do not still have my 3.19 config file, but I could come up with one. Created attachment 169251 [details]
Program to apply load at specified work / sleep rate
Please use the attached program to do an experiment.
Don't worry about the calibration, as we are not using fixed work packets mode.
disable any xorg or whatever GUI stuff.
disable anything else you can think of that isn't needed.
The goal is to have a system as "idle" as possible.
run:
./consume 17.5 7.0 1000 &
./consume 7.5 3.0 1000 &
and run perf record as normal (requires the patched kernel):
sudo perf record -a --event=power:pstate_sample sleep 600
Send me the resulting perf.data file.
This data should be easier to analyze and more definitive.
Created attachment 169381 [details]
John consume data sample
Even though I screwed up the operating conditions for John's test using the consume program, I was able to extract a clear example that shows the influence of a CO not in the C0 state on CPUs that are.
Notice line 941 where CPU 2 exits the driver with a target pstate of 25.
Now to fit everything in one screen shot, I deleted some lines.
Notice the next pass of CPU 2 through the driver is 4 seconds later at line 960.
Now observe CPU 0, highlighted in yellow. In line 951 the frequency is 2.5 GHz, but CPU 2 should not be influencing it. At worst, there might be some influence from CPU 1, but it has finished its work chunk and gone out of C0 also.
Now observe lines 954 and 955, not highlighted, CPU 0 has finished its work chunk and will have gone out of C0 by then. CPU 1 is still influenced by the CPU 0 target pstate. It shouldn't be.
O.K. At this point, I think I have enough proof about this part.
Created attachment 169441 [details]
Suggested kernel config file
The suggested kernel config file for a 4.0RC2 kernel.
For the experiment to compare Arch kernel config with Ubuntu config with regards to this issue.
From my comment 42 above: . The only thing within the processor I can think of is the IA-32_ENERGY_PERF_BIAS Register (section 14.3.2 of Vol 3B of the IA-32 SDM. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf (which doesn't mean there are not other possibilities) I don't know how to investigate this further. It turns out that turbostat reads the register and lists its contents in verbose mode. I.E. sudo turbostat -v sleep 1 gives (edited): cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced) John and / or manuel: what do you get? % sudo turbostat -v sleep 1 turbostat v3.7 Feb 6, 2014 - Len Brown <lenb@kernel.org> CPUID(0): GenuineIntel 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3) CPUID(6): APERF, DTS, PTM RAPL: 2979 sec. Joule Counter Range, at 88 Watts cpu0: MSR_NHM_PLATFORM_INFO: 0x80838f3012800 8 * 100 = 800 MHz max efficiency 40 * 100 = 4000 MHz TSC frequency cpu0: MSR_IA32_POWER_CTL: 0x0000005d (C1E auto-promotion: DISabled) cpu0: MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008400 (UNdemote-C3, UNdemote-C1, demote-C3, demote-C1, locked: pkg-cstate-limit=0: pc0) cpu0: MSR_NHM_TURBO_RATIO_LIMIT: 0x2c2c2c2c 44 * 100 = 4400 MHz max turbo 4 active cores 44 * 100 = 4400 MHz max turbo 3 active cores 44 * 100 = 4400 MHz max turbo 2 active cores 44 * 100 = 4400 MHz max turbo 1 active cores cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 0.000977 sec.) cpu0: MSR_PKG_POWER_INFO: 0x000002c0 (88 W TDP, RAPL 0 - 0 W, 0.000000 sec.) cpu0: MSR_PKG_POWER_LIMIT: 0x42ffff001dffff (UNlocked) cpu0: PKG Limit #1: ENabled (4095.875000 Watts, 16.000000 sec, clamp ENabled) cpu0: PKG Limit #2: ENabled (4095.875000 Watts, 0.002441* sec, clamp DISabled) cpu0: MSR_PP0_POLICY: 0 cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: Cores Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_PP1_POLICY: 0 cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: GFX Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88430800 (33 C) cpu0: MSR_IA32_THERM_STATUS: 0x884a0800 (26 C +/- 1) cpu1: MSR_IA32_THERM_STATUS: 0x88490800 (27 C +/- 1) cpu2: MSR_IA32_THERM_STATUS: 0x88480800 (28 C +/- 1) cpu3: MSR_IA32_THERM_STATUS: 0x884a0800 (26 C +/- 1) Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt - - 6 0.14 4020 4000 0 1.03 0.14 0.08 98.61 27 31 0.00 0.00 0.00 0.00 5.05 0.23 0.03 0 0 16 0.39 4043 4000 0 1.11 0.00 0.00 98.50 27 31 0.00 0.00 0.00 0.00 5.05 0.23 0.03 0 4 4 0.10 3862 4000 0 1.40 1 1 10 0.25 4027 4000 0 1.62 0.00 0.00 98.13 27 1 5 2 0.06 4354 4000 0 1.81 2 2 6 0.17 3892 4000 0 0.16 0.43 0.00 99.25 26 2 6 2 0.04 4345 4000 0 0.29 3 3 5 0.12 3968 4000 0 0.86 0.12 0.32 98.57 24 3 7 1 0.02 4078 4000 0 0.96 1.000814 sec So, I guess John doesn't have the register. Not all processors have it. $ sudo turbostat -v sleep 1 [sudo] password for manuel: turbostat v3.7 Feb 6, 2014 - Len Brown <lenb@kernel.org> CPUID(0): GenuineIntel 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3) CPUID(6): APERF, DTS, PTM, EPB RAPL: 2979 sec. Joule Counter Range, at 88 Watts cpu0: MSR_NHM_PLATFORM_INFO: 0x80838f3012800 8 * 100 = 800 MHz max efficiency 40 * 100 = 4000 MHz TSC frequency cpu0: MSR_IA32_POWER_CTL: 0x0104005d (C1E auto-promotion: DISabled) cpu0: MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e000400 (UNdemote-C3, UNdemote-C1, demote-C3, demote-C1, UNlocked: pkg-cstate-limit=0: pc0) cpu0: MSR_NHM_TURBO_RATIO_LIMIT: 0x2a2b2c2c 42 * 100 = 4200 MHz max turbo 4 active cores 43 * 100 = 4300 MHz max turbo 3 active cores 44 * 100 = 4400 MHz max turbo 2 active cores 44 * 100 = 4400 MHz max turbo 1 active cores cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced) cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 0.000977 sec.) cpu0: MSR_PKG_POWER_INFO: 0x000002c0 (88 W TDP, RAPL 0 - 0 W, 0.000000 sec.) cpu0: MSR_PKG_POWER_LIMIT: 0x428370001a82c0 (UNlocked) cpu0: PKG Limit #1: ENabled (88.000000 Watts, 8.000000 sec, clamp DISabled) cpu0: PKG Limit #2: ENabled (110.000000 Watts, 0.002441* sec, clamp DISabled) cpu0: MSR_PP0_POLICY: 0 cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: Cores Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_PP1_POLICY: 0 cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: GFX Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88440800 (32 C) cpu0: MSR_IA32_THERM_STATUS: 0x88490000 (27 C +/- 1) cpu1: MSR_IA32_THERM_STATUS: 0x884b0000 (25 C +/- 1) cpu2: MSR_IA32_THERM_STATUS: 0x88480000 (28 C +/- 1) cpu3: MSR_IA32_THERM_STATUS: 0x88470000 (29 C +/- 1) Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt - - 38 0.94 4095 3991 0 8.49 0.24 0.23 90.10 28 31 0.00 0.00 0.00 0.00 8.24 1.38 0.00 0 0 34 0.85 3974 3990 0 5.12 0.11 0.28 93.64 26 31 0.00 0.00 0.00 0.00 8.24 1.38 0.00 0 4 2 0.05 3996 3990 0 5.92 1 1 98 2.31 4268 3990 0 2.65 0.04 0.07 94.93 27 1 5 1 0.03 4002 3991 0 4.93 2 2 54 1.33 4050 3991 0 1.46 0.51 0.08 96.62 28 2 6 1 0.01 4019 3991 0 2.79 3 3 116 2.89 4018 3991 0 21.08 0.29 0.51 75.23 28 3 7 1 0.03 3946 3991 0 23.94 1.000750 sec Both John and Manuel have the exact same processor and are using the exact same version of turbostat, yet only one has the readout for the MSR_IA32_ENERGY_PERF_BIAS register. Weird. The only other thing I notice is this: "pkg-cstate-limit=0: pc0" and the related 0% entries in the package C levels. whereas on my system it is "pkg-cstate-limit=3: pc6" and the related 97.8% entry in the pc6 column. I don't know if this is significant or not. Created attachment 169871 [details]
Demonstrates failure to raise target pstate under heavy load. 1 of 2.
While not directly related to this bug report, this attachment and the next one demonstrate what is called "the excessive duration effect". The load on CPU 5 is just under 90%, yet the target pstate is not raised because it just so happens that the CPU is generally not in the C0 state on the jiffy boundary and so the interval (duration) between passes through the intel_pstate driver is very long. The result is that duration code is triggered incorrectly forcing the target pstate downwards.
This demonstration was rigged for dramatic effect. Under real live conditions, and very dependent on the use case case, the excessive duration effect occurs anywhere between a few to a few thousand times per hour.
Created attachment 169881 [details] Demonstrates failure to raise target pstate under heavy load. 2 of 2. Attachment 2 [details] of 2. See previous comment. (In reply to Doug Smythies from comment #42) > Where are we and how do we move forward? > . Suggested experiment: Purpose: To determine if the PLL vote issue is > somehow due to Arch kernel configuration: John and / or Manuel should > compile a kernel but using an ubuntu config file. I can attach my kernel > 4.0RC2 config file (basically Ubuntu), if John is willing to compile a > 4.0RC2 test kernel with it. I do not still have my 3.19 config file, but I > could come up with one. John created a kernel using an Ubuntu config file. And supplied the perf record data. The results were the same. Therefore the conclusion is that this is a not a kernel configuration issue. I have not attached a screen shot of my spreadsheet, but I could if needed. I am moving on to reviving the test intel_pstate driver code from June / July. It will take awhile. By the way, on attachment 169881 [details] (2 of 2 above) observe lines 8377 through 8380 where the load is 88.6 percent but the target pstate remains locked at 26, even though scaled busy is 99. The target pstate really should increase under the load. Note that on those lines the CPU is in the C0 state on jiffy boundaries. This is another known issue with the intel_pstate driver, the incredibly finicky tradeoff between integer math and gain factors and underdamped and overdamped servo system response. The test codes we were using in June /July dealt with this scenario, but some work does remain to be done. With what we have discovered herein, where some processors never loose their vote into the PLL decision as to what CPU frequency to generate, and this known potential lock up scenario, this might be a contributing factor with bug 90421 O.K. it is taking me longer than originally expected to revive my code from June / July. However I am re-developing the algorithms and fixing some things that we wanted to change, but were too busy at the time, as I go.
Meanwhile, the very important issue uncovered herein is the issue where it seems some processors never lose their vote into the PLL decision as to what CPU frequency to generate. I do not what to do about that part of the issue. However, it does reveal an issue that existed anyhow, it was just more rarely manifested with processors that behave properly in giving up their vote. This code:
/*
* core_busy is the ratio of actual performance to max
* max_pstate is the max non turbo pstate available
* current_pstate was the pstate that was requested during
* the last sample period.
*
* We normalize core_busy, which was our actual percent
* performance to what we requested during the last sample
* period. The result will be a percentage of busy at a
* specified pstate.
*/
core_busy = cpu->sample.core_pct_busy;
max_pstate = int_tofp(cpu->pstate.max_pstate);
current_pstate = int_tofp(cpu->pstate.current_pstate);
core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
is incorrect.
Why? Because core_busy is being calculated based on what was asked for and not what was actually done. With processors that properly lose their vote, the code mostly works because mostly one gets what they asked for. With processors that do not lose their vote, there is a dramatic increase in the number of occurrences of the actual operating pstate not being the requested current_pstate for this cpu. Thus ridiculous core_busy numbers are calculated unduly driving up the target pstate. For example see attachment 169101 [details]
I am saying that this:
core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
should be something like this (not actually real code):
core_busy = mul_fp(core_busy, div_fp(max_pstate * 100MHz , measured_frequency));
I have submitted my patch set for review. I have also asked John and Manuel to try the patch set. I have feedback from one GUI type (Ubuntu, 250 Hz kernel) user with an i5-4690K processor. On an otherwise idle system, the average CPU frequency goes from 2.6 GHz without my patch set to 1.1 GHz with it. There is also a package energy savings of about 0.52 watts. Created attachment 174141 [details]
Compare some response curves - fixed load method
As we start to get some feedback from John and Manuel, there seems to be some use cases where the default response curve of my patch set might be holding the target pstate down a little too hard. For example, Manuel seems to have a situation where his game (Dying Light) seems to think his computer does not have enough CPU power, so it drops the FPS (Frames Per second) rate. Meanwhile, the patched intel_pstate driver thinks there isn't enough demand on the CPUs to warrant raising the target pstate. The game seems to switch CPU's in an odd manor, perhaps similar to the Phoronix ffmpeg test.
The attached graph shows a few CPU Verses load (fixed load method) response curves. The patched response has been pushed way way over such that the Phoronix ffmpeg test average time is the same as for the unpatched kernel and the acpi_cpufreq driver.
Created attachment 174151 [details]
Compare some response curves - fixed work packet method
The attached graph compares CPU frequency Verses load response curves for the fixed amount of work method (more like a real life scenario). The load is roughly normalized to the max non-turbo clock frequency. While it would be annoying, we may have to consider moving the default response curve some, so as to cover some odd use cases.
Created attachment 174211 [details]
CPU frequencies during game - with patch set - default settings
Graph 1 of 2.
From Manuel: CPU frequencies during game (Dying Light) play with the patch set at default settings.
Note: this game is a real challenge, because it changes CPU's at a high rate.
Created attachment 174221 [details]
CPU frequencies during game - with patch set - adjusted settings
Graph 2 of 2
From Manuel: CPU frequencies during game play with the default setting adjusted as per the shifted response graphs. They are way higher and Manuel reports the FPS (Frames Per Second) is back to normal.
However, the "idle" frequencies are up some.
I have the same problem. My c7 runs constantly between 70-95%. I tried adding "intel_idle.max_cstate=6" to my GRUB_CMD_LINE. It runs a little cooler now. Just a little though. Did you come to a solution? I read all the posts, but it is not really clear to me. If can help somehow I'd be glad to do so. I need this laptop to study and this heat and noise are driving me nuts. I'm an experienced Linux user. But my coding skills cover just basic C-programming. Status: While "idle" (which isn't really idle on a desktop) frequencies are up a little with the patch set and with the shifted curve, after a number of tests, there has been no measurable increase in processor power consumption at "idle". This is a good thing. In general, for "idle" and for some processors, there is about a 1/2 a watt package power saved verses the un-patched version of the intel-pstate driver. For other processors, there seems to be no difference in package power, at "idle". Just yesterday, I even managed to obtain some test results from a very low power device: Intel(R) Pentium(R) CPU N3530 @ 2.16GHz With such a low IIR filter gain of 5%, the rise time response time is considerably slower than with 10%. While I convinced there must be an operational scenario where this would be detrimental, I have yet to find it. Of course, Intel (Kristen) has to do a bunch of testing of various workflows on various platforms. My patch set is now out of sync with Kernel 4.1RC1, and I am just attempting to re-base it now (I am not very git savvy). Jan: Please provide your exact processor model number and the linux distribution you use. Are you comfortable compiling the kernel? I am looking for test results from anyone willing to try, and I have made kernels for one person (Ubuntu kernel configuration, but kernel.org source + my patch set). John and Manuel are able to compile their own kernels. Jan: your comments about C7 time and "intel_idle.max_cstate=6" don't really make sense to me. Regardless we have moved past that idea. Jan: as a temporary work around, you might want to try going back to the acpi=cpufreq driver. Just add "intel_pstate=disable" to your GRUB_CMDLINE_LINUX_DEFAULT line. I have a Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz. Possible that they don't make sense. I'm not that of a pro yet. But working on it. What I meant is that I used the i7z tool to monitor my CPU. I don't know what the c1-c7 mean. I guessed it were cores. But I also don't think that I've got 7 of them. Anyway when I run i7z the c7 column is around 80% the others are under 10-20%. I'm running Antergos Linux, which is based on Arch. This is the first time that I use this system due to the ease of reinstalling. I was experiencing the EXACT SAME PROBLEMS under Arch. Since the kernel is the same it would be possible to try out your compilation. I don't assume that it could break my laptop, could it? I have never compiled a kernel before. What do I need to do? I tried disabling intel_pstate, but it didn't change anything, like all the other solution on the web. Plus disabling c7 made my laptop a little cooler, so I'm going to keep it until there is a more sophisticated solution. @Jan: Thanks for reporting back. Your problem is something different than this particular bug report is covering. Please also note that i7z is garbage, and its prints CPU frequencies that are not possible. Various C states are different levels of CPU idle. C7 is the deepest, lowest power (i.e. zero power) state, and the more time spent there the better. Doug - Can you update us on the status of this work? Median idle freq under Arch in Xorg (300 Hz tick rate): 1,237 MHz Median idle freq under Ubuntu 15.04 live CD: 886 MHz Median idle freq under Arch in Xorg (1000 Hz tick rate): 956 MHz Median idle freq under Arch without Xorg running (1000 Hz tick rate): 831 MHz (In reply to da_audiophile from comment #66) > Doug - Can you update us on the status of this work? > > Median idle freq under Arch in Xorg (300 Hz tick rate): 1,237 MHz > Median idle freq under Ubuntu 15.04 live CD: 886 MHz > Median idle freq under Arch in Xorg (1000 Hz tick rate): 956 MHz > Median idle freq under Arch without Xorg running (1000 Hz tick rate): 831 MHz Hi John, Everything is on hold and will not make it into kernel 4.2RC1. The maintainer is working on a method for calculating C0 time. While the maintainer is in favor of eliminating the pid controller, what she is not sure about yet is whether she likes my new algorithm. Your median idle frequencies all look O.K. to me. So.. can the maintainer update us on this? Created attachment 183041 [details] Phoronix ffmpeg test comare Hi Manuel, Just this morning, I was going to ask if you would be willing to try the most recent version of my patch set? It is the same as before, with one small, yet significant, change to deal with comment 55 above. With the change I was able to both back off the response curve somewhat and increase the IIR filter gain, and still maintain equal or better performance compared to the acpi-cpufreq driver on the Phoronix ffmpeg test, regardless of the number of CPUs allocated. (I'll attach a graph) Recall that the ffmpeg test represents a particularly annoying challenge to frequency scaling drivers. The patch set is based on the Kernel 4.2 release candidate series. What I want you to try is your Dying Light game with the patch set with the operating parameters set to: iir_gain 10%; c0_floor 20%; c0_ceiling 65%. Furthermore, I would like you to try +/- N% to both the floor and the ceiling, keeping 45% difference between them, to find the point where you Dying Light game starts to reduce the frame rate. I.E. I want to find how much margin, if any, we would have with the proposed parameter settings. I know I am asking you for a lot of work, but I do not know of a way to simulate the Dying Light game situation. If you are agreeable, we will go to e-mails to proceed and then report findings back here. @ Manuel: never mind. I had some disappointing test results today, and will have to re-think my attempt to deal with comment 55. @Manuel: I would still like to determine some things with your Dying Light game: First, it it works O.K. with iir_gain 10%, c0_floor 15%, c0_ceiling 50%; And if that works, then second, at what point does the frames per second begin to drop (modifying c0_ceiling upwards from there). The Kernel 4.3RC series patch set can be found at: double u double u double dot smythies dot com /~doug/linux/intel_pstate/build22/ Example for setting the parameters: doug@s15:~/temp$ sudo su root@s15:/home/doug/temp# echo "150" > /sys/kernel/debug/pstate_snb/c0_floor root@s15:/home/doug/temp# echo "500" > /sys/kernel/debug/pstate_snb/c0_ceiling root@s15:/home/doug/temp# echo "10" > /sys/kernel/debug/pstate_snb/iir_gain_pct root@s15:/home/doug/temp# cat /sys/kernel/debug/pstate_snb/c0_floor 150 root@s15:/home/doug/temp# cat /sys/kernel/debug/pstate_snb/c0_ceiling 500 root@s15:/home/doug/temp# cat /sys/kernel/debug/pstate_snb/iir_gain_pct 10 root@s15:/home/doug/temp# exit exit doug@s15:~/temp$ (In reply to manuel.bua from comment #68) > So.. can the maintainer update us on this? Manuel: The maintainer asked me to re-base my patch set to the kernel 4.2RC series. She is working on an alternate method for calculating C0 time or utilization. I merely brought back the method previously used, but it might not work for all architectures. I am not aware of a timeline for this work. Meanwhile I was hoping you could do those Dying Light tests, as I hope to be able to adjust the proposed default operating parameters a little. I did go back and reviewed some previous Dying Light trace data. It rotates through CPUs at various loads very quickly. (In reply to Doug Smythies from comment #72) > Meanwhile I was hoping you could do those Dying Light tests, as I hope to be > able to adjust the proposed default operating parameters a little. Just F.Y.I. In the absence of any additional information from Manuel with his Dying Light game, the next time I have to re-base or update the patch set for the maintainer, in additional to fixing a couple of typos, I will be setting the following recommended default parameters: c0_floor: 15% c0_ceiling: 58% iir_gain: 10% Recall our findings where it appeared as though sometimes when the CPUs were in C states above, I think it is, 1, their target pstate vote into the PLL was not being dropped as it should be. There is another case of exactly the same thing. This time the processor is a Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz. There is an added twist in that something else is defining the minimum pstate, meaning that if all 4 CPUs set a target pstate of 5, the actual CPU frequency will be higher. 800 MHz in the particular trace example that I have, but it isn't always the same. Created attachment 187781 [details]
An example of the target pstates going to maximum with virtually no load
The extract is from a "perf record" session for this computer where, for whatever reason, the CPU frequency never goes down to the minimum for the processor.
It demonstrates how the current control algorithm has troubles when the CPU frequency is not what it thinks it should be.
The main issues are highlighted in red.
The issue starts on line 1354, where CPU 3 has a short duration pass through the driver, even though there is very little load (0.71%). Now, because the CPU frequency was not 500 MHz, the math goes mental and it ends up asking for a target pstate of 18. More typically, low load situations result in long durations and so the long duration adjustment to scaled_busy kicks in, but not this time.
Thus begins a saga where, with virtually no load, the system drives target pstates up for no reason.
Eventually the system does settle down, due to the long duration thing kicking in over all CPUs.
Created attachment 187791 [details]
Why does this system not go to 500MHz?
An extract detailing a quiet period where all target pstates are 5, and the system should settle to a CPU frequency in the 500MHz range. However it doesn't, it settles to the 800Mhz range. Why?
Note the blue highlighted cells. In that case the higher CPU frequency is O.K. because the previous pass of CPU 1 was so long ago (745 milliseconds), that the sample has some content from when frequencies were supposed to be higher.
For the cells highlighted in yellow, the CPU frequency should be in the 500 MHz range, but are not.
Created attachment 187801 [details]
An example where CPU 0 does not give up it vote into the PLL
In this extract from the trace data, it appears as though CPU 0 does not give up its influence on the PLL when it is in a high C state. The duration is 985 milliseconds and there are very few clocks cycles on CPU 0 during that so very long time (134470), so it is hard to imagine that it did go above C2 in that time.
The suggestion is that all the light red highlighted cells should have had a much lower CPU frequency.
The cells that are not highlighted, but within the duration window are indeterminate due to partial sample times outside of the duration window.
@Doug Smythies, I believe I am experiencing the same issue describe in this bug report. My CPU on idle constantly sits at max frequency around 3.9Ghz. "sudo cpupower frequency-info" reports that governor available are powersave and performance. frequency available are 800mhz-3.4ghz. current governor powersave. "sudo i7z" shows 4 cores are entering C1,C2,C3 state about 1% of the time and C7 around 98/99% of the time. Almost never in C6. I have tired "intel_pstate=disable" in /etc/default/grub and "sudo update-grub". Successfully goes back to acpi_cpufreq driver. CPU sits on idle at 800Mhz and VCore around 0.6968v. But acpi_cpufreq driver has no Turbo, and max frequency on 100% load is only 3.4Ghz. I have tried stress test the cpu with mprime from (Arch AUR). With test type=2. On 100% load, on stock heatsink, with intel_pstate cpu reaches 100 degree C in around 10 min mark. Thermal throttle kicks in and reduces the cpu frequency to as low as 2.2ghz on powersave governor and 2.9 on performance. But keeps the VCore voltage unnecessarily high around 1.1210v. intel_pstate almost never reduces VCore even on idle according to i7z. But keeps the idle frequency at highest! Observed with CoreTemp, win7 with intel chipset driver can manage Haswell much better. Run the same prime95 test type=2, on 100% load with turbo 3.9Ghz. In 10 min reaches 100 deg C. Thermal Throttle pulls back the VCore and Frequency at the same time, lowering the temp more quickly. Also keeps the 4 Core at constant 3.5Ghz at 100 deg C from that point onward. Manjaro being Arch like has tickrate 300 HZ. Above you already have established that setting 250 Hz or 1000 Hz tickrates will only mask the underlying problem and not solve it. Are there any test or experiment I can run that might help you to farther deduce the cause of this issue? My H/W, S/W config as follows: Hardware: Intel Core i7 4770 (Haswell) Revision: C0 Base Frequency: 3.4Ghz Turbo: 3.9Ghz 4 Cores / 8 Thread (aka Hyper-threading) Kernel: Linux laptop1-manjaro 4.1.6-3-MANJARO #1 SMP PREEMPT Sat Sep 5 10:57:06 UTC 2015 x86_64 GNU/Linux Distribution: LSB Version: n/a Distributor ID: ManjaroLinux Description: Manjaro Linux Release: 15.09-rc2 Codename: Bellatrix Desktop Environment: default Manjaro xorg, openbox, xfce4 environment. (In reply to Anon Y. from comment #78) > My CPU on idle constantly sits at max frequency around 3.9Ghz. What is your definition of "idle"? I ask because "idle" is not really idle on a desktop with a GUI and such. On a server, "idle" can be much more idle. Anyway, it would be good to understand why your CPU frequency is so high, as from your C state data listed below it seems it should be lower. I can only think of acquiring some trace data to acquire better detail as to what is going on. I think your kernel is recent enough. > I have tried "intel_pstate=disable" in /etc/default/grub and "sudo > update-grub". Successfully goes back to acpi_cpufreq driver. CPU sits on > idle at 800Mhz and VCore around 0.6968v. But acpi_cpufreq driver has no > Turbo, and max frequency on 100% load is only 3.4Ghz. ? acpi-cpufreq supports turbo. Under 100% load do "grep MHz /proc/cpuinfo" if it listes 3401, then it is in turbo mode. Note that the acpi-cpufreq scaling driver lists what CPU frequency was asked for and not what the CPU frequency actually is. @Anon Y. The trace data you sent me is very consistent with what John was getting at the beginning of this bug report. I can not say much more because I had to use older manual methods to post process the data, as I was wrong and your kernel does not have the added trace information needed by our post processing tools. The older manual methods are simply too time consuming, so I have only post processed a couple of CPUs data. I can not say conclusively if your system is exhibiting this failure to give up PLL votes issue or not, but it seems likely. Keep in mind that CPU frequency alone does not tell the whole story. One has to consider overall C states and such for an overall view of things. Created attachment 188101 [details]
CPU Frequencies for Anon Y
Just adding an overview CPU frequencies graph from Anon Y's second trace data set. The actual CPU loads are almost always very very low. In this case, it is less clear from the spreadsheet data that the CPU's should be giving up their votes into the PLL. However, the cores are spending about 95% of their time in the C7 state. Also, there does seem to be some high frequency stuff going on, so maybe there is enough activity that the PLL doesn't have time to drop down in frequency between (just speculating).
Created attachment 188141 [details]
CPU loads for Anon Y
Just adding the corresponding CPU loads graph for the 2nd trace data from Anon Y.
Originally I never posted it here, but for the Anon Y case, and based on not a big sample space, the power cost of this stuff was about 1/2 a watt. Readers: Please try kernel 4.6-rc1 (or more recent, once available), and report back. (use intel_pstate CPU frequency scaling driver and the "powersave" frequency governor.) @Doug - Using my i7-4790K for testing, I compared idle frequencies under 4.6-rc3 and found that it is much higher median frequency and idle temps than under 4.5.0. Script I used to generate the statistics: https://gist.github.com/graysky2/7d0447e04dcf2ac638cb % stats.sh ~/4.5.0.txt median : 1004.181763 mean : 1324.87 min : 787.353760 max : 4403.496582 count : 327 % stats.sh ~/4.6rc3.txt median : 4153.16 mean : 3969.87 min : 968.414856 max : 4682.961914 count : 452 @John (da_audiophile@yahoo.com): Thanks very much (oh, and nice to hear from after so long). Your results are what was expected. For reference see bug 115771. If you are willing (please be willing), please try the patch in the bug report (version 5 from comment_ 106). Alternatively, you could wait for kernel 4.6-rc4, as I think that patch will be included (not sure yet). However, it would be much much better if you could try now, as in addition to just a test, I am wondering if the current threshold will be good enough for the sufferers of this threads issue. ... Doug OK. I patched rc3 per your suggestion. The results here are in line with those acquired under 4.5.0 however, as I was watching i7z collect the data, I noticed that the chip stayed in C1 state for much longer than under 4.5.0 just sitting idle under X. Some times, a single core would be in C1 for a high percentage, others times, all 4 would be for some small percentage. % stats.sh cpu_freq_log.txt median : 1081.34 mean : 1214.47 min : 775.959229 max : 4355.350098 count : 296 @John: Thanks very much. Your findings are consistent with bug 115771. If it were just up to me, I think I might have the threshold a little higher, but it seems to be good enough. @John: if you are willing: On the kernel you used in comment 87, please run "turbostat --debug". Let it run for about 10 intervals. Attach the output back here. Note 1: Please use a very recent turbostat, so that IRQ information will be acquired. I am using "turbostat version 4.11 27 Feb 2016". Note 2: Please do not run i7z while acquiring the test data. Otherwise have your system the same as for the comment 87 test. Created attachment 212541 [details]
turbostat under 4.5.0
Created attachment 212551 [details]
turbostat under 4.6rc3
@Doug - Attached in #90 and #91. What do you make of these data? Created attachment 212561 [details]
histogram comparing percent time in c7 sleep while idle
If I plot the distribution of %C7 state (attachment 212561 [details]) I can see the same effect as I reported from the i7z dataset: the 4.6rc3 kernel spends less idle time in C7 state than the 4.5.0 kernel does.
Created attachment 212571 [details]
mean values based on turbostat
You'll have to give me a direction with regard to understanding the output of these. Attachment 212571 [details] just provides the mean values for each parameter for each kernel. Again, 4.6rc3 spends less time in C7 and more in C1 (seemingly less power efficient). If I understand the PkgWatt column, 4.6rc3 is consuming more power than 4.5.0 so it's all consistent.
@John: Thanks very much for the turbostat data you provided. It is more or less what was expected. I think, and also mentioned in comment 42, that running i7z effects the system we are attempting to measure. The power difference is a concern. The patch that I mentioned in comment 86 would be in kernel 4.6-rc4 isn't. Maybe it will be in kernel 4.6-rc5. @John: I wonder if we could look at energy again, over a longer sample time. I am thinking something like: sudo turbostat -J --debug sleep 2000 with the system left alone for the 2000 seconds (33 minutes and 20 seconds). Recall, in general sufferers of the issues on this thread, also seemed to consume, on average, about an extra 1/2 watt of power. It is not clear, to me at least, what is going on with energy from John's short samples. Created attachment 213531 [details]
turbustat under 4.5.2
Created attachment 213541 [details]
turbostat under 4.6rc4 using Doug's patch
@Doug - Attached per your request. (In reply to da_audiophile from comment #100) > Created attachment 213541 [details] > turbostat under 4.6rc4 using Doug's patch You mean Rafael's patch version 5, right? (from bug 115771) Anyway, 2% energy increase, or 31 milliwatts, between 4.5.2 and 4.6rc4+patch at idle. O.K. @Doug - It's the same patch you mentioned from comment #86 that modifies drivers/cpufreq/intel_pstate.c adding 4 lines. Also, both kernels on my system are using the desktop 1000 HZ tickrate if that matters. (In reply to da_audiophile from comment #103) > @Doug - It's the same patch you mentioned from comment #86 that modifies > drivers/cpufreq/intel_pstate.c adding 4 lines. Also, both kernels on my > system are using the desktop 1000 HZ tickrate if that matters. Yes, that is the correct patch. 1000Hz is fine. Originally we found that 300Hz on an Arch system seemed to exemplify the issue best, but I wouldn't re-do the work. (In reply to Doug Smythies from comment #84) > Readers: Please try kernel 4.6-rc1 (or more recent, once available), and > report back. (use intel_pstate CPU frequency scaling driver and the > "powersave" frequency governor.) Hi Doug, for many reasons, i need to stay on kernel Linux tm 4.1.17. Do you think the related changes from 4.6 could be easily backported there ? Thanks (In reply to The Troll from comment #105) > (In reply to Doug Smythies from comment #84) > > Readers: Please try kernel 4.6-rc1 (or more recent, once available), and > > report back. (use intel_pstate CPU frequency scaling driver and the > > "powersave" frequency governor.) > > Hi Doug, > > for many reasons, i need to stay on kernel Linux tm 4.1.17. > Do you think the related changes from 4.6 could be easily backported there ? > > Thanks That was a trick question. The expectation was that the results would be bad, and was later confirmed by John's tests. Anyway, Rafael's patch, that at least partially addresses the issue, will (for sure) be in kernel 4.6-rc5. As to how and when that might be backported to a series 4.1 kernel, I don't know. It is a trivial patch, you could add it yourself. However, note that it was the change from timers to utilization in kernel 4.6 that significantly exacerbated the original issue of this thread, making the root issue much more obvious. You may or may not notice significant changes on a 4.1 series kernel. Hi, thanks for getting back so quickly! The patch from http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ffb810563c0c049872a504978e06c8892104fb6c uses cpu->sample.tsc but tsc is not a member of sample in 4.1... Is there a way to workaround that ? Or do we need a larger backport ? thx (In reply to The Troll from comment #107) > Hi, > > thanks for getting back so quickly! > The patch from > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=ffb810563c0c049872a504978e06c8892104fb6c > > uses cpu->sample.tsc but tsc is not a member of sample in 4.1... > > Is there a way to workaround that ? Or do we need a larger backport ? > > thx Oh! I see the tsc sample stuff was only added as of kernel 4.2-rc1 (see commit 4055fad34086dcf5229c43846e0a3cf0fb3692e3). I am not an expert on backporting and such, and don't know what to recommend. Hi, for now, I switched back to acpi_cpufreq. At least it throttles the CPU clock :) Though: - it doesnt go over 4Ghz (instead of 4.4) - it has no real impact on CPU temp I have Arch install on my Intel Haswell i7 4790 non K edition. Under Arch its always been near turbo and higher heat as compared to Ubuntu flavors LTS and non LTS and this from day Arch introduced intel_pstate in its kernel. Initially after testing done by Colin King of Canonical, they found out higher power consumption and kept it off till Ubuntu kernel 3.19. Thankfully the temps and frequency scaled for Haswell. Currently an interesting observation regarding this. I have recently installed Ubuntu 16.04 with kernel 4.4 and now the frequencies of my Haswell scale down way less than compared to the 4.2 kernel where it would touch close to 800MHz for some cores at idle in Ubuntu 14.04.4 LTS but temps and voltage are way under control compared to my Arch installation that runs 3-4 degrees higher and steady 1.13v Vcore at idle compared to Ubuntu's .88Volts. I have the same problem on my "Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz" (which turbos up to 3.5GHz almost constantly). It's a gaming laptop. I fear this is prematurely wearing it out. The thought of Windows doing this better is frightening. I have been using Ubuntu full time for years now. The following commands seemed to cause my CPU to throttle back a lot more often: $ sudo -Hi # echo 95 > /sys/devices/system/cpu/intel_pstate/max_perf_pct # echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct # exit $ It still stays throttled up too much, but I've seen it throttle all the way down to 800MHz, and it throttles down quite a bit more often. I had never seen that before. I know that this shouldn't make any difference. It's hard to tell whether this is my imagination or it really did something. Immediately after the write, it seems to really throttle back properly for a while. Worth a try. Perhaps something touched by the code path that handles those writes provokes it to throttle properly again. With luck this might lead to a fix. It has been months since any news on this bug. :( @Doug Gale: From your description, I am having a bit of trouble understanding your exact scenario. It sounds as though you have significant load all the time on your computer and you have thermal throttling issues, which would be beyond this bug reports scope. I'd be happy to post process and analyze a trace from your computer, if you want (see comment #31 and comment #43 , and do it on an otherwise idle system). There root issue of this bug report would not lead to premature wear out of your computer, because the bottom line is not much extra overall power consumption (the extra active power is offset by extra sleep time). As for progress on this issue: There was a RFC/RCT (Request For Change / Request For Test) for a patch set that would solve the root issue of this bug report (the clock modulation issue also), as part of the kernel 4.8rc series. I think there are 3 of us working on it, and a great deal of effort has been expended on it. However, there are issues, and any particular timeline remains unclear. For anyone following this: I never specifically stated, but hope it is now clear to all: When I mentioned things like "its as though CPUs that have gone into the C7 state are still casting their target p_state vote into the processor PLL decision stuff, whereas I thought those CPU were supposed to lose their vote" It turns out what was really happening is that there were several extremely short interrupts on those CPUs, such that their vote was still correctly included in the PLL decisions. @Doug Smythies. I'm pretty sure you are correct. I did some further digging, using `sudo perf top -g` and found that an excessive amount of CPU time was being spent reading the HPET coming from XOrg on an otherwise completely idle system sitting at the desktop (I forget the exact kernel symbol, but it was obviously HPET access). Rebooting resolved it. I haven't seen my CPU sitting at 3.5GHz nonstop at all since yesterday. It hovers around 2.5 to 2.8GHz when idle. When operating normally, the top function in perf top is about 3% in _raw_spin_lock_irqsave, which seems reasonable. I am using the non-free nvidia driver from ubuntu repository, nvidia-352-updates-dev. Sorry for posting misleading information. Hi, Doug (In reply to Doug Gale from comment #113) > @Doug Smythies. I'm pretty sure you are correct. > > I did some further digging, using `sudo perf top -g` and found that an > excessive amount of CPU time was being spent reading the HPET coming from > XOrg on an otherwise completely idle system sitting at the desktop (I forget > the exact kernel symbol, but it was obviously HPET access). > I have not read all the thread. so the real problem is that bugus HPET reading from Xorg results in high cpu frequency, right? As we have a couple of different bug reporters in this thread and I'm not sure if they are exactly the same problem. da_audiophile@yahoo.com please confirm if the problem still exists in the latest upstream kernel. Doug, please make sure you're saying the same problem with da_audiophile@yahoo.com, if no, what you should do is to open a new bug report. (In reply to Zhang Rui from comment #114) > Doug, please make sure you're saying the same problem with > da_audiophile@yahoo.com, if no, what you should do is to open a new bug > report. I thought it was the same issue but later figured out that XOrg was at fault and the system was (correctly) keeping the CPU throttled up. Sorry for adding noise. (In reply to Zhang Rui from comment #114) > da_audiophile@yahoo.com > please confirm if the problem still exists in the latest upstream kernel. > @Zhang: The problem still exists in the latest kernel. However, and since my computer doesn't really suffer from the problem (I can sort of create it), I do not have good numbers for how bad the problem is. It never really was very expensive in terms of extra power consumption, at least from the data I got from others over the couple of years of this saga. See also bug 115771, where the issue is partially corrected by the patch from that bug report. For the original report from da_audiophile@yahoo.com, I found a related commit commit ffb810563c0c049872a504978e06c8892104fb6c intel_pstate: Avoid getting stuck in high P-states when idle Which is described here: https://bugzilla.kernel.org/show_bug.cgi?id=115771 I wonder if this commit has fixed the problem, and are we talking about other issues? (In reply to Doug Smythies from comment #116) > (In reply to Zhang Rui from comment #114) > > > da_audiophile@yahoo.com > > please confirm if the problem still exists in the latest upstream kernel. > > > > @Zhang: The problem still exists in the latest kernel. > However, and since my computer doesn't really suffer from the problem (I can > sort of create it), I do not have good numbers for how bad the problem is. > It never really was very expensive in terms of extra power consumption, at > least from the data I got from others over the couple of years of this saga. > > See also bug 115771, where the issue is partially corrected by the patch > from that bug report. Ah, yes, should be this one. I wonder if we have this issue on the same platform as da_audiophile@yahoo.com reported, if not I think we can open a new bug report on that, because as the thread going longer and longer I sometimes get lost in the clue :(. @da_audiophile@yahoo.com are you still looking at this thread? @Chen - Sorry, it fell off my radar. I will review and post if needed shortly. Yes, the problem still exists for this hardware under the 4.10.13 kernel. I am measuring the CPU frequency using this bash script[1] that reads once per second from /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq and writes it out to a file. It then plots a nice histogram and some stats of the data. I sampled the CPU frequency under 2 conditions: 1) With X running (lxdm and xfce4 sitting at the desktop otherwise idle). 2) Without X running, just sitting at the prompt idle. I still see significantly higher mean idle frequency under X: Logged into X, median frequency is 3,948 MHz Logged into a TTY, median frequency is 2,500 MHz Would others please test in a like fashion (script linked below) and post the results here? Again, my hardware is technically a Haswell Refresh if that matters... as noted in comment #7, doing the same two (under X or from a TTY) experiments on a i3-4130T (Haswell) results in nearly identical median values for CPU frequency (median 813 MHz under X and median 800 under TTY). Full results for the Haswell Refresh CPU first under X logged in with lxdm at idle: # NumSamples = 120; Min = 799.80; Max = 4401.30 # Mean = 3246.355000; Variance = 1653274.818975; SD = 1285.797348; Median 3948.050000 # each ∎ represents a count of 1 799.8000 - 1159.9500 [ 17]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (14.17%) 1159.9500 - 1520.1000 [ 5]: ∎∎∎∎∎ (4.17%) 1520.1000 - 1880.2500 [ 1]: ∎ (0.83%) 1880.2500 - 2240.4000 [ 3]: ∎∎∎ (2.50%) 2240.4000 - 2600.5500 [ 10]: ∎∎∎∎∎∎∎∎∎∎ (8.33%) 2600.5500 - 2960.7000 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ (9.17%) 2960.7000 - 3320.8500 [ 7]: ∎∎∎∎∎∎∎ (5.83%) 3320.8500 - 3681.0000 [ 2]: ∎∎ (1.67%) 3681.0000 - 4041.1500 [ 6]: ∎∎∎∎∎∎ (5.00%) 4041.1500 - 4401.3000 [ 58]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (48.33%) Results logged in to a TTY at idle (no X running): # NumSamples = 120; Min = 799.80; Max = 4405.70 # Mean = 2521.885833; Variance = 1611401.717383; SD = 1269.409988; Median 2500.000000 # each ∎ represents a count of 1 799.8000 - 1160.3900 [ 25]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (20.83%) 1160.3900 - 1520.9800 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ (9.17%) 1520.9800 - 1881.5700 [ 10]: ∎∎∎∎∎∎∎∎∎∎ (8.33%) 1881.5700 - 2242.1600 [ 4]: ∎∎∎∎ (3.33%) 2242.1600 - 2602.7500 [ 22]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (18.33%) 2602.7500 - 2963.3400 [ 8]: ∎∎∎∎∎∎∎∎ (6.67%) 2963.3400 - 3323.9300 [ 5]: ∎∎∎∎∎ (4.17%) 3323.9300 - 3684.5200 [ 7]: ∎∎∎∎∎∎∎ (5.83%) 3684.5200 - 4045.1100 [ 1]: ∎ (0.83%) 4045.1100 - 4405.7000 [ 27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (22.50%) 1. https://github.com/graysky2/bin/blob/master/cpufreq_histogram.sh Hi John, Your data seems a bit inconsistent with the turbostat stuff you posted (turbostat-4.5.2-ref.log and turbostat-4.6rc4.log, and even the two before those). We can never just look at CPU frequencies in isolation, we also need to look at load. The use of the load based code path within the intel_pstate driver will be greatly expanded as of kernel 4.12-rc1 (in a few weeks) so this whole saga might become academic. Anyway, for my main test computer (i7-2600K, min pstate = 16, max pstate = 38, kernel 4.11-rc7) the data using your method is: $ ./cpufreq_histogram.sh 300 Collecting data for 300 seconds... # NumSamples = 300; Min = 1500.00; Max = 3900.00 # Mean = 1599.786333; Variance = 0.261980; SD = 0.511840; Median 1599.700000 # each ∎ represents a count of 3 1500.0000 - 1600.0000 [ 296]: ∎∎∎∎...∎∎∎∎ (98.67%) 1600.0000 - 1700.0000 [ 4]: ∎ (1.33%) 1700.0000 - 1800.0000 [ 0]: (0.00%) 1800.0000 - 1900.0000 [ 0]: (0.00%) 1900.0000 - 2000.0000 [ 0]: (0.00%) 2000.0000 - 2100.0000 [ 0]: (0.00%) 2100.0000 - 2200.0000 [ 0]: (0.00%) 2200.0000 - 2300.0000 [ 0]: (0.00%) 2300.0000 - 2400.0000 [ 0]: (0.00%) 2400.0000 - 2500.0000 [ 0]: (0.00%) 2500.0000 - 2600.0000 [ 0]: (0.00%) 2600.0000 - 2700.0000 [ 0]: (0.00%) 2700.0000 - 2800.0000 [ 0]: (0.00%) 2800.0000 - 2900.0000 [ 0]: (0.00%) 2900.0000 - 3000.0000 [ 0]: (0.00%) 3000.0000 - 3100.0000 [ 0]: (0.00%) 3100.0000 - 3200.0000 [ 0]: (0.00%) 3200.0000 - 3300.0000 [ 0]: (0.00%) 3300.0000 - 3400.0000 [ 0]: (0.00%) 3400.0000 - 3500.0000 [ 0]: (0.00%) 3500.0000 - 3600.0000 [ 0]: (0.00%) 3600.0000 - 3700.0000 [ 0]: (0.00%) 3700.0000 - 3800.0000 [ 0]: (0.00%) 3800.0000 - 3900.0000 [ 0]: (0.00%) @Doug - Yes, sandybridge (i7-2600k) seems unaffected. Hi John, I think the difference between our two results has more to do with differences in "idle" than the processor itself. When we communicated via e-mail on November 13th, I said a trace was needed to really determine what was going on on your system. That is still true, however the preferred way to acquire and post process trace data has changed. From your previous turbostat postings (the ones where IRQ totals were working), your "idle" system seems to have about twice as many IRQs per second as mine. Moving forward, I would suggest to wait for, and then try, kernel 4.12-rc1. Then your system will likely, but not for certain, use the load based code path through the intel_pstate driver. (At least I think that is the plan.) I can't explain this result as it is inconsistent with my testing under the older kernel versions, but with 4.10.13, if I drop the tickrate from 1000 to 300, I get much reduced idle frequencies: # NumSamples = 180; Min = 799.80; Max = 4401.60 # Mean = 1889.847222; Variance = 1381154.325937; SD = 1175.225223; Median 1326.400000 # each ∎ represents a count of 1 799.8000 - 1159.9800 [ 73]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎...∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (40.56%) 1159.9800 - 1520.1600 [ 26]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (14.44%) 1520.1600 - 1880.3400 [ 16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (8.89%) 1880.3400 - 2240.5200 [ 10]: ∎∎∎∎∎∎∎∎∎∎ (5.56%) 2240.5200 - 2600.7000 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ (6.11%) 2600.7000 - 2960.8800 [ 6]: ∎∎∎∎∎∎ (3.33%) 2960.8800 - 3321.0600 [ 7]: ∎∎∎∎∎∎∎ (3.89%) 3321.0600 - 3681.2400 [ 9]: ∎∎∎∎∎∎∎∎∎ (5.00%) 3681.2400 - 4041.4200 [ 4]: ∎∎∎∎ (2.22%) 4041.4200 - 4401.6000 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (10.00%) It looks like the issue was due to too many periodic ticks during 'idle'? And I wonder if it is really idle(if it is idle, there would be no big freq raising anymore because the intel_pstate is controled by the utilization provided by the scheduler now) And I think intel_pstate driver has also changed its freq predication recently(dropped the PID algorithm), so I suggested to check on latest upstream 4.14-rc1 and also use the latest turbostat -i 10 to see if it still exist? thanks. I'm closing this as there's no response for some time. Please feel free to reopen it if you'd like me to further track this issue. |