Bug 197469
Summary: | CPU runs at higher minimum frequencies on kernel 4.13 than in previous versions | ||
---|---|---|---|
Product: | Power Management | Reporter: | Erikas Rudinskas (erikmnkl) |
Component: | intel_pstate | Assignee: | Zhang Rui (rui.zhang) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | dsmythies, erikmnkl, fcastillousfq, lenb, rui.zhang |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.13 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Logs for CPU usage, turbostat and greped CPU stats
Logs as per request in comment #7 Idle PowerNightmare test |
Description
Erikas Rudinskas
2017-10-27 09:20:54 UTC
please 1. attach the output of "grep . /sys/devices/system/cpu/cpu*/cpufreq/*" 2. attach the turbostat.out file after running "sudo turbostat --debug --interval 5 --out turbostat.out" for 10 seconds in a. 4.12 kernel when the cpu frequency at ~600mhz (when it's idle) b. 4.13 kernel when the cpu frequency at ~800mhz (when it's idle) Created attachment 260429 [details]
Logs for CPU usage, turbostat and greped CPU stats
Submitting requested logs.
I've extracted "* CPU usage.txt" files using this command: cpupower frequency-info | grep asserted | awk -v date="$(date +"%Y-%m-%d %r : ")" '{ print date $4$5 }' You can clearly see the difference between 2 versions :) The 4.12 turbostat and the 4.13 turbostat both show the frequency getting down into the mid 500MHz range. package C-states also look similar. PkgWatt looks like 1.7 Watts in both cases. are you sure these observations were made running the different kernels? Yes, i confirm these log files were created on different kernel versions. There was no mistake made and log files are not mixed up. In my opinion, the kernel 4.13 is in overall not simply stable. My VM crashes/hungs on it, my home desktop with Manjaro time to time does not wake up from sleep. Could this be related? I've also raised another bug here for mentioned VM issue: https://bugzilla.kernel.org/show_bug.cgi?id=197449 I am on 4.12 now which seems to be stable. 1) The issue is between the latest 4.12.X and the first 4.13.0 kernel versions. 2) Latest available 4.13.11 is still affected with this issue. 3) Didn't have a chance to test 4.14. I am still on 4.12 kernel. (In reply to Erikas Rudinskas from comment #5) > Yes, i confirm these log files were created on different kernel versions. > There was no mistake made and log files are not mixed up. well, we'd prefer to use turbostat to see the difference when cpu is running in 500MHZ in 4.12 and 800MHZ in 4.13, but we don't get this difference according to your previous attachment. so what I'd like to see is something like below in 4.12, if (cpupower output < 600MHZ) { turbostat --debug --out turbostat-4-12.out grep . /sys/devices/system/cpu/cpu*/cpufreq/ > sysfs-4-12.out cpupower frequency-info > cpupower-4-12.out } in 4.13, if (cpupower output > 700MHZ && cpupower output < 900MHZ) { turbostat --debug --out turbostat-4-13.out grep . /sys/devices/system/cpu/cpu*/cpufreq/ > sysfs-4-13.out cpupower frequency-info > cpupower-4-13.out } the cpupower.out and sysfs.out are used to confirm the problem you mentioned in this bug. and the turbostat.out is used to tell us what happens when the bug is reproduced. so please do capture the turbostat output when the cpupower shows the difference you described in this bug report. (In reply to Erikas Rudinskas from comment #5) > In my opinion, the kernel 4.13 is in overall not simply stable. My VM > crashes/hungs on it, my home desktop with Manjaro time to time does not wake > up from sleep. Could this be related? TBH, I don't know. Let's stick with this issue first, and see if the other problems still exist after this issue being resolved. Created attachment 260613 [details] Logs as per request in comment #7 Find commands.txt file inside for used commands. One command didn't work, another one didn't stop... (In reply to Len Brown from comment #4) > The 4.12 turbostat and the 4.13 turbostat both show the frequency getting > down into the mid 500MHz range. > > package C-states also look similar. > > PkgWatt looks like 1.7 Watts in both cases. > > are you sure these observations were made running the different kernels? My fan spins faster and more often on 4.13 than on 4.12 + cpupower frequency-info shows higher CPU usage, so it's not just "my imagination". Any other ideas what causes such higher CPU frequencies? one way to get to the bottom of what changed may be to.... for each kernel -- working and failing... set the frequency to a constant value in sysfs cpufreq and watch the utilization (eg with turbostat) because the cause of the problem could simply be that the utilization has gone up -- and cpufreq is simply responding to that... this is from 4.13 kernel /sys/devices/system/cpu/cpu2/cpufreq/energy_performance_preference:balance_performance this is from 4.12 kernel /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference:balance_power But according to commit 3cedbc5a6d7f7c5539e139f89ec9f6e1ed668418 Author: Len Brown <len.brown@intel.com> AuthorDate: Mon May 1 23:06:08 2017 -0400 Commit: Len Brown <len.brown@intel.com> CommitDate: Thu May 11 21:27:53 2017 -0400 intel_pstate: use updated msr-index.h HWP.EPP values intel_pstate exports sysfs attributes for setting and observing HWP.EPP. These attributes use strings to describe 4 operating states, and inside the driver, these strings are mapped to numerical register values. The authorative mapping between the strings and numerical HWP.EPP values are now globally defined in msr-index.h, replacing the out-dated mapping that were open-coded into intel_pstate.c new old string --- --- ------ 0 0 performance 128 64 balance_performance 192 128 balance_power 255 192 power Note that the HW and BIOS default value on most system is 128, which intel_pstate will now call "balance_performance" while it used to call it "balance_power". Signed-off-by: Len Brown <len.brown@intel.com> it seems that this does not bring any function change. Thanks for findings! Is there anything that needs to be done/provided from my side? You should still try kernel 4.14. And we should figure out if bug 197945 and this one are duplicates of each other. Tried 4.14.0 - nothing has changed to me. CPU frequencies still stay higher. O.K. so this is not the same as bug 197945. There does seems to be some significant differences between kernel 4.12 and 4.13 for some benchmarks. See: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.10-4.15-Kernel-Tests However the idle power test did not show much change: https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.6-4.15-Battery-Power Also note that the idle power on my test computer (a server with no GUI, and a bunch of services disabled. i.e. my "idle" is way way more "idle" than any computer with a GUI) has been consistent at 3.82 +- ~0.05 watts ever since I can recall. Using kernel 4.14.0, please try disabling idle state 0. I use a script (run as sudo): $ cat ./idle_state0_disable #! /bin/bash echo "idle state 0: before:" cat /sys/devices/system/cpu/cpu*/cpuidle/state0/disable for file in /sys/devices/system/cpu/cpu*/cpuidle/state0/disable; do echo "1" > $file; done echo "idle state 0: after:" cat /sys/devices/system/cpu/cpu*/cpuidle/state0/disable (In reply to Doug Smythies from comment #16) > Also note that the idle power on my test computer > has been consistent at 3.82 +- ~0.05 watts ever since I > can recall. Unless I get what we are calling "PowerNightmares", where shallow idle states basically get forgotten about for a very long time (up to 4 seconds), and can consume significant extra energy. That is why I asked for the idle state 0 disable test in comment 17. I'll attach a graph of a idle PowerNightmare test from my system in a moment. Created attachment 261047 [details]
Idle PowerNightmare test
The graph is from an idle PowerNightmare test on my system.
Notice the lower average package power consumption with idle state 0 disabled, and a further lowering with more idle states disabled.
Something has definitelly changed from version 4.12 to 4.13 (increased CPU usage, Windows 10 crashes in KVM). I am currently testing latest available (stable) kernel 4.14.4 and from yesterday I am yet not 100% sure if everything is completelly fine with this kernel version. In following few days I'll do some test and share the findings. For 2 minutes I've been measuring CPU statistics using these commands (equally for 2 minutes - thanks to simple, but effective bash script): CPUSPEED=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq` CPUTEMP=`cat /sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input` =============================== 4.12.8 Average CPU speed: 920994.7375 Average CPU temp: 33681.06312 4.13.6 Average CPU speed: 912368.7276 Average CPU temp: 34299.00332 4.14.4 Average CPU speed: 917715.0432 Average CPU temp: 33691.0299 =============================== Top 20 lowest CPU speeds on kernel: 4.12.8 836169 894836 896649 896649 897967 899450 899450 899780 899780 899780 899780 899780 899780 899780 899780 899780 899780 899945 899945 899945 Top 20 lowest CPU speeds on kernel: 4.13.6 898502 898802 899267 899790 900000 900000 900000 900043 900049 900053 900053 900055 900059 900060 900071 900072 900074 900076 900089 900093 Top 20 lowest CPU speeds on kernel: 4.14.4 899080 899466 899673 899700 899703 899711 899713 899738 899756 899771 899773 899782 899807 899821 899823 899824 899831 899832 899835 899840 =============================== Temperature:count (kernel: 4.12.8) 37:1 36:1 35:15 34:168 33:116 Temperature:count (kernel: 4.13.6) 37:4 36:27 35:73 34:148 33:49 Temperature:count (kernel: 4.14.4) 37:0 36:0 35:20 34:168 33:113 =============================== In conclusion - seems that issue is gone on 4.14.4 kernel, but results still show that something got slightly broken on kernel version 4.13. Haven't tested if Windows 10 VM (KVM/QEMU) doesn't crash running 4.14.4 kernel version. Just did lots of reboots of VM, Essay writting in MS Word, 30+ tabs in Firefox browser etc. Not a single crash! Few hours - still stable. Computer can completelly be suspended and resumed multiple times while VM is running, yet still no issues at all. I believe this bug can be closed as it is no longer reproducible in 4.14.4 (and possibly further versions). |