Bug 197469

Summary: CPU runs at higher minimum frequencies on kernel 4.13 than in previous versions
Product: Power Management Reporter: Erikas Rudinskas (erikmnkl)
Component: intel_pstateAssignee: Zhang Rui (rui.zhang)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: dsmythies, erikmnkl, fcastillousfq, lenb, rui.zhang
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.13 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Logs for CPU usage, turbostat and greped CPU stats
Logs as per request in comment #7
Idle PowerNightmare test

Description Erikas Rudinskas 2017-10-27 09:20:54 UTC
I have a laptop with Kaby Lake CPU (i5-7200U) and I noticed that fan spins more often that it used to be before system update. Checked at what frequency CPU stays when idle and found out that it basically never go below 800mhz. When on 4.12 and beyond versions - it used to stay at about 550mhz-650mhz CPU frequency (minimum reported frequency - 400mhz). Moreover, in public Arch Linux Facebook group there was at least one additional person with Kaby Lake CPU who also confirmed this behaviour.

Operating system: Arch Linux (64bit)
Kernel: 4.13.9
Hardware: Asus Zenbook UX430UQ

See more details on Facebook post: https://www.facebook.com/photo.php?fbid=2002853513308006
Comment 1 Zhang Rui 2017-10-30 03:03:02 UTC
please
1. attach the output of "grep . /sys/devices/system/cpu/cpu*/cpufreq/*"
2. attach the turbostat.out file after running "sudo turbostat --debug --interval 5 --out turbostat.out" for 10 seconds
in
a. 4.12 kernel when the cpu frequency at ~600mhz (when it's idle)
b. 4.13 kernel when the cpu frequency at ~800mhz (when it's idle)
Comment 2 Erikas Rudinskas 2017-10-30 06:52:35 UTC
Created attachment 260429 [details]
Logs for CPU usage, turbostat and greped CPU stats

Submitting requested logs.
Comment 3 Erikas Rudinskas 2017-10-30 06:53:56 UTC
I've extracted "* CPU usage.txt" files using this command:

cpupower frequency-info | grep asserted | awk -v date="$(date +"%Y-%m-%d %r : ")" '{ print date $4$5 }'

You can clearly see the difference between 2 versions :)
Comment 4 Len Brown 2017-10-30 23:19:58 UTC
The 4.12 turbostat and the 4.13 turbostat both show the frequency getting down into the mid 500MHz range.

package C-states also look similar.

PkgWatt looks like 1.7 Watts in both cases.

are you sure these observations were made running the different kernels?
Comment 5 Erikas Rudinskas 2017-10-30 23:28:54 UTC
Yes, i confirm these log files were created on different kernel versions. There was no mistake made and log files are not mixed up.

In my opinion, the kernel 4.13 is in overall not simply stable. My VM crashes/hungs on it, my home desktop with Manjaro time to time does not wake up from sleep. Could this be related? I've also raised another bug here for mentioned VM issue: https://bugzilla.kernel.org/show_bug.cgi?id=197449

I am on 4.12 now which seems to be stable.
Comment 6 Erikas Rudinskas 2017-11-03 22:57:46 UTC
1) The issue is between the latest 4.12.X and the first 4.13.0 kernel versions. 
2) Latest available 4.13.11 is still affected with this issue.
3) Didn't have a chance to test 4.14.

I am still on 4.12 kernel.
Comment 7 Zhang Rui 2017-11-05 07:22:14 UTC
(In reply to Erikas Rudinskas from comment #5)
> Yes, i confirm these log files were created on different kernel versions.
> There was no mistake made and log files are not mixed up.

well, we'd prefer to use turbostat to see the difference when cpu is running in 500MHZ in 4.12 and 800MHZ in 4.13, but we don't get this difference according to your previous attachment.

so what I'd like to see is something like below
in 4.12,
if (cpupower output < 600MHZ) {
   turbostat --debug --out turbostat-4-12.out
   grep . /sys/devices/system/cpu/cpu*/cpufreq/ > sysfs-4-12.out
   cpupower frequency-info > cpupower-4-12.out
}
in 4.13,
if (cpupower output > 700MHZ && cpupower output < 900MHZ) {
   turbostat --debug --out turbostat-4-13.out
   grep . /sys/devices/system/cpu/cpu*/cpufreq/ > sysfs-4-13.out
   cpupower frequency-info > cpupower-4-13.out
}

the cpupower.out and sysfs.out are used to confirm the problem you mentioned in this bug.
and the turbostat.out is used to tell us what happens when the bug is reproduced.

so please do capture the turbostat output when the cpupower shows the difference you described in this bug report.
Comment 8 Zhang Rui 2017-11-05 07:24:12 UTC
(In reply to Erikas Rudinskas from comment #5)
> In my opinion, the kernel 4.13 is in overall not simply stable. My VM
> crashes/hungs on it, my home desktop with Manjaro time to time does not wake
> up from sleep. Could this be related?

TBH, I don't know.
Let's stick with this issue first, and see if the other problems still exist after this issue being resolved.
Comment 9 Erikas Rudinskas 2017-11-11 16:23:13 UTC
Created attachment 260613 [details]
Logs as per request in comment #7

Find commands.txt file inside for used commands. One command didn't work, another one didn't stop...
Comment 10 Erikas Rudinskas 2017-11-11 16:29:29 UTC
(In reply to Len Brown from comment #4)
> The 4.12 turbostat and the 4.13 turbostat both show the frequency getting
> down into the mid 500MHz range.
> 
> package C-states also look similar.
> 
> PkgWatt looks like 1.7 Watts in both cases.
> 
> are you sure these observations were made running the different kernels?

My fan spins faster and more often on 4.13 than on 4.12 + cpupower frequency-info shows higher CPU usage, so it's not just "my imagination".

Any other ideas what causes such higher CPU frequencies?
Comment 11 Len Brown 2017-11-20 23:16:50 UTC
one way to get to the bottom of what changed may be to....

for each kernel -- working and failing...
set the frequency to a constant value in sysfs cpufreq

and watch the utilization (eg with turbostat)

because the cause of the problem could simply be that the utilization has gone up -- and cpufreq is simply responding to that...
Comment 12 Zhang Rui 2017-11-21 01:30:47 UTC
this is from 4.13 kernel
/sys/devices/system/cpu/cpu2/cpufreq/energy_performance_preference:balance_performance
this is from 4.12 kernel
/sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference:balance_power

But according to
commit 3cedbc5a6d7f7c5539e139f89ec9f6e1ed668418
Author:     Len Brown <len.brown@intel.com>
AuthorDate: Mon May 1 23:06:08 2017 -0400
Commit:     Len Brown <len.brown@intel.com>
CommitDate: Thu May 11 21:27:53 2017 -0400

    intel_pstate: use updated msr-index.h HWP.EPP values
    
    intel_pstate exports sysfs attributes for setting and observing HWP.EPP.
    These attributes use strings to describe 4 operating states, and
    inside the driver, these strings are mapped to numerical register
    values.
    
    The authorative mapping between the strings and numerical HWP.EPP values
    are now globally defined in msr-index.h, replacing the out-dated
    mapping that were open-coded into intel_pstate.c
    
    new old string
    --- --- ------
      0   0 performance
    128  64 balance_performance
    192 128 balance_power
    255 192 power
    
    Note that the HW and BIOS default value on most system is 128,
    which intel_pstate will now call "balance_performance"
    while it used to call it "balance_power".
    
    Signed-off-by: Len Brown <len.brown@intel.com>

it seems that this does not bring any function change.
Comment 13 Erikas Rudinskas 2017-11-21 03:45:03 UTC
Thanks for findings! Is there anything that needs to be done/provided from my side?
Comment 14 Doug Smythies 2017-11-25 16:17:56 UTC
You should still try kernel 4.14.
And we should figure out if bug 197945 and this one are duplicates of each other.
Comment 15 Erikas Rudinskas 2017-11-26 14:14:56 UTC
Tried 4.14.0 - nothing has changed to me. CPU frequencies still stay higher.
Comment 16 Doug Smythies 2017-11-26 19:41:06 UTC
O.K. so this is not the same as bug 197945.
There does seems to be some significant differences between kernel 4.12 and 4.13 for some benchmarks. See:
https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.10-4.15-Kernel-Tests
However the idle power test did not show much change:
https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.6-4.15-Battery-Power
Also note that the idle power on my test computer (a server with no GUI, and a bunch of services disabled. i.e. my "idle" is way way more "idle" than any computer with a GUI) has been consistent at 3.82 +- ~0.05 watts ever since I can recall.
Comment 17 Doug Smythies 2017-11-30 03:17:53 UTC
Using kernel 4.14.0, please try disabling idle state 0.

I use a script (run as sudo):

$ cat ./idle_state0_disable
#! /bin/bash
echo "idle state 0: before:"
cat /sys/devices/system/cpu/cpu*/cpuidle/state0/disable

for file in /sys/devices/system/cpu/cpu*/cpuidle/state0/disable; do echo "1" > $file; done

echo "idle state 0: after:"
cat /sys/devices/system/cpu/cpu*/cpuidle/state0/disable
Comment 18 Doug Smythies 2017-12-06 22:04:55 UTC
(In reply to Doug Smythies from comment #16)
> Also note that the idle power on my test computer
> has been consistent at 3.82 +- ~0.05 watts ever since I
> can recall.

Unless I get what we are calling "PowerNightmares", where shallow idle states basically get forgotten about for a very long time (up to 4 seconds), and can consume significant extra energy.

That is why I asked for the idle state 0 disable test in comment 17.

I'll attach a graph of a idle PowerNightmare test from my system in a moment.
Comment 19 Doug Smythies 2017-12-06 22:10:42 UTC
Created attachment 261047 [details]
Idle PowerNightmare test

The graph is from an idle PowerNightmare test on my system.
Notice the lower average package power consumption with idle state 0 disabled, and a further lowering with more idle states disabled.
Comment 20 Erikas Rudinskas 2017-12-11 11:59:08 UTC
Something has definitelly changed from version 4.12 to 4.13 (increased CPU usage, Windows 10 crashes in KVM).

I am currently testing latest available (stable) kernel 4.14.4 and from yesterday I am yet not 100% sure if everything is completelly fine with this kernel version.

In following few days I'll do some test and share the findings.
Comment 21 Erikas Rudinskas 2017-12-12 00:45:53 UTC
For 2 minutes I've been measuring CPU statistics using these commands (equally for 2 minutes - thanks to simple, but effective bash script):

CPUSPEED=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq`
CPUTEMP=`cat /sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input`

===============================

4.12.8
Average CPU speed: 920994.7375
Average CPU temp: 33681.06312

4.13.6
Average CPU speed: 912368.7276
Average CPU temp: 34299.00332

4.14.4
Average CPU speed: 917715.0432
Average CPU temp: 33691.0299

===============================

Top 20 lowest CPU speeds on kernel: 4.12.8
836169
894836
896649
896649
897967
899450
899450
899780
899780
899780
899780
899780
899780
899780
899780
899780
899780
899945
899945
899945

Top 20 lowest CPU speeds on kernel: 4.13.6
898502
898802
899267
899790
900000
900000
900000
900043
900049
900053
900053
900055
900059
900060
900071
900072
900074
900076
900089
900093

Top 20 lowest CPU speeds on kernel: 4.14.4
899080
899466
899673
899700
899703
899711
899713
899738
899756
899771
899773
899782
899807
899821
899823
899824
899831
899832
899835
899840

===============================

Temperature:count (kernel: 4.12.8)
37:1
36:1
35:15
34:168
33:116

Temperature:count (kernel: 4.13.6)
37:4
36:27
35:73
34:148
33:49

Temperature:count (kernel: 4.14.4)
37:0
36:0
35:20
34:168
33:113

===============================

In conclusion - seems that issue is gone on 4.14.4 kernel, but results still show that something got slightly broken on kernel version 4.13. Haven't tested if Windows 10 VM (KVM/QEMU) doesn't crash running 4.14.4 kernel version.
Comment 22 Erikas Rudinskas 2017-12-12 01:40:50 UTC
Just did lots of reboots of VM, Essay writting in MS Word, 30+ tabs in Firefox browser etc. Not a single crash! Few hours - still stable. Computer can completelly be suspended and resumed multiple times while VM is running, yet still no issues at all.

I believe this bug can be closed as it is no longer reproducible in 4.14.4 (and possibly further versions).