Bug 66581 - intel_pstate/powersave - cpu frequency remains at high level (Turbo) after suspend/resume
Summary: intel_pstate/powersave - cpu frequency remains at high level (Turbo) after su...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Chen Yu
URL:
Keywords:
: 58801 65301 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-12-05 09:36 UTC by Lado Kumsiashvili
Modified: 2016-05-22 22:48 UTC (History)
15 users (show)

See Also:
Kernel Version: 3.12.2-gentoo
Subsystem:
Regression: No
Bisected commit-id:


Attachments
outpot of cpupower (2.90 KB, text/plain)
2013-12-05 09:36 UTC, Lado Kumsiashvili
Details
/usr/src/linux/.config (94.13 KB, text/plain)
2013-12-05 09:37 UTC, Lado Kumsiashvili
Details
ps aux (19.65 KB, text/plain)
2013-12-05 09:37 UTC, Lado Kumsiashvili
Details
cat /proc/cpuinfo (7.98 KB, text/plain)
2013-12-05 09:38 UTC, Lado Kumsiashvili
Details
powertop --html before resume (92.44 KB, text/html)
2013-12-05 19:35 UTC, Lado Kumsiashvili
Details
powertop --html after resume (86.08 KB, text/html)
2013-12-05 19:35 UTC, Lado Kumsiashvili
Details
powertop --html --workload=sleep 300 before suspend (64.21 KB, text/html)
2014-01-18 10:16 UTC, Mykal Valentine
Details
powertop --html --workload=sleep 300 after resume (61.10 KB, text/html)
2014-01-18 10:21 UTC, Mykal Valentine
Details
/usr/bin/perf record -a -c 1 -e power:pstate_sample sleep 5 from before suspend (6.17 KB, text/plain)
2014-01-18 20:10 UTC, Mykal Valentine
Details
/usr/bin/perf record -a -c 1 -e power:pstate_sample sleep 5 after resume (14.21 KB, text/plain)
2014-01-18 20:11 UTC, Mykal Valentine
Details
perf diff perf.data.old perf.data (238 bytes, text/plain)
2014-01-18 20:17 UTC, Mykal Valentine
Details
/usr/bin/perf record -a -c 1 -e power:pstate_sample sleep 5 after resume (1.65 MB, text/plain)
2014-01-21 06:01 UTC, Mykal Valentine
Details
Patch to take non-idle time into account fro core busy calculation (2.54 KB, patch)
2014-01-28 23:10 UTC, Dirk Brandewie
Details | Diff
Perf script output after patch (9.96 KB, text/plain)
2014-01-29 03:45 UTC, Mykal Valentine
Details
screenshot (723.40 KB, image/png)
2014-02-06 21:09 UTC, tomasi
Details
powertop --html (62.92 KB, text/html)
2014-03-16 21:37 UTC, Joost
Details
powertop --html after boot from complete power off, no suspend done (56.53 KB, text/html)
2014-03-19 07:20 UTC, Joost
Details
An example turbostat capture (1012 bytes, text/plain)
2015-10-20 22:42 UTC, Doug Smythies
Details

Description Lado Kumsiashvili 2013-12-05 09:36:53 UTC
Created attachment 117541 [details]
outpot of cpupower

After resume all the cores are showing the max turbo frequences for my cpu.
This is unrealisitc as the max frequence is 2.80 and all the cores can not run at the max turbo frequence.
Comment 1 Lado Kumsiashvili 2013-12-05 09:37:34 UTC
Created attachment 117551 [details]
/usr/src/linux/.config
Comment 2 Lado Kumsiashvili 2013-12-05 09:37:50 UTC
Created attachment 117561 [details]
ps aux
Comment 3 Lado Kumsiashvili 2013-12-05 09:38:15 UTC
Created attachment 117571 [details]
cat /proc/cpuinfo
Comment 4 Dirk Brandewie 2013-12-05 14:42:56 UTC
Could you run "powertop --html"  before and after resume and attach the results.
Comment 5 Lado Kumsiashvili 2013-12-05 19:35:17 UTC
Created attachment 117671 [details]
powertop --html before resume
Comment 6 Lado Kumsiashvili 2013-12-05 19:35:42 UTC
Created attachment 117681 [details]
powertop --html after resume
Comment 7 Lado Kumsiashvili 2013-12-05 19:45:20 UTC
OK, attached. Another strange thing. I think I have never noticed before, because I have this laptop since a couple of days. After I have booted cpufreq showed 2.80ghz at all cores without any last. THen I have executed

max_perf_pct showed 100

and min_perf_pct showed 21

I has never gone below 2.80Ghz


After I have executed

echo 21 > /sys/devices/system/cpu/intel_pstate/max_perf_pct 

it went down to 800Mhz as expected in cpupower tool view

genlap ~ # cpupower -c 0-4 frequency-info | grep current

  current policy: frequency should be within 800 MHz and 3.80 GHz.
  current CPU frequency is 800 MHz (asserted by call to hardware).
  current policy: frequency should be within 800 MHz and 3.80 GHz.
  current CPU frequency is 800 MHz (asserted by call to hardware).
  current policy: frequency should be within 800 MHz and 3.80 GHz.
  current CPU frequency is 800 MHz (asserted by call to hardware).
  current policy: frequency should be within 800 MHz and 3.80 GHz.
  current CPU frequency is 800 MHz (asserted by call to hardware).
  current policy: frequency should be within 800 MHz and 3.80 GHz.
  current CPU frequency is 800 MHz (asserted by call to hardware).

  current CPU frequency is 800 MHz (asserted by call to hardware).

When I put the value 100 in the max_perf_pct it goes now between 2.80 and somewhere 3.30Ghz and changes all the time as I execute cpupower. 


But, before I have touched this max_perf_pct  after the boot it was constantly on 2.80.

With regards,

Lado
Comment 8 tomasi 2013-12-06 21:02:18 UTC
Looks same like defect 65301.
https://bugzilla.kernel.org/show_bug.cgi?id=65301
Comment 9 Lan Tianyu 2013-12-09 05:52:34 UTC
*** Bug 65301 has been marked as a duplicate of this bug. ***
Comment 10 tomasi 2014-01-14 21:05:48 UTC
just checking..
Is there any progress on that?
Comment 11 Jeremy A 2014-01-16 20:04:50 UTC
Also affected by this bug on Dell Inspiron 7000.

Linux jitte 3.12.7-2-ARCH #1 SMP PREEMPT Sun Jan 12 13:09:09 CET 2014 x86_64 GNU/Linux

Would be happy to help if you need any feedback.

When laptop starts, governor seems to do the job right with mimimum: 800Mhz hit most of the time in idle. After suspend/resume, the clock is always max: 3000Mhz.
Comment 12 Dirk Brandewie 2014-01-17 16:51:05 UTC
(In reply to Lado Kumsiashvili from comment #7)
> OK, attached. Another strange thing. I think I have never noticed before,
> because I have this laptop since a couple of days. After I have booted
> cpufreq showed 2.80ghz at all cores without any last. THen I have executed
> 

The before and after powertop are nothing alike as far as what is running on the system before and after resume.  Can you get powertop output that have the same processes running before and after resume.

> max_perf_pct showed 100
> 
> and min_perf_pct showed 21
> 
> I has never gone below 2.80Ghz
> 
> 
> After I have executed
> 
> echo 21 > /sys/devices/system/cpu/intel_pstate/max_perf_pct 
> 
> it went down to 800Mhz as expected in cpupower tool view
> 
> genlap ~ # cpupower -c 0-4 frequency-info | grep current
> 
>   current policy: frequency should be within 800 MHz and 3.80 GHz.
>   current CPU frequency is 800 MHz (asserted by call to hardware).
>   current policy: frequency should be within 800 MHz and 3.80 GHz.
>   current CPU frequency is 800 MHz (asserted by call to hardware).
>   current policy: frequency should be within 800 MHz and 3.80 GHz.
>   current CPU frequency is 800 MHz (asserted by call to hardware).
>   current policy: frequency should be within 800 MHz and 3.80 GHz.
>   current CPU frequency is 800 MHz (asserted by call to hardware).
>   current policy: frequency should be within 800 MHz and 3.80 GHz.
>   current CPU frequency is 800 MHz (asserted by call to hardware).
> 
>   current CPU frequency is 800 MHz (asserted by call to hardware).
> 
> When I put the value 100 in the max_perf_pct it goes now between 2.80 and
> somewhere 3.30Ghz and changes all the time as I execute cpupower. 
> 
> 
> But, before I have touched this max_perf_pct  after the boot it was
> constantly on 2.80.
> 
> With regards,
> 
> Lado
Comment 13 Dirk Brandewie 2014-01-17 17:09:14 UTC
(In reply to Jeremy A from comment #11)
> Also affected by this bug on Dell Inspiron 7000.
> 
> Linux jitte 3.12.7-2-ARCH #1 SMP PREEMPT Sun Jan 12 13:09:09 CET 2014 x86_64
> GNU/Linux
> 
> Would be happy to help if you need any feedback.
> 
> When laptop starts, governor seems to do the job right with mimimum: 800Mhz
> hit most of the time in idle. After suspend/resume, the clock is always max:
> 3000Mhz.

could you boot the system and run "powertop --html --workload=sleep 300" before and after a suspend/resume cycle?  It is best to wait a couple minutes after boot to let the system settle.
Comment 14 Dirk Brandewie 2014-01-17 17:24:01 UTC
Here is a patch that is queued for 3.14 that allows you to use perf to collect
the internal state of intel_pstate driver with perf.

Could you apply the patch and run:
   perf record -a -c 1 -e power:pstate_sample sleep 5

before and after a suspend/resume cycle.

You can get human readable output with:
   perf script


commit e44be3a0b78e301abe46cc7f34bcf88736e43f12
Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
Date:   Fri Jan 17 08:45:50 2014 -0800

    intel_pstate: Add trace point to report internal state.
    
    Add perf trace event "power:pstate_sample" to report driver state to
    aid in diagnosing issues reported against intel_pstate.
    
    Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 26 +++++++++++++++++++++
 include/trace/events/power.h   | 53 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 99d8ab5..8e2f31a 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -47,6 +47,8 @@ static inline int32_t div_fp(int32_t x, int32_t y)
 	return div_s64((int64_t)x << FRAC_BITS, (int64_t)y);
 }
 
+static u64 energy_divisor;
+
 struct sample {
 	int32_t core_pct_busy;
 	u64 aperf;
@@ -449,6 +451,7 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
 
 	rdmsrl(MSR_IA32_APERF, aperf);
 	rdmsrl(MSR_IA32_MPERF, mperf);
+
 	cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT;
 	cpu->samples[cpu->sample_ptr].aperf = aperf;
 	cpu->samples[cpu->sample_ptr].mperf = mperf;
@@ -493,6 +496,7 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 	ctl = pid_calc(pid, busy_scaled);
 
 	steps = abs(ctl);
+
 	if (ctl < 0)
 		intel_pstate_pstate_increase(cpu, steps);
 	else
@@ -502,10 +506,17 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 static void intel_pstate_timer_func(unsigned long __data)
 {
 	struct cpudata *cpu = (struct cpudata *) __data;
+	struct sample *sample;
+	u64 energy;
 
 	intel_pstate_sample(cpu);
+
+	sample = &cpu->samples[cpu->sample_ptr];
+	rdmsrl(MSR_PKG_ENERGY_STATUS, energy);
+
 	intel_pstate_adjust_busy_pstate(cpu);
 
+
 	if (cpu->pstate.current_pstate == cpu->pstate.min_pstate) {
 		cpu->min_pstate_count++;
 		if (!(cpu->min_pstate_count % 5)) {
@@ -514,6 +525,15 @@ static void intel_pstate_timer_func(unsigned long __data)
 	} else
 		cpu->min_pstate_count = 0;
 
+	trace_pstate_sample(fp_toint(sample->core_pct_busy),
+			fp_toint(intel_pstate_get_scaled_busy(cpu)),
+			cpu->pstate.current_pstate,
+			sample->mperf,
+			sample->aperf,
+			div64_u64(energy, energy_divisor),
+			sample->freq);
+
+
 	intel_pstate_set_sample_time(cpu);
 }
 
@@ -707,6 +727,8 @@ static int __init intel_pstate_init(void)
 {
 	int cpu, rc = 0;
 	const struct x86_cpu_id *id;
+	u64 units;
+
 
 	if (no_load)
 		return -ENODEV;
@@ -728,8 +750,12 @@ static int __init intel_pstate_init(void)
 	if (rc)
 		goto out;
 
+	rdmsrl(MSR_RAPL_POWER_UNIT, units);
+	energy_divisor = 1 << ((units >> 8) & 0x1f); /* bits{12:8} */
+
 	intel_pstate_debug_expose_params();
 	intel_pstate_sysfs_expose_params();
+
 	return rc;
 out:
 	get_online_cpus();
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index cda100d..9e9475c 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -35,6 +35,59 @@ DEFINE_EVENT(cpu, cpu_idle,
 	TP_ARGS(state, cpu_id)
 );
 
+TRACE_EVENT(pstate_sample,
+
+	TP_PROTO(u32 core_busy,
+		u32 scaled_busy,
+		u32 state,
+		u64 mperf,
+		u64 aperf,
+		u32 energy,
+		u32 freq
+		),
+
+	TP_ARGS(core_busy,
+		scaled_busy,
+		state,
+		mperf,
+		aperf,
+		energy,
+		freq
+		),
+
+	TP_STRUCT__entry(
+		__field(u32, core_busy)
+		__field(u32, scaled_busy)
+		__field(u32, state)
+		__field(u64, mperf)
+		__field(u64, aperf)
+		__field(u32, energy)
+		__field(u32, freq)
+
+	),
+
+	TP_fast_assign(
+		__entry->core_busy = core_busy;
+		__entry->scaled_busy = scaled_busy;
+		__entry->state = state;
+		__entry->mperf = mperf;
+		__entry->aperf = aperf;
+		__entry->energy = energy;
+		__entry->freq = freq;
+		),
+
+	TP_printk("core_busy=%lu scaled=%lu state=%lu mperf=%llu aperf=%llu energy=%lu freq=%lu ",
+		(unsigned long)__entry->core_busy,
+		(unsigned long)__entry->scaled_busy,
+		(unsigned long)__entry->state,
+		(unsigned long long)__entry->mperf,
+		(unsigned long long)__entry->aperf,
+		(unsigned long)__entry->energy,
+		(unsigned long)__entry->freq
+		)
+
+);
+
 /* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
 #ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
 #define _PWR_EVENT_AVOID_DOUBLE_DEFINING
Comment 15 Mykal Valentine 2014-01-18 10:15:32 UTC
I tried using that patch, but I still didn't see the pstate events.  Maybe I'm missing some other patches.  The tree I put the patch in was 3.13-rc8.

Anyway, I did the powertop outputs for before and after the resume.
Comment 16 Mykal Valentine 2014-01-18 10:16:40 UTC
Created attachment 122471 [details]
powertop --html --workload=sleep 300 before suspend
Comment 17 Mykal Valentine 2014-01-18 10:21:05 UTC
Created attachment 122481 [details]
powertop --html --workload=sleep 300 after resume
Comment 18 Mykal Valentine 2014-01-18 20:10:53 UTC
Created attachment 122501 [details]
/usr/bin/perf record -a -c 1 -e power:pstate_sample sleep 5  from before suspend
Comment 19 Mykal Valentine 2014-01-18 20:11:24 UTC
Created attachment 122511 [details]
/usr/bin/perf record -a -c 1 -e power:pstate_sample sleep 5 after resume
Comment 20 Mykal Valentine 2014-01-18 20:17:38 UTC
Created attachment 122521 [details]
perf diff perf.data.old perf.data

I'm not sure if the diff is useful at all...  First arg is before suspend, second is after.
Comment 21 Dirk Brandewie 2014-01-20 17:18:43 UTC
(In reply to Mykal Valentine from comment #19)
> Created attachment 122511 [details]
> /usr/bin/perf record -a -c 1 -e power:pstate_sample sleep 5 after resume

So intel_pstate is working correctly AFAICT.  Something is putting a load on the
system that intel_pstate is reacting to.

Does the situation persist for a long time after resume or does it "calm down" after a while.

It is strange that powertop is measuring 2W less power in the "after resume" case.

I don't know what is causing the load after resume could you try shutting down everything but the desktop include WLAN connection and grab perf data again?
Comment 22 tomasi 2014-01-20 18:16:54 UTC
(In reply to Dirk Brandewie from comment #21) 
> Does the situation persist for a long time after resume or does it "calm
> down" after a while.

it persists forever (until next reboot).
Comment 23 Jeremy A 2014-01-20 21:09:48 UTC
Yup, same on my machine.
Comment 24 Mykal Valentine 2014-01-21 06:01:49 UTC
Created attachment 122871 [details]
/usr/bin/perf record -a -c 1 -e power:pstate_sample sleep 5 after resume

Okay, I did it again with most everything turned off including wifi.
Comment 25 Jeremy A 2014-01-25 20:43:36 UTC
Does anyone know of a hack to reset the intel_pstate load? Temperature of the laptop tends to be bit high after resume due to the constant high freq.
Comment 26 Dirk Brandewie 2014-01-28 23:10:20 UTC
Created attachment 123711 [details]
Patch to take non-idle time into account fro core busy calculation

I think I have figured out what is happening could you try this patch to see if this fixes the issue
Comment 27 Dirk Brandewie 2014-01-28 23:14:47 UTC
The msr-tools package from https://01.org/msr-tools/downloads makes it easy to see what P state is being requested for each core with the command:
   sudo rdmsr -a   0x199

The requested P state is in the upper eight bits.
Comment 28 Mykal Valentine 2014-01-29 03:45:34 UTC
Created attachment 123721 [details]
Perf script output after patch

That seems to have done the trick, as far as I can tell.

At least, the cores are no longer stuck at turbo frequency any longer.


Reading that MSR gave values of 800 for most cores and 1700 for one, mostly.

I've including the new perf script output.
Comment 29 tomasi 2014-01-30 14:30:45 UTC
can anybody provide to me deb packages for amd64?
Comment 30 Dirk Brandewie 2014-02-03 16:41:26 UTC
*** Bug 58801 has been marked as a duplicate of this bug. ***
Comment 31 Jeremy A 2014-02-05 19:50:51 UTC
I would like to try this patch on Arch Linux. I checked out the kernel sources on github but can't find any trace of the patch yet.
Comment 32 Dirk Brandewie 2014-02-05 22:20:39 UTC
The patch is queued for the v3.14-rc2 merge and was marked for stable so it should end up in the stable tree.

in the mean time you can find it in Rafaels tree 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git


commit fcb6a15c2e7e76d493e6f91ea889ab40e1c643a4
Comment 33 tomasi 2014-02-06 21:08:07 UTC
Hi Dirk,
i just compiled linux-pm with patch.
problem still persists.
Comment 34 tomasi 2014-02-06 21:09:12 UTC
Created attachment 124931 [details]
screenshot
Comment 35 tomasi 2014-02-06 21:10:10 UTC
Comment on attachment 124931 [details]
screenshot

after resume -> see right part - conky app
Comment 36 Dirk Brandewie 2014-02-10 21:02:17 UTC
Still searching for the difference between normal operation and resume :-(
Could someone that is still seeing the problem run the command:
   echo 0 > /sys/kernel/debug/pstate_snb/d_gain_pct
as root and tell me if there is any change?

This command in fact any write to the debugfs files will force a 
reset of the PIDs on all CPUs.

The only difference I can find in the normal and the resume path is that CPU 0
is not re-initialized due the core treating it as a boot cpu.
Comment 37 kepler 2014-02-10 22:38:04 UTC
(In reply to Dirk Brandewie from comment #36)
> Still searching for the difference between normal operation and resume :-(
> Could someone that is still seeing the problem run the command:
>    echo 0 > /sys/kernel/debug/pstate_snb/d_gain_pct
> as root and tell me if there is any change?

No change for me. Still at 3.1-3.2 GHz (turbo) on all cores. Oddly enough, powertop reports 100% idle on the frequency stats (while top and powertop idle stats report ~97% idle).

running on 3.13.1 w/ the patch from comment #26,

$rdmsr -a 0x199
2100
1b00
1b00
1b00
Comment 38 Dirk Brandewie 2014-02-11 06:40:50 UTC
commit 91a4cd4f3 intel_pstate: Remove periodic P state boost
Solves the issue seen in Mykal perf output.  The commit came in in v3.14-rc1 I have it submitted to stable but haven't seen it added to the patch queue yet.
Comment 39 kepler 2014-02-11 09:23:35 UTC
"works for me" w/ 3.14-rc2
Comment 40 Jeremy A 2014-02-16 21:37:46 UTC
Hi all,

I just installed linux-mainline (3.14-rc2) from Arch Linux AUR but the problem still persists for me. Both commits: `91a4cd4f3` and `fcb6a15c2` are present.

> uname -a
Linux machine 3.14.0-1-mainline #1 SMP PREEMPT Sat Feb 15 00:15:45 CET 2014 x86_64 GNU/Linux

> sudo i7z
...
Socket [0] - [physical cores=2, logical cores=4, max online cores ever=2]
  TURBO ENABLED on 2 Cores, Hyper Threading ON
  Max Frequency without considering Turbo 1894.72 MHz (99.72 x [19])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  30x/27x/27x/27x
  Real Current Frequency 2904.83 MHz [99.72 x 29.13] (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       2904.83 (29.13x)         1    96.6       1       1    50      0.9441
        Core 2 [2]:       2860.04 (28.68x)         1    96.7       1       1    47      0.9468
Comment 41 Jeremy A 2014-02-17 22:14:33 UTC
Never mind my last comment. I uninstalled tlp and now everything is fine. Cores are running low 800Mhz on idle now, even after suspend/resume. It seems that those PM tools (laptop-tools, tlp, pm-utils) now do more harm than good considering the progress in kernel power management. Thanks Dirk.
Comment 42 Joost 2014-03-16 21:36:01 UTC
Hi everybody,

I fear I am seeing similar problems on my T440s. After suspend, my CPU cores are stuck on 3,2 GHz. Attached is a powertop html.
Am I right that this is related? Any further info needed?

Joost
Comment 43 Joost 2014-03-16 21:37:07 UTC
Created attachment 129651 [details]
powertop --html
Comment 44 Joost 2014-03-16 21:38:03 UTC
Btw., this is on 3.14 rc 6 from here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14-rc6-trusty/
Comment 45 Joost 2014-03-19 07:20:53 UTC
Created attachment 130021 [details]
powertop --html after boot from complete power off, no suspend done

After boot, before the first suspend cycle is performed, it seems like the cores are stuck between 2,1 GHz and approx. 3,3GHz and do not switch to frequencies below 2,1 GHz.
Comment 46 Jeremy A 2014-03-19 20:18:52 UTC
So this seems unrelated to this bug, are you running laptop-tools or TLP?
Comment 47 Joost 2014-03-20 09:29:00 UTC
No, both are not installed.
Should I open a new bug?
Comment 48 Alex Lochmann 2015-01-07 18:48:14 UTC
I can confirm this bug still exists.
I filed another bug report related to the intel pstate driver: https://bugzilla.kernel.org/show_bug.cgi?id=90421

I tried to uninstall tlp, which had no effect. Furthermore, i ran the perf example as mentioned above. After resuming from suspend the cpu reports busy values above 100 leading to a high frequency.
Comment 49 dflogeras2 2015-10-18 15:47:17 UTC
I too see this behaviour, before first suspend my cpu freq is as low as 800Mhz, after first suspend it never seems to dip below 2.3GHz

Sony Vaio Pro 13
i5-4200U

Tested on both 4.0.5 and 4.2.3
I have both X86_INTEL_PSTATE and X86_ACPI_CPUFREQ compiled in.  I also use the sony platform driver which lets you set a thermal profile, not sure if that is interfering?  (But you'd think it would interfere before suspend too).
Comment 50 dflogeras2 2015-10-18 15:51:41 UTC
I also find it odd that though I have only the following governors compiled in:
CONSERVATIVE
PERFORMANCE
USERSPACE
ONDEMAND

and the default is set to CONSERVATIVE,

when I query scaling_available_governors it says:
performance powersave

and that scaling_governor says I am using powersave (which isn't in my kernel)
Comment 51 Doug Smythies 2015-10-18 16:30:53 UTC
@dflogera2: The intel_pstate driver only has two governors, powersave and performance.

After your first suspend, did the governor settings restore properly? For myself, and a lot of people they don't. Check via (and for example (not suspended yet)):

doug@s15:~/temp-k-git/linux$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave

Even after restoring the governors to powersave, myself and a lot of people, are finding the CPU frequencies no longer go down to the minimum value.
Comment 52 dflogeras2 2015-10-18 19:52:28 UTC
@Doug

Thanks for the clarification.  So is it stupid to have the ACPI_CPUFREQ at all?  It seems the intel pstate takes precedence.

After a resume my governors are all still showing powersave.  I attempted toggling them to performance and back to powersave, but the cpu_freq seems to never come below 2.3-2.4GHz, while min_freq is 800MHz.  (And on first boot, it does drop down to around 800MHz)

I am not running laptop-mode or TLP.

I run KDE4 on Gentoo, but I think the days that KDE mucked with the cpufreq are long gone.
Comment 53 Doug Smythies 2015-10-20 00:21:22 UTC
(In reply to dflogeras2 from comment #52)
> So is it stupid to have the ACPI_CPUFREQ at all?

I don't know. I use both, going back and forth for comparative test results.

> It seems the intel pstate takes precedence.

For most distributions, I think yes. Myself, I force what I want in the command line in grub.
 
> After a resume my governors are all still showing powersave.

Sorry, I wasn't clear. The issue depends on distribution and if systemd or the old way is used and the method used to suspend.

For example, if I suspend my test server via "sudo pm-suspend", it will restore the governor states properly. If I suspend via (as su) "echo mem > /sys/power/state", they won't. If I run an up to date live ubuntu 15.10 desktop USB on my wife's Laptop, close the lid and then open the lid, the governors will not restore properly. ...

> I attempted
> toggling them to performance and back to powersave, but the cpu_freq seems
> to never come below 2.3-2.4GHz, while min_freq is 800MHz.  (And on first
> boot, it does drop down to around 800MHz)

There are a few version if this issue, and it would help to figure out which one you have.

First, the min_pct is at 100 in powersave mode issue (unlikely for what you described for your case; Distro dependant; Fix accepted and will be in kernel 4.4-rc1 (I think). Check in powersave mode via:

grep . /sys/devices/system/cpu/intel_pstate/*_perf_*

/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42

(mine is O.K.)

Second, CPU frequencies do not go below some value after resume from suspend, no matter how idle the system is, The CPUs still go into deep C states though (probable for your case). The minimum CPU frequency seems to vary between processors and systems. Typically package power consumption increases a little.
Check via looking at all the settings that they are O.K.
Check that the MHZ never goes down to the minimum for the processor when the system is idle. This can be difficult on a desktop system, because always seem to have all kinds of stuff to do, and are never really idle.
Check what target p-states are being asked for and what the processor is giving. Example:

What is being asked for:
# rdmsr --bitfield 15:8 -d -a 0x199
16
16
16
16
16
16
16
16

What is being given:
# rdmsr --bitfield 15:8 -d -a 0x198
16
16
16
16
16
16
16
16

(mine is O.K.)

Third, very similar to the second case, except that the package never goes into deep C states. Typically package power consumption increases considerably and package temperature goes up (in my case 10 degrees).
In my case my CPU frequencies never go below 2.4 GHz:

Before suspend:
# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42
# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
#
# echo mem > /sys/power/state
#
After suspend:
# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:performance
# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100
# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done
# grep . /sys/devices/system/cpu/intel_pstate/*_perf_*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:42
# grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
# grep MHz /proc/cpuinfo
cpu MHz         : 2399.921
cpu MHz         : 2399.921
cpu MHz         : 2399.921
cpu MHz         : 2401.117
cpu MHz         : 2399.921
cpu MHz         : 2399.921
cpu MHz         : 2400.054
cpu MHz         : 2399.921

What is being given:
# rdmsr --bitfield 15:8 -d -a 0x198
24
24
24
24
24
24
24
24

What is being asked for:
# rdmsr --bitfield 15:8 -d -a 0x199
16
16
16
16
16
16
16
16
 
(The above was just an example. In my case the issue is kernel 4.3 specific, and has been isolated to a specific commit, but I have seen others with similar problems)
Comment 54 dflogeras2 2015-10-20 21:12:42 UTC
I am using Gentoo, without systemd so I think what you refer to as the old way.  KDE uses the old udev/upower pm-utils package to manage suspend.

So for your tests, after a suspend/resume:

1) My min_perf_pct is not stuck on 100.  It was 30, max_perf_pct was 100

2) All my cpuX cores are still showing they are set to use powersave governor

3) # grep MHz /proc/cpuinfo shows (only 4 'cores') but similar to your example: everything stuck at ~2.4GHz

4) rdmsr --bitfield 15:8 -d -a 0x199
8
8
8
8

or occasionally something like:
8
8
8
12



5) rdmsr --bitfield 15:8 -d -a 0x198
23
23
23
26

or sometimes:
23
23
23
23

I have no idea what the rdmsr values mean, but it seems that it is not giving what is being asked for in your language?
Comment 55 Doug Smythies 2015-10-20 22:42:35 UTC
Created attachment 190661 [details]
An example turbostat capture

O.K. thanks. Your situation is likely the "second" scenario. You can use turbostat (see: tools/power/x86/turbostat/turbostat.c) to know for sure between the "second" and "third" scenarios. The attachment is an example "third" scenario, with notes at the end as to what to look for.

I am not the expert, but of recent I have seen many computers that suffer from at least the "second" scenario. The "third" scenario is less common.
Comment 56 dflogeras2 2015-10-20 23:26:33 UTC
Thanks for the introduction to turbostat, this is great!

Yes, it seems that on the laptop (CPU 0x06:45:1) when it settles down (pre-suspend) it averages ~1GHz, spending the lions share [98%] in c7 state (with very small bits of c1 and c3).

After a suspend/resume, the cores are staying ~2.4Ghz, but the distributions of c1, c3 and c7 seem about the same.



I tried this on my Desktop (CPU 0x06:3c:3) and saw no such behaviour.  Pre and post suspend both clocked down to sub GHz (and both spent most time in state c7)
Comment 57 Zhang Rui 2016-05-18 08:27:22 UTC
can you please check if the problem still exists in the latest upstream kernel, say 4.6?
Comment 58 Alex Lochmann 2016-05-19 07:08:55 UTC
As far as i can tell, the changes introduced by 4.6 make things way better.
According to i7z, the frequency is still a bit higher compared to a 4.4 kernel with a patchset from Doug Smythies. But it seems the cpu makes heavily use of the sleep states. As a result, the cpu temperature stays at a moderate level, and the fan is very quiet. I think we can carefully consider this issue to be solved.... Maybe.
Comment 59 dflogeras2 2016-05-20 23:30:47 UTC
I also am seeing an improvement from past behaviour on the same hardware.  One thing I am noticing from turbostat is that prior to suspend, the GFXMHZ is reported as 500. After suspend/resume, it is locked at 1000.
Comment 60 dflogeras2 2016-05-20 23:40:13 UTC
Actually on closer inspection after a bunch of runs, it seems like turbostat just reports the first value of GFXMHZ when it is invoked and never updates it.

If I start turbostat after the laptop is idle for a while it might report a value of 200, but never changes even if I do something graphics intensive.  Similarly, if I am doing something graphical when I start turbostat it might report 1000, but never goes down if I let the machine idle.  Only after restarting turbostat.  Possibly a turbostat bug?
Comment 61 Doug Smythies 2016-05-22 22:48:05 UTC
(In reply to Alex Lochmann from comment #58)
> As far as i can tell, the changes introduced by 4.6 make things way better.
> According to i7z, the frequency is still a bit higher compared to a 4.4
> kernel with a patchset from Doug Smythies. But it seems the cpu makes
> heavily use of the sleep states. As a result, the cpu temperature stays at a
> moderate level, and the fan is very quiet. I think we can carefully consider
> this issue to be solved.... Maybe.

Alex: As far as I know, the improvements you are seeing are more related to bug 93521 and bug 115771 than this one.

This one was closed as "UNREPRODUCIBLE", but I can easily reproduce it (at least the some of the stuff I mentioned in comment 53). That being said, I have moved on to newer versions of Ubuntu, and those issues not longer exist, so I wouldn't re-open this bug. I don't know that it was ever a kernel problem, or some sort of initialization problem.

Note You need to log in before you can comment on or make changes to this bug.