Bug 80651

Summary: [intel_pstate] cpu sticks at high freq after a round of cpu offline/online Haswell / i7-4700HQ
Product: Power Management Reporter: Tobias Jakobi (liquid.acid)
Component: intel_pstateAssignee: Chen Yu (yu.c.chen)
Status: CLOSED INVALID    
Severity: normal CC: alexey.brodkin, dsmythies, jgeboski, kadir, kristen.c.accardi, lenb, rui.zhang, tianyu.lan, yu.c.chen
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 3.15.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
kernel config
i7z output
Patch to not lose settings on cpu offline
output of: grep . /sys/devices/system/cpu/cpu*/cpufreq/*
turbostat before suspend
turbostat after resume
Compare powers, CPUs loaded 100% and CPUs offline through a suspend

Description Tobias Jakobi 2014-07-18 22:37:22 UTC
Created attachment 143431 [details]
dmesg output

Hello,

the patches that went into 3.15.6 made the intel_pstate driver finally useable for me. At least as long as I don't suspend/resume the system.

After resume the multiplicator is locked to the highest one. I have to change min_perf_pct and max_perf_pct to some other values and then back to restore the behaviour before the suspend.

CPU is a Haswell / i7-4700HQ.

With best wishes,
Tobias
Comment 1 Tobias Jakobi 2014-07-18 22:38:19 UTC
Created attachment 143441 [details]
kernel config
Comment 2 Tobias Jakobi 2014-07-18 22:41:54 UTC
Created attachment 143451 [details]
i7z output

This is some generic output by i7z when the cpu is in 'low' perf mode (max_perf_pct=55, min_perf_pct=23, no_turbo=1).

After resume, while these settings above didn't change, the multiplicator sticks to around 24.5.
Comment 3 Tobias Jakobi 2014-07-20 10:43:44 UTC
I realized that changing and restoring the values doesn't quite do it. The issue seems related to the fact that I usually have two of the four cores disabled (by offlining the cpus that are associated to core ids 2 and 3).

I have to bring all cores online again, then set the values, and then I can take core 2 and 3 offline again.
Comment 4 Dirk Brandewie 2014-07-21 14:04:12 UTC
Created attachment 143721 [details]
Patch to not lose settings on cpu offline

Can you try this patch it should preserve your sysfs settings.
Comment 5 Tobias Jakobi 2014-07-21 15:28:53 UTC
Hello Dirk, the patch doesn't fix the issue.

I like to point out that there seems to be a misunderstanding here: The problem is not that CPUs that are offline lose their settings on suspend/resume. It's the CPUs that are still online that lose their settings. Or rather, something seems to overwrite the settings, effectively locking their multiplicator to the highest setting.
Comment 6 Dirk Brandewie 2014-07-21 15:54:19 UTC
Can you send the steps to reproduce and the the output of
   grep . sys/devices/system/cpu/intel_pstate/* 
before and after suspend?
Comment 7 Dirk Brandewie 2014-07-21 15:57:26 UTC
(In reply to Dirk Brandewie from comment #6)
> Can you send the steps to reproduce and the the output of
>    grep . sys/devices/system/cpu/intel_pstate/* 
> before and after suspend?
Also 
grep . /sys/devices/system/cpu/cpu*/cpufreq/*
Comment 8 Tobias Jakobi 2014-07-21 16:05:35 UTC
To reproduce:

# setup pstate
echo -n 23 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
echo -n 55 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
echo -n 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

# take cores 2 and 3 offline
for arg in 2 3 6 7; do
  echo -n 0 > /sys/devices/system/cpu/cpu${arg}/online
done

------------------------------

On my system:
Core 2 provides cpu2 and cpu6
Core 3 provides cpu3 and cpu7
Comment 9 Tobias Jakobi 2014-07-21 16:07:05 UTC
Created attachment 143771 [details]
output of: grep . /sys/devices/system/cpu/cpu*/cpufreq/*
Comment 10 Tobias Jakobi 2014-07-21 16:10:03 UTC
grep . /sys/devices/system/cpu/intel_pstate/*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:55
/sys/devices/system/cpu/intel_pstate/min_perf_pct:23
/sys/devices/system/cpu/intel_pstate/no_turbo:1

--------------------------

The output is the same before and after suspend.
Comment 11 Dirk Brandewie 2014-07-22 16:02:54 UTC
I have reproduced this.  intel_pstate is requesting a lower P state but it is being ignored. I am trying to find the person that can explain to me how this can happen.
Comment 12 uhkeller 2014-09-05 07:41:33 UTC
I have the same problem after resume, but for me the multiplicator is stuck at the lowest value (8), very rarely rising to 10 or 12, but not higher. No such problems with acpi-cpufreq.

Arch Linux x86-64, Kernel 3.16.1, Sandy Bridge i7-2620M CPU @ 2.70GHz
Comment 13 Len Brown 2014-09-09 16:23:15 UTC
it would be interesting to see turbostat output for the idle system
before and after the suspend.  If the package RAPL counter says
we're using more power after, then it seems that the offline cpus
are really not idle, but are busy in the BIOS.
Comment 14 Tobias Jakobi 2014-09-14 19:51:29 UTC
Created attachment 150301 [details]
turbostat before suspend

Hello,

here's the turbostat output before the suspend. Package watt is around 5.6, CPU mostly in C7, package state mostly in pc2 (anyway, why doesn't this go lower?).
Comment 15 Tobias Jakobi 2014-09-14 19:53:09 UTC
Created attachment 150311 [details]
turbostat after resume

And here's the output after the resume. Package watt is now suddenly over 14 and also the package state counter seems to be broken.
Comment 16 Tobias Jakobi 2014-09-23 16:42:06 UTC
So what's with this NEEDINFO status? Am I supposed to set it to something else now that I've provided the info?
Comment 17 Tobias Jakobi 2014-11-16 19:56:29 UTC
This is still an issue with vanilla 3.17.3.

PkgWatt before suspend: 5.61
After suspend: 14.24
Comment 18 Kristen 2014-11-18 18:26:26 UTC
The problem is somewhere in the suspend flow itself with the manual offlining of the cpus.  We don't believe this is an intel_pstate problem.  Will discuss with Rafael to see how to disposition this further.
Comment 19 Tobias Jakobi 2014-11-18 21:16:14 UTC
Let me know if I can test something more on this side.
Comment 20 Alexey Brodkin 2015-04-24 19:40:40 UTC
I'm seeing this issue (multiplier stuck with value 8 after suspend) even on 3.19.3-200.fc21.x86_64.
Comment 21 Doug Smythies 2015-09-04 06:39:02 UTC
(In reply to Kristen from comment #18)
> The problem is somewhere in the suspend flow itself with the manual
> offlining of the cpus.  We don't believe this is an intel_pstate problem. 
> Will discuss with Rafael to see how to disposition this further.

Question: If true, then shouldn't the same issue occur when using the acpi-cpufreq scaling driver?

Answer: Yes, the same issue occurs when using the acpi-cpufreq driver instead.

Theory: After a suspend, nobody told those offline CPUs to be offline, and they actually are not offline, although the system thinks they are offline. Therefore they are holding the PLL at the non turbo max frequency, regardless of what the other CPU target pstates are set to. They are also responsible for burning all the excess power, as they are 100% in the C0 state, doing what I do not know.

The only evidence I have to support the theory is that the turbostat power numbers all make sense on my system. I can also get things back to normal by bringing those CPUs back online (as seen by the system). And the turbostat power numbers continue to make sense as I do so.
Comment 22 Doug Smythies 2015-09-08 04:32:28 UTC
Created attachment 187021 [details]
Compare powers, CPUs loaded 100% and CPUs offline through a suspend

I guess my comment 21 was just supporting Len's comment 13 (which somehow I missed before).

Please note, as of kernel 4.2 pm-suspend does not work if the highest numbered CPU is offline. You have to edit /usr/lib/pm-utils/sleep.d/94cpufreq and force a return code of 0 for the subroutine hibernate_cpufreq() to make it work again (thanks Rafael for the suggestion).

There seems to be two types of problem covered in this bug report. CPU frequencies stuck high when 1 or more CPUs are offline during a suspend/resume and CPU frequencies stuck low (or lowish) after a suspend/resume. Users suffering
from stuck low CPU frequencies after suspend should try kernel 4.2 and report back.
Comment 23 Doug Smythies 2015-09-08 04:39:49 UTC
Addendum: For the previous graph, I forgot to mention that loading of CPUs and number of CPUs offline was done core by core. Meaning, for my CPU, CPU loading was done as 0, then add 4, completing core 0, then add 1 then add 5, completing core 1 ... Similarly for the offline stuff.
Comment 24 uhkeller 2015-10-02 07:30:05 UTC
My system exhibits the "stuck low" problem with intel_pstate, but not with acpi-cpufreq. Still does on 4.2. It's a Dell e6420 laptop running Arch Linux x86-64, Kernel 4.2.2, with a Sandy Bridge i7-2620M CPU @ 2.70GHz.

The problem appears ONLY when resuming on battery power (not sure if this was the case before 4.2, I've been using acpi-cpufreq), and ONLY affects the powersave governor.

It does not matter which governor is used at the moment the system suspends/resumes. If I suspend while using performance and resume (on battery), frequency scaling works. But when I then switch to powersave, the multiplicator is stuck at 8. If I then resume on AC, powersave works again. Disconnecting AC while using powersave (without suspending/resuming) does not result in the "stuck low" problem.

The values in /sys/devices/system/cpu/intel_pstate/* are not changed when the "stuck low" problem is present. Manually changing them does not seem to have any effect.

(While testing I discovered that the system draws slightly less power on pstate/performance than on cpufreq/ondemand even when idling, so I'll stick with pstate/performance for now.)
Comment 25 Doug Smythies 2015-10-04 16:54:59 UTC
@uhkeller: O.K. thanks. When you say "the multiplicator is stuck at 8", does that mean your CPU frequencies are in the 800MHz range? or are you observing lower CPU frequencies, typically in the 600MHz range? I ask because your issue sounds very much like the Clock Modulation issue.

You can further check by reading the MSRs directly. However, this becomes a bit of a saga for the Arch linux distribution, because it is my understanding that there is no msr-tools package, so you need to compile it yourself. Someone on an Arch forum [1] has "made an AUR arch package for it" available at [2].

In the CPU frequencies stuck low (below 800 Mhz) state do:

sudo rdmsr -a 0x19a

And post the results.
If bit 4 is set, you can also try to clear the issue via:

sudo wrmsr -a 0x19a 0x0
and check it:
sudo rdmsr -a 0x19a
Are the CPU frequencies O.K. now?

If your issue turns out to be not due to Clock Modulation, then I think we will want to acquire some trace data with your 4.2.2 kernel.
If you issue is Clock Modulation, then please chime in on [3] and complain.

[1] https://bbs.archlinux.org/viewtopic.php?id=199922
[2] https://aur.archlinux.org/packages/msr-tools/
[3] http://en.community.dell.com/support-forums/laptop/f/3518/t/19634759
Comment 26 uhkeller 2015-10-05 03:52:15 UTC
@Doug Smythies: Thank you very much. Clock modulation seems to be the issue, "rdmsr -a 0x19a" returned "1e" for each core, and luckily it is fixed by "wrmsr -a 0x19a 0x0".

I will chime in at the Dell support site you linked to (thanks again), even though it seems Dell is not particularly interested.

Just out of curiosity: is it expected that the clock modulation problem affects only the powersave governor, not the performance governor?
Comment 27 Doug Smythies 2015-10-05 13:57:29 UTC
(In reply to uhkeller from comment #26)
> 
> Just out of curiosity: is it expected that the clock modulation problem
> affects only the powersave governor, not the performance governor?

Yes. However, you will find your highest CPU frequency in performance mode is actually 87.5% of max for "1e" in register "19a". The same is true for the acpi-cpufreq driver, with any governor. It is just that most users don't notice the issue when they are using the acpi-cpufreq driver, but because in its current form the intel_pstate driver is not compatible with Clock Modulation users notice.
Comment 28 Chen Yu 2015-11-18 12:36:59 UTC
Hi Doug,
Are  you planning to send the patch to caculate busy_scale using clock modulation to maillist?
And according to your #Comment 27, does acpi-cpufreq also need this fix too?
thanks.

Yu
Comment 29 Doug Smythies 2015-11-18 16:09:07 UTC
(In reply to Chen Yu from comment #28)
> Hi Doug,
> Are  you planning to send the patch to caculate busy_scale using clock
> modulation to maillist?

Hi Yu,
I saw your on-list e-mail of 2015.11.12, asking if I was going to submit a formal version of the test patch. I haven't yet.
Myself, ultimately I think some sort of real load calculation is needed in the intel_pstate frequency scaling driver, which would eliminate the need for this patch. However, since progress is slow on that front, perhaps this patch should be submitted in the interim.

> And according to your #Comment 27, does acpi-cpufreq also need this fix too?

No. The acpi-cpufreq driver works, in my opinion, properly with clock modulation. The test patch I used, makes the intel_pstate driver respond to clock modulation the same way the acpi-cpufreq driver does already. The following chart shows CPU 7 frequency for various levels of clock modulation for my processor ("Yu norm" is using your normalization suggestion, "doug" is based on a load based patch set I submitted on 2015.04.11 (recently rejected)):

CPU 7 100% load - Frequency measured with turbostat							
Kernel: 4.3 : Processor: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz							
Clock Modulation	intel_pstate	acpi-cpufreq
		current	Yu norm. doug	acpi	Expected
(percent)	(MHz)	(MHz)	(MHz)	(MHz)	(MHz)	
Disabled	3806	3800	3810	3811	3812
87.50%		1405	3300	3312	3311	3335.5
75.00%		1204	2800	2811	2809	2859
62.50%		1003	2300	2309	2308	2382.5
50.00%		803	1900	1907	1906	1906
37.50%		611	1400	1405	1405	1429.5
25.00%		409	900	903	903	953
12.50%		210	507	508	508	476.5
reserved	803	N.A.	1907	1906	??
Comment 30 Chen Yu 2016-05-11 08:48:24 UTC
@Tobias Jakobi  , since the history of this thread is a little long, could you please check with latest  4.6.0-rc7?  I assume you don't have the Clock Modulation issue? and there is quite a lot of change/fix in recent intel_pstate

@Doug Smythies, for Clock Modulation issue, how about restoring the Clock Modulation  to previous one before suspend, in intel_pstate.resume callback? like this commit:
commit	ba41e1bc28bd862089b0fc00e8136aa258a62b21
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
Comment 31 Chen Yu 2016-05-11 08:50:15 UTC
Clock Modulation should be transparent to intel_pstate, right?
Comment 32 Doug Smythies 2016-05-18 15:43:06 UTC
(In reply to Chen Yu from comment #30)
> @Doug Smythies, for Clock Modulation issue, how about restoring the Clock
> Modulation  to previous one before suspend, in intel_pstate.resume callback?
> like this commit:
> commit        ba41e1bc28bd862089b0fc00e8136aa258a62b21
> cpufreq: intel_pstate: Fix HWP on boot CPU after system resume

I do not think (but do not know for sure) that would solve the case where a dell LapTop was booted on battery power.

(In reply to Chen Yu from comment #31)
> Clock Modulation should be transparent to intel_pstate, right?

I'm not sure what you mean. In it's current form, the intel_pstate driver is incompatible with any use of Clock Modulation. (And with recent changes, I mean "get_target_pstate_use_performance". While I didn't test it, "get_target_pstate_use_cpu_load" should be O.K.).
Comment 33 Tobias Jakobi 2016-06-19 10:43:43 UTC
I no longer own the system. Closing as 'obsolete'.
Comment 34 Doug Smythies 2016-06-19 15:00:26 UTC
The original issue reported herein persists through kernel 4.7-rc3. Just because Tobias (the OP) no longer owns his related system, is no reason to close this. Re-opening.
Comment 35 Doug Smythies 2016-06-19 15:02:08 UTC
I guess I don't have powers that enable me to re-open this. Some one please re-open it.
Comment 36 Tobias Jakobi 2016-06-19 15:44:33 UTC
The original issue might persist, but I'm not interested in status reports from this bug anyway. Please open your own bug.
Comment 37 Doug Smythies 2016-06-23 21:09:18 UTC
@Chen Yu: Do you have sufficient privileges to re-open this, and maybe delete Tobias? I do not want to loose the history herein, nor make a new bug report.
Comment 38 Chen Yu 2016-06-24 00:38:29 UTC
reopen because this problem exists on Doug's system.
Comment 39 Tobias Jakobi 2016-06-24 13:26:31 UTC
Please stop spamming my e-mail address!
Comment 40 Kadir 2016-06-26 10:51:18 UTC
I have a Dell e6320 and I also have this problem. I have used the "solution" provided in this bugreport ever since https://bugzilla.kernel.org/show_bug.cgi?id=90041

Right now I am on Fedora 24 (4.5.7-300.fc24.x86_64) and the bug is still there.
Comment 41 Doug Smythies 2016-06-27 14:57:46 UTC
@Chen Yu: Since Tobias keeps closing this one, I have entered bug 121051 to replace this one. Kristen still gets assigned intel_pstate bug reports, which is no longer correct. Will you take it?

@Kadir: Are you sure you have the original problem of this bug report? There was some cross posting into this one about Clock Modulation, which is not really what this one was about.
Comment 42 Kadir 2016-06-27 15:12:29 UTC
@Doug: You are correct. The bug that my laptop has, is covered in the report at https://bugzilla.kernel.org/show_bug.cgi?id=90041

I have been following both bugreports for quite some time in search of a fix. I hope a fix is coming soon, so that I don't have do the workaround as described by you at the other burgreport.