Bug 80651
Summary: | [intel_pstate] cpu sticks at high freq after a round of cpu offline/online Haswell / i7-4700HQ | ||
---|---|---|---|
Product: | Power Management | Reporter: | Tobias Jakobi (liquid.acid) |
Component: | intel_pstate | Assignee: | Chen Yu (yu.c.chen) |
Status: | CLOSED INVALID | ||
Severity: | normal | CC: | alexey.brodkin, dsmythies, jgeboski, kadir, kristen.c.accardi, lenb, rui.zhang, tianyu.lan, yu.c.chen |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 3.15.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg output
kernel config i7z output Patch to not lose settings on cpu offline output of: grep . /sys/devices/system/cpu/cpu*/cpufreq/* turbostat before suspend turbostat after resume Compare powers, CPUs loaded 100% and CPUs offline through a suspend |
Created attachment 143441 [details]
kernel config
Created attachment 143451 [details]
i7z output
This is some generic output by i7z when the cpu is in 'low' perf mode (max_perf_pct=55, min_perf_pct=23, no_turbo=1).
After resume, while these settings above didn't change, the multiplicator sticks to around 24.5.
I realized that changing and restoring the values doesn't quite do it. The issue seems related to the fact that I usually have two of the four cores disabled (by offlining the cpus that are associated to core ids 2 and 3). I have to bring all cores online again, then set the values, and then I can take core 2 and 3 offline again. Created attachment 143721 [details]
Patch to not lose settings on cpu offline
Can you try this patch it should preserve your sysfs settings.
Hello Dirk, the patch doesn't fix the issue. I like to point out that there seems to be a misunderstanding here: The problem is not that CPUs that are offline lose their settings on suspend/resume. It's the CPUs that are still online that lose their settings. Or rather, something seems to overwrite the settings, effectively locking their multiplicator to the highest setting. Can you send the steps to reproduce and the the output of grep . sys/devices/system/cpu/intel_pstate/* before and after suspend? (In reply to Dirk Brandewie from comment #6) > Can you send the steps to reproduce and the the output of > grep . sys/devices/system/cpu/intel_pstate/* > before and after suspend? Also grep . /sys/devices/system/cpu/cpu*/cpufreq/* To reproduce: # setup pstate echo -n 23 > /sys/devices/system/cpu/intel_pstate/min_perf_pct echo -n 55 > /sys/devices/system/cpu/intel_pstate/max_perf_pct echo -n 1 > /sys/devices/system/cpu/intel_pstate/no_turbo # take cores 2 and 3 offline for arg in 2 3 6 7; do echo -n 0 > /sys/devices/system/cpu/cpu${arg}/online done ------------------------------ On my system: Core 2 provides cpu2 and cpu6 Core 3 provides cpu3 and cpu7 Created attachment 143771 [details]
output of: grep . /sys/devices/system/cpu/cpu*/cpufreq/*
grep . /sys/devices/system/cpu/intel_pstate/* /sys/devices/system/cpu/intel_pstate/max_perf_pct:55 /sys/devices/system/cpu/intel_pstate/min_perf_pct:23 /sys/devices/system/cpu/intel_pstate/no_turbo:1 -------------------------- The output is the same before and after suspend. I have reproduced this. intel_pstate is requesting a lower P state but it is being ignored. I am trying to find the person that can explain to me how this can happen. I have the same problem after resume, but for me the multiplicator is stuck at the lowest value (8), very rarely rising to 10 or 12, but not higher. No such problems with acpi-cpufreq. Arch Linux x86-64, Kernel 3.16.1, Sandy Bridge i7-2620M CPU @ 2.70GHz it would be interesting to see turbostat output for the idle system before and after the suspend. If the package RAPL counter says we're using more power after, then it seems that the offline cpus are really not idle, but are busy in the BIOS. Created attachment 150301 [details]
turbostat before suspend
Hello,
here's the turbostat output before the suspend. Package watt is around 5.6, CPU mostly in C7, package state mostly in pc2 (anyway, why doesn't this go lower?).
Created attachment 150311 [details]
turbostat after resume
And here's the output after the resume. Package watt is now suddenly over 14 and also the package state counter seems to be broken.
So what's with this NEEDINFO status? Am I supposed to set it to something else now that I've provided the info? This is still an issue with vanilla 3.17.3. PkgWatt before suspend: 5.61 After suspend: 14.24 The problem is somewhere in the suspend flow itself with the manual offlining of the cpus. We don't believe this is an intel_pstate problem. Will discuss with Rafael to see how to disposition this further. Let me know if I can test something more on this side. I'm seeing this issue (multiplier stuck with value 8 after suspend) even on 3.19.3-200.fc21.x86_64. (In reply to Kristen from comment #18) > The problem is somewhere in the suspend flow itself with the manual > offlining of the cpus. We don't believe this is an intel_pstate problem. > Will discuss with Rafael to see how to disposition this further. Question: If true, then shouldn't the same issue occur when using the acpi-cpufreq scaling driver? Answer: Yes, the same issue occurs when using the acpi-cpufreq driver instead. Theory: After a suspend, nobody told those offline CPUs to be offline, and they actually are not offline, although the system thinks they are offline. Therefore they are holding the PLL at the non turbo max frequency, regardless of what the other CPU target pstates are set to. They are also responsible for burning all the excess power, as they are 100% in the C0 state, doing what I do not know. The only evidence I have to support the theory is that the turbostat power numbers all make sense on my system. I can also get things back to normal by bringing those CPUs back online (as seen by the system). And the turbostat power numbers continue to make sense as I do so. Created attachment 187021 [details] Compare powers, CPUs loaded 100% and CPUs offline through a suspend I guess my comment 21 was just supporting Len's comment 13 (which somehow I missed before). Please note, as of kernel 4.2 pm-suspend does not work if the highest numbered CPU is offline. You have to edit /usr/lib/pm-utils/sleep.d/94cpufreq and force a return code of 0 for the subroutine hibernate_cpufreq() to make it work again (thanks Rafael for the suggestion). There seems to be two types of problem covered in this bug report. CPU frequencies stuck high when 1 or more CPUs are offline during a suspend/resume and CPU frequencies stuck low (or lowish) after a suspend/resume. Users suffering from stuck low CPU frequencies after suspend should try kernel 4.2 and report back. Addendum: For the previous graph, I forgot to mention that loading of CPUs and number of CPUs offline was done core by core. Meaning, for my CPU, CPU loading was done as 0, then add 4, completing core 0, then add 1 then add 5, completing core 1 ... Similarly for the offline stuff. My system exhibits the "stuck low" problem with intel_pstate, but not with acpi-cpufreq. Still does on 4.2. It's a Dell e6420 laptop running Arch Linux x86-64, Kernel 4.2.2, with a Sandy Bridge i7-2620M CPU @ 2.70GHz. The problem appears ONLY when resuming on battery power (not sure if this was the case before 4.2, I've been using acpi-cpufreq), and ONLY affects the powersave governor. It does not matter which governor is used at the moment the system suspends/resumes. If I suspend while using performance and resume (on battery), frequency scaling works. But when I then switch to powersave, the multiplicator is stuck at 8. If I then resume on AC, powersave works again. Disconnecting AC while using powersave (without suspending/resuming) does not result in the "stuck low" problem. The values in /sys/devices/system/cpu/intel_pstate/* are not changed when the "stuck low" problem is present. Manually changing them does not seem to have any effect. (While testing I discovered that the system draws slightly less power on pstate/performance than on cpufreq/ondemand even when idling, so I'll stick with pstate/performance for now.) @uhkeller: O.K. thanks. When you say "the multiplicator is stuck at 8", does that mean your CPU frequencies are in the 800MHz range? or are you observing lower CPU frequencies, typically in the 600MHz range? I ask because your issue sounds very much like the Clock Modulation issue. You can further check by reading the MSRs directly. However, this becomes a bit of a saga for the Arch linux distribution, because it is my understanding that there is no msr-tools package, so you need to compile it yourself. Someone on an Arch forum [1] has "made an AUR arch package for it" available at [2]. In the CPU frequencies stuck low (below 800 Mhz) state do: sudo rdmsr -a 0x19a And post the results. If bit 4 is set, you can also try to clear the issue via: sudo wrmsr -a 0x19a 0x0 and check it: sudo rdmsr -a 0x19a Are the CPU frequencies O.K. now? If your issue turns out to be not due to Clock Modulation, then I think we will want to acquire some trace data with your 4.2.2 kernel. If you issue is Clock Modulation, then please chime in on [3] and complain. [1] https://bbs.archlinux.org/viewtopic.php?id=199922 [2] https://aur.archlinux.org/packages/msr-tools/ [3] http://en.community.dell.com/support-forums/laptop/f/3518/t/19634759 @Doug Smythies: Thank you very much. Clock modulation seems to be the issue, "rdmsr -a 0x19a" returned "1e" for each core, and luckily it is fixed by "wrmsr -a 0x19a 0x0". I will chime in at the Dell support site you linked to (thanks again), even though it seems Dell is not particularly interested. Just out of curiosity: is it expected that the clock modulation problem affects only the powersave governor, not the performance governor? (In reply to uhkeller from comment #26) > > Just out of curiosity: is it expected that the clock modulation problem > affects only the powersave governor, not the performance governor? Yes. However, you will find your highest CPU frequency in performance mode is actually 87.5% of max for "1e" in register "19a". The same is true for the acpi-cpufreq driver, with any governor. It is just that most users don't notice the issue when they are using the acpi-cpufreq driver, but because in its current form the intel_pstate driver is not compatible with Clock Modulation users notice. Hi Doug, Are you planning to send the patch to caculate busy_scale using clock modulation to maillist? And according to your #Comment 27, does acpi-cpufreq also need this fix too? thanks. Yu (In reply to Chen Yu from comment #28) > Hi Doug, > Are you planning to send the patch to caculate busy_scale using clock > modulation to maillist? Hi Yu, I saw your on-list e-mail of 2015.11.12, asking if I was going to submit a formal version of the test patch. I haven't yet. Myself, ultimately I think some sort of real load calculation is needed in the intel_pstate frequency scaling driver, which would eliminate the need for this patch. However, since progress is slow on that front, perhaps this patch should be submitted in the interim. > And according to your #Comment 27, does acpi-cpufreq also need this fix too? No. The acpi-cpufreq driver works, in my opinion, properly with clock modulation. The test patch I used, makes the intel_pstate driver respond to clock modulation the same way the acpi-cpufreq driver does already. The following chart shows CPU 7 frequency for various levels of clock modulation for my processor ("Yu norm" is using your normalization suggestion, "doug" is based on a load based patch set I submitted on 2015.04.11 (recently rejected)): CPU 7 100% load - Frequency measured with turbostat Kernel: 4.3 : Processor: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz Clock Modulation intel_pstate acpi-cpufreq current Yu norm. doug acpi Expected (percent) (MHz) (MHz) (MHz) (MHz) (MHz) Disabled 3806 3800 3810 3811 3812 87.50% 1405 3300 3312 3311 3335.5 75.00% 1204 2800 2811 2809 2859 62.50% 1003 2300 2309 2308 2382.5 50.00% 803 1900 1907 1906 1906 37.50% 611 1400 1405 1405 1429.5 25.00% 409 900 903 903 953 12.50% 210 507 508 508 476.5 reserved 803 N.A. 1907 1906 ?? @Tobias Jakobi , since the history of this thread is a little long, could you please check with latest 4.6.0-rc7? I assume you don't have the Clock Modulation issue? and there is quite a lot of change/fix in recent intel_pstate @Doug Smythies, for Clock Modulation issue, how about restoring the Clock Modulation to previous one before suspend, in intel_pstate.resume callback? like this commit: commit ba41e1bc28bd862089b0fc00e8136aa258a62b21 cpufreq: intel_pstate: Fix HWP on boot CPU after system resume Clock Modulation should be transparent to intel_pstate, right? (In reply to Chen Yu from comment #30) > @Doug Smythies, for Clock Modulation issue, how about restoring the Clock > Modulation to previous one before suspend, in intel_pstate.resume callback? > like this commit: > commit ba41e1bc28bd862089b0fc00e8136aa258a62b21 > cpufreq: intel_pstate: Fix HWP on boot CPU after system resume I do not think (but do not know for sure) that would solve the case where a dell LapTop was booted on battery power. (In reply to Chen Yu from comment #31) > Clock Modulation should be transparent to intel_pstate, right? I'm not sure what you mean. In it's current form, the intel_pstate driver is incompatible with any use of Clock Modulation. (And with recent changes, I mean "get_target_pstate_use_performance". While I didn't test it, "get_target_pstate_use_cpu_load" should be O.K.). I no longer own the system. Closing as 'obsolete'. The original issue reported herein persists through kernel 4.7-rc3. Just because Tobias (the OP) no longer owns his related system, is no reason to close this. Re-opening. I guess I don't have powers that enable me to re-open this. Some one please re-open it. The original issue might persist, but I'm not interested in status reports from this bug anyway. Please open your own bug. @Chen Yu: Do you have sufficient privileges to re-open this, and maybe delete Tobias? I do not want to loose the history herein, nor make a new bug report. reopen because this problem exists on Doug's system. Please stop spamming my e-mail address! I have a Dell e6320 and I also have this problem. I have used the "solution" provided in this bugreport ever since https://bugzilla.kernel.org/show_bug.cgi?id=90041 Right now I am on Fedora 24 (4.5.7-300.fc24.x86_64) and the bug is still there. @Chen Yu: Since Tobias keeps closing this one, I have entered bug 121051 to replace this one. Kristen still gets assigned intel_pstate bug reports, which is no longer correct. Will you take it? @Kadir: Are you sure you have the original problem of this bug report? There was some cross posting into this one about Clock Modulation, which is not really what this one was about. @Doug: You are correct. The bug that my laptop has, is covered in the report at https://bugzilla.kernel.org/show_bug.cgi?id=90041 I have been following both bugreports for quite some time in search of a fix. I hope a fix is coming soon, so that I don't have do the workaround as described by you at the other burgreport. |
Created attachment 143431 [details] dmesg output Hello, the patches that went into 3.15.6 made the intel_pstate driver finally useable for me. At least as long as I don't suspend/resume the system. After resume the multiplicator is locked to the highest one. I have to change min_perf_pct and max_perf_pct to some other values and then back to restore the behaviour before the suspend. CPU is a Haswell / i7-4700HQ. With best wishes, Tobias