Created attachment 86591 [details] /proc/cpuinfo With 3.7-rc5 and -rc6 sometimes (often?) after a reboot my PC's cores (on an i7-2630QM) all remaing running at 2GHz, even after their governors are changed to ondemand (this is done by the /etc/init.d/ondemand script approximately one minute after it runs at boot). I'm not running anything especially taxing at this stage, just a Ubuntu unity desktop that is idling away. Note that when this happens, /proc/cpuinfo incorrectly indicates that the cores are running are 800MHz (does the ondemand governor use this to determine if it needs to change speed?) but that this disagrees with the readings from powertop, cpufreq-info and (eg) /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq. In /sys/devices/system/cpu/cpu0/cpufreq/, scaling_governor indicates ondemand is being used, and cpuinfo_min_freq is 800000. powertop indicates power usage of 35W (compared to a more normal 22W), and acpi -t shows 75C instead of the more normal 45-55C. Changing the governor to performance and back again doesn't help, but a suspend/resume cycle seems to fix it. I'll attach the output from cat /proc/cpuinfo and cpufreq-info when the problem is being exhibited.
Created attachment 86601 [details] cpufreq-info output
please show the output from turbostat when this problem is happening. you can find turbostat in the kernel source tree under tools/power/x86/turbostat also, please verify with top that nothing is running when the system is idle.
I definitely verified with top that nothing was using significant CPU the last time this happened. I'll try running turbostat next time I see it (it doesn't happen on every reboot).
Created attachment 87151 [details] Log output from turbostat and other commands Here's the output from turbostat, top, and cpuinfo after I booted the system and the cores stayed stuck in max frequency. I also included the output from turbostat after a suspend/resume cycle, when the cores had successfully clocked down to their min frequencies. One point of strangeness is that first turbostat indicates frequencies up to 2.45GHz, which I thought is higher than my system allows. The subsequent turbostat shows the cores running at around 2GHz. I suppose it is possible that I ran the first command while the system was still using the performance governor - could this be why the cores report 2.45GHz? The second turbostat definitely was running when the ondemand governor was running, as you can see from the 800Mhz reported in cpuinfo, and my frequency applet also was reporting that the system was running in low performance mode.
re: /proc/cpuinfo MHz value I don't think that is used for anything, or means anything. We really should delete that, as all it does is confuse people. re: turbostat output yes, it looks like you are stuck at high frequency even when mostly idle, but this condition goes away after suspend/resume. Does the system behave normally after it breaks out of the initial problem?
> re: /proc/cpuinfo MHz value Yes, it surely must confuse people! I found several sites recommending it. Additionally, when my system isn't experiencing this bug, it does seem to be accurate. > Does the system behave normally after it breaks out of the initial problem? Yes, after suspend/resume the system works just fine, as far as I can tell.
This is still a problem in 3.7-rc7. Is there any other info I can provide or things I can try?
Created attachment 88361 [details] turbostat after boot showing too-high GHz readings This is still a problem in 3.7-rc8. I ran turbostat (log attached) just after reboot, before the governor tried to drop the frequency down to lowest (in fact the governor dropped to ondemand just before the last reading in the log). Here turbostat also reports frequency readings are also over my CPU's max of 2.0GHz (up to 2.6 GHz in some cases). Does that mean that turbostat is also providing unreliable information? The cores are definitely running at a higher frequency, because the fan goes on high and the power usage is right up.
I've just encountered this again in kernel 3.7.1. turbostat indicates the cores are running over their max 2.0GHz frequency (it says core 1 in particular is running at up to 2.4GHz). Is there anything I can try that might help to debug this?
turbostat -v will tell you the max turbo for this processor. Yes, it is normal to run above the TSC frequency -- indeed, that is exactly what turbo-mode is for... can you force the processors to go slower by writing to sysfs? eg. as root cd /sys/devices/system/cpu/cpu0/cpufreq cat scaling_max_freq cat scaling_min_freq > scaling_max_freq cat scaling_max_freq and see if you can get that cpu to slow down, as shown by turbostat.
No, the command doesn't work. I ran into the issue again with 3.7.2, and although the command does set scaling_max_frequency to equal scaling_min_frequency, turbostat still shows the core running at a much higher frequency: root@sierra:/sys/devices/system/cpu/cpu0/cpufreq# cat scaling_min_freq > scaling_max_freq root@sierra:/sys/devices/system/cpu/cpu0/cpufreq# cat scaling_max_freq 800000 root@sierra:/sys/devices/system/cpu/cpu0/cpufreq# turbostat cor CPU %c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 0.54 2.04 1.99 0.99 0.04 0.00 98.44 0.00 0.00 0.00 0.00 0 0 1.09 2.00 1.99 0.78 0.06 0.00 98.06 0.00 0.00 0.00 0.00 0 1 0.04 1.99 1.99 1.83 1 2 0.52 2.00 1.99 0.38 0.00 0.00 99.10 1 3 0.08 1.99 1.99 0.81 2 4 1.14 2.17 1.99 0.26 0.01 0.00 98.59 2 5 0.01 1.99 1.99 1.39 3 6 1.42 1.99 1.99 0.51 0.07 0.00 98.00 3 7 0.01 2.00 1.99 1.91 cor CPU %c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 0.31 2.01 1.99 0.70 0.02 0.00 98.97 0.00 0.00 0.00 0.00 0 0 0.92 1.99 1.99 0.68 0.03 0.00 98.37 0.00 0.00 0.00 0.00 0 1 0.02 1.99 1.99 1.58 1 2 0.27 1.99 1.99 0.36 0.00 0.00 99.37 1 3 0.10 1.99 1.99 0.53 2 4 0.57 2.05 1.99 0.15 0.01 0.00 99.27 2 5 0.01 2.00 1.99 0.71 3 6 0.47 1.99 1.99 0.64 0.02 0.00 98.87 3 7 0.12 1.99 1.99 0.99
On a positive note, I haven't encountered this issue yet at all on the 3.8-rc kernel series.
Starting with kernel 3.7.0 i have no frequency scaling with m AMD Phenon. Thereist no folder /sys/bus/cpu/devices/cpu0/cpufreq
Got absolutely the same behaviour with 3.7.5 kernel on i7-2620M cpu. As far as I can see -- it occurs after long sleeps > 5-10 mins. I occurs 100% after long sleep (going to work -- it happens, going home at evening -- it happens again). cpufreq shows what it sets lowest freq possibly, but once I try "-w" switch it shows 2.7GHz real hardware clock. I tried plain vanilla 3.7.10 kernel with no luck. Cant try 3.8.x yet, because there is no pf-patches for it (I *need* BFS ;)). Will try to downgrade onto 3.6.x and post results here... Cpufreq from unprivileged user shows 800MHz: csl linux/tools/power/x86/turbostat # sudo -u mocksoul cpufreq-info| grep 'curre nt CPU' current CPU frequency is 800 MHz. current CPU frequency is 800 MHz. current CPU frequency is 800 MHz. current CPU frequency is 800 MHz. While root can read hw freq and it shows 2.7GHz :( mcsl linux/tools/power/x86/turbostat # cpufreq-info| grep 'current CPU' current CPU frequency is 2.70 GHz (asserted by call to hardware). current CPU frequency is 2.70 GHz (asserted by call to hardware). current CPU frequency is 2.70 GHz (asserted by call to hardware). current CPU frequency is 2.70 GHz (asserted by call to hardware). Turbostat (as well as powertop 1.x) shows that my cpu mostly in c7 state: mcsl linux/tools/power/x86/turbostat # ./turbostat cor CPU %c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 0.42 2.69 2.69 1.34 0.03 0.00 98.22 0.00 0.00 0.00 0.00 0 0 0.86 2.69 2.69 2.41 0.05 0.00 96.67 0.00 0.00 0.00 0.00 0 1 0.63 2.69 2.69 2.64 1 2 0.15 2.69 2.69 0.09 0.00 0.00 99.76 1 3 0.02 2.69 2.69 0.22
Yep, in 3.6.12 kernel I have no issues anymore!
I just encountered this again on 3.9-rc4 but this time after a resume rather than a reboot. All eight cores were stuck at 2.00GHz. Suspending and resuming again made them drop back to the more normal 0.8GHz. This bug is marked as needinfo. What more info is required?
(In reply to comment #16) > All eight cores were stuck at 2.00GHz. This is probably a new bug, try following patch: https://lkml.org/lkml/2013/3/24/164
My 3.9-rc6 kernel tells me I've already got that patch applied, and I just experienced all eight cores stuck at 2GHz again after a resume. So it looks like it wasn't that particular new bug - is it likely to be this one or a different new one?
The 3.9 kernel also suffers from this bug (after a suspend/resume). A suspend/resume cycle fixes it. Is there something that might fix it without having to suspend/resume? This bug is still marked NEEDINFO: is there any other information I can provide?
please attach the output of "grep . /sys/devices/system/cpu/cpu0/cpufreq/*" in the latest kernel, both w/ and w/ the problem reproduced.
Created attachment 101801 [details] output of grep command when CPUs were stuck at 2GHz This is using kernel 3.9.2. The CPUs were stuck at 2GHz after a resume.
Created attachment 101811 [details] output of grep command when CPUs were ok The only difference between the two greps (CPUs ok and CPUs locked) is: -/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:800000 +/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:2000000
Created attachment 102611 [details] output of grep command when cores were stuck at 2GHz - 3.10-rc2 Here's an output from grep for the 3.10-rc2 kernel. This time the problem didn't immediately resolve itself after a suspend/resume - it took six attempts before the cores went down to normal.
It seems to be happening quite frequently with 3.10-rc2 and -rc3 - I just had it happen again on -rc3 and it took three suspend/resumes to make it go away. Is there any other info I can provide?
(In reply to comment #24) > It seems to be happening quite frequently with 3.10-rc2 and -rc3 - I just had > it happen again on -rc3 and it took three suspend/resumes to make it go away. > Is there any other info I can provide? Your output looked to be strange and so I checked it on my laptop and saw the same thing. /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:2000000 /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:800000 Why are these two different? I checked this on my ARM SoC and they are same.
(In reply to comment #25) > (In reply to comment #24) > > It seems to be happening quite frequently with 3.10-rc2 and -rc3 - I just > had > > it happen again on -rc3 and it took three suspend/resumes to make it go > away. > > Is there any other info I can provide? > > Your output looked to be strange and so I checked it on my laptop and saw the > same thing. > > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:2000000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:800000 > > Why are these two different? I checked this on my ARM SoC and they are same. Dirk, Do you have answer for this?
Looks like there is a disconnect between the governor and the scaling driver. cpuinfo_cur_freq comes from a query to the scaling driver. scaling_cur_freq is reporting the current frequency from the governors point of view.
Can any of the reporters please check if using the intel_pstate driver instead of cpufreq makes any difference?
I've been using it for a few days, and so far I haven't experienced any scaling issue. That said, even with cpufreq I could sometimes go a few days without any, so it might not be conclusive just yet. Definitely looks better though.
Created attachment 104471 [details] cpufreq grep and turbostat output using intel_pstate I've done a bit of testing using intel_pstate in 3.10-rc5 over the last day, and it looks like intel_pstate also suffers from the 'CPU stuck' issue. Out of four suspend/resume cycles the CPUs have twice got stuck. (Two out of four, however, is a better result than acpi-cpufreq manages in 3.10-rc5. Yesterday I had to suspend/resume six times in a row using acpi-cpufreq before the CPUs dropped to their minimum frequency!) Unlike acpi-cpufreq, where the CPUs get stuck at 2.0GHz, with intel_pstate they get stuck near their max turbo 2.9GHz frequency. The attached log shows both output from a 'bad' suspend/resume, where the CPUs get locked near their maximum turbo frequency, and from a 'good' suspend/resume, where the CPUs drop to somewhere in the 1-2GHz range, which seems to be normal for intel_pstate. Is that expected? Shouldn't they be dropping down near the min frequency of 800 MHz?
I just had the CPUs lock in the 2.7-2.9GHz range with five straight suspend/resume cycles using intel_pstate. At one point, I tried changing /sys/devices/system/cpu/intel_pstate/max_perf_pct to 50 instead of 100 but it made no difference. When the system is locked in this range, powertop reports using 33W instead of the more normal 17W-18W. Is there any other info I can provide to help debug this? This is happening often enough to be very annoying.
I'm trying intel_pstate in conjunction with CONFIG_NO_HZ_FULL in 3.10-rc7 now and it's most definitely still a problem (I just had to suspend/resume five times before the CPUs settled down, and according to top, there's nothing using more than a few percent CPU). Looking at the turbostat logs when using pstate, the frequencies DO change slightly (ie typically in the 2.7-2.9 GHz turbo range), so it seems that that the driver is still able to alter the frequencies, and that it's choosing high frequencies on purpose. Has no-one any suggestions for what to try or how to debug this? Does either the pstate or cpufreq driver give any information about why it chooses such high frequencies, ie what workload is causing it?
I did also tried intel_pstate. Problem still occurs, but differently. Typically I have 11-12W usage on my laptop (Lenovo T520, medium to high brightness). Sometimes after waking laptop I have 20+W usage. And according to ksysguard graphs cpu freq is LOWER than usual (1ghz..1.4ghz). They will go higher in high cpu usage, but still power consumption is MUCH higher. If I'll made this: for x in 0 1 2 3; do cpufreq-set -c $x -g powersave -d 0.8Ghz -u 3.0Ghz; done Frequencies immidiately will stick around 2ghz and power consumption drops to regular 11-13W (slowly, since I'm looking in battery info).
(In reply to comment #32) > I'm trying intel_pstate in conjunction with CONFIG_NO_HZ_FULL in 3.10-rc7 now > and it's most definitely still a problem (I just had to suspend/resume five > times before the CPUs settled down, and according to top, there's nothing > using > more than a few percent CPU). > > Looking at the turbostat logs when using pstate, the frequencies DO change > slightly (ie typically in the 2.7-2.9 GHz turbo range), so it seems that that > the driver is still able to alter the frequencies, and that it's choosing > high > frequencies on purpose. > > Has no-one any suggestions for what to try or how to debug this? Does either > the pstate or cpufreq driver give any information about why it chooses such > high frequencies, ie what workload is causing it? CONFIG_NO_HZ_FULL and intel_pstate is a known issue the problem is that CONFIG_NO_HZ_FULL causes a regression in the menu idle governor that is causes the system to look busy to intel_pstate so it raises the frequency. The maintainers of the menu governor and the authors of NO_HZ_FULL are aware of the issue and are looking at it. A short term workaround is to unset CONFIG_NO_HZ which will make you drop back to the ladder idle governor. This will get you at least running if you want to try out NO_HZ_FULL.
Thanks for the info, Dirk. But could that be causing this particular problem after resume from RAM? With my current config, the CPU frequency scaling works just fine *except* when this suspend/resume bug occurs, and the bug occurs with or without intel_pstate and with or without CONFIG_NO_HZ_FULL. (I have configured "General setup --> Timers subsystem --> unset Old Idle dynticks" as Vi0L0 said you recommended at https://bbs.archlinux.org/viewtopic.php?pid=1282810, and the frequencies are fine after I reboot, and fine in approximately 1 out of every 5 times that I suspend then resume the laptop.)
(In reply to comment #35) > Thanks for the info, Dirk. But could that be causing this particular problem > after resume from RAM? With my current config, the CPU frequency scaling > works > just fine *except* when this suspend/resume bug occurs, and the bug occurs > with > or without intel_pstate and with or without CONFIG_NO_HZ_FULL. > > (I have configured "General setup --> Timers subsystem --> unset Old Idle > dynticks" as Vi0L0 said you recommended at > https://bbs.archlinux.org/viewtopic.php?pid=1282810, and the frequencies are > fine after I reboot, and fine in approximately 1 out of every 5 times that I > suspend then resume the laptop.) I don't believe so the CONFIG_NO_HZ_FULL and the suspend issues are separate AFAICT I think https://bugzilla.kernel.org/show_bug.cgi?id=58971 is the root of this issue with resume since it is killing ondemand and intel_pstate.
Thanks for the link to bug 58971. I've tried a suggestion from there (pause between i915 irq enable and modeset init) and posted back the results, which so far look good.
Ok, I've reverted back to the last kernel I used with the acpi-cpufreq driver (3.8). After a few suspend-resume (5 or so) i ended up with the same scaling issues : cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:2400000 cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:800000 This went away after one suspend-resume cycle. On the other hand, with a 3.9 kernel using the pstate driver I've never had any problem and I must be close to a hundred suspend-resume so far. I've got output from turbostat for both config, with and without the scaling issues on the 3.8-acpi-cpufreq one, do you want me to post it? I can also post my config for each of the kernel used, or whatever else you might find useful. As for https://bugzilla.kernel.org/show_bug.cgi?id=58971 it does seem similar, though I remember having these issues way before 3.8.7 (at least 3.6 I think), so reading the comments there I think mine was a different issue. Btw I've got an i7-2760QM not a 2630QM like Rocko.
FWIW, I have not seen this bug a single time since I added the delay between i915 irq enable and modeset init. I also tried a 3.10-generic kernel *without* the delay and it locked up the CPU frequencies on the very first resume. Perhaps it would be a good idea to add the delay as an interim workaround, pending a proper resolution?
(In reply to rocko from comment #39) > FWIW, I have not seen this bug a single time since I added the delay between > i915 irq enable and modeset init. I also tried a 3.10-generic kernel > *without* the delay and it locked up the CPU frequencies on the very first > resume. > > Perhaps it would be a good idea to add the delay as an interim workaround, > pending a proper resolution? Hi, from this result, this is i915 issue and so reassign to intel-gfx category. Experts there will find the final solution.
Hmm. When I booted into linux-3.11-rc2 just now (using intel pstate) the CPUs were all stuck at near their turbo frequency despite top showing the highest CPU task at around 4%. Is that likely to be an i915 issue or should I open another bug?
(In reply to Lan Tianyu from comment #40) > (In reply to rocko from comment #39) > > FWIW, I have not seen this bug a single time since I added the delay > between > > i915 irq enable and modeset init. I also tried a 3.10-generic kernel > > *without* the delay and it locked up the CPU frequencies on the very first > > resume. > > > > Perhaps it would be a good idea to add the delay as an interim workaround, > > pending a proper resolution? > > Hi, > from this result, this is i915 issue and so reassign to intel-gfx category. > Experts there will find the final solution. Apparently we haven't. Does the problem persist with recent kernels?
Ping for the re-test.
Sorry for the delay in getting back. I haven't seen this issue in 3.16 or 3.17-rc6 using intel pstate. Is there a specific re-test I can do?
In case it helps, I have seen the cores get stuck at their max frequency on a couple of occasions recently (in 3.17.2). On both occasions I killed a long-running CPU-intensive ruby task that was using mysqld, and afterwards the cores continued running at their turbo frequency even though the original process no longer showed up with ps -xa and mysqld was reporting zero percent activity (according to top). I had to kill -9 the mysqld task and restart it to fix the issue.
Long time no updates, closing. If the problem persists with latest kernels, please file a bug at the freedesktop.org bugzilla [1], referencing this bug. Thank you. [1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI&component=DRM/Intel