Created attachment 32502 [details] Outout of acpidump Hi, I own a brand new ASUS U45JC laptop with an Intel i5-450M CPU built in. During the first compile on it, I noticed that the CPU reached max. 1.33 GHz using governor "ondemand". Kicking the CPU with a kernel compile using "make -j4" didn't help either, max. frequency was nailed at 1.33 GHz. Both the latest stable 2.6.35.7 and release candidate 2.6.36-rc6-git2 are affected. After 2 days investigating (I'm not primarily a kernel programmer/developer), I came across the solution, and there was already a patch available from Thomas Renninger @suse, which I have included in this mail. I blindly guess now that a lot of people own a notebook with an Intel i5 inside, and maybe (could) have the same problem. Any chance to get this included in the kernel, or is it just me who encounters this problem? The patch applies cleanly against the latest rc, and I can confirm that it also fixes the above mentioned problem in 2.6.35.7 (I backported it and did try). Attached are the output of acpidump and dmesg, and the patch from Thomas Renninger which fixes the problem for me. BIOS is the latest version 207 from Asus. Thanks, Heinz.
Created attachment 32512 [details] dmesg output
Created attachment 32522 [details] Patch from Thomas R.
Cpufreq info is embedded into dynamically loaded ACPI tables which you need to dump manually. Please run: acpidump --addr 0xAADA3918 --length 0x3FB >/tmp/CPU0IST acpidump --addr 0xAADA2A98 --length 0x303 >/tmp/APIST acpidump --addr 0xAADA1018 --length 0x8A9 >/tmp/CPU0CST acpidump --addr 0xAADA0D98 --length 0x119 >/tmp/APCST and attach (zipped?) output files. Thanks.
Hi, the output of these commands is attached here. Thanks, -heinz
Created attachment 32902 [details] Output of the commands asked by Thomas R.
The BIOS latency values look sane, I wonder why my patch helps. Because I am blindfolded..., this is about reducing up_threshold and it is a a duplicate of bug #17001. Can you try to lower up_treshold to 53 or below as mentioned there and then use cat /dev/zero >/dev/null & multiple times to utilize CPUs. Looks like something broke somewhen before 2.6.35.7. Do you know about a kernel version that did not show this?
I am having a similar problem, but changing the CPU governor or threshhold doesn't help. I wrote a little script to test my CPU, and it appears that the BIOS is limiting the CPU at high work loads, but not at low work loads. After some time at full BIOS limiting ACPI throttling begins. The temperature never gets high enough for throttling. I am using the stock Kernel from Ubuntu 10.10. This behavior was not present in Ubuntu 10.04, and is not present when running Windows.
Created attachment 33562 [details] Simple CPU tester and reporter. The results from running the test on my computer are attached to the bottom of the script in comments.
Mobusby: You have another problem which is BIOS related. Best check for the latest BIOS. Here is a similar bug (but the root cause in BIOS is probably different). BIOS limits the frequ on purpose, you have to find out why... Best also check your BIOS settings (best after a upgrade). Some things to check: Temperature might be related, AC/battery, dirty fan slot, only happens after suspend (disk/ram)? If nothing helps, processor.ignore_ppc=1 will ignore the limit from BIOS. If you tried that far, then please open a new bug and attach dmesg and acpidump output and assign it to me.
Mobusby: Here is a similar bug (but the root cause in BIOS is probably different): https://bugzilla.kernel.org/show_bug.cgi?id=16362
Hi Thomas, lowering up_threshold helps. Standard after booting without having the kernel patched is 95, after your patch it is set to 40. I didn't install a lot of different kernels yet, because the machine is brand new, but I noticed to my surprise at 2.6.36-rc7-git3 seems to have cured the problem. There's no cpufreq related patch in there, as far as I could see, so I wonder what the f*ck is going on here :-)
Thanks! So it looks like we have a regression in 2.6.35 kernel which got fixed between 2.6.36-rc6-git2 and 2.6.36-rc7-git3. It affects idle/busy/io CPU accounting in way that cpufreq ondemand governor does not switch up frequency (only with the workaround by dramatically decreasing up_threshold tunable). The fix seem not to be in any cpufreq related code (According to Heinz, I didn't double check, but haven't seen anything on the cpufreq list lately). Rafael/Ingo/Peter: Do you have an idea which patch could have solved this issue. This should probably go to .35 stable...
*** Bug 17001 has been marked as a duplicate of this bug. ***
In bug #17001 it's stated: > As far as I can tell, the problem exists at least since kernel version 2.6.30 But this would be something machine specific then, someone should have reported that already meanwhile. Could also be that above statement is wrong and something went wrong when testing 2.6.30 and this is as mentioned a regression introduced in 2.6.35 and fixed somewhere between 2.6.36-rc6-git2 and 2.6.36-rc7-git3...
Hi all, I have an Intel Core i5 520 M (Thinkpad T410). This is when I switched from Ubuntu 10.04 (2.6.32.15) to Ubuntu 10.10 (2.6.35.4) that the problem has made itself visible. * With the stock kernel of Ubuntu 10.04 : (Ondemand WORKS) cpufreq-info said for all virtual CPUs : hardware limits: 1.20 GHz - 2.40 GHz current policy: frequency should be within 1.20 GHz and 2.40 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.20 GHz. cpufreq stats: 2.40 GHz:3.26%, 2.40 GHz:0.02%, 2.27 GHz:0.05%, 2.13 GHz:0.05%, 2.00 GHz:0.03%, 1.87 GHz:0.02%, 1.73 GHz:0.03%, 1.60 GHz:0.01%, 1.47 GHz:0.03%, 1.33 GHz:0.03%, 1.20 GHz:96.48% (235) * With the stock kernel of Ubuntu 10.10 : (Ondemand FAILS) cpufreq-info said for all virtual CPUs : hardware limits: 1.20 GHz - 2.40 GHz current policy: frequency should be within 1.20 GHz and 1.20 GHz. current CPU frequency is 1.20 GHz. cpufreq stats: 2.40 GHz:0.00%, 2.40 GHz:0.00%, 2.27 GHz:0.00%, 2.13 GHz:0.00%, 2.00 GHz:0.00%, 1.87 GHz:0.00%, 1.73 GHz:0.00%, 1.60 GHz:0.00%, 1.47 GHz:0.00%, 1.33 GHz:0.00%, 1.20 GHz:100.00% I took my custom reference configuration (.config) of my laptop which perfectly works with the 2.6.32.15 Ubuntu Kernel and tested it with some different kernel versions (from kernel.org), in order to reduce the incidence of Ubuntu patches. * Results : 2.6.32.15 : Ondemand scaling OK 2.6.32.16 : Ondemand scaling OK 2.6.32.17 : Ondemand scaling OK 2.6.32.18 : Ondemand scaling OK 2.6.32.19 : Ondemand scaling OK 2.6.32.20 : Ondemand scaling OK 2.6.32.21 : Ondemand scaling OK 2.6.32.22 : Ondemand scaling OK 2.6.32.23 : Ondemand scaling OK 2.6.32.24 : Ondemand scaling OK So, at this state, could it be concluded that the problem has came after 2.6.32 ? Not so sure ! Because, as it was underlines higher in this thread, this problem may highly be machine specific... In any case, this is the second bug reporting, impacting a Core i5 CPU. On Ubuntu launchpad, someone with a Core 2 Duo T7200 CPU is affected by this problem too (2.6.35.4 from Ubuntu 10.10). I have tested too with a clean 2.6.35.7 with and without the patch from Thomas R. For both, the result is the same : Ondemand scaling KO. I will tell you what happens with >= 2.6.33 kernels.
Could have been related to cpuidle and the new intel_idle driver, but I also see no recent fixes there as well. If this is true: fixed between 2.6.36-rc6-git2 and 2.6.36-rc7-git3 it shouldn't be that hard to bisect on those rather stable versions. Be aware that you have to exchange git "good" and "bad" if you search for a fix and not for a regression/bug. That would be: git start <bad> <good> git start 2.6.36-rc7-git3 2.6.36-rc6-git2 If a version works well go for git bad if it shows the bug go for git good gitX is not tagged in git though, no idea how one can find out the git id of these subcommits/merges? Heinz: How sure are you that it got fixed in 2.6.36-rc7-git3?
If someone wants to test whether this has to do with intel_idle driver you should be able to check the used driver (new intel_idle vs acpi_idle): cat /sys/devices/system/cpu/cpuidle/current_driver The new cpupowerutils (former cpufrequtils) show this via: cpuidle-info If intel_idle is used the boot param: intel_idle.max_cstate=0 should make the machine fall back to acpi_idle... Just an idea..., idle driver sounds possibly related to wrong idle accounting... and this is an easy check compared to compiling/bisecting...
I just did some tests with >= 2.6.33 Kernels. * Results : 2.6.33 : KERNEL_PANIC (Ouch !!!) 2.6.33.1 : Ondemand scaling KO (freeze my laptop after few minutes... I did not investigate deeply...) In any case, just before freezing, I had enough time to check cpufreq-info and cpuidle state. * cpufreq-info print out the same "hardware limits" and "cpufreq stats" that I reported higher in this thread, when Ondemand governor failed to scale. So the regression may highly appear since 2.6.33. * The current driver used for cpuidle was "acpi_idle". So, if intel_idle driver is not incriminated, we can check the differences between 2.6.32.24 and 2.6.33[.1] Kernel tree, which may cause this regression. * Comparing the two "drivers/cpufreq" directories, there are some changes between : - cpufreq.c (add bios limit reading and release the rwsem around governor) - cpufreq_conservative.c (some stuffs, but I never use this gouvernor) - cpufreq_ondemand.c (very few : add a condition to read the new min policy) - freq_table.c (some function names refactoring) I'm not a kernel hacker and I do not know (not yet) how the cpufreq API works. I hope this information will help you.
Hi all, I just tested my fresh and unstable 2.6.33.1 kernel with functional cpufreq driver from 2.6.32.24. * Result : Ondemand scaling KO So the regression may come from somewhere else... !
I possibly know the problem. What does: cat /sys/devices/system/cpu/cpu0/cpufreq/{affected_cpus,related_cpus} say? I expect (and from ACPI table info it's very likely this is the case for Heinz's system) that ACPI_PDC_SMP_P_HWCOORD is used. Compare with 8.4.4.5 _PSD (P-State Dependency) (ACPI spec): CoordType: DWordConst The type of coordination that exists (hardware) or is required (software) as a result of the underlying hardware dependency. Could be either 0xFC (SW_ALL), 0xFD (SW_ANY) or 0xFE (HW_ALL) indicating whether OSPM is responsible for coordinating the P-state transitions among processors with dependencies (and needs to initiate the transition on all or any processor in the domain) or whether the hardware will perform this coordination. Heinz's BIOS differs _PDC (OS capabilities) and exports HW_ANY in case the kernel tells the BIOS it can do it (and it does). I remember another machine (Jean Delvare) where frequency switching was totally messed up then. So this may not be a real kernel bug. I can provide a patch so that HW_COORD OS capability is not set, that should help Heinz can we can verify whether it's that. It's hard to check which coordination type is used at runtime. I once looked it up a bit and the if affected_cpus, related_cpus are not the same it must be HW_ANY, iirc.
Created attachment 34092 [details] Patch introducing boot param to disable HW_COORD type Please try: processor.disable_hw_coord=1 boot param with this patch. Hm, pdc is called rather early these days. The param won't be considered for the first _PDC call in arch/x86/kernel/acpi/boot.c. But that should not matter and I hope it gets used later, otherwise it must be an early param... I try to come up with another patch to export the coordination type to be able to check this...
Created attachment 34102 [details] Provide /sys/devices/system/cpu/cpu*/cpufreq/shared_type This should be: CPUFREQ_SHARED_TYPE_ALL (2) if you added the boot param provided in the previous patch and CPUFREQ_SHARED_TYPE_HW (1) by default.
Concerning bug 17001, which I filed, I have to report that the issue has not magically disappeared for me with the 2.6.36-rc kernels. I finally managed to compile 2.6.36-rc7 and -rc8 (had an error building the i915 driver before) and Core2 Duo T7300 CPU still is stuck at the lowest frequency without lowering up_threshold to 53 or lower. Would it help if I provided you with some ACPI information? What do you need?
Would it help if I provided you with some ACPI information? What do you need? acpidump output for now. Need to look up the addresses of dynamic tables. Possibly you can look it up yourself. Do: acpixtract acpidump iasl -d *.dat grep -i load *.dsl You may see something like that then: SSDT.dsl: Load (IST0, HI0) SSDT.dsl: Load (CST0, HC0) SSDT.dsl: Load (CST1, HC1) SSDT.dsl: Load (IST1, HI1) Which is the address/length of the dynamically loaded table. On this system you there is: OperationRegion (IST0, SystemMemory, DerefOf (Index (SSDT, One)), DerefOf (Index (SSDT, 0x02))) and Name (SSDT, Package (0x0C) { "CPU0IST ", 0xAADA3918, 0x000003FB, "APIST ", 0xAADA2A98, 0x00000303, "CPU0CST ", 0xAADA1018, 0x000008A9, "APCST ", 0xAADA0D98, 0x00000119 }) These are the names/address/length of the tables you need to extract manually with acpidump --addr 0xAADA3918 --length 0x000003FB >CPU0IST.dat acpidump --addr 0xAADA2A98 --length 0x00000303 >CPU0CST.dat They possibly can already be found there: /sys/firmware/acpi/tables/ not sure. But you could also just give my two patches a try and show us: cat /sys/devices/system/cpu/cpu*/cpufreq/shared_type if it shows CPUFREQ_SHARED_TYPE_ALL (2) it's worth to the boot param mentioned in comment #21
Here's what I have in # ls /sys/firmware/acpi/tables/ APIC DSDT FACP HPET SLIC SSDT2 SSDT4 SSDT6 SSDT8 BOOT dynamic FACS MCFG SSDT1 SSDT3 SSDT5 SSDT7 TCPA I guess you want the SSDT* . Do you need any other files from there? Further I applied your two patches to 2.6.36-rc8 and without the boot parameter I'm getting: # cat /sys/devices/system/cpu/cpu*/cpufreq/shared_type 2 2 With processor.disable_hw_coord=1 I'm getting the exact same thing: # uname -rs Linux 2.6.36-rc8-pgzh # dmesg | grep hw_coord Kernel command line: BOOT_IMAGE=/vmlinuz.exp root=/dev/sda3 ro processor.disable_hw_coord=1 centauri:/home/pgzh# cat /sys/devices/system/cpu/cpu*/cpufreq/shared_type 2 2 There's no change in CPUFREQ behavior as well - I'm stuck with the lowest freq unless I lower up_threshold. In the CPUFREQ drivers section in kconfig I have the following set: # CONFIG_X86_PCC_CPUFREQ is not set Should this be enabled? I only got the ACPI driver enabled right now. (CONFIG_X86_ACPI_CPUFREQ=m)
Hi all, So with my Core i5 520 M (2 physical cores, 4 logical cores): * 2.6.32.15 (ubuntu), 2.6.35.7 (kernel.org), 2.6.35.7 (kernel.org + 2 patchs) : - Values of /sys/devices/system/cpu/cpu*/cpufreq/affected_cpus are : "0", "1", "2" and "3" respectively for cpu0, cpu1, cpu2 and cpu3. - Output of /sys/devices/system/cpu/cpu*/cpufreq/related_cpus is always : "0 1 2 3" No problem. * 2.6.35.7 (kernel.org + 2 patchs) with and without boot parameter "processor.disable_hw_coord=1" : - Output of /sys/devices/system/cpu/cpu*/cpufreq/shared_type is always : 1 Always default value 1 (CPUFREQ_SHARED_TYPE_HW) : so, Thomas, according to what you said, there is something wrong here... - cpufreq-info returns the same result as before (lowest freq). - /sys/devices/system/cpu/cpufreq/ondemand/up_threshold is 95 by default. - Lower the up_threshold has no effect for me.
I expect acpi-cpufreq is fundamentally broken in respect to HW_ALL coordination. The only aspect acpi-cpufreq or cpufreq subsystem takes into account in HW_ALL case is to make sure that the same governor is running on all dependent CPUs in case CONFIG_HOTPLUG_CPU is set in .config. Otherwise the dependent CPUs are only shown as "related" in sysfs, that's all. The only thing which makes me wonder is: Why has this not come up earlier... From ACPI spec it's impossible to guess how OS should deal with HW_ALL. Googling about it leads to this bug :) and one interesting discussion: http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg11682.html -> adding Len and Rui into CC. But from there it's also not 100% clear. While SW_ALL is very clear, I could imagine the difference between HW_ALL and SW_ANY is that the (MSR/HW) status registers may/may not get updated. Or that HW may or may not transition the other core(s) into the same state and SW has to re-evaluate (what would be rather stupid and SW_ALL algorithm should apply then). Hmm, I found: 14.2 P-STATE HARDWARE COORDINATION in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 But the code in there is so poor. Essentially it tells us to use aperf/mperf for switching decisions. The part that aperf/mperf should get reset to 0, should vanish from this document, it's possible to write an algorithm which can handle register overflows, setting them back to zero is wrong. And this comment there: // This example does not cover the additional logic or algorithms // necessary to coordinate multiple logical processors to a target P-state. makes me wonder whether we also miss this additional logic in acpi-cpufreq/ondemand. The ondemand governor must know about the dependency and look at the utilization of all dependent cores when doing decisions which is not the case. I expect Heinz and Peter get "half way correct" switching at about 52 up_threshold because they have two "related" cores. Vyncere you have 4 dependent cores, if your core(s) are switched up with an up_threshold of 25 I expect you see the same bug. My patches or say workaround may help for Heinz's BIOS, it may not for others. Also this must get fixed properly. For this some input from Intel people would be great how HW_ALL must get handled or better how it's done on Windows. You could try to find out how your HW behave by not loading any cpufreq driver, get the msr-tools package and set the frequency on single cores manually and then read out status MSR whether the dependent cores switched as well (question is whether the status is true then, may be CPU specific and may have nothing to do how the OS must implement HW_ALL algorithm). The MSRs are: #define MSR_IA32_PERF_STATUS 0x00000198 #define MSR_IA32_PERF_CTL 0x00000199 You have to be careful that only 16 bits (0-15, cmp. with chapter 14.3.2.2 in above mentioned document) are used when you write to PERF_CTL, you have to read out and keep the others and write them back as well. Long story short (Provided my whole research for comments whether I have a thinko): HW_ALL is about taking aperf/mperf into account. We do not rely on BIOS, but determine that already by reading out cpuid aperf/mperf capabilities of the CPU directly. Still the governor must consider all dependent cores which is currently not the case and which is a major bug. Question still is whether all cores (like SW_ALL) or only one core (like SW_ANY) is enough to switch. SW_ALL should work for sure -> will provide a patch.
Created attachment 35352 [details] CPUFREQ: Fix HW_ALL core dependencies Please give it a try...
I applied the patch "CPUFREQ: Fix HW_ALL core dependencies" to a vanilla 2.6.36 (without any of the previously posted patches) and it did NOT fix the problem for me. I'm still stuck at the lowest freq until I lower up_threshold.
Thanks. I still think not considering core dependencies in HW_ALL case is wrong, but this needs further/separate discussing/evaluation. Peter, as you said with exactly same HW and kernel version/.config: This one is broken: Intel Core2 Duo T7300 2.0 GHz processor and this one is not: Intel Core2 Quad Q9550 Could you show us: cat /sys/devices/system/clocksource/clocksource0/available_clocksource and cat /sys/devices/system/clocksource/clocksource0/current_clocksource does it make a difference switching, e.g. to hpet? echo xy >/sys/devices/system/clocksource/clocksource0/current_clocksource Ok, I checked Heinz's and Peter's dmesg: Heinz explicitly enables hpet via boot param (what is this for?) clocksource=hpet acpi_skip_timer_override And Peter has: Marking TSC unstable due to TSC halts in idle Switching to clocksource hpet It's always only CPU0 not switching up, right? Looks like something with hpet is wrong.
I also wonder whether if you bind a 100% CPU utilizing task to CPU0: numactl --physcpubind=0 cat /dev/zero >/dev/null whether top (and then hit "1" to see each CPU's utilization) really shows you 100% (sy + us) running and 0% idle time?
numactl --physcpubind=0 cat /dev/zero >/dev/null produces about 95%sys and 5%us load on _ONE_ core. idle goes down to 0.0% immediately. This behavior does not change when changing up_threshold. # cat /sys/devices/system/clocksource/clocksource0/available_clocksource hpet acpi_pm # cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet So my system is already using the HPET clocksource. Changing to acpi_pm has no effect on cpu frequency scaling (with the default up_threshold, of course). The system did switch to acpi_pm as dmesg reveals: Switching to clocksource acpi_pm Concerning HPET and TSC I got the following from dmesg: ACPI: HPET 000000007f736b66 00038 (v01 TOSHIB A0054 20070816 TASM 04010000) ACPI: HPET id: 0x8086a201 base: 0xfed00000 hpet clockevent registered Fast TSC calibration failed TSC: PIT calibration matches HPET. 1 loops HPET: 3 timers in total, 0 timers will be used for per-cpu timer hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 hpet0: 3 comparators, 64-bit 14.318180 MHz counter Switching to clocksource tsc Marking TSC unstable due to TSC halts in idle Switching to clocksource hpet rtc0: alarms up to one year, 114 bytes nvram, hpet irqs CE: hpet increased min_delta_ns to 7500 nsec CE: hpet increased min_delta_ns to 11250 nsec The "hpet increased min_delta_ns" message shows up on the system with the Core2 Quad Q9550 as well, but obviously this does not affect cpu frequency scaling either. To me everything else looks fine. BTW, do you still need the ACPI tables you mentioned in Comment #24? If so, please tell which of the ones I got (see Comment #25) you would like to have.
I wonder why Heinz has: clocksource=hpet acpi_skip_timer_override Peter's machine has an unstable TSC, that is strange, cpuflags show constant_tsc. So it seems it falls through some boot check to not increment constantly. Hmm, you both seem to have problems with TSC..., aperf/mperf may be fed by the same internal clock as TSC. If average frequency calced via aperf/mperf is very low, the frequency would never get ramped up. It's hard to believe, because you have so different CPUs, but it looks like you both have broken TSC/aperf/mperf timers on CPU 0? At least Peter has? Another patch to try.., please use: acpi_cpufreq.disable_average=1 boot param and check dmesg that it got applied: "acpi-cpufreq: average (aperf/mperf) accounting disabled by user"
Created attachment 35492 [details] Patch to workaround the possible HW defect If this really helps, please also try cpufreq-aperf tool from a latest cpufrequtils package and check what kind of broken values aperf/mperf provide on CPU0.
# dmesg | grep acpi-cpufreq acpi-cpufreq: average (aperf/mperf) accounting disabled by user acpi-cpufreq: average (aperf/mperf) accounting disabled by user This patch actually *FIXED* the problem for me. Please give me some instructions on how to use cpufreq-aperf - how do I use the tool? When running it, it gives me an output like this (with the patched kernel): # cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 0500250 00 sec 041 ms 00 sec 958 ms 04 001 0460230 00 sec 056 ms 00 sec 943 ms 05 000 0500250 00 sec 045 ms 00 sec 954 ms 04 001 0620310 00 sec 056 ms 00 sec 943 ms 05 000 0480240 00 sec 033 ms 00 sec 966 ms 03 001 0520260 00 sec 054 ms 00 sec 945 ms 05 000 0580290 00 sec 109 ms 00 sec 890 ms 10 001 0580290 00 sec 123 ms 00 sec 876 ms 12 000 0560280 00 sec 179 ms 00 sec 820 ms 17 001 0540270 00 sec 239 ms 00 sec 760 ms 23 Do I have to create load on one or both cpu cores? Do I have to pass particular options? And if I just have to run it, how long should I run it?
Hi everyone, Hi Thomas, Here some result on my Thinkpad T410, Intel Core i5 520 M. * 2.6.32.15 (Kernel Reference with functionnal cpufreq) - Boot params : None # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm # cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc # While CPU is idling : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 1824000 00 sec 067 ms 00 sec 932 ms 06 001 1392000 00 sec 007 ms 00 sec 992 ms 00 002 1272000 00 sec 010 ms 00 sec 989 ms 01 003 1488000 00 sec 005 ms 00 sec 994 ms 00 000 1656000 00 sec 052 ms 00 sec 947 ms 05 001 2232000 00 sec 016 ms 00 sec 983 ms 01 002 1272000 00 sec 009 ms 00 sec 990 ms 00 003 1464000 00 sec 005 ms 00 sec 994 ms 00 000 1248000 00 sec 026 ms 00 sec 973 ms 02 001 1992000 00 sec 043 ms 00 sec 956 ms 04 002 1272000 00 sec 011 ms 00 sec 988 ms 01 003 1488000 00 sec 006 ms 00 sec 993 ms 00 # While kernel is compiling (make -j 3) : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 2664000 00 sec 997 ms 00 sec 002 ms 99 001 2664000 00 sec 512 ms 00 sec 487 ms 51 002 2664000 00 sec 812 ms 00 sec 187 ms 81 003 2664000 00 sec 876 ms 00 sec 123 ms 87 000 2664000 00 sec 771 ms 00 sec 228 ms 77 001 2664000 00 sec 857 ms 00 sec 142 ms 85 002 2664000 00 sec 814 ms 00 sec 185 ms 81 003 2664000 00 sec 731 ms 00 sec 268 ms 73 000 2664000 00 sec 446 ms 00 sec 553 ms 44 001 2664000 00 sec 885 ms 00 sec 114 ms 88 002 2664000 00 sec 962 ms 00 sec 037 ms 96 003 2664000 00 sec 990 ms 00 sec 009 ms 99 # cpufreq-info current policy: frequency should be within 1.20 GHz and 2.40 GHz. * 2.6.36 (+ 2 Patches HW_COORD, SHARED_TYPE) - Boot params : None # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm # cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc # While CPU is idling : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 1200000 00 sec 066 ms 00 sec 933 ms 06 001 1224000 00 sec 038 ms 00 sec 961 ms 03 002 1368000 00 sec 002 ms 00 sec 997 ms 00 003 1320000 00 sec 002 ms 00 sec 997 ms 00 000 1200000 00 sec 055 ms 00 sec 944 ms 05 001 1224000 00 sec 033 ms 00 sec 966 ms 03 002 1344000 00 sec 002 ms 00 sec 997 ms 00 003 1272000 00 sec 001 ms 00 sec 998 ms 00 000 1224000 00 sec 057 ms 00 sec 942 ms 05 001 1224000 00 sec 033 ms 00 sec 966 ms 03 002 1392000 00 sec 002 ms 00 sec 997 ms 00 003 1344000 00 sec 001 ms 00 sec 998 ms 00 # While kernel is compiling (make -j 3) : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 1176000 00 sec 585 ms 00 sec 414 ms 58 001 1176000 00 sec 719 ms 00 sec 280 ms 71 002 1200000 00 sec 825 ms 00 sec 174 ms 82 003 1176000 00 sec 951 ms 00 sec 048 ms 95 000 1200000 00 sec 874 ms 00 sec 125 ms 87 001 1176000 00 sec 864 ms 00 sec 135 ms 86 002 1200000 00 sec 776 ms 00 sec 223 ms 77 003 1200000 00 sec 586 ms 00 sec 413 ms 58 000 1176000 00 sec 903 ms 00 sec 096 ms 90 001 1200000 00 sec 841 ms 00 sec 158 ms 84 002 1200000 00 sec 682 ms 00 sec 317 ms 68 003 1176000 00 sec 702 ms 00 sec 297 ms 70 # cpufreq-info current policy: frequency should be within 1.20 GHz and 1.20 GHz. * 2.6.36 (+ 3 Patches HW_COORD, SHARED_TYPE, HW_ALL) - Boot params : None # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm # cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc # While CPU is idling : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 1224000 00 sec 053 ms 00 sec 946 ms 05 001 1248000 00 sec 022 ms 00 sec 977 ms 02 002 1320000 00 sec 002 ms 00 sec 997 ms 00 003 1344000 00 sec 002 ms 00 sec 997 ms 00 000 1224000 00 sec 061 ms 00 sec 938 ms 06 001 1272000 00 sec 018 ms 00 sec 981 ms 01 002 1344000 00 sec 002 ms 00 sec 997 ms 00 003 1344000 00 sec 003 ms 00 sec 996 ms 00 000 1224000 00 sec 060 ms 00 sec 939 ms 06 001 1248000 00 sec 015 ms 00 sec 984 ms 01 002 1416000 00 sec 002 ms 00 sec 997 ms 00 003 1368000 00 sec 002 ms 00 sec 997 ms 00 # While kernel is compiling (make -j 3) : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 1200000 00 sec 101 ms 00 sec 898 ms 10 001 1200000 00 sec 079 ms 00 sec 920 ms 07 002 1200000 00 sec 201 ms 00 sec 798 ms 20 003 1200000 00 sec 828 ms 00 sec 171 ms 82 000 1200000 00 sec 112 ms 00 sec 887 ms 11 001 1200000 00 sec 515 ms 00 sec 484 ms 51 002 1200000 00 sec 006 ms 00 sec 993 ms 00 003 1200000 00 sec 528 ms 00 sec 471 ms 52 000 1200000 00 sec 567 ms 00 sec 432 ms 56 001 1200000 00 sec 275 ms 00 sec 724 ms 27 002 1176000 00 sec 253 ms 00 sec 746 ms 25 003 1176000 00 sec 120 ms 00 sec 879 ms 12 # cpufreq-info current policy: frequency should be within 1.20 GHz and 1.20 GHz. * 2.6.36 (+ 4 Patches HW_COORD, SHARED_TYPE, HW_ALL, HW_STATISTICS) - Boot params for patch 4 : acpi_cpufreq.disable_average=1 # dmesg | grep cpufreq acpi-cpufreq: average (aperf/mperf) accounting disabled by user # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm # cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc # While CPU is idling : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 1200000 00 sec 084 ms 00 sec 915 ms 08 001 1248000 00 sec 016 ms 00 sec 983 ms 01 002 1368000 00 sec 011 ms 00 sec 988 ms 01 003 1344000 00 sec 003 ms 00 sec 996 ms 00 000 1224000 00 sec 056 ms 00 sec 943 ms 05 001 1224000 00 sec 059 ms 00 sec 940 ms 05 002 1320000 00 sec 010 ms 00 sec 989 ms 01 003 1296000 00 sec 004 ms 00 sec 995 ms 00 000 1224000 00 sec 054 ms 00 sec 945 ms 05 001 1224000 00 sec 033 ms 00 sec 966 ms 03 002 1368000 00 sec 010 ms 00 sec 989 ms 01 003 1368000 00 sec 003 ms 00 sec 996 ms 00 # While kernel is compiling (make -j 3) : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 1176000 00 sec 815 ms 00 sec 184 ms 81 001 1200000 00 sec 532 ms 00 sec 467 ms 53 002 1176000 00 sec 997 ms 00 sec 002 ms 99 003 1200000 00 sec 997 ms 00 sec 002 ms 99 000 1200000 00 sec 501 ms 00 sec 498 ms 50 001 1200000 00 sec 731 ms 00 sec 268 ms 73 002 1176000 00 sec 997 ms 00 sec 002 ms 99 003 1200000 00 sec 997 ms 00 sec 002 ms 99 000 1176000 00 sec 714 ms 00 sec 285 ms 71 001 1176000 00 sec 889 ms 00 sec 110 ms 88 002 1176000 00 sec 934 ms 00 sec 065 ms 93 003 1176000 00 sec 778 ms 00 sec 221 ms 77 # cpufreq-info current policy: frequency should be within 1.20 GHz and 1.20 GHz. It's very interesting. With the old good 2.6.32 kernel (with working cpufreq), while CPU is idling, according to cpufreq-aperf, the clock speeds fluctuate between 1.20GHz to 1.80GHz, sometimes up to 2.40GHz. Hummmm... it's not very good for power saving... It may explains why my CPU is always near 50°C. It's slightly better with 2.6.36 kernel. (Thanks to Intel Idle maybe ?! I don't know.) I don't know if it's really the true clock speeds since my conky monitor always shows me the 4 virtual cores at 1.2 GHz... But I think that cpufreq-aperf is more accurate than everything else. cpufreq-info always report a 1.20GHz max limit. In my case, the problem is not solved. With the 3 different kernel configurations (2.6.36 + patches), with high CPU loads, clock speeds still remain at the lowest state. Ironically, cpufreq-aperf shows that frequencies never exceed 1.2OGHz at full load, contrary to idle time ! It's an upside-down world...
vyncere: Your problem is different. Could it be that cat /sys/devices/system/cpu/cpu0/cpufreq/bios_limit does not show max freq? Then it is related to: https://bugzilla.kernel.org/show_bug.cgi?id=14771 But in contrast to Peter's problem it's BIOS related (and the root cause may be different and related to ACPI). If above (bios_limit) is true, best update your BIOS, check related BIOS options, if processor.ignore_ppc=1 workarounds and BIOS fiddling doesn't help, please open a new bug and assign it to the ACPI component. Peter: Your CPU (counters: tsc, aperf, mperf) is/are broken. Heinz may have a similar problem and if this is more common, a getavg check could be added whether the values are far away from min/max and getavg is not considered anymore then. As this is a hotpath, this could be implemented in a similar check as TSC is checked at boot up, e.g. test 3 times over 100ms whether getavg is inside ~min/~max limits, if not don't use it. If this normally never happens it might not be worth it and just the module param could be added -> Waiting for a test from Heinz, either the boot param and/or a runtime timer check should be added.
Some results with : - acpi_cpufreq.disable_average=1 clocksource=hpet and - acpi_cpufreq.disable_average=1 clocksource=hpet acpi_skip_timer_override * 2.6.36 (+ 4 Patches HW_COORD, SHARED_TYPE, HW_ALL, HW_STATISTICS) - Boot params for patch 4 : acpi_cpufreq.disable_average=1 - Boot params : clocksource=hpet # dmesg | grep cpufreq acpi-cpufreq: average (aperf/mperf) accounting disabled by user # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm # cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet # While CPU is idling : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 2688000 00 sec 033 ms 00 sec 966 ms 03 001 1944000 00 sec 017 ms 00 sec 982 ms 01 002 1776000 00 sec 001 ms 00 sec 998 ms 00 003 1536000 00 sec 002 ms 00 sec 997 ms 00 000 2496000 00 sec 028 ms 00 sec 971 ms 02 001 1656000 00 sec 032 ms 00 sec 967 ms 03 002 1536000 00 sec 018 ms 00 sec 981 ms 01 003 1416000 00 sec 003 ms 00 sec 996 ms 00 000 2712000 00 sec 036 ms 00 sec 963 ms 03 001 2064000 00 sec 014 ms 00 sec 985 ms 01 002 1416000 00 sec 001 ms 00 sec 998 ms 00 003 1392000 00 sec 001 ms 00 sec 998 ms 00 000 2688000 00 sec 034 ms 00 sec 965 ms 03 001 1920000 00 sec 017 ms 00 sec 982 ms 01 002 1584000 00 sec 001 ms 00 sec 998 ms 00 003 1536000 00 sec 002 ms 00 sec 997 ms 00 000 2712000 00 sec 041 ms 00 sec 958 ms 04 001 1968000 00 sec 023 ms 00 sec 976 ms 02 002 1512000 00 sec 008 ms 00 sec 991 ms 00 003 1560000 00 sec 004 ms 00 sec 995 ms 00 000 2784000 00 sec 040 ms 00 sec 959 ms 04 001 1896000 00 sec 022 ms 00 sec 977 ms 02 002 1416000 00 sec 008 ms 00 sec 991 ms 00 003 1488000 00 sec 003 ms 00 sec 996 ms 00 000 2736000 00 sec 042 ms 00 sec 957 ms 04 001 2160000 00 sec 018 ms 00 sec 981 ms 01 002 1512000 00 sec 007 ms 00 sec 992 ms 00 003 1608000 00 sec 003 ms 00 sec 996 ms 00 # While kernel is compiling (make -j 3) : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 2640000 00 sec 767 ms 00 sec 232 ms 76 001 2304000 00 sec 960 ms 00 sec 039 ms 96 002 2112000 00 sec 641 ms 00 sec 358 ms 64 003 2256000 00 sec 886 ms 00 sec 113 ms 88 000 2640000 00 sec 709 ms 00 sec 290 ms 70 001 2208000 00 sec 969 ms 00 sec 030 ms 96 002 2016000 00 sec 677 ms 00 sec 322 ms 67 003 2160000 00 sec 881 ms 00 sec 118 ms 88 000 2640000 00 sec 844 ms 00 sec 155 ms 84 001 2400000 00 sec 937 ms 00 sec 062 ms 93 002 2328000 00 sec 695 ms 00 sec 304 ms 69 003 2376000 00 sec 866 ms 00 sec 133 ms 86 000 2640000 00 sec 882 ms 00 sec 117 ms 88 001 2424000 00 sec 756 ms 00 sec 243 ms 75 002 2400000 00 sec 674 ms 00 sec 325 ms 67 003 2472000 00 sec 991 ms 00 sec 008 ms 99 # cpufreq-info current policy: frequency should be within 1.20 GHz and 1.20 GHz. * 2.6.36 (+ 4 Patches HW_COORD, SHARED_TYPE, HW_ALL, HW_STATISTICS) - Boot params for patch 4 : acpi_cpufreq.disable_average=1 - Boot params : clocksource=hpet acpi_skip_timer_override # dmesg | grep cpufreq acpi-cpufreq: average (aperf/mperf) accounting disabled by user # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm # cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet # dmesg | grep apic ACPI: Core revision 20100702 Setting APIC routing to flat ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... ..... (found apic 0 pin 0) ... ....... works. # dmesg | grep intel_idle intel_idle: MWAIT substates: 0x1120 intel_idle: v0.4 model 0x25 intel_idle: lapic_timer_reliable_states 0xffffffff ACPI: acpi_idle yielding to intel_idle # While CPU is idling : cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 2184000 00 sec 030 ms 00 sec 969 ms 03 001 1320000 00 sec 038 ms 00 sec 961 ms 03 002 1392000 00 sec 032 ms 00 sec 967 ms 03 003 1536000 00 sec 002 ms 00 sec 997 ms 00 000 1752000 00 sec 031 ms 00 sec 968 ms 03 001 1680000 00 sec 051 ms 00 sec 948 ms 05 002 1488000 00 sec 015 ms 00 sec 984 ms 01 003 1272000 00 sec 015 ms 00 sec 984 ms 01 000 2640000 00 sec 034 ms 00 sec 965 ms 03 001 1632000 00 sec 027 ms 00 sec 972 ms 02 002 1392000 00 sec 003 ms 00 sec 996 ms 00 003 1464000 00 sec 002 ms 00 sec 997 ms 00 000 2712000 00 sec 038 ms 00 sec 961 ms 03 001 1776000 00 sec 020 ms 00 sec 979 ms 02 002 1464000 00 sec 002 ms 00 sec 997 ms 00 003 1488000 00 sec 003 ms 00 sec 996 ms 00 # cpufreq-info current policy: frequency should be within 1.20 GHz and 1.20 GHz. In all cases, cpufreq-info always reports a 1.20GHz max limit, but cpufreq-aperf the opposite ; During high loads, frequencies manage to jump up to 2.64GHz. It took the same amount of time to compile the kernel than with my reference kernel (2.6.32) and CPU reached 72°C. I can conclude that CPU states reported by cpufreq-info are wrong. (My conky desktop monitor applet does not seem to read info from cpufreq-aperf or equivalent, because it always reports frequencies at 1.20GHz like cpufreq-info). With hpet clocksource, average frequencies at idle time are very high, higher than with tsc clocksource (which are in the first hand strangely high for idle time), specially for the CPU 0, (2,7 GHz for 3% load !!!), but it does not seem to raise the temperature. I expect to have all my virtual cores at 1.20GHz (or less) at idle time, but even with my reference kernel, this ideal state was never reached. With acpi_skip_timer_override parameter, the kernel reports at boot time some verbosities related to IO-APIC.
Thomas : * 2.6.32.15 (Kernel Reference with functionnal cpufreq) - Boot params : None --> No /sys/devices/system/cpu/cpu0/cpufreq/bios_limit * 2.6.36 (+ 4 Patches HW_COORD, SHARED_TYPE, HW_ALL, HW_STATISTICS) - Boot params for patch 4 : acpi_cpufreq.disable_average=1 - Boot params : clocksource=hpet # cat /sys/devices/system/cpu/cpu0/cpufreq/bios_limit 1199000 So, 1.20GHz. Thank you. I will investigate on the BIOS/ACPI track.
Seems like I am lucky in picking the broken CPUs...the Core2 Quad Q9550 I have also has a minor flaw, it has broken temperature sensors - those are stuck at one temperature, no matter how much load I put on it :-P BTW, I never overclocked or fiddled with any of those CPUs and bought them unused and new... So, do you need anything else from me? If it helps, with the patched kernel CPU frequency scaling tends to be a little slower to ramp the freq up and scales down a little earlier compared to the unpatched kernel with lowered up_threshold. So system responsiveness is a bit worse. You notice that, but its not too bad, it definitely WORKS as it's supposed to. I'm now probably saving a bit of power compared to before. Thanks for looking into this again! If you want me to try more sophisticated patches just let me know.
> So, do you need anything else from me? Eh, yes, I found something else: You have constant_tsc feature, but not non-stop tsc feature. CPU idle drivers mark tsc unstable if C-states are used. If C-states also affect aperf/mperf timers, they must not used as well in this case. Can you try: idle=halt boot param Argh, intel idle driver does not recognize idle= overrides. Please try idle=halt with another patch I'll provide. Or try both params (then the patch should not be needed): intel_idle.max_cstate=0 idle=halt Does cpufreq-aperf then give you sane average frequency values in the range of min/max freq and cpufreq subsystem works as expected?
Created attachment 35852 [details] intel_idle: Do not load if user overrides idle function via idle= boot param This try only makes sense if cpuidle driver got used before and the machine supports C2 and deeper sleep states, you can check (without idle=halt override) via: cat /sys/devices/system/cpu/cpu*/cpuidle/state*/name whether your machine uses deeper sleep states.
> CPU idle drivers mark tsc unstable if C-states are used. If C-states also > affect aperf/mperf timers, they must not used as well in this case. That is wrong, sorry about that: aperf/mperf timers always stop in C-states, but the resulting average frequency during C0 (not idle) must still be correct. Also the fact that only one CPU is affected very much points to defect HW. As long as this only shows up one machine it's also not worth adding extra code to the kernel. If you want to workaround this in self-built kernels, best is you remove the aperfmperf capabilties line from arch/x86/kernel/include/asm/cpufeatures.h as it's also used in the scheduler. Still waiting some days from feedback from Heinz before closing this one "documented". As it's a totally different CPU it's probably something else.
> Also the fact that only one CPU is affected very much points to defect HW. I got only one CPU, but two CPU cores - by CPU you meant "CPU core", right? > remove the aperfmperf capabilties line from arch/x86/kernel/include/asm/cpufeatures.h I'll try that with a vanilla kernel to see if I run into other problems with that. (That will remove the capability system-wide, not only for the cpufreq subsystem, right?) If it helps: I installed Windows XP on another partition on my affected laptop, frequency scaling works just fine there...
Another possibility that does not require kernel rebuilding is to use the userspace governor and a cpufreq userspace daemon. I expect they do not use aperf/mperf. Here is a nice overview of packages that are out there: http://www.gentoo.org/doc/en/power-management-guide.xml Other CPU Speed Utilities > by CPU you meant "CPU core", right? Yes.
Concerning the changes to arch/x86/kernel/include/asm/cpufeatures.h: I can't just kill the line #define X86_FEATURE_APERFMPERF (3*32+28) /* APERFMPERF */ since this will prevent the kernel from being compiled. For now I changed the line to #define X86_FEATURE_APERFMPERF (6*32+11) /* APERFMPERF */ which changes the checked feature to SSE5 instead of AMPERF - my CPU does not support SSE5 and this should report a non-set bit. Not the most elegant solution and of course not really portable to newer cpus, but it sure does the trick. Do you have a better idea to hard-code that feature to zero in the kernel code? Is there a CPUID bit that's 0 by default for all processors? I looked at related kernel code, but I don't think there's an easier or better fix at another place in the code, as the way I choose to go makes the feature reported as non-present for all code in the kernel.
> since this will prevent the kernel from being compiled. Yep, you would need to touch the other places where it gets evaluated as well. Easiest for you should be to take the patch you verified working from comment #34 (with the boot/module param described somewhere...): acpi-cpufreq: Provide param to disable average HW statistics for broken timers If you want to make sure scheduler also does not make use of the timers you can additionally add something like: struct cpuinfo_x86 *c; ... c = &cpu_data(cpu); clear_cpu_cap(c, X86_FEATURE_APERFMPERF); This is an ugly hack, but should work for you.
I modified to patch like you suggested: + if (cpu_has(c, X86_FEATURE_APERFMPERF)) { + if (disable_average) { + printk(KERN_INFO "acpi-cpufreq: average (aperf/mperf) " + "accounting disabled by user\n"); + clear_cpu_cap(c, X86_FEATURE_APERFMPERF); + } + else + acpi_cpufreq_driver.getavg = cpufreq_get_measured_perf; + } Will the scheduler adapt to the changed cpu features with only that single change? Since the sched code probably gets initialized earlier, I wonder if it will notice the change. Anyways, the system feels a bit more responsive with the modification to the patch, but that might be a psychologic deception since I know that I made that change...
Hi all, Thank you very much Thomas for your hint. As you suggested me to check the BIOS limit (which was 1.20 GHz in my case), this morning I set in my BIOS for all modes (AC/DC Power + Battery) the "Performance" profile, instead of "Power-saving" one. Now the BIOS limit is 2.40GHz, and cpufreq manage to do his job, without any patches (2.6.36). Thank you very much again, tracking this bug was very instructive. ^^
> Will the scheduler adapt to the changed cpu features with only that single > change? Yes, the only part I found using it always checks for: if (has_cpu_cap(X86_FEATURE_APERFMPERF)) use_it() So unsetting the cap effectively disables usage there on the next schedule. This may change in the future, as said it's a hack... Perfect would be a boot param disbable_cpu_cap=xy, but as most of these caps are evaluated early it's hard to implement. vyncere: Looks like Linux works correctly according to your BIOS settings :) Not sure whether I should set this invalid. I set it documented, if there are more machines with broken aperf/mperf timers, they might find it and if this should be a more common issue, it still can be thought of a fix/workaround for the kernel.
> if there are more machines with broken aperf/mperf timers Thomas, In what way are the aperf/mperf timers broken on this box?
There already seem to go something wrong with TSC at early boot: Fast TSC calibration failed TSC: PIT calibration matches HPET. 1 loops Most interesting are comments 33, 34 and 35. cpufreq-aperf (measuring the average freq using aperf/mperf) shows frequencies around 500 MHz which is wrong (afaik the cpu only supports 800 and higher freqs). That is the reason why the cpufreq subsystem, taking aperf values to calculate the next frequency into account never raises the frequency. Removing the cpufreq code to look at aperf/mperf values to calculate the next desired frequency fixed the problem for Peter and the machine starts switching frequencies as expected. Be aware that vyncere's problem is something totally different, but that came out later in the bug. Just a guess: Could it be that the BIOS misconfigured some clock multiplier and tsc and mperf are running to slow? I didn't look at the rate tsc/mperf/aperf are really running at.
I believe I'm having a similar problem. My hardware is a Dell Latitude E5420 with an i5-2520M CPU. I have the latest Dell BIOS for this system. I'm currently on the Ubuntu kernel 2.6.38-10-generic, and after I resume from suspend, CPU frequency scaling no longer works. Before suspending, cpufreq-aperf shows sane values. Afterwards, it gives frequencies of about 625MHz: cpufreq-aperf CPU Average freq(KHz) Time in C0 Time in Cx C0 percentage 000 0625250 00 sec 077 ms 00 sec 922 ms 07 001 0625250 00 sec 007 ms 00 sec 992 ms 00 002 0600240 00 sec 098 ms 00 sec 901 ms 09 003 0625250 00 sec 002 ms 00 sec 997 ms 00 I believe that, because of this, the cpufreq scaling won't bump things up. "That is the reason why the cpufreq subsystem, taking aperf values to calculate the next frequency into account never raises the frequency." Any idea why it would only occur after a suspend/resume cycle and what I can do to fix the issue?
An update on my post above. Disabling TurboBoost in the BIOS seems to have resolved the issue. I wonder if TurboBoost somehow causes mis-reported aperf statistics. Interestingly enough, the ratios are the same as the turboboost ratio on this CPU. That is, 3.2GHz (Turbo)/2.5 GHz (Stock) == 800 MHz (Real minimum)/625 MHz (reported minimum via aperf). I don't know if it matters, just thought it was notable.
You could use tools/power/x86/turbostat.c from the latest mainline kernel and replace two lines: print_counters(cnt_delta); with dump_cnt(cnt_delta); and compare with/without turboboost. You could also use tools/power/cpupower/ with debug option compile in (Makefile) and the cpupower -d monitor -m Mperf but this won't be that nicely formatted. You may be able to disable turboboost at runtime via a MSR read, mask out a bit and write the value back. According to chapter: 14.3.2.2 OS Control of Opportunistic Processor Performance Operation of Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A the bit is bit 32 (starting from 0) of the IA32_PERF_CTL MSR (0199H) MSR register. You have to make sure msr driver is compiled in or as module (modprobe msr) then you can use msr-tools: rdmsr 0x199 will show you the 64 bit register. If you boot with turboboost enabled you find bit 32 set otherwise unset. If I haven't overseen something you can enable/disable turbo mode via: IA32_PERF_CTL=`rdmsr 0x199` # disable wrmsr -a 0x199 $((~(1 << 32) & $IA32_PERF_CTL)) # enable wrmsr -a 0x199 $(((1 << 32) | $IA32_PERF_CTL)) -a option only exists in latest msr-tools git version which can be found here: git://git.kernel.org/pub/scm/utils/cpu/msr-tools/msr-tools.git Something to play with..., hopefully you find out something pointing to the root cause...
You can verify whether turboboost is enabled/disabled by: - utilizing one core: cat /dev/zero >/dev/null & - run turbostat or "cpupower monitor -m Mperf" and double check average frequency of the utilized core whether it got boosted
The issue reported by the original poster is gone. Other people have come in and out of this bug report, some leaving with their issue solved, some not. If yours is not fixed, please open a new bug, because this one is closed.