Latest working kernel version: Earliest failing kernel version:2.6.27.10 Distribution:Gentoo Hardware Environment:ThinkPad 41 Software Environment: Problem Description: Starting with kernel 2.6.27.10 I observed sometimes that my ThinkPad T41 stays at CPU frequency of 600 MHz when I run a lot of processes with nice level 3 and few more at 19 - even if I have some foreground jobs which normally force the CPU to go to at 1700 MHz A typical scenario is to compile glibc at a Gentoo system mit "make -j 2" and then run another make job at nice level 0 or try to start firefox (which start time itself is OTOH slow enough even with 1.7 GHz). I can quantify this observation with a quick&dirty command line like : $> time factor 819734028463158891 I would expect a real value of 6-7 seconds and a user value of 5-6 seconds. However when I used it 2 times in a row I got : tfoerste@n22 ~ $ time factor 819734028463158891 819734028463158891: 3 273244676154386297 real 0m51.658s user 0m15.691s sys 0m0.013s tfoerste@n22 ~ $ time factor 819734028463158891 819734028463158891: 3 273244676154386297 real 0m19.136s user 0m6.944s sys 0m0.044s BTW during startup I set : $>echo 1 > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load My system: tfoerste@n22 ~ $ uname -a Linux n22 2.6.27-gentoo-r7 #12 Sun Dec 28 18:26:57 CET 2008 i686 Intel(R) Pentium(R) M processor 1700MHz GenuineIntel GNU/Linux The appropriate kernel config values are : tfoerste@n22 ~/devel/wireshark/docbook $ zgrep -e GOV -e FREQ /proc/config.gz | grep -v '#' CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_TABLE=y CONFIG_CPU_FREQ_STAT=m CONFIG_CPU_FREQ_STAT_DETAILS=y CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_POWERSAVE=m CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_X86_ACPI_CPUFREQ=m CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y Steps to reproduce:
Are you able to identify any earlier kernel version which didn't have this problem>
What is the latest working kernel version? Is it 2.6.27.9? And what is the load when you are observing this ? (use top, user/system/idle load %)
I'll test other kernel versions at weekend, but FWIW if I run this as user root : $>cd /usr/portage/sys-libs/glibc && nice -n 3 ebuild glibc-2.6.1.ebuild compile and run in factor also as root, than the cpu frequency is immediately changed to max . However running factor as a normal user doesn't increase the cpu frequency.
Yes. ignore_nice_load was broken in recent ondemand. Reported earlier here bugzilla #12310 and I have a test patch there. Can you check whether that resolves the issue.
(In reply to comment #4) > Yes. ignore_nice_load was broken in recent ondemand. Reported earlier here > bugzilla #12310 and I have a test patch there. Can you check whether that > resolves the issue. > I get this while applying to 2.6.27.10 : n22 /usr/src/linux # patch -p1 <../ondemand_ignore_nice_fix.patch patching file drivers/cpufreq/cpufreq_ondemand.c Hunk #1 succeeded at 107 with fuzz 2 (offset -10 lines). Hunk #2 FAILED at 123. Hunk #3 FAILED at 288. Hunk #4 FAILED at 391. Hunk #5 FAILED at 564. 4 out of 5 hunks FAILED -- saving rejects to file drivers/cpufreq/cpufreq_ondemand.c.rej :-(
2.6.26-gentoo-r4 (== 2.6.26.8) works fine, 2.6.28 gives for "real" values between 15 and 19 seconds and for "user" values between 8 and 10 seconds. Running in parallel as user root the command "watch -n 1 cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq" shows that the current frequency alternates between 600 and 1700 MHz. BTW: tfoerste@n22 ~ $ zgrep HZ /proc/config.gz | grep -v '#' CONFIG_NO_HZ=y CONFIG_HZ_1000=y CONFIG_HZ=1000 tfoerste@n22 ~ $
Sorry. I was wrong earlier. This problem is not same as bug #12310. This looks to be a different problem. Can you make sure cpufreq_stats modules is loaded and dump # grep . /sys/devices/system/cpu/cpu*/cpufreq/*/* before and after your good (root) and bad (normal user) runs and attach it here? Thanks.
While emerging the ffmpeg package with nice level 3 I run in parallel as normal user (tfoerste) the factor command (alais "perf") and got : tfoerste@n22 ~ $ perf Thu Feb 5 10:54:44 CET 2009 819734028463158891: 3 273244676154386297 real 0m15.374s user 0m8.983s sys 0m0.009s During that time these are the values you requested: tfoerste@n22 ~/devel/wireshark $ echo tfoerste tfoerste tfoerste@n22 ~/devel/wireshark $ gov.sh; echo; grep . /sys/devices/system/cpu/cpu*/cpufreq/*/* governor : ondemand min_freq : 600000 max_freq : 1700000 cur_freq : 600000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load:1 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/powersave_bias:0 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate:500000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_max:250000000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_min:250000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold:80 After that I run as user root the same perf alias and got : n22 ~ # perf Thu Feb 5 10:55:07 CET 2009 819734028463158891: 3 273244676154386297 real 0m25.030s user 0m15.741s sys 0m0.039s while the stats values sems to be equal : tfoerste@n22 ~/devel/wireshark $ echo root root tfoerste@n22 ~/devel/wireshark $ gov.sh; echo; grep . /sys/devices/system/cpu/cpu*/cpufreq/*/* governor : ondemand min_freq : 600000 max_freq : 1700000 cur_freq : 600000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load:1 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/powersave_bias:0 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate:500000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_max:250000000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_min:250000 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold:80
sr y- didn#t loaded the stats modules, loaded it and re-run the 2 tests. But now I got the same results both for normal user and root : tfoerste@n22 ~ $ perf Thu Feb 5 11:04:07 CET 2009 819734028463158891: 3 273244676154386297 real 0m24.915s user 0m15.766s sys 0m0.003s tfoerste@n22 ~ $ su - Password: n22 ~ # perf Thu Feb 5 11:04:39 CET 2009 819734028463158891: 3 273244676154386297 real 0m26.369s user 0m15.816s sys 0m0.011s I redirected the stats to thses files : tfoerste@n22 ~/devel/wireshark $ grep . /sys/devices/system/cpu/cpu*/cpufreq/*/* > tfoerste tfoerste@n22 ~/devel/wireshark $ grep . /sys/devices/system/cpu/cpu*/cpufreq/*/* > root which I'll attache with the next 2 replys.
Created attachment 20121 [details] stats for root
Created attachment 20122 [details] stats for tfoerste
BTW, here's a simple scenario to reproduce the behaviour w/o having some unwanted side effects w/ a niced compile job : Run in one terminal : $> while [[ true ]]; do nice -3 factor 819734028463158891; done and in another terminal : $>time factor 819734028463158891
I could narrow down that issue to a change between 2.6.19 and 2.6.20 (sure !). With 2.6.19 ondemand works fine both with speedstep-centrino and acpi-cpufreq. With 2.6.20 ondemand shows the wrong behaviour to stay at low frequency at my ThinkPad T41 (Pentium M). I tried to bisect it yeszterday night but due to the fact that too often a bisected version couldn't be tested (modprobe failed) - I cannot give more details. I'll attach a config which I used for bisecting.
Created attachment 20206 [details] .config to reproduce theissue
Created attachment 20221 [details] .config to reproduce the issue With the attached config I bisected again and found eventually this bad commit : commit dde9f7ba60adac0cade262ab9b17654e93c626e2 Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Date: Tue Oct 3 12:33:14 2006 -0700 [CPUFREQ][3/8] acpi-cpufreq: Pull in MSR based transition support Add in the support for Intel Enhanced Speedstep - MSR based transitions. With this change, the ACPI based support in speedstep-centrino can be deprecated and duplicate code in that driver can be marked for removal. Much easier to maintain and support this way. This also reduces the user misconfigurations and questions on which driver is to be used under which CPUs to support Enhanced Speedstep. Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com> Signed-off-by: Dave Jones <davej@redhat.com> For the bisect I compiled both acpi-cpufreq and speedstep-centrino as modules but always tried to load acpi-cpufreq first. That wasn't always succesful due to a fatal error while modprobing for it. In such a case to test the ondemand governor I modprobed for speedstep-centrino (whihc always was succesful). The bisect test itself were done in this manner: The following command was run at console 1: $> while [[ true ]]; do nice -3 factor 273244676154386297; done then I logged in into console 2 and run : $>time factor 273244676154386297 If the 2nd command needed much longer than 5.5 sec (usually 15 seconds) the commit was marked as bad. This always happened with acpi-cpufreq. If however a particulat commit was marked as good that result was sometimes derived with the use of the speedstep-centrino module.
This issue now occurs in the current stable kernel 2.6.28.8 too.
And FWIW running the BOINC client with nice level 19 results into a perdiodic alternation of the CPU frequency between 600 MHz nad 1700 MHz : http://forums.gentoo.org/viewtopic-t-747140-highlight-.html
It looks like an issue with the timer. The current running kernel runs fine and does not show this issue. However the (same) kernel version but booted at another day showed this issue. The only seen difference within dmesg is this : tfoerste@n22 ~ $ grep calibration dmesg-2.6.27-gentoo-r10 tmp/dmesg-2.6.27-gentoo-r10 dmesg-2.6.27-gentoo-r10:TSC: PIT calibration confirmed by PMTIMER. dmesg-2.6.27-gentoo-r10:TSC: using PIT calibration value tmp/dmesg-2.6.27-gentoo-r10:TSC: PIT calibration confirmed by PMTIMER. tmp/dmesg-2.6.27-gentoo-r10:TSC: using PMTIMER calibration value The file within ~/tmp was from a running kernel which had have the issue, the dmesg file within my home directory is from the current kernel (which runs fine). BTW IIRC Linus posted some days ago a patch where the topic of that thread was related to HPET timers ...
Hardware went into the ThinkPad paradise in the mean while and current kernel works fine.