Bug 12385 - CPU ondemand governor doesn't work for ThinkPad T41
Summary: CPU ondemand governor doesn't work for ThinkPad T41
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Venkatesh Pallipadi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-08 02:14 UTC by Toralf Förster
Modified: 2011-07-30 05:51 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.28.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
stats for root (1.79 KB, text/plain)
2009-02-05 02:07 UTC, Toralf Förster
Details
stats for tfoerste (1.79 KB, text/plain)
2009-02-05 02:08 UTC, Toralf Förster
Details
.config to reproduce theissue (20.71 KB, text/plain)
2009-02-12 01:14 UTC, Toralf Förster
Details
.config to reproduce the issue (19.46 KB, text/plain)
2009-02-12 11:14 UTC, Toralf Förster
Details

Description Toralf Förster 2009-01-08 02:14:05 UTC
Latest working kernel version:
Earliest failing kernel version:2.6.27.10
Distribution:Gentoo
Hardware Environment:ThinkPad 41
Software Environment:
Problem Description:
Starting with kernel 2.6.27.10 I observed sometimes that my ThinkPad T41 stays 
at CPU frequency of 600 MHz when I run a lot of processes with nice level 3 
and few more at 19 - even if I have some foreground jobs which normally force 
the CPU to go to at 1700 MHz 
A typical scenario is to compile glibc at a Gentoo system mit "make -j 2" and 
then run another make job at nice level 0 or try to start firefox (which 
start time itself is OTOH slow enough even with 1.7 GHz). I can quantify this 
observation with a quick&dirty command line like :

$> time factor 819734028463158891

I would expect a real value of 6-7 seconds and a user value of 5-6 seconds. 
However when I used it 2 times in a row I got :

tfoerste@n22 ~ $ time factor 819734028463158891
819734028463158891: 3 273244676154386297

real    0m51.658s
user    0m15.691s
sys     0m0.013s

tfoerste@n22 ~ $ time factor 819734028463158891
819734028463158891: 3 273244676154386297

real    0m19.136s
user    0m6.944s
sys     0m0.044s

BTW during startup I set :

$>echo 1 > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load

My system:

tfoerste@n22 ~ $ uname -a
Linux n22 2.6.27-gentoo-r7 #12 Sun Dec 28 18:26:57 CET 2008 i686 Intel(R) 
Pentium(R) M processor 1700MHz GenuineIntel GNU/Linux

The appropriate kernel config values are :

tfoerste@n22 ~/devel/wireshark/docbook $ zgrep -e GOV -e 
FREQ /proc/config.gz  | grep -v '#'
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y

Steps to reproduce:
Comment 1 Andrew Morton 2009-01-08 02:22:20 UTC
Are you able to identify any earlier kernel version which didn't have this problem>
Comment 2 Éric Piel 2009-01-08 02:23:41 UTC
What is the latest working kernel version? Is it 2.6.27.9? 

And what is the load when you are observing this ? (use top, user/system/idle load %)
Comment 3 Toralf Förster 2009-01-08 12:08:18 UTC
I'll test other kernel versions at weekend, but FWIW if I run this as user root :

$>cd /usr/portage/sys-libs/glibc && nice -n 3 ebuild glibc-2.6.1.ebuild compile  

and run in factor also as root, than the cpu frequency is immediately changed to max . However running factor as a normal user doesn't increase the cpu frequency.
Comment 4 Venkatesh Pallipadi 2009-01-08 12:13:23 UTC
Yes. ignore_nice_load was broken in recent ondemand. Reported earlier here
bugzilla #12310 and I have a test patch there. Can you check whether that resolves the issue.
Comment 5 Toralf Förster 2009-01-08 12:28:23 UTC
(In reply to comment #4)
> Yes. ignore_nice_load was broken in recent ondemand. Reported earlier here
> bugzilla #12310 and I have a test patch there. Can you check whether that
> resolves the issue.
> 

I get this while applying to 2.6.27.10 :

n22 /usr/src/linux # patch -p1 <../ondemand_ignore_nice_fix.patch
patching file drivers/cpufreq/cpufreq_ondemand.c
Hunk #1 succeeded at 107 with fuzz 2 (offset -10 lines).
Hunk #2 FAILED at 123.
Hunk #3 FAILED at 288.
Hunk #4 FAILED at 391.
Hunk #5 FAILED at 564.
4 out of 5 hunks FAILED -- saving rejects to file drivers/cpufreq/cpufreq_ondemand.c.rej

:-(
Comment 6 Toralf Förster 2009-01-10 01:44:53 UTC
2.6.26-gentoo-r4 (== 2.6.26.8) works fine, 2.6.28 gives for "real" values between 15 and 19 seconds and for "user" values between 8 and 10 seconds.

Running in parallel as user root the command "watch -n 1 cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq" shows that the current frequency alternates between 600 and 1700 MHz.

BTW:

tfoerste@n22 ~ $ zgrep HZ /proc/config.gz  | grep -v '#'
CONFIG_NO_HZ=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
tfoerste@n22 ~ $
Comment 7 Venkatesh Pallipadi 2009-02-04 14:42:26 UTC
Sorry. I was wrong earlier. This problem is not same as bug #12310.

This looks to be a different problem. Can you make sure cpufreq_stats modules is loaded and dump
# grep . /sys/devices/system/cpu/cpu*/cpufreq/*/*
before and after your good (root) and bad (normal user) runs and attach it here?

Thanks.
Comment 8 Toralf Förster 2009-02-05 01:59:53 UTC
While emerging the ffmpeg package with nice level 3 I run in parallel as normal user (tfoerste) the factor command (alais "perf") and got :

tfoerste@n22 ~ $ perf
Thu Feb  5 10:54:44 CET 2009
819734028463158891: 3 273244676154386297

real    0m15.374s
user    0m8.983s
sys     0m0.009s

During that time these are the values you requested:

tfoerste@n22 ~/devel/wireshark $ echo tfoerste
tfoerste
tfoerste@n22 ~/devel/wireshark $ gov.sh; echo; grep . /sys/devices/system/cpu/cpu*/cpufreq/*/*
governor : ondemand
min_freq : 600000
max_freq : 1700000
cur_freq : 600000

/sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load:1
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/powersave_bias:0
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate:500000
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_max:250000000
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_min:250000
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold:80


After that I run as user root the same perf alias and got :
n22 ~ # perf
Thu Feb  5 10:55:07 CET 2009
819734028463158891: 3 273244676154386297

real    0m25.030s
user    0m15.741s
sys     0m0.039s


while the stats values sems to be equal :

tfoerste@n22 ~/devel/wireshark $ echo root
root
tfoerste@n22 ~/devel/wireshark $ gov.sh; echo; grep . /sys/devices/system/cpu/cpu*/cpufreq/*/*
governor : ondemand
min_freq : 600000
max_freq : 1700000
cur_freq : 600000

/sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load:1
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/powersave_bias:0
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate:500000
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_max:250000000
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate_min:250000
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold:80
Comment 9 Toralf Förster 2009-02-05 02:06:58 UTC
sr y- didn#t loaded the stats modules, loaded it and re-run the 2 tests.
But now I got the same results both for normal user and root :
tfoerste@n22 ~ $ perf
Thu Feb  5 11:04:07 CET 2009
819734028463158891: 3 273244676154386297

real    0m24.915s
user    0m15.766s
sys     0m0.003s
tfoerste@n22 ~ $ su -
Password:
n22 ~ # perf
Thu Feb  5 11:04:39 CET 2009
819734028463158891: 3 273244676154386297

real    0m26.369s
user    0m15.816s
sys     0m0.011s

I redirected the stats to thses files :
tfoerste@n22 ~/devel/wireshark $ grep . /sys/devices/system/cpu/cpu*/cpufreq/*/* > tfoerste
tfoerste@n22 ~/devel/wireshark $ grep . /sys/devices/system/cpu/cpu*/cpufreq/*/* > root
which I'll attache with the next 2 replys.
Comment 10 Toralf Förster 2009-02-05 02:07:50 UTC
Created attachment 20121 [details]
stats for root
Comment 11 Toralf Förster 2009-02-05 02:08:15 UTC
Created attachment 20122 [details]
stats for tfoerste
Comment 12 Toralf Förster 2009-02-05 06:47:33 UTC
BTW, here's a simple scenario to reproduce the behaviour w/o having some unwanted side effects w/ a niced compile job :

Run in one terminal :

$> while [[ true ]]; do nice -3 factor 819734028463158891; done

and in another terminal :

$>time factor 819734028463158891
Comment 13 Toralf Förster 2009-02-12 01:13:52 UTC
I could narrow down that issue to a change between 2.6.19 and 2.6.20 (sure !).
With 2.6.19 ondemand works fine both with speedstep-centrino and acpi-cpufreq.
With 2.6.20 ondemand shows the wrong behaviour to stay at low frequency at my ThinkPad T41 (Pentium M).
I tried to bisect it yeszterday night but due to the fact that too often a bisected version couldn't be tested (modprobe failed) - I cannot give more details.
I'll attach a config which I used for bisecting.
Comment 14 Toralf Förster 2009-02-12 01:14:26 UTC
Created attachment 20206 [details]
.config to reproduce theissue
Comment 15 Toralf Förster 2009-02-12 11:14:28 UTC
Created attachment 20221 [details]
.config to reproduce the issue

With the attached config I bisected again and found eventually this bad commit :

commit dde9f7ba60adac0cade262ab9b17654e93c626e2
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Tue Oct 3 12:33:14 2006 -0700

    [CPUFREQ][3/8] acpi-cpufreq: Pull in MSR based transition support

    Add in the support for Intel Enhanced Speedstep - MSR based transitions.
    With this change, the ACPI based support in speedstep-centrino can be
    deprecated and duplicate code in that driver can be marked for removal.
    Much easier to maintain and support this way. This also reduces the
    user misconfigurations and questions on which driver is to be used
    under which CPUs to support Enhanced Speedstep.

    Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
    Signed-off-by: Dave Jones <davej@redhat.com>

For the bisect I compiled both acpi-cpufreq and speedstep-centrino as modules but always tried to load acpi-cpufreq first. That wasn't always succesful due to a fatal error while modprobing for it. In such a case to test the ondemand governor I modprobed for speedstep-centrino (whihc always was succesful).

The bisect test itself were done in this manner:

The following command was run at console 1:
$> while [[ true ]]; do nice -3 factor 273244676154386297; done

then I logged in into console 2 and run :

$>time factor 273244676154386297

If the 2nd command needed much longer than 5.5 sec (usually 15 seconds) the commit was marked as bad. This always happened with acpi-cpufreq.

If however a particulat commit was marked as good that result was sometimes derived with the use of the speedstep-centrino module.
Comment 16 Toralf Förster 2009-03-17 03:15:17 UTC
This issue now occurs in the current stable kernel 2.6.28.8 too.
Comment 17 Toralf Förster 2009-03-17 06:30:38 UTC
And FWIW running the BOINC client with nice level 19 results into a perdiodic alternation of the CPU frequency between 600 MHz nad 1700 MHz : http://forums.gentoo.org/viewtopic-t-747140-highlight-.html
Comment 18 Toralf Förster 2009-03-24 05:13:49 UTC
It looks like an issue with the timer. The current running kernel runs fine and does not show this issue. However the (same) kernel version but booted at another day showed this issue. The only seen difference within dmesg is this :

tfoerste@n22 ~ $ grep calibration dmesg-2.6.27-gentoo-r10  tmp/dmesg-2.6.27-gentoo-r10
dmesg-2.6.27-gentoo-r10:TSC: PIT calibration confirmed by PMTIMER.
dmesg-2.6.27-gentoo-r10:TSC: using PIT calibration value

tmp/dmesg-2.6.27-gentoo-r10:TSC: PIT calibration confirmed by PMTIMER.
tmp/dmesg-2.6.27-gentoo-r10:TSC: using PMTIMER calibration value

The file within ~/tmp was from a running kernel which had have the issue, the dmesg file within my home directory is from the current kernel (which runs fine).

BTW IIRC Linus posted some days ago a patch where the topic of that thread was related to HPET timers ...
Comment 19 Toralf Förster 2010-01-07 15:14:08 UTC
Hardware went into the ThinkPad paradise in the mean while and current kernel works fine.

Note You need to log in before you can comment on or make changes to this bug.