Bug 4379 - Default sampling rates for ondemand governor are too high on a amd64
Summary: Default sampling rates for ondemand governor are too high on a amd64
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: cpufreq
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-21 04:21 UTC by Gunther Piez
Modified: 2009-03-24 05:39 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.11, 2.6.12-rc5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Reduce the default sampling rate to check more often (1.05 KB, patch)
2006-02-05 02:46 UTC, Éric Piel
Details | Diff
Scheduler-driven very rapid increase in CPU speed in response to load spikes. (4.09 KB, patch)
2009-02-03 18:15 UTC, Jim Bray
Details | Diff

Description Gunther Piez 2005-03-21 04:21:48 UTC
Distribution: Gentoo   
Hardware Environment: AMD Athlon(tm) 64 Processor 3200+, Clawhammer, 754   
Software Environment:   
Problem Description: Default sampling rates for ondemand governor are too high 
on a amd64. This leads to bad responsivness for desktop apps.  
   
Steps to reproduce: 
 
I gave the ondemand governor a try and noticed a slow system (bad 
responsivness).  
  
The frequencies/voltages seemed to switch fine, so I wondered where this slow 
feeling came from.  
 
I tried  
  
# watch -n 0.1 cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq  
  
and watched the output while starting "kdevelop" (huge, bloated, very nice 
app). With max frequency this needs something about 3 seconds to start, if 
everything is in the disk cache. Now it needed 5-6 seconds. I noticed the 
frequency didn't go up at the the moment I started it, but about 1-2 seconds 
later, when app was almost loaded.  
Further investigating, I inspected the values 
at /sys/devices/system/cpu/cpu0/cpufreq/ondemand .  
  
It turns out, that on my system (clawhammer 754, 2 GHz, 1MB 2nd Level Cache) 
the default sampling rates are quite high:  
 
ondemand # cat sampling_rate  
1240000  
ondemand # cat sampling_down_factor  
10  
  
Which means, for a frequency transition upwards, the cpu usage is sampled 
every 1,24 seconds, for scaling downwards every 12,4 (!) seconds. Which means, 
every action which needs less than about a second (for instance opening a 
konqueror window) is likely to be done at slow speed (800 MHz in my case), 
others, which take longer than the sampling period, get fullspeed only after a 
noticable latency, and so, need up to 3 seconds longer to complete (1,24 
seconds * 2.5 speed factor). Which was exactly what i observed. 
 
Additionally, the scaling down takes 12,4 seconds, during the this time the 
cpu will burn with 2 GHz doing essentially nothing. This is a unneccessary 
power consumption. 
 
I think sampling_rate_min and sampling_rate_max are way too high. What use may 
a 620 second sampling time have? On the other hand, it can't be faster than 
0,62 s which reduces the lag to 1.5 seconds. For me this is not sufficient. 
 
The intel cpus seem to have faster transition times (by a factor of 10), in 
this case, the effect is probably not noticable. 
 
I suggest a fixed default sampling time (say like 50 ms), making it a factor 
(currently 1000) of cpuinfo.transition_latency leads to bad behavior, if the 
transition latency is not in the "intel range". The value of sampling_rate_min 
should be adjusted downwards.
Comment 1 Andrew Morton 2005-05-25 16:42:46 UTC
Is this problem still present in 2.6.12-rc5?
Comment 2 Gunther Piez 2005-05-27 04:50:17 UTC
Yes, no change at all.  
 
Instead of using some arbitrary values for the sampling rates (1000 times the 
transition latency, where does the 1000 come from?), I am using some other 
arbitrary value (50 ms, because there is no noticable delay anymore) :-) 
 
I can't imagine a scenario, where it is useful to have a sampling rate 
dependant on the transition time. If the latter is very slow, this governor 
wouldn't work anyway, if it is very fast, it will eat up cpu cycles. 
 
Since 2 month I am using a fixed sampling time of 50 ms, this works fine for 
me. 
 
 
  
 --- /usr/src/linux-2.6.12-rc5/drivers/cpufreq/cpufreq_ondemand.c.orig   
2005-05-27 13:39:48.000000000 +0200 
+++ /usr/src/linux-2.6.12-rc5/drivers/cpufreq/cpufreq_ondemand.c        
2005-05-27 13:41:50.000000000 +0200 
@@ -42,20 +42,17 @@ 
 #define MAX_FREQUENCY_DOWN_THRESHOLD           (100) 
 
 /* 
- * The polling frequency of this governor depends on the capability of 
- * the processor. Default polling frequency is 1000 times the transition 
- * latency of the processor. The governor will work on any processor with 
- * transition latency <= 10mS, using appropriate sampling 
- * rate. 
+ * The default polling frequency is 50 ms. The governor will 
+ * work on any processor with transition latency <= 10mS, 
+ * using an appropriate sampling rate. 
  * For CPUs with transition latency > 10mS (mostly drivers with 
CPUFREQ_ETERNAL) 
  * this governor will not work. 
  * All times here are in uS. 
  */ 
-static unsigned int                            def_sampling_rate; 
-#define MIN_SAMPLING_RATE                      (def_sampling_rate / 2) 
-#define MAX_SAMPLING_RATE                      (500 * def_sampling_rate) 
-#define DEF_SAMPLING_RATE_LATENCY_MULTIPLIER   (1000) 
-#define DEF_SAMPLING_DOWN_FACTOR               (10) 
+#define DEF_SAMPLING_RATE                      (50 * 1000) 
+#define MIN_SAMPLING_RATE                      (DEF_SAMPLING_RATE / 5) 
+#define MAX_SAMPLING_RATE                      (200 * DEF_SAMPLING_RATE) 
+#define DEF_SAMPLING_DOWN_FACTOR               (4) 
 #define TRANSITION_LATENCY_LIMIT               (10 * 1000) 
 #define sampling_rate_in_HZ(x)                 (((x * HZ) < (1000 * 1000))?1:
((x * HZ) / (1000 * 1000))) 
 
@@ -412,16 +409,8 @@ 
                 * is used for first time 
                 */ 
                if (dbs_enable == 1) { 
-                       unsigned int latency; 
-                       /* policy latency is in nS. Convert it to uS first */ 
 
-                       latency = policy->cpuinfo.transition_latency; 
-                       if (latency < 1000) 
-                               latency = 1000; 
- 
-                       def_sampling_rate = (latency / 1000) * 
-                                       DEF_SAMPLING_RATE_LATENCY_MULTIPLIER; 
-                       dbs_tuners_ins.sampling_rate = def_sampling_rate; 
+                       dbs_tuners_ins.sampling_rate = DEF_SAMPLING_RATE; 
 
                        dbs_timer_init(); 
                } 
 
Comment 3 Dominik Brodowski 2005-11-15 13:53:49 UTC
Is this problem still present in 2.6.14?
Comment 4 Éric Piel 2006-02-05 02:46:00 UTC
Created attachment 7248 [details]
Reduce the default sampling rate to check more often

Hello,

Could you try the attached patch against 2.6.15 or 2.6.16 and confirm it solves
your problem ? As there is now a check to avoid polling too fast, it should not
be dangerous to check more often.

Eric
Comment 5 Natalie Protasevich 2007-11-17 23:02:08 UTC
Eric, what about commit df8b59be0976c56820453730078bef99a8d1dbda, "Avoid the ondemand cpufreq governor to use a too high frequency for stats" - does it resolve this problem?
Thanks.
Comment 6 Éric Piel 2007-11-19 05:17:41 UTC
(Sorry for answering late, I haven't received directly any email, just saw the message on the mailing list.)

What I was talking about in comment #4 was this commit. It's to keep the the frequency low enough compared to the scheduler. The attached patch, on the contrary, _increases_ the frequency. 

Unfortunately Gunther hasn't confirm this patch helps with his problem. However, now the sampling down is done as often as the sampling up so it should be less problematic anyway. 1000 is a very conservative value but nowadays, with the great hunt for reducing the wakeups that can probably be considered fine. It would be nice if Gunther could confirm the behavior is currently better (slow down happens fast enough)...
Comment 7 Fionn Behrens 2008-05-14 06:39:32 UTC
This problem has has been HUGELY WORSENED in kernel 2.6.25 since someone apparently opted to RAISE the minimum sampling rate to whopping 2500000.
Even with the minimum settings, the ondemand governor now acts so sluggish that it is basically unusuable. I had to switch to userspace governor and powernowd which I consider a significant step backwards.

Would someone with sufficient rights please add 2.6.25 to the kernel version in bugzilla? Otherwise I'll open a new bug for that version soon.
Comment 8 Fionn Behrens 2008-05-14 06:44:57 UTC
argh argh argh. sorry. my last comment does NOT apply to the ondemand but to conservative governor. I'll open a separate bug for that

Sorry again. tried not to duplicate anything but should have read better
Comment 9 Thomas Renninger 2008-08-06 07:26:33 UTC
For latest kernels:
cpufreq is rather broken with tickless idle from what I saw.
Venkatesh posted some patches on cpufreq recently (not included in .27-rcX (yet?)).
This may be the reason for:
> This problem has has been HUGELY WORSENED in kernel 2.6.25
and nohz=off might help?
Comment 10 Jarl Friis 2008-08-07 04:43:00 UTC
(In reply to comment #9)
> This may be the reason for:
> > This problem has has been HUGELY WORSENED in kernel 2.6.25
> and nohz=off might help?

In comment 8 Fionn Behrens writes that comment 7 containing this statement was not intended for this bug. So I interpret that as Fionn taking back the statement "This problem has has been HUGELY WORSENED in kernel 2.6.25"

Jarl
Comment 11 Thomas Renninger 2008-08-11 15:22:28 UTC
I do not get the idea of MAX_SAMPLING_RATE.
It limits the value so that user space cannot set too high sampling rate values. But why should this be limited?
Only MIN_SAMPLING_RATE really makes sense, this is when HW cannot switch fast enough..., but the user can set the value as high as he likes and I do not see why limiting it according to HW latency would make sense.

Next is that if latency is really reported that high by HW (may be wrong!) as described in the description (from 2005...), the default sampling rate calculated from that should not be higher than say (taken from a userspace governor) 333ms which is know to work with all/most HW.

Too lazy to create attachments, this can also easier be read by the list then:
This one is a first step to remove sampling_rate_max which makes IMO no sense to limit user space here (how/where should ondemand/sampling_rate_max sysfs file be marked deprecated so that it can be removed later?):

--- x/drivers/cpufreq/cpufreq_ondemand.c
+++ y/drivers/cpufreq/cpufreq_ondemand.c
@@ -45,7 +45,7 @@ static unsigned int def_sampling_rate;
                        (MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10))
 #define MIN_SAMPLING_RATE                      \
                        (def_sampling_rate / MIN_SAMPLING_RATE_RATIO)
-#define MAX_SAMPLING_RATE                      (500 * def_sampling_rate)
+#define MAX_SAMPLING_RATE                      ~0U
 #define DEF_SAMPLING_RATE_LATENCY_MULTIPLIER   (1000)
 #define TRANSITION_LATENCY_LIMIT               (10 * 1000 * 1000)


This one limits max default sampling rate. Latency might be wrongly exported from HW. Setting default sampling rate higher than 333ms by default does not makes sense (userspace daemons work with all known HW and also do not need to define higher values):

--- x/drivers/cpufreq/cpufreq_ondemand.c
+++ y/drivers/cpufreq/cpufreq_ondemand.c
@@ -39,6 +39,8 @@
  * All times here are in uS.
  */
 static unsigned int def_sampling_rate;
+/* Do not set sampling rate above 333 ms by default */
+#define MAX_DEF_SAMPLING_RATE                  (333 * 1000 * 1000)
 #define MIN_SAMPLING_RATE_RATIO                        (2)
 /* for correct statistics, we need at least 10 ticks between each measure */
 #define MIN_STAT_SAMPLING_RATE                         \
@@ -547,6 +549,9 @@ static int cpufreq_governor_dbs(struct c
                        if (def_sampling_rate < MIN_STAT_SAMPLING_RATE)
                                def_sampling_rate = MIN_STAT_SAMPLING_RATE;

+                       if (def_sampling_rate > MAX_DEF_SAMPLING_RATE)
+                               def_sampling_rate = MAX_DEF_SAMPLING_RATE;
+
                        dbs_tuners_ins.sampling_rate = def_sampling_rate;
                }
                dbs_timer_init(this_dbs_info);


---
Not compile tested suggestion patches.
Does/should this fix this issue?
Comment 12 Jim Bray 2009-01-19 18:10:05 UTC
(In reply to comment #9)
> For latest kernels:
> cpufreq is rather broken with tickless idle from what I saw.
> Venkatesh posted some patches on cpufreq recently (not included in .27-rcX
> (yet?)).
> This may be the reason for:
> > This problem has has been HUGELY WORSENED in kernel 2.6.25
> and nohz=off might help?
> 
  I ran into that:
http://bugzilla.kernel.org/show_bug.cgi?id=12310

  Venkatesh fixed it (patch works great) but it hasn't made it into .29 as of -rc2.

  BTW, I just did a little hacking in ondemand and elsewhere and came up with a proposal to treat SCHED_IDLE time as idle time so niced processes could up the cpu speed without having deep-background jobs like boinc and seti do it. I'm running this and it works as described. Comments appreciated. The other scheduling policies such as SCHED_BATCH might also merit special treatment.

http://bugzilla.kernel.org/show_bug.cgi?id=12482
Comment 13 Jim Bray 2009-01-19 18:50:32 UTC
  Just a quick thought: if the scheduler was aware of min, max and current cpu frequencies, it might be persuaded to schedule a designated cpufreq process whenever there was a great mismatch between load and cpu frequency, giving very rapid response and allowing slower polling to handle the gradual adjustments. While cluttering up the scheduler is undesirable, the information needed by cpufreq code comes from the scheduler and a case could be made for integrating cpu frequency control into the scheduler, in light of which adding just enough code to someplace like maybe find_idlest_cpu() or rebalancing code to know when the cpufreq needs checking might be considered permissible. 
Comment 14 Jim Bray 2009-02-03 18:15:22 UTC
Created attachment 20103 [details]
 Scheduler-driven very rapid increase in CPU speed in response to load spikes.

  The idea is that in sched.c::update_cpu_load(struct rq *this_rq),

/* 
+	 * If load is increasing steeply, go straight to max CPU speed.
+	 * Use the smoothest and the middle-smoothed loads for comparison.
+	 * Use the prio_to_weight table to guesstimate the weight of one
+	 * full-cpu-burning normal priority tasks for steepness threshold.
+	 */
+		
+	if ( min(this_load, this_rq->cpu_load[max(0,CPU_LOAD_IDX_MAX-3)]) > 
+		this_rq->cpu_load[max(0,CPU_LOAD_IDX_MAX-1)] + prio_to_weight[20] ) {

  This condition is a question of careful tuning, which I am still working on. If it is too sensitive, the cpu speed immediately jumps if one just moves the mouse a bit, which is nice, except that it tends to keep randomly jumping to high speed and spends too much time there. If it isn't sensitive enough, it does nothing. The present settings probably err on the conservative side, but when it works it makes a noticeable difference in responsiveness.

  Any polling approach that doesn't poll too rapidly will always fight the last war, jacking up the speed too late to be useful unless one is starting a batch job. An event-driven approach can take full advantage of what I gather is the 200 microsecond frequency switching time of recent AMD processors (I have AMD 64 X2). The result is a nice increase in speed for things that require a quick burst of CPU, such as starting any sort of GUI program that isn't disk-bound while starting.
Comment 15 Jim Bray 2009-02-13 17:36:30 UTC
Comment on attachment 20103 [details]
 Scheduler-driven very rapid increase in CPU speed in response to load spikes.

Update: I think Venkatesh did a little more tweaking to the patch he made in
response to http://bugzilla.kernel.org/show_bug.cgi?id=12310 . The original
patch caused ondemand to become very sluggish in response when there was a high
nice load. At first I thought this was intrinsic to ondemand and made the
fast-switching patch, then I thought it was caused by my SCHED_IDLE patch, but
29-rc5 just came out with the released version of the ondemand patch, and
ondemand is now very fast with or without a high nice load.
Comment 16 Thomas Renninger 2009-03-24 05:39:10 UTC
This got fixed for powernow-k8 in 2.6.29 by this commit:
732553e567c2700ba5b9bccc6ec885c75779a94b

Note You need to log in before you can comment on or make changes to this bug.