12482 – Change cpufreq_ondemand to tread SCHED_IDLE time as idle time

Bug 12482 - Change cpufreq_ondemand to tread SCHED_IDLE time as idle time

Summary: Change cpufreq_ondemand to tread SCHED_IDLE time as idle time

Status:	CLOSED DOCUMENTED

Alias:	None

Product:	Power Management
Classification:	Unclassified
Component:	cpufreq (show other bugs)
Hardware:	All Linux

Importance:	P1 enhancement
Assignee:	cpufreq

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-01-18 19:54 UTC by Jim Bray
Modified:	2011-01-18 08:21 UTC (History)
CC List:	2 users (show)

See Also:
Kernel Version:	2.6.29-rc2
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
patch to kernel_stat.h, sched.c, cpufreq_ondemand.c (3.07 KB, patch) 2009-01-18 20:05 UTC, Jim Bray	Details \| Diff
patch-cpu-freq-sched-idle (3.08 KB, patch) 2009-02-03 17:47 UTC, Jim Bray	Details \| Diff
Very rapid cpu speed increase when load increases (4.09 KB, patch) 2009-02-03 17:51 UTC, Jim Bray	Details \| Diff
Show Obsolete (2) Add an attachment (proposed patch, testcase, etc.)

Description Jim Bray 2009-01-18 19:54:15 UTC

I run boinc-client, meaning I always have background jobs which are ioniced and running at SCHED_IDLE priority. They are also niced, so cpufreq_ondemand as it is ignores them if I set ignore_nice. The other day I was building a kernel, which I niced, and it occurred to me that I would like to be able to make processes niced below a certain level up the cpu speed. At first I tried that approach, turning ignore_nice into a nice-fence value, but this got very ugly, requiring looking at tasks and so forth, and I decided it was a bad idea. Then I realised I could get the effect I wanted by getting sched.c to keep separate track of SCHED_IDLE time and then treating that as nice time unconditionally, regardless of the ignore-nice flag, which I could then turn off and have my niced kernel builds speed up the cpu. Seems to work fine. I'll hope I can attach the patch after I initiate this enhancement suggestion.

 I hated having to touch anything in /kernel, but there seemed to be no other way around it, and having cpustat keep separate track of time in SCHED_IDLE might be useful for something else.

Comment 1 Jim Bray 2009-01-18 20:05:44 UTC

Created attachment 19883 [details]
patch to kernel_stat.h, sched.c, cpufreq_ondemand.c

  Been a long time since I did this, apologies if I screw things up. Hopefully the patch will attach. It seems to work (I'm running it as I write). The nice-level idea is nice, but would really require sched.c to keep track of time at different nice levels. cpufreq_ondemand.c really shouldn't be including sched.h and rooting around in scheduling data structures.
  If anyone out there actually wants their SCHED_IDLE tasks to jack up the cpu speed, I guess another sysfs variable would have to be added, but I'd be surprised if anyone does (switch it to SCHED_BATCH in that case would make sense).

Comment 2 Jim Bray 2009-01-19 08:44:16 UTC

I did some searching (boinc ignore_nice) and found there are some people who actually want boinc processes to run the cpu up to top speed, and were having problems with this because apparently gnome in its infinite wisdom normally automatically sets ignore_nice_load (something that in my opinion a GUI has no business doing) and of course keeps this preference in human-unreadable format.

I got curious about SCHED_BATCH and switched my boinc processes to it. The system seems as responsive, and setting ignore_nice_load to zero causes cpu to go to max freq if boinc processes are SCHED_BATCH. SCHED_IDLE is the ultimate nice: a strong case can be made for ignoring SCHED_IDLE time in cpufreq_ondemand, given that SCHED_BATCH is clearly more appropriate for bionc-type jobs if maximum niced performance is desired, and possibly even as the default, given that thruput increases of up to 300% are possible according to `man schedtool` (see below).

It would be trivial to extend my patch to keep track of SCHED_BATCH time and allow cpufreq_ondemand to treat that as desired, optionally adding variables to sysfs to control this.

SCHED_BATCH [ since 2.6.16 in mainline ] SCHED_BATCH was designed for non-interactive, CPU-bound applications. It uses longer timeslices (to better exploit the cache), but can be interrupted anytime by other processes in other classes to guaratee interaction of the system. Processes in this class are selected last but may result in a considerable speed-up (up to 300%).

SCHED_IDLEPRIO [ patch needed ] SCHED_IDLEPRIO is similar to SCHED_BATCH, but was explicitely designed to consume only the time the CPU is idle. No interactive boosting is done. If you used SCHED_BATCH in the -ck kernels this is what you want since 2.6.16

Comment 3 Dave Jones 2009-01-22 08:35:10 UTC

I recommend bringing this up for discussion on Linux-kernel. The scheduler developers will more likely weigh in there.

Comment 4 Jim Bray 2009-01-22 15:46:57 UTC

  I even had a wild notion of giving the scheduler some idea of the existence of the cpufreq code (see below). I was on lkml years and years ago, and had to get off, too much volume. Do I really need to join LKML, or is there another way route I might take that doesn't involve deleting hundreds of emails every day? Thanks for the comment, anyway. 

 From http://bugzilla.kernel.org/show_bug.cgi?id=4379:

 Just a quick thought: if the scheduler was aware of min, max and current cpu
frequencies, it might be persuaded to schedule a designated cpufreq process
whenever there was a great mismatch between load and cpu frequency, giving very
rapid response and allowing slower polling to handle the gradual adjustments.
While cluttering up the scheduler is undesirable, the information needed by
cpufreq code comes from the scheduler and a case could be made for integrating
cpu frequency control into the scheduler, in light of which adding just enough
code to someplace like maybe find_idlest_cpu() or rebalancing code to know when
the cpufreq needs checking might be considered permissible.

Comment 5 Jim Bray 2009-02-03 17:47:42 UTC

Created attachment 20099 [details]
patch-cpu-freq-sched-idle

This is the same as the obsolete patch except for cosmetics (changed the name of the #ifdef).

Comment 6 Jim Bray 2009-02-03 17:51:03 UTC

Created attachment 20100 [details]
Very rapid cpu speed increase when load increases

  This implements (not necessarily the Right Way) the idea I mentioned of having the scheduler kick the speed up when called for.

Comment 7 Jim Bray 2009-02-03 18:17:11 UTC

PS the rapid-speed-increase hack is cross-posted and explained in more detail here:

http://bugzilla.kernel.org/show_bug.cgi?id=4379

Comment 8 Jim Bray 2009-02-13 17:32:20 UTC

Comment on attachment 20100 [details]
Very rapid cpu speed increase when load increases

Update: I think Venkatesh did a little more tweaking to the patch he made in response to http://bugzilla.kernel.org/show_bug.cgi?id=12310 . The original patch caused ondemand to become very sluggish in response when there was a high nice load. At first I thought this was intrinsic to ondemand and made the fast-switching patch, then I thought it was caused by my SCHED_IDLE patch, but 29-rc5 just came out with the released version of the ondemand patch, and ondemand is now very fast with or without a high nice load.

Comment 9 Jim Bray 2009-03-23 13:14:19 UTC

  AFAIK, this has not been picked up, so closing it as "CODE_FIX" doesn't seem quite right. "WILL_NOT_FIX" would seem to be the thing if there is no interest in it. If the idea has been picked up, please let me know. I just checked RC8 and didn't see it there.

Comment 10 Robert Bradbury 2010-06-25 07:15:41 UTC

Comments on Gentoo Bug #287463 and Kernel Bugs #14066,  #14771 and #16072 may be related to this.  It is clear from my experience with cpufreq_ondemand and acpi-cpufreq since Linux 2.6.30 is that there are very few people who really understand the issues involved here.  The general tone from the "kernel" developers seems to be that cpufreq_ondemand (and indirectly p4-clockmod.c) are "deprecated" in favor of using acpi-cpufreq.  This is without taking into account that for acpi-cpufreq to work correctly it needs an Enhanced Intel Speedstep processor *and* a compatible ACPI BIOS with _PCT code(*) -- and there are a *lot* (presumably millions) of machines in the world for which this is not the case.

* This only applies to machines with Intel Pentium class processors presumably.

Comment 11 Len Brown 2011-01-18 08:20:48 UTC

re: comment #10
cpufreq has two parts, the policy governor, and the platform driver.
It needs one of each to function.

available governors include ondemand, performance, powersave, userspace
available drivers include acpi-cpufreq, p4-clockmod, and some amd-specific ones

ondemand is a fully supported governor
acpi-cpufreq is a fully supported driver

p4-clockmod is generally used by mistake when it is used,
and it generally does more harm than it does good.
The only reason that it hasn't been deleted from the source tree
is that some people use it to lower the temperature on celeron processors.

re: comment #9

proposals for enhancing the algorithms used in the ondemand
should be done on the list, rather than filed as bugs in bugzilla,
particularly proposals that involve scheduling classes etc.
that may have a variety of uses and require vetting from
a variety of experts (who will not see bugzilla entries).

So I'm going to close this sighting as DOCUMENTED.
Please submit your cool ideas to the cpufreq list.

Note You need to log in before you can comment on or make changes to this bug.