Bug 100771 - When activating ignore_nice_load with governor ondemand performance drops for normal processes.
Summary: When activating ignore_nice_load with governor ondemand performance drops for...
Status: CLOSED DOCUMENTED
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Chen Yu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-01 16:24 UTC by Carlos Alberto Lopez Perez
Modified: 2016-05-16 21:54 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Carlos Alberto Lopez Perez 2015-07-01 16:24:58 UTC
1) Lets's monitorize the CPU frequencies:

$ watch "grep 'cpu MHz' /proc/cpuinfo"


2) Set cpufreq governor to ondemand, and enable ignore_nice_load

$ echo 1 |sudo tee /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load


3) Launch a bunch of process to saturate the CPU at normal nice.

$ openssl speed -multi $(( $(nproc) * 4 ))


4) Check the frequency of the CPU. Everything ok (frequency is at the maximum)


5) Now launch in another terminal the previous command but with nice.

$ nice -n1 openssl speed -multi $(( $(nproc) * 4 ))


6) Check how the frequency goes down. That shouldn't happen because there
are process with normal priority waiting, IMHO. Therefore the performance
has been dropped for the normal processes because some unrelated processes
with low priority took over the processors at lower frequency. The frequency
shouldn't be lowered when there are normal process waiting, otherwise we
are incrementing the waiting time because we don't get rid of the nice processes
ASAP.
Comment 1 Carlos Alberto Lopez Perez 2015-07-01 16:37:15 UTC
I have just done the following benchmark:

1) Disable ignore_nice_load and launch at the same time the following processes:
a -> openssl speed -multi 32 ecdsap521
b -> nice -n1 openssl speed -multi 32 ecdsap521

I got the following performance numbers.

a->
                               sign    verify    sign/s verify/s
 521 bit ecdsa (nistp521)   0.0002s   0.0004s   4136.8   2261.3

b->
                              sign    verify    sign/s verify/s
 521 bit ecdsa (nistp521)   0.0003s   0.0005s   3164.1   1907.4



2) Enable ignore_nice_load and repeat the experiment. Then I get:

a->
                              sign    verify    sign/s verify/s
 521 bit ecdsa (nistp521)   0.0003s   0.0005s   3885.3   1841.9

b->
                              sign    verify    sign/s verify/s
 521 bit ecdsa (nistp521)   0.0003s   0.0007s   2989.4   1499.2



As you can see the performance for the normal process (a) has been dropped because of the unrelated process with nice (b).
Comment 2 Aaron Lu 2015-07-08 05:58:31 UTC
Looks like a cpufreq issue to me, move there.
Comment 3 Chen Yu 2015-12-16 15:11:33 UTC
This is a normal behavior IMO. 

1. task_b with nice = 1 does not mean task_a will always occupy the time slice, it is CFS scheduler, which means even task with nice = 19 will get chance to run.
2.When you enable ignore_nice_load, the task_b with nice = 1 will be considered to be idle, which means that, the total time task_b is running, will be added to current cpu's idle time, which will cause the cpufreq governor to treat the cpu load lower than when ignore_nice_load disabled.

Yu
Comment 4 calvaris 2015-12-16 16:26:43 UTC
(In reply to Chen Yu from comment #3)
> 2.When you enable ignore_nice_load, the task_b with nice = 1 will be
> considered to be idle, which means that, the total time task_b is running,
> will be added to current cpu's idle time, which will cause the cpufreq
> governor to treat the cpu load lower than when ignore_nice_load disabled.

I understand the rationale you explain if we apply it only to idle tasks. What I don't agree with is that tasks with idle priority can slow down tasks with normal priority.
Comment 5 Chen Yu 2015-12-17 02:27:35 UTC
(In reply to calvaris from comment #4)
> (In reply to Chen Yu from comment #3)
> > 2.When you enable ignore_nice_load, the task_b with nice = 1 will be
> > considered to be idle, which means that, the total time task_b is running,
> > will be added to current cpu's idle time, which will cause the cpufreq
> > governor to treat the cpu load lower than when ignore_nice_load disabled.
> 
> I understand the rationale you explain if we apply it only to idle tasks.
> What I don't agree with is that tasks with idle priority can slow down tasks
> with normal priority.
Do you mean idle tasks = task_b in above context? Why task_b slows down the normal priority task is because that, the cpufreq decrease due to task_b's high nice value, and task_a is scheduled on the same cpu.
Comment 6 calvaris 2015-12-17 08:33:20 UTC
(In reply to Chen Yu from comment #5)
> Do you mean idle tasks = task_b in above context? Why task_b slows down the
> normal priority task is because that, the cpufreq decrease due to task_b's
> high nice value, and task_a is scheduled on the same cpu.

Yes, I understand that it's what's happenning and I think that is what shouldn't happen because, IMHO, it doesn't make sense. I think it should be exactly the opposite, as an idle task is scheduled on the same cpu as a higher priority one, it should be run at the frequency of the higher one.
Comment 7 Chen Yu 2015-12-17 09:09:55 UTC
(In reply to calvaris from comment #6)
> (In reply to Chen Yu from comment #5)
> > Do you mean idle tasks = task_b in above context? Why task_b slows down the
> > normal priority task is because that, the cpufreq decrease due to task_b's
> > high nice value, and task_a is scheduled on the same cpu.
> 
> Yes, I understand that it's what's happenning and I think that is what
> shouldn't happen because, IMHO, it doesn't make sense. I think it should be
> exactly the opposite, as an idle task is scheduled on the same cpu as a
> higher priority one, it should be run at the frequency of the higher one.

Well, currently the cpufreq framework is based on the total load of one CPU, but not on single task/or a group of tasks. According to your requirement, it looks like a 'cgroup' cpufreq scheduling. It would be of another new semantic IMO.
Comment 8 Carlos Alberto Lopez Perez 2015-12-23 13:27:49 UTC
Zhang Rui: could you clarify why you are closing this as invalid without even giving an explanation?


I'm commenting below to Chen explanation:

(In reply to Chen Yu from comment #3)
> This is a normal behavior IMO. 
> 
> 1. task_b with nice = 1 does not mean task_a will always occupy the time
> slice, it is CFS scheduler, which means even task with nice = 19 will get
> chance to run.
> 2.When you enable ignore_nice_load, the task_b with nice = 1 will be
> considered to be idle, which means that, the total time task_b is running,
> will be added to current cpu's idle time, which will cause the cpufreq
> governor to treat the cpu load lower than when ignore_nice_load disabled.
> 

Could it be possible to change the behaviour of point 2 to: 

2. When you enable ignore_nice_load, the task_b with nice = 1 will be considered to be idle **only if there are no tasks with nice <= 0 scheduled on the same CPU?** ....

???
Comment 9 Chen Yu 2015-12-23 16:13:22 UTC
(In reply to Carlos Alberto Lopez Perez from comment #8)
> Zhang Rui: could you clarify why you are closing this as invalid without
> even giving an explanation?
> 
> 
> I'm commenting below to Chen explanation:
> 
> (In reply to Chen Yu from comment #3)
> > This is a normal behavior IMO. 
> > 
> > 1. task_b with nice = 1 does not mean task_a will always occupy the time
> > slice, it is CFS scheduler, which means even task with nice = 19 will get
> > chance to run.
> > 2.When you enable ignore_nice_load, the task_b with nice = 1 will be
> > considered to be idle, which means that, the total time task_b is running,
> > will be added to current cpu's idle time, which will cause the cpufreq
> > governor to treat the cpu load lower than when ignore_nice_load disabled.
> > 
> 
> Could it be possible to change the behaviour of point 2 to: 
> 
> 2. When you enable ignore_nice_load, the task_b with nice = 1 will be
> considered to be idle **only if there are no tasks with nice <= 0 scheduled
> on the same CPU?** ....
> 
> ???
NACK, it might break the original meaning of ignore_nice_load, and impact other systems,  I don't think it would get through in community. As a matter of fact, if ignore_nice_load is disabled, task_b with nice=1 seldom has chance to run,  so the final cpufreq is mainly determined by task_a with nice=0, and it looks like task_b has been ignored already, - which is nearly what you required.

But if you really want to completely ignore specific task_b, you might need a new entry like /proc/task_b_pid/ignore_cpuload, if enabled, this task's running time will be excluded in both busy-time and sample-period, - it looks like this task has never been scheduled. But this feature might need big change to current code, and it is not related to ignore_nice_load.

So I once suggested Rui to close current thread, and if you can open a new one if you wants the task-based cpufreq governor.
Comment 10 Zhang Rui 2015-12-28 07:23:33 UTC
(In reply to Carlos Alberto Lopez Perez from comment #8)
> Zhang Rui: could you clarify why you are closing this as invalid without
> even giving an explanation?
> 
I closed this bug based on Yu's offline feedback.
Comment 11 Chen Yu 2015-12-29 01:49:00 UTC
Hi, Carlos and calvaris,
I'm closing this thread as the reason mentioned in #Comment 9,
if you want this 'exlusive cpufreq governor' please send an email to 
linux-pm@vger.kernel.org and CCed  rjw@rjwysocki.net, let's wait for the feedback from community. or you can wait for one or two weeks after the holidays and I'll sync with the powermanage maintainers on our internal meeting.
thanks,
yu

Note You need to log in before you can comment on or make changes to this bug.