Bug 10361

Summary: Compiling with CONFIG_RT_GROUP_SCHED breaks pam limits.conf rtprio assignment
Product: Process Management Reporter: Viktor Radnai (viktor.radnai)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: RESOLVED OBSOLETE    
Severity: normal CC: a.p.zijlstra, aicacaten, alan, birthdaystock, viktor.radnai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6..32 Subsystem:
Regression: No Bisected commit-id:

Description Viktor Radnai 2008-03-30 03:06:27 UTC
Latest working kernel version: 2.6.24
Earliest failing kernel version: 2.6.25
Distribution: Ubuntu (probably all)
Hardware Environment: amd64 (probably all)
Software Environment:
Problem Description: Compiling a kernel with CONFIG_RT_GROUP_SCHED enabled will break realtime priority assignment (for users other than root) using pam and setting limits in limits.conf. This is currently mostly used by people who would like to run audio software with realtime priorities, and this breaks the "best practice" method with no immediately obvious explanation (sched_setscheduler returns -EPERM aka Permission Denied -- I had to read the sched_setscheduler code to find out what I'm doing wrong). 

The offending code is from line 4573 of kernel/sched.c (on 2.6.25-rc7):
#ifdef CONFIG_RT_GROUP_SCHED
        /*
         * Do not allow realtime tasks into groups that have no runtime
         * assigned.
         */
        if (rt_policy(policy) && task_group(p)->rt_runtime == 0)
                return -EPERM;
#endif

Some googling tells me that this was committed by Peter Zijlstra <a.p.zijlstra@chello.nl>, I'm attempting to CC him as well.


Steps to reproduce:
1. create "audio" group and make test user a member
2. echo "@audio          hard    rtprio          99" >> /etc/security/limits.conf
3. log in with test user
4. run chrt -v -r 20 bash
5. result: success -- you get a shell with realtime priority

6. Compile new kernel with CONFIG_GROUP_SCHED=y and CONFIG_RT_GROUP_SCHED=y
7. boot new kernel
8. log in with test user
9. run chrt -v -r 20 bash
10. result: failure -- you get "permission denied" with no indication to what you did wrong
Comment 1 Peter Zijlstra 2008-03-30 04:27:28 UTC
This is _NOT_ a bug but expected behaviour, one you asked for by enabling
RT group scheduling.

RT group scheduling means you have to assign a bandwidth to the group before it will accept tasks.

By default all bandwidth is assigned to the root group, if you want to assign bandwidth to another group, reduce the root group's bandwidth and assign some or all of the difference to another group.

As RT scheduling is all about determinism, a group has to be able to rely on the amount of bandwidth being constant, hence the kernel cannot change this for you when the group configuration changes - you really have to do this yourself.
Comment 2 Viktor Radnai 2008-03-30 14:49:39 UTC
Hi,

OK, thanks for making that clear. I know that this is a new feature in 2.6.25 but I haven't found any explanation to the strange behaviour (that took me several days to debug) so I decided to report it. 

I am still a bit concerned about breaking (or modifying, I should say) functionality with only a generic error (EPERM). Maybe documentation on group scheduling will help with this, but I would have appreciated some kind of clue (dmesg, whatever) on what the "Permission denied" meant in this case (even if you read the source sched_setscheduler has five EPERM conditions). I can imagine that this would really stump someone who gets a kernel with this feature enabled by their distribution without knowing about it.

Can you please think of some way to protect the innocent (and the clueless), some way to make it more obvious to the user what's wrong?

Thanks in advance.

Regards,
Vik
Comment 3 Peter Zijlstra 2008-03-31 03:11:22 UTC
On Sun, 2008-03-30 at 14:49 -0700, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=10361
> 
> 
> 
> 
> 
> ------- Comment #2 from viktor.radnai@gmail.com  2008-03-30 14:49 -------
> Hi,
> 
> OK, thanks for making that clear. I know that this is a new feature in 2.6.25
> but I haven't found any explanation to the strange behaviour (that took me
> several days to debug) so I decided to report it. 

Documentation/sched-rt-group.txt

> I am still a bit concerned about breaking (or modifying, I should say)
> functionality with only a generic error (EPERM). Maybe documentation on group
> scheduling will help with this, but I would have appreciated some kind of
> clue
> (dmesg, whatever) on what the "Permission denied" meant in this case (even if
> you read the source sched_setscheduler has five EPERM conditions). I can
> imagine that this would really stump someone who gets a kernel with this
> feature enabled by their distribution without knowing about it.
> 
> Can you please think of some way to protect the innocent (and the clueless),
> some way to make it more obvious to the user what's wrong?

I see your point, however I'm not sure dmesg is the correct way; we'd
set a precedent and eventually end up explaining every failing syscall.

What we need is a better error value; one that signifies failure due to
lack of resources, something like -ENOSPC and -ENOMEM. Perhaps -EBUSY
can be used to signify the lack of CPU resources?
Comment 4 Viktor Radnai 2008-03-31 07:44:24 UTC
> Documentation/sched-rt-group.txt
Yes, I read that before filing the bug and it still didn't make sense at the time. It makes more sense now, but it's still not clear how to enable it (it's lack of background knowledge on my part). Anyway, this is just a matter of extending the documentation (which I would like to help with, after I tried out this feature), and it doesn't belong into Bugzilla. Oh, a very important realisation hit me just now. As a sysadmin looking after Java stuff, I'm mentally conditioned to understand "runtime" to be something totally different from what you mean (run + time ie the time the process group has been given to run). Much of my confusion has been caused by this :)

> I see your point, however I'm not sure dmesg is the correct way; we'd
> set a precedent and eventually end up explaining every failing syscall.
>
> What we need is a better error value; one that signifies failure due to
> lack of resources, something like -ENOSPC and -ENOMEM. Perhaps -EBUSY
> can be used to signify the lack of CPU resources?

I agree with both.

I actually considered other error codes myself, but couldn't find anything more appropriate in include/asm-generic/errno-base.h at the time that was better than your choice. I was hoping that those smarter than me would come up with a good way to notify userspace in a meaningful manner :)

Now also looking through include/asm-generic/errno.h, EDQUOT grabbed my attention, but that explicitly means "disk quota", not *any* quota. I suppose all the above mean  that some resource has ran out, but none of them are applicable to this case.  But I need to learn more about realtime process groups before I can suggest a more meaningful alternative. Is there a chance of getting some new error codes added to include/asm-generic/errno.h? (I can see a bunch of subsystem-specific ones in there already)
Comment 5 Alan 2010-01-19 17:17:22 UTC
Still present - did anyone decide whether to change it or close it ?
Comment 6 Anonymous Emailer 2010-01-19 18:43:48 UTC
Reply-To: peterz@infradead.org

On Tue, 2010-01-19 at 17:17 +0000, bugzilla-daemon@bugzilla.kernel.org 
> --- Comment #5 from Alan <alan@lxorguk.ukuu.org.uk>  2010-01-19 17:17:22 ---
> Still present - did anyone decide whether to change it or close it ?

I think it mostly depends on CONFIG_USER_SCHED, which places each user
in a separate group, but since that code is depricated and will
hopefully soon go away this issue should go away too.
Comment 7 Alan 2012-05-18 10:41:47 UTC
Please re-open against a current kernel if this is still a problem