Bug 16417 - Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED
Summary: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED
Status: RESOLVED CODE_FIX
Alias: None
Product: Process Management
Classification: Unclassified
Component: Scheduler (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Ingo Molnar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-19 14:38 UTC by Pierre Bourdon
Modified: 2011-01-19 12:01 UTC (History)
0 users

See Also:
Kernel Version: 2.6.34.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Pierre Bourdon 2010-07-19 14:38:04 UTC
Hello,

We have been experiencing slow context switches using a large number of cgroups (around 600 groups) and CONFIG_FAIR_GROUP_SCHED. This causes a system time usage increase on context switching heavy processes (measured with pidstat -w) and a drop in timer interrupts handling.

This problem only appears on SMP : when booting with nosmp, the issue does not appear. From maxprocs=2 to maxprocs=8 we were able to reproduce it accurately.

Steps to reproduce :
- mount the cgroup filesystem in /dev/cgroup
- cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
- launch lat_ctx from lmbench, for instance ./lat_ctx -N 200 100

The results from lat_ctx were the following :
- SMP enabled, no cgroups : 2.65
- SMP enabled, 1000 cgroups : 3.40
- SMP enabled, 6000 cgroups : 3957.36
- SMP disabled, 6000 cgroups : 1.58

We can see that from a certain amount of cgroups, the context switching starts taking a lot of time. Another way to reproduce this problem :
- launch cat /dev/zero | pv -L 1G > /dev/null
- look at the CPU usage (about 40% here)
- cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
- look at the CPU usage (about 80% here)

Also note that when a lot of cgroups are present, the system is spending a lot of time in softirqs, and there are less timer interrupts handled than normally (according to our graphs).

Regards,
Pierre Bourdon
Comment 1 Andrew Morton 2010-07-22 22:53:32 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

sched suckage!  Do we have a linear search in there?


On Mon, 19 Jul 2010 14:38:09 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16417
> 
>            Summary: Slow context switches with SMP and
>                     CONFIG_FAIR_GROUP_SCHED
>            Product: Process Management
>            Version: 2.5
>     Kernel Version: 2.6.34.1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Scheduler
>         AssignedTo: mingo@elte.hu
>         ReportedBy: pbourdon@excellency.fr
>         Regression: No
> 
> 
> Hello,
> 
> We have been experiencing slow context switches using a large number of
> cgroups
> (around 600 groups) and CONFIG_FAIR_GROUP_SCHED. This causes a system time
> usage increase on context switching heavy processes (measured with pidstat
> -w)
> and a drop in timer interrupts handling.
> 
> This problem only appears on SMP : when booting with nosmp, the issue does
> not
> appear. From maxprocs=2 to maxprocs=8 we were able to reproduce it
> accurately.
> 
> Steps to reproduce :
> - mount the cgroup filesystem in /dev/cgroup
> - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
> - launch lat_ctx from lmbench, for instance ./lat_ctx -N 200 100
> 
> The results from lat_ctx were the following :
> - SMP enabled, no cgroups : 2.65
> - SMP enabled, 1000 cgroups : 3.40
> - SMP enabled, 6000 cgroups : 3957.36
> - SMP disabled, 6000 cgroups : 1.58
> 
> We can see that from a certain amount of cgroups, the context switching
> starts
> taking a lot of time. Another way to reproduce this problem :
> - launch cat /dev/zero | pv -L 1G > /dev/null
> - look at the CPU usage (about 40% here)
> - cd /dev/cgroup && for i in $(seq 1 5000); do mkdir test_group_$i; done
> - look at the CPU usage (about 80% here)
> 
> Also note that when a lot of cgroups are present, the system is spending a
> lot
> of time in softirqs, and there are less timer interrupts handled than
> normally
> (according to our graphs).
>
Comment 2 Pierre Bourdon 2010-08-02 11:26:34 UTC
On Mon, 02 Aug 2010 10:58:41 +0200, Peter Zijlstra <peterz@infradead.org>
wrote:
> Does: echo NO_LB_SHARES_UPDATE > /debug/sched_features
> (or wherever you mounted debugfs) help things?

It does not, sorry. Latency with lat_ctx is still high, and CPU usage with
cat | pv is still high too.

Regards,
Comment 3 Pierre Bourdon 2011-01-19 12:01:38 UTC
The bug seems to be fixed in 2.6.38-rc1, thanks a lot!

Note You need to log in before you can comment on or make changes to this bug.