Bug 42789

Summary: cpuset cgroup: when a CPU goes offline, it is removed from all cgroup's cpuset.cpus, but when it comes online, it is only restored to the root cpuset.cpus
Product: Process Management Reporter: bugs-kernel (bugs-kernel.8eaf7cd8e5128d8191fe)
Component: OtherAssignee: process_other
Status: NEEDINFO ---    
Severity: normal CC: alan, spartacus06, srivatsa
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: v3.2.6 Subsystem:
Regression: No Bisected commit-id:

Description bugs-kernel@spamgourmet.com 2012-02-17 02:15:55 UTC
This was first noticed when my containers kept switching to use only CPU #0 after hibernation.  This is present in vanilla v3.2.6, the latest released 3.2.x patch as of this writing.  A minimal test case follows (demonstrated on a 2 CPU system, but also affects larger systems):

# cd /sys/fs
# mount -o cpuset -t cgroup cgroup cgroup
# cd cgroup
## Prepare a child cgroup.
# mkdir sub
# cat cpuset.cpus > sub/cpuset.cpus
# cat cpuset.mems > sub/cpuset.mems
# cat cpuset.cpus
0-1
# cat sub/cpuset.cpus
0-1
## Disable CPU #1.
# echo 0 > /sys/devices/system/cpu/cpu1/online
## OK, now CPU #1 is disallowed for both cgroups.
# cat cpuset.cpus
0
# cat sub/cpuset.cpus
0
## Enable CPU #1.
# echo 1 > /sys/devices/system/cpu/cpu1/online
# cat cpuset.cpus
0-1
# cat sub/cpuset.cpus
0

Everything is fine until this last step.  After CPU #1 is brought online, the child cgroup remains bound to use only CPU #0, but the parent cgroup returns to being able to use both CPU #0 and CPU #1.  As a result, all processes in the child cgroup are restricted to using only CPU #0.  The reproduction was done on a system booted to an initramfs prompt, so that no background processes would be present that could perform concurrent cgroup modifications.
Comment 1 bugs-kernel@spamgourmet.com 2012-03-10 05:36:41 UTC
This was independently reported by Prashanth Nageshappa <prashanth@linux.vnet.ibm.com> and fixed in commit 8f2f748b0656257153bcf0941df8d6060acc5ca6, but subsequently reverted by Linus as commit 4293f20c19f44ca66e5ac836b411d25e14b9f185.  According to his commit, the fix caused regressions elsewhere.
Comment 2 Seth Jennings 2012-05-07 15:53:27 UTC
There are bugs in RedHat and Ubuntu tracking this issue as it impacts libvirt:

https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/993354

https://bugzilla.redhat.com/show_bug.cgi?id=714271

In this particular case, the effect is that all VM vcpus are pinned to the boot cpu (cpu 0) because the cpusets aren't maintained properly during suspend/resume which, as I understand, uses cpu hotplug.

This thread has a userspace workaround:

https://www.redhat.com/archives/libvir-list/2012-April/msg00777.html

This is a pretty significant kernel issue.  I can see a lot of people hitting this now that the distros are using versions of libvirt that utilize cgroups.
Comment 3 Srivatsa S. Bhat 2012-05-08 06:49:03 UTC
Hi,

Recently, I posted a new set of patches to fix this issue in the kernel.

Link: http://thread.gmane.org/gmane.linux.documentation/4805

The discussion is still going on around that patchset...

Regards,
Srivatsa S. Bhat
Comment 4 Srivatsa S. Bhat 2012-05-08 06:53:57 UTC
By the way, AFAICT, Thomas Gleixner's efforts to rework CPU hotplug is not aimed at fixing this cpuset issue. His work addresses a very different problem (cleaning up duplication of cpu hotplug code in arch/ implementations, and possibly improving the performance to cpu hotplug itself).
Comment 5 Alan 2012-08-30 14:43:20 UTC
Is this now resolved ?