Kernel Bug Tracker – Bug 9135
top displaying 9999% CPU usage
Last modified: 2008-08-21 13:05:05 UTC
References : http://lkml.org/lkml/2007/10/3/123
Submitter : Frans Pop <email@example.com>
Handled-By : Chuck Ebbert <firstname.lastname@example.org>
Christian Borntraeger <email@example.com>
Patch have been proposed: http://lkml.org/lkml/2007/10/4/389
Another patch has been proposed: http://lkml.org/lkml/2007/10/4/405
The second proposed patch has been shown to not completely fix the issue:
The first proposed patch is basically a reversion to 2.6.22 behavior and has been tested to fix the regression, although it may not be the best solution in the long run as it reduces accuracy:
Fix committed in Linus' git tree for 2.6.24 with following two commits.
Author: Peter Zijlstra <firstname.lastname@example.org>
sched: keep utime/stime monotonic
Author: Balbir Singh <email@example.com>
sched: fix /proc/<PID>/stat stime/utime monotonicity, part 2
Currently waiting for stable team to respond to request to include these changes in next point release.
Unfortunately commits 73a2bcb0edb9ffb0b007b3546b430e2c6e415eee and 9301899be75b464ef097f0b5af7af6d9bd8f68a7 do not appear to completely fix this bug. I am still seeing this, at least in certain circumstances.
I am seeing this behavior with all 2.6.23+ kernels, including 220.127.116.11 and 2.6.27-rc4 on x86_64 SMP Xeon, and Opteron machines. I am able to replicate this reliably by running apache with the worker MPM, and running a simple apache benchmark, for example:
ab -r -n 100000 -c 1000 http://localhost/
watching stime and utime for the apache children I see both jumping around a bit and see decreases quite frequently.
The first patch from this bug, which reverts to the 2.6.22 behavior, appears to correct the problem.
I have been able to reliably duplicate this bug using x86_64 defconfig on all 2.6.23+ kernels. I have also tried playing with many timer and clock related config options (with and without NO_HZ, HIGH_RES_TIMERS, different RTC options, etc), I have tried changing clocksource (tsc, hpet, jiffies), but with no success. I never see any complaints about the clock or timer, even when booting with report_lost_ticks.
I am unable to duplicate this when running with nosmp, and I have only seen it on threaded processes, however am unable to say for sure that this only occurs on SMP and threaded processes, or if my testing is simply insufficient to replicate it.