Bug 9135 - top displaying 9999% CPU usage
Summary: top displaying 9999% CPU usage
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Process Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: process_other
URL:
Keywords:
Depends on:
Blocks: 9056
  Show dependency tree
 
Reported: 2007-10-08 12:46 UTC by Rafael J. Wysocki
Modified: 2008-08-21 13:05 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.23-rc9
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Rafael J. Wysocki 2007-10-08 12:46:23 UTC
References      : http://lkml.org/lkml/2007/10/3/123
Submitter       : Frans Pop <elendil@planet.nl>
Handled-By      : Chuck Ebbert <cebbert@redhat.com>
                  Christian Borntraeger <borntraeger@de.ibm.com>
Comment 1 Rafael J. Wysocki 2007-10-08 12:47:11 UTC
Patch have been proposed: http://lkml.org/lkml/2007/10/4/389
Comment 2 Rafael J. Wysocki 2007-10-08 12:47:56 UTC
Another patch has been proposed: http://lkml.org/lkml/2007/10/4/405
Comment 3 Frans Pop 2007-10-12 13:22:49 UTC
The second proposed patch has been shown to not completely fix the issue:
http://lkml.org/lkml/2007/10/5/128

The first proposed patch is basically a reversion to 2.6.22 behavior and has been tested to fix the regression, although it may not be the best solution in the long run as it reduces accuracy:
http://lkml.org/lkml/2007/10/5/150
Comment 4 Frans Pop 2007-11-05 14:02:13 UTC
Fix committed in Linus' git tree for 2.6.24 with following two commits.

commit 73a2bcb0edb9ffb0b007b3546b430e2c6e415eee
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
    sched: keep utime/stime monotonic

commit 9301899be75b464ef097f0b5af7af6d9bd8f68a7
Author: Balbir Singh <balbir@linux.vnet.ibm.com>
    sched: fix /proc/<PID>/stat stime/utime monotonicity, part 2

Currently waiting for stable team to respond to request to include these changes in next point release.
Comment 5 Spencer Candland 2008-08-21 13:05:05 UTC
Unfortunately commits 73a2bcb0edb9ffb0b007b3546b430e2c6e415eee and 9301899be75b464ef097f0b5af7af6d9bd8f68a7 do not appear to completely fix this bug.  I am still seeing this, at least in certain circumstances.

I am seeing this behavior with all 2.6.23+ kernels, including 2.6.26.3 and 2.6.27-rc4 on x86_64 SMP Xeon, and Opteron machines.  I am able to replicate this reliably by running apache with the worker MPM, and running a simple apache benchmark, for example:

ab -r -n 100000 -c 1000 http://localhost/

watching stime and utime for the apache children I see both jumping around a bit and see decreases quite frequently.

The first patch from this bug, which reverts to the 2.6.22 behavior, appears to correct the problem.

I have been able to reliably duplicate this bug using x86_64 defconfig on all 2.6.23+ kernels.  I have also tried playing with many timer and clock related config options (with and without NO_HZ, HIGH_RES_TIMERS, different RTC options, etc), I have tried changing clocksource (tsc, hpet, jiffies), but with no success.  I never see any complaints about the clock or timer, even when booting with report_lost_ticks.

I am unable to duplicate this when running with nosmp, and I have only seen it on threaded processes, however  am unable to say for sure that this only occurs on SMP and threaded processes, or if my testing is simply insufficient to replicate it.

Note You need to log in before you can comment on or make changes to this bug.