Bug 23652 - procfs: /proc/<pid>/stat utime stops incrementing, stime starts incrementing
Summary: procfs: /proc/<pid>/stat utime stops incrementing, stime starts incrementing
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-23 18:23 UTC by Mark Stemm
Modified: 2012-08-14 13:21 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32-25-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Mark Stemm 2010-11-23 18:23:50 UTC
I noticed that for a cpu-bound test program I had written, getrusage(RUSAGE_SELF, ...)'s ru_utime would stop incrementing, reliably, after between 261 and 262 seconds of cpu time.

Here is a simple test program that exhibits this behavior:

#include <stdio.h>
#include <assert.h>
#include <sys/time.h>
#include <sys/resource.h>

int main(int argc, char *argv[])
{
    struct rusage before;
    int count = 0;

    while(1) {
        count++;
        if (count > 5000000) {
            count=0;
            assert(getrusage(RUSAGE_SELF, &before)==0);
            printf("*****Before=%d sec %d usec\n", before.ru_utime.tv_sec, before.ru_utime.tv_usec);
        }
    }
}

In the program's output, I see that ru_utime stops incrementing with sec=261, and usec=some variable number, for example:

*****Before=261 sec 796361 usec
*****Before=261 sec 804361 usec
*****Before=261 sec 812362 usec
*****Before=261 sec 820362 usec
*****Before=261 sec 828363 usec
*****Before=261 sec 836363 usec
*****Before=261 sec 844364 usec
*****Before=261 sec 844364 usec
*****Before=261 sec 844364 usec
*****Before=261 sec 844364 usec
*****Before=261 sec 844364 usec
*****Before=261 sec 844364 usec


When I run this program and periodically cat /proc/<pid>/stat, what I see is that utime stops incrementing and stime starts incrementing, for example:


$ while true; do sleep 1; cat /proc/5926/stat; done
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 25387 46 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 25487 46 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 25585 47 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 25684 48 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513720 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 25784 48 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513733 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 25884 48 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 25984 48 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513733 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26085 48 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 50 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 149 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 250 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513720 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 350 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513733 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 450 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513725 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 550 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513733 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 651 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513733 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 751 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513733 0 0 0 0 0 0 0 17 1 0 0 0 0 0
5926 (foo) R 936 5926 936 34833 5926 4194304 163 0 0 0 26184 852 0 0 20 0 1 0 43456557 1679360 91 4294967295 134512640 134514112 3213945872 3213945600 134513719 0 0 0 0 0 0 0 17 1 0 0 0 0 0

This is the only major program running, and it's not swapping or anything.

My kernel information is:

$ cat /proc/version
Linux version 2.6.32-25-generic (buildd@palmer) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #45-Ubuntu SMP Sat Oct 16 19:48:22 UTC 2010

This is not running as a virtual machine.

Let me know if you need more information, or if I'm misunderstanding the allocation between user and system time, or if I should be filing this bug with ubuntu instead.
Comment 1 Yill Din 2012-08-03 15:49:51 UTC
I have reproduced the same kind of problem with a Linux 2.6.34:
Linux controlblade 2.6.34.7-ELinOS-189-Noah #1 SMP PREEMPT Fri Aug 3 16:33:41 CEST 2012 i686 unknown

I have made a small program that is using both user and system CPU. After several seconds, the CPU user time is stalled to 0 and the CPU system is counting both the time spent by system and user:
Loop 71, at 1344008485, CPU user time=414100 (delta=5600), system time=457100 (delta=6500), delta sum=12100
Loop 72, at 1344008486, CPU user time=419800 (delta=5700), system time=463600 (delta=6500), delta sum=12200
Loop 73, at 1344008487, CPU user time=425600 (delta=5800), system time=469900 (delta=6300), delta sum=12100
Loop 74, at 1344008488, CPU user time=431100 (delta=5500), system time=476500 (delta=6600), delta sum=12100
Loop 75, at 1344008489, CPU user time=437100 (delta=6000), system time=482700 (delta=6200), delta sum=12200
Loop 76, at 1344008491, CPU user time=442800 (delta=5700), system time=489100 (delta=6400), delta sum=12100
Loop 77, at 1344008492, CPU user time=448600 (delta=5800), system time=495500 (delta=6400), delta sum=12200
Loop 78, at 1344008493, CPU user time=448600 (delta=0), system time=507600 (delta=12100), delta sum=12100
Loop 79, at 1344008494, CPU user time=448600 (delta=0), system time=519800 (delta=12200), delta sum=12200
Loop 80, at 1344008496, CPU user time=448600 (delta=0), system time=531900 (delta=12100), delta sum=12100

So to be more precised, it seems that after a certain period (or may be when CPU user time reaches a certain value), the CPU user time is counted as system time instead of user time.
Comment 2 Alan 2012-08-14 13:21:48 UTC
Not reproducable on modern kernels with the code example - corrected for the warnings (gcc -Wall)

Note You need to log in before you can comment on or make changes to this bug.