Bug 96871 - /proc/loadavg inconsistent / run queue length nowhere in /proc?
Summary: /proc/loadavg inconsistent / run queue length nowhere in /proc?
Status: NEW
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-19 03:04 UTC by Gustavo Homem
Modified: 2015-04-19 03:04 UTC (History)
0 users

See Also:
Kernel Version: 2.6.32-358.el6.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
1m load average calculated with ps + shell scrip versus kernel 1m load average (60.04 KB, image/png)
2015-04-19 03:04 UTC, Gustavo Homem
Details

Description Gustavo Homem 2015-04-19 03:04:35 UTC
Created attachment 174411 [details]
1m load average calculated with ps + shell scrip versus kernel 1m load average

From "man proc" we have, regarding /proc/loadavg:

"The fourth field consists of two numbers separated by a slash (/).  The first  of  these  is  the number  of  currently runnable kernel scheduling entities (processes, threads)." 

I would expect the "currently runnable kernel scheduling entities (processes, threads)" mentioned in the manpage would mean the "active tasks" that represent the instant load, which in turn is used to calculate the 3 load averages.

However, the following output suggests otherwise:

$ while true;do cat /proc/loadavg ; sleep 0.5 ; done
32.20 15.46 12.17 2/459 10149
32.20 15.46 12.17 1/459 10151
32.20 15.46 12.17 1/459 10153
32.19 15.73 12.27 1/458 10155
32.19 15.73 12.27 1/458 10157
32.19 15.73 12.27 1/458 10159
32.19 15.73 12.27 1/458 10164
32.19 15.73 12.27 1/458 10166
32.19 15.73 12.27 1/458 10168
32.19 15.73 12.27 1/458 10170
32.19 15.73 12.27 1/458 10172
32.19 15.73 12.27 1/458 10174
32.17 16.00 12.38 1/458 10176
32.17 16.00 12.38 1/458 10178
32.17 16.00 12.38 1/458 10180

Seems that /proc/loadavg isn't actually providing the active tasks in the 4th field. 

We know that this system has, from time to time, bursts of processes that stay briefly in state D and cause the high load values displayed above. And the ps command seems to get it right:

$ while true;do  ps -eL h -o state | egrep "R|D" | wc -l ; sleep 0.5 ; done
15
19
17
18
17
17
17
17
16
12
13
10
11
11
13
15
25
21
21
21
20
21
19
15
15
15
15
15
15
14
14
10
9
9
9
9

We know that ps is "right" because we plotted the kernel load average against our own calculation, using the instant numbers from ps, and the values match perfectly. So /proc/loadavg must be wrong... or displaying something different.

Questions:

1- is there a problem with /proc/loadavg or is it displaying what's intended?
2- if so, what is /proc/loadavg displaying in the forth field?
3- is there a place in /proc where the real instant load, as obtained with ps, can be found? (/proc/stats doesn't work because procs_blocked doesn't include threads)

Note You need to log in before you can comment on or make changes to this bug.