Bug 45471

Summary: Increasing load average over several kernel revisions
Product: Process Management Reporter: Konstantin Svist (fry.kun)
Component: OtherAssignee: process_other
Status: RESOLVED INVALID    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id:
Attachments: lspci -nnn -vvv

Description Konstantin Svist 2012-08-03 01:08:12 UTC
I have a bunch of servers with same hardware running different kernel revisions (trying to upgrade them gradually). What I've noticed is that latest revisions seem to have a much higher load average (see below).
To be fair, the first machine is Fedora 14, but the last two are running exact same software (except for the kernel version) -- I don't think there should be this much performance impact.


Linux hst624 2.6.35.14-96.fc14.x86_64 #1 SMP Thu Sep 1 11:59:56 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux


top - 17:56:58 up 32 days, 19:15,  1 user,  load average: 0.98, 0.95, 0.87
Tasks: 833 total,   7 running, 825 sleeping,   0 stopped,   1 zombie
Cpu(s): 32.3%us, 13.4%sy,  0.0%ni, 52.3%id,  0.0%wa,  0.0%hi,  2.0%si,  0.0%st
Mem:  132292864k total, 100893356k used, 31399508k free,   157460k buffers
Swap:  8191996k total,    11804k used,  8180192k free, 19326420k cached


Linux hst641 3.3.7-1.fc16.x86_64 #1 SMP Tue May 22 13:59:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

top - 17:57:38 up 15 min,  2 users,  load average: 5.30, 4.07, 2.76
Tasks: 569 total,   9 running, 560 sleeping,   0 stopped,   0 zombie
Cpu(s): 24.6%us, 10.0%sy,  0.0%ni, 61.8%id,  0.0%wa,  2.0%hi,  1.6%si,  0.0%st
Mem:  132032624k total, 78058084k used, 53974540k free,    69580k buffers
Swap:  8191996k total,        0k used,  8191996k free, 11310772k cached


Linux hst623 3.4.6-1.fc16.x86_64 #1 SMP Fri Jul 20 12:58:04 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

top - 17:58:05 up  2:17,  2 users,  load average: 24.82, 25.66, 26.34
Tasks: 576 total,   6 running, 570 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.6%us, 10.4%sy,  0.0%ni, 60.2%id,  0.0%wa,  2.2%hi,  1.6%si,  0.0%st
Mem:  132032544k total, 91066796k used, 40965748k free,    84332k buffers
Swap:  8191996k total,        0k used,  8191996k free, 11397696k cached
Comment 1 Konstantin Svist 2012-08-03 02:00:17 UTC
Created attachment 76721 [details]
lspci -nnn -vvv
Comment 2 Konstantin Svist 2012-08-03 20:41:33 UTC
Version 3.4.2-1 still has load ~5-6, so the latest regression is between that and 3.4.6-1
Still checking other versions
Comment 3 Konstantin Svist 2012-08-03 22:43:54 UTC
Load from ~1 to ~6 seems to originate somewhere between 2.6.38.6-26.rc1.fc15 and 2.6.43.8-1.fc15 -- can't find easily installable versions inbetween those so far
Comment 4 Konstantin Svist 2012-08-08 01:42:39 UTC
After testing many different kernel versions, I'm more confused than ever.

Same kernel version on F14 and F16 (2.6.35.14-97.fc14.x86_64) gives me very different results. The main culprit seems to be nginx process, which (according to sysprof) calls __close_nocancel -> kernel. In F14, it takes up ~5%, but in F16 (same kernel!) it takes up ~57% (most of which seems to be used on __mutex_lock_common.clone.5 35% and do_raw_spin_lock 13%)

Load averages:
f14: 4.70, 5.60, 5.95
f16: 36.99, 36.37, 32.72

Can anyone help?
Comment 5 Alan 2012-08-09 13:33:06 UTC

*** This bug has been marked as a duplicate of bug 45001 ***
Comment 6 Konstantin Svist 2012-08-09 16:39:58 UTC
Excuse me, I saw that bug and it's not my case at all - my system is NOT idle and the load actually IS high, as referenced by top and sysprof
Comment 7 Konstantin Svist 2013-01-22 20:02:13 UTC
By looking at vmstat output, it seems in retrospect that old kernels had calculated the load average incorrectly (or differently). With 40 tasks running, load average was reported in the single digits.
Latest kernel versions (3.6, at least) seems to average out number of tasks running over time period and reports that as load average.