I have a bunch of servers with same hardware running different kernel revisions (trying to upgrade them gradually). What I've noticed is that latest revisions seem to have a much higher load average (see below). To be fair, the first machine is Fedora 14, but the last two are running exact same software (except for the kernel version) -- I don't think there should be this much performance impact. Linux hst624 2.6.35.14-96.fc14.x86_64 #1 SMP Thu Sep 1 11:59:56 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux top - 17:56:58 up 32 days, 19:15, 1 user, load average: 0.98, 0.95, 0.87 Tasks: 833 total, 7 running, 825 sleeping, 0 stopped, 1 zombie Cpu(s): 32.3%us, 13.4%sy, 0.0%ni, 52.3%id, 0.0%wa, 0.0%hi, 2.0%si, 0.0%st Mem: 132292864k total, 100893356k used, 31399508k free, 157460k buffers Swap: 8191996k total, 11804k used, 8180192k free, 19326420k cached Linux hst641 3.3.7-1.fc16.x86_64 #1 SMP Tue May 22 13:59:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux top - 17:57:38 up 15 min, 2 users, load average: 5.30, 4.07, 2.76 Tasks: 569 total, 9 running, 560 sleeping, 0 stopped, 0 zombie Cpu(s): 24.6%us, 10.0%sy, 0.0%ni, 61.8%id, 0.0%wa, 2.0%hi, 1.6%si, 0.0%st Mem: 132032624k total, 78058084k used, 53974540k free, 69580k buffers Swap: 8191996k total, 0k used, 8191996k free, 11310772k cached Linux hst623 3.4.6-1.fc16.x86_64 #1 SMP Fri Jul 20 12:58:04 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux top - 17:58:05 up 2:17, 2 users, load average: 24.82, 25.66, 26.34 Tasks: 576 total, 6 running, 570 sleeping, 0 stopped, 0 zombie Cpu(s): 25.6%us, 10.4%sy, 0.0%ni, 60.2%id, 0.0%wa, 2.2%hi, 1.6%si, 0.0%st Mem: 132032544k total, 91066796k used, 40965748k free, 84332k buffers Swap: 8191996k total, 0k used, 8191996k free, 11397696k cached
Created attachment 76721 [details] lspci -nnn -vvv
Version 3.4.2-1 still has load ~5-6, so the latest regression is between that and 3.4.6-1 Still checking other versions
Load from ~1 to ~6 seems to originate somewhere between 2.6.38.6-26.rc1.fc15 and 2.6.43.8-1.fc15 -- can't find easily installable versions inbetween those so far
After testing many different kernel versions, I'm more confused than ever. Same kernel version on F14 and F16 (2.6.35.14-97.fc14.x86_64) gives me very different results. The main culprit seems to be nginx process, which (according to sysprof) calls __close_nocancel -> kernel. In F14, it takes up ~5%, but in F16 (same kernel!) it takes up ~57% (most of which seems to be used on __mutex_lock_common.clone.5 35% and do_raw_spin_lock 13%) Load averages: f14: 4.70, 5.60, 5.95 f16: 36.99, 36.37, 32.72 Can anyone help?
*** This bug has been marked as a duplicate of bug 45001 ***
Excuse me, I saw that bug and it's not my case at all - my system is NOT idle and the load actually IS high, as referenced by top and sysprof
By looking at vmstat output, it seems in retrospect that old kernels had calculated the load average incorrectly (or differently). With 40 tasks running, load average was reported in the single digits. Latest kernel versions (3.6, at least) seems to average out number of tasks running over time period and reports that as load average.