Bug 13750

Summary: Load average flatlines after returning from hibernate
Product: Power Management Reporter: Duncan (1i5t5.duncan)
Component: Hibernation/SuspendAssignee: power-management_other
Status: CLOSED CODE_FIX    
Severity: normal CC: rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30-5659-g300df7d Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13615    
Attachments: kernel config (2.6.31-rc2 version)

Description Duncan 2009-07-09 15:14:27 UTC
I haven't yet bisected this, as I did a bug search but don't know if anyone posted it on LKML.  If it's a new bug, tell me and I'll bisect.

I first tried the 2.6.31 tree pre-rc1, with 2.6.30-5659-g300df7d.  From that point on, thru 2.6.31-rc2-214-g34f2547 at least, everything works well until first hibernate.  Returning from hibernate, load average drops toward zero as the pre-hibernate average scrolls out of the window, until it's zero.  There it stays.

I first noticed this recompiling the kernel after my next git pull a few days later, as I was using makeopts of "-j -l25" or some such, and of course with zeroed load average and no job limit apart from that[1], the full force of the kernel's massively parallelizable make bore down upon me, sending me way into swap even with 8 gigs RAM.  As I run Gentoo, I had the same issues emerging programs.  Luckily I run 4-spindle striped swap, so it wasn't /too/ bad, tho everything did basically stop responding for a few minutes until the jobs quick spawning as fast as swap could absorb them.

Today I finally got around to confirming that it was the hibernate that did it, and that 2.6.30 release was free of the regression, while everything beyond my first 2.6.31 series git pull has it.

Hardware info: Tyan s2885 mobo, x86_64 dual Opteron 290 (so dual-dual-core), 8 gigs RAM, 4 x 300-gig SATA drives, Radeon 9200 AGP graphics, dual LCDs @ 1920x1200 native each, USB connected Logitech wireless mouse/keyboard.

Software info: Gentoo/~amd64, gcc-4.4.0, mainline kernel, SMP, NUMA, normally working hibernate, those 4 SATA drives in md/mdp/RAID (personalities 0,1,6, main system as RAID-6), LVM2, no initramfs/initrd, rootfs is kernel command-line-assembled mdp/RAID, reiserfs, radeonfb @ native but WITHOUT the new KMS enabled.

Once I get a reply confirming it's a new bug, I'll attach full .config and can attach boot log if necessary.  I'll also do a git bisect to nail it down within that first 5659 commits, but if it's already done, as it was the last time I filed a bug, I can skip all those recompile and reboot, then hibernate and resume to test, cycles. =:^)

BTW, is there a good way to get on the regressions tracking list without filing a bug?   I've been on the regression list CC after reporting a bug, but would love to get it routinely, either by subscribing to a regression list somewhere, or with a link to the web-page version of it.

.....
[1] I wasn't running a -j limit as make's jobserver apparently had a race a version or two ago, and would occasionally error out complaining about job token count mismatches.  Don't put a limit on job count, depending entirely on the -l load average limit, and those go away!  I think it's fixed now, or at least I've not yet encountered it after adding back a -j limit.  Now I'm running -j40 -l25 for the kernel, thus at least capping the lack of load average usage to something sane.  Running multi-hundred parallel jobs and going gigs into swap was fun the first few times I tried it years ago, but it has long since lost its novelty.
Comment 1 Duncan 2009-07-12 14:27:50 UTC
Created attachment 22324 [details]
kernel config (2.6.31-rc2 version)

OK, did a bisect.  Here's the whatchanged -1 version of the result.  The commit does what it says on the label, affect load average. =:^)

I'm attaching my config (2.6.31-rc2 version) as well.  If you need anything else or want me to run a debug patch or something, let me know.

commit dce48a84adf1806676319f6f480e30a6daa012f9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sat Apr 11 10:43:41 2009 +0200

    sched, timers: move calc_load() to scheduler

    Dimitri Sivanich noticed that xtime_lock is held write locked across
    calc_load() which iterates over all online CPUs. That can cause long
    latencies for xtime_lock readers on large SMP systems.

    The load average calculation is an rough estimate anyway so there is
    no real need to protect the readers vs. the update. It's not a problem
    when the avenrun array is updated while a reader copies the values.

    Instead of iterating over all online CPUs let the scheduler_tick code
    update the number of active tasks shortly before the avenrun update
    happens. The avenrun update itself is handled by the CPU which calls
    do_timer().

    [ Impact: reduce xtime_lock write locked section ]

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Peter Zijlstra <peterz@infradead.org>

:100644 100644 b4c38bc... 6eb4892... M  include/linux/sched.h
:100644 100644 8908d19... f4eb881... M  kernel/sched.c
:100644 100644 8a21a2e... 499672c... M  kernel/sched_idletask.c
:100644 100644 687dff4... 52a8bf8... M  kernel/time/timekeeping.c
:100644 100644 cffffad... 6a21d7a... M  kernel/timer.c
Comment 2 Duncan 2009-07-27 09:15:40 UTC
As Peterz indicated on the mail, it's now fixed (late in the rc3 cycle, for rc4).

From the mail:

commit 6301cb95c119ebf324bb96ee226fa9ddffad80a7
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 17 14:15:47 2009 +0200

    sched: fix nr_uninterruptible accounting of frozen tasks really


commit a468d389349a7560249b355cdb6d2097ea1616c9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 17 14:15:46 2009 +0200

    sched: fix load average accounting vs. cpu hotplug

Incidentally, I guess this should go in as a bugzilla.kernel.org bug, the status description link is apparently still generic bugzilla, as it has fixed, etc, not the kernel-bugs specific CODE_FIX, PATCH_ALREADY_AVAILABLE, etc.  I was unsure of which to use for resolution, and the status link, being generic, isn't any help.  So I picked CODE_FIX.
Comment 3 Duncan 2009-07-27 09:40:11 UTC
Re: Comment #2:

CODE_FIX, PATCH_ALREADY_AVAILABLE, etc.  I was unsure of which to use for resolution, and the status link, being generic, isn't any help.

Bug filed: Bug 13851
Comment 4 Rafael J. Wysocki 2009-07-27 22:30:53 UTC
On Monday 27 July 2009, Peter Zijlstra wrote:
> On Sun, 2009-07-26 at 22:28 +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.30.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=13750
> > Subject             : Load average flatlines after returning from hibernate
> > Submitter   : Duncan <1i5t5.duncan@cox.net>
> > Date                : 2009-07-09 15:14 (18 days old)
> > 
> > 
> 
> commit 6301cb95c119ebf324bb96ee226fa9ddffad80a7
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Fri Jul 17 14:15:47 2009 +0200
> 
>     sched: fix nr_uninterruptible accounting of frozen tasks really
> 
> 
> commit a468d389349a7560249b355cdb6d2097ea1616c9
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Fri Jul 17 14:15:46 2009 +0200
> 
>     sched: fix load average accounting vs. cpu hotplug