Bug 12005

Summary: Regression: 2.6.28-rc2 thru -rc4 Hibernation fails -- no write to disk
Product: Power Management Reporter: Duncan (1i5t5.duncan)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED DUPLICATE    
Severity: normal CC: akpm, jbarnes, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 11808    
Attachments: 2.6.28-rc3 .config

Description Duncan 2008-11-10 10:30:08 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28-rc2

But, I've git bisected, see below...

Distribution: Gentoo/~amd64

Hardware Environment: Tyan s2885 dual Opteron 290, 8 gigs RAM (AMD 8151 GART IOMMU), 4 SATA drives on SATA_SIL as JBOD, ATI Radeon 9250 AGP based video.

Software Environment: As mentioned, 64-bit Gentoo/~amd64 (~ indicates unstable).  GCC 4.3.2.  I normally build everything in that I use routinely, so normally don't load any modules tho stuff like vfat and floppy are built as modules in case I need them.

I'm running kernel/mdp RAID-0/1/6 (one of each) on the 4 SATA drives.  I run my own hibernate script, with the image saved to the 4 gig swap partition on one of the four SATA drives mentioned above.  (Each drive is laid out identically, with a 4-gig swap on each, normally set equal priority for striping.)

Problem Description: From 2.6.28-rc2 (I didn't try -rc1), hibernate fails to write the image to disk, so of course can't resume.  2.6.27 worked fine.  I suspected the framebuffer changes, but recompiled with that disabled and still had the issue.  Since the git bisect, it seems the problem isn't the framebuffer but the DRM changes, namely:

commit 0a3e67a4caac273a3bfc4ced3da364830b1ab241
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Sep 30 12:14:26 2008 -0700

    drm: Rework vblank-wait handling to allow interrupt reduction.

    Previously, drivers supporting vblank interrupt waits would run the interrupt
    all the time, or all the time that any 3d client was running, preventing the
    CPU from sleeping for long when the system was otherwise idle.  Now, interrupts
    are disabled any time that no client is waiting on a vblank event. The new
    method uses vblank counters on the chipsets when the interrupts are turned
    off, rather than counting interrupts, so that we can continue to present
    accurate vblank numbers.

    Co-author: Michel Dänzer <michel@tungstengraphics.com>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

I'll attack my .config. I'm not sure the boot log will help at this point, so will wait on that until asked.  If you want debug info, tell me what to enable/try/attach.

PS:  I like the new compiled in kernel command line options.  With the RAID options, I was getting quite a string of them, and it's very nice not to have to worry about it on the grub command line, now, tho of course I put them back in grub for the bisect.

PPS:  I /was/ using the 7x14 font with radeonfb on 2.6.27 and previous.  It doesn't work with 2.6.28 now, and I had to go back to the default 8x16 font.  Of course there's warnings that the 7x14 may not work, but it worked before.  Is that considered a bug/regression as well, or a deliberate change?  In any case, there may be additional reports as .28 heads toward release, so I thought it worth mentioning.  If it's considered a bug I can open a separate one for it, if desired.
Comment 1 Duncan 2008-11-10 10:33:47 UTC
Created attachment 18785 [details]
2.6.28-rc3 .config
Comment 2 Andrew Morton 2008-11-10 10:37:33 UTC
It's strange that a DRM change would have that effect, but not the strangest!
I'll add Jesse to the Cc.
Comment 3 Rafael J. Wysocki 2008-11-10 10:39:56 UTC
The second issue is a separate one, so please open another bug entry for it.

Now, I have no idea what the problem may be.  If you can carry out a bisection, please do so.
Comment 4 Jesse Barnes 2008-11-10 11:18:16 UTC
Are you sure your hibernation is completing?  We have another, possibly related bug open that was causing the ATI DRM driver to panic at suspend time (depending on your configuration), I wonder if that's what you're hitting?
Comment 5 Duncan 2008-11-10 12:04:45 UTC
(In reply to comment #4)
> Are you sure your hibernation is completing?  We have another, possibly
> related
> bug open that was causing the ATI DRM driver to panic at suspend time
> (depending on your configuration), I wonder if that's what you're hitting?

Well, I did say it's failing to write the image to disk, so of course it can't resume.  To me, that's the same as saying the hibernate isn't completing, but perhaps you are making a distinction between the quiescing, etc and the image write that I didn't make.

It could indeed be the same bug, yes, at least if it wasn't there with 2.6.27, same as mine.  Is his still there as of -rc4, as is mine?  How far has the diagnosis gotten and if it was bisected, was his result identical to mine?  Do you have either a bug number or a list archive message URL?
Comment 6 Jesse Barnes 2008-11-10 12:35:32 UTC
I was thinking of 11891.  When I said "completing" I really meant whether the system was going down gracefully.  If your image isn't getting written I suppose we can assume that's not the case, but there's always the possibility that something is going wrong with the write path and everything else is working fine.

And yeah, the fix hasn't been pushed yet afaik so the failure is still present.  And judging by his last add, it may only be a partial fix.  I'm trying to get one of the radeon guys to look at it.
Comment 7 Rafael J. Wysocki 2008-11-10 13:34:50 UTC
Tejun, can you have a look at this entry, please?
Comment 8 Duncan 2008-11-10 15:43:18 UTC
(In reply to comment #6)
> I was thinking of 11891.

If kernel bugzi works like some do, it should auto-link bug number references on the web page if they are given like this: bug #11891 .  If not and/or for mail, that's http://bugzilla.kernel.org/show_bug.cgi?id=11891 .

That one's bisecting to the same commit, so I'd say it's almost certainly the same.  I don't have time to check it further or check the patch ATM, but I will later.  I'll mark this one a dup of it if it looks so after I get a chance to take that closer look.  A second report of it confirming the fix should be a good thing in any case.
Comment 9 Tejun Heo 2008-11-10 17:49:25 UTC
Rafael, looks like drm problem.  What am I supposed to look at?
Comment 10 Rafael J. Wysocki 2008-11-11 05:22:55 UTC
Sorry, I thought it was breaking while writing the image.


*** This bug has been marked as a duplicate of bug 11891 ***
Comment 11 Duncan 2008-11-17 05:27:15 UTC
Just confirming, 2.6.28-rc5 does work for me, so presumably it IS a dup and the fix for the other fixed it here as well.