Kernel Bug Tracker – Bug 12005
Regression: 2.6.28-rc2 thru -rc4 Hibernation fails -- no write to disk
Last modified: 2008-11-17 05:27:15 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28-rc2
But, I've git bisected, see below...
Hardware Environment: Tyan s2885 dual Opteron 290, 8 gigs RAM (AMD 8151 GART IOMMU), 4 SATA drives on SATA_SIL as JBOD, ATI Radeon 9250 AGP based video.
Software Environment: As mentioned, 64-bit Gentoo/~amd64 (~ indicates unstable). GCC 4.3.2. I normally build everything in that I use routinely, so normally don't load any modules tho stuff like vfat and floppy are built as modules in case I need them.
I'm running kernel/mdp RAID-0/1/6 (one of each) on the 4 SATA drives. I run my own hibernate script, with the image saved to the 4 gig swap partition on one of the four SATA drives mentioned above. (Each drive is laid out identically, with a 4-gig swap on each, normally set equal priority for striping.)
Problem Description: From 2.6.28-rc2 (I didn't try -rc1), hibernate fails to write the image to disk, so of course can't resume. 2.6.27 worked fine. I suspected the framebuffer changes, but recompiled with that disabled and still had the issue. Since the git bisect, it seems the problem isn't the framebuffer but the DRM changes, namely:
Author: Jesse Barnes <firstname.lastname@example.org>
Date: Tue Sep 30 12:14:26 2008 -0700
drm: Rework vblank-wait handling to allow interrupt reduction.
Previously, drivers supporting vblank interrupt waits would run the interrupt
all the time, or all the time that any 3d client was running, preventing the
CPU from sleeping for long when the system was otherwise idle. Now, interrupts
are disabled any time that no client is waiting on a vblank event. The new
method uses vblank counters on the chipsets when the interrupts are turned
off, rather than counting interrupts, so that we can continue to present
accurate vblank numbers.
Co-author: Michel DÃ¤nzer <email@example.com>
Signed-off-by: Jesse Barnes <firstname.lastname@example.org>
Signed-off-by: Eric Anholt <email@example.com>
Signed-off-by: Dave Airlie <firstname.lastname@example.org>
I'll attack my .config. I'm not sure the boot log will help at this point, so will wait on that until asked. If you want debug info, tell me what to enable/try/attach.
PS: I like the new compiled in kernel command line options. With the RAID options, I was getting quite a string of them, and it's very nice not to have to worry about it on the grub command line, now, tho of course I put them back in grub for the bisect.
PPS: I /was/ using the 7x14 font with radeonfb on 2.6.27 and previous. It doesn't work with 2.6.28 now, and I had to go back to the default 8x16 font. Of course there's warnings that the 7x14 may not work, but it worked before. Is that considered a bug/regression as well, or a deliberate change? In any case, there may be additional reports as .28 heads toward release, so I thought it worth mentioning. If it's considered a bug I can open a separate one for it, if desired.
Created attachment 18785 [details]
It's strange that a DRM change would have that effect, but not the strangest!
I'll add Jesse to the Cc.
The second issue is a separate one, so please open another bug entry for it.
Now, I have no idea what the problem may be. If you can carry out a bisection, please do so.
Are you sure your hibernation is completing? We have another, possibly related bug open that was causing the ATI DRM driver to panic at suspend time (depending on your configuration), I wonder if that's what you're hitting?
(In reply to comment #4)
> Are you sure your hibernation is completing? We have another, possibly related
> bug open that was causing the ATI DRM driver to panic at suspend time
> (depending on your configuration), I wonder if that's what you're hitting?
Well, I did say it's failing to write the image to disk, so of course it can't resume. To me, that's the same as saying the hibernate isn't completing, but perhaps you are making a distinction between the quiescing, etc and the image write that I didn't make.
It could indeed be the same bug, yes, at least if it wasn't there with 2.6.27, same as mine. Is his still there as of -rc4, as is mine? How far has the diagnosis gotten and if it was bisected, was his result identical to mine? Do you have either a bug number or a list archive message URL?
I was thinking of 11891. When I said "completing" I really meant whether the system was going down gracefully. If your image isn't getting written I suppose we can assume that's not the case, but there's always the possibility that something is going wrong with the write path and everything else is working fine.
And yeah, the fix hasn't been pushed yet afaik so the failure is still present. And judging by his last add, it may only be a partial fix. I'm trying to get one of the radeon guys to look at it.
Tejun, can you have a look at this entry, please?
(In reply to comment #6)
> I was thinking of 11891.
If kernel bugzi works like some do, it should auto-link bug number references on the web page if they are given like this: bug #11891 . If not and/or for mail, that's http://bugzilla.kernel.org/show_bug.cgi?id=11891 .
That one's bisecting to the same commit, so I'd say it's almost certainly the same. I don't have time to check it further or check the patch ATM, but I will later. I'll mark this one a dup of it if it looks so after I get a chance to take that closer look. A second report of it confirming the fix should be a good thing in any case.
Rafael, looks like drm problem. What am I supposed to look at?
Sorry, I thought it was breaking while writing the image.
*** This bug has been marked as a duplicate of bug 11891 ***
Just confirming, 2.6.28-rc5 does work for me, so presumably it IS a dup and the fix for the other fixed it here as well.