I have a Sony Vaio VGN-Z540 laptop. This machine was able to suspend/resume without problem in 2.6.31-rc2, but in 2.6.31-rc3 it started freezing up when I try to suspend. The system freezes with a blank screen and the keyboard leds start blinking. I have to use system power button to shut it down. The problem is not limited to suspend though - the same type of freeze occurs when I try to switch from X to a console (ctrl-alt-f1). The same problem exists in 2.6.31-rc4 (head 4be3bd7849165e7efa6b0b35a23d6a3598d97465). When I booth without gdm and trigger suspend with "echo disk > /sys/power/state" it works fine.
I was able to bisect the problem to the commit below:
Merge: eee33ab... 0611254...
Author: Linus Torvalds <firstname.lastname@example.org>
Date: Fri Jul 10 19:14:48 2009 -0700
Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix corruption dump
UBIFS: clean up free space checking
UBIFS: small amendments in the LEB scanning code
UBIFS: dump a little more in case of corruptions
MAINTAINERS: update ahunter's e-mail address
UBIFS: allow more than one volume to be mounted
UBIFS: fix assertion warning
UBIFS: minor spelling and grammar fixes
UBIFS: fix 64-bit divisions in debug print
UBIFS: few spelling fixes
UBIFS: set write-buffer timout to 3-5 seconds
UBIFS: slightly optimize write-buffer timer usage
UBIFS: improve debugging messaged
UBIFS: fix integer overflow warning
I don't know how accurate it is though because I tried to do a sanity check on a repo that first has this commit as HEAD and next reverted this repo to HEAD^. The problem was present in both of these repos.
The commit result seems unlikely to be correct. Are you actually using UBIFS?
I thought the same thing and that is why I did that sanity check (which failed). My kernel is not compiled with CONFIG_UBIFS_FS
I am starting with a fresh bisect now. Will reports back results when this is complete.
I think I know why my previous bisect result was wrong. During the bisect I had to test a revision that did not boot on my system, I then guessed it as "bad" and proceeded. With the next bisect the kernel revision changed to 2.6.31-rc1, and I knew that this worked on 2.6.31-rc2 so I changed my guess to "good" to get bisect to go back to 2.6.31-rc2. This was wrong as I see now the new bisect has a kernel version of 2.6.31-rc1 even though 2.6.31-rc2 works fine. It must have something to do with how the trees are merged.
Anyway, I rerun bisect with different good commit and was able to get a reliable first bad commit. This commit makes more sense as I am using i915.
Author: email@example.com <firstname.lastname@example.org>
Date: Thu Jun 25 10:59:22 2009 +0800
drm/i915: Set SSC frequency for 8xx chips correctly
All 8xx class chips have the 66/48 split, not just 855.
Signed-off-by: Ma Ling <email@example.com>
Reviewed-by: Jesse Barnes <firstname.lastname@example.org>
Signed-off-by: Eric Anholt <email@example.com>
00:02.0 0300: 8086:2a42 (rev 07)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 31
Region 0: Memory at e8400000 (64-bit, non-prefetchable) [size=4M]
Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at 8130 [size=8]
Capabilities: <access denied>
OK, thanks, I'll reassign it to DRI.
This is a post-2.6.31-rc2 regression.
I am now trying to use 2.6.31-rc4 but would like to use X also. I thus reverted this commit from 2.6.31-rc4 but the freeze problem still exists. The above commit was identified clearly in a git-bisect that ran without problems, but reverting it does not fix the issue. This is weird. Is there a way in which I can obtain any logs to help debug this?
This problem has me very confused. I have tried bisecting it three times
now and every time I end up with a patch from a merge of the
'drm-intel-next' branch, but I do not always get the same commit from
that branch as the "first bad commit". I went ahead and "rolled my own"
bisect by using rc4 and reverting all the patches from that branch
merge. As expected, that gave me a working setup again. I then did a
manual bisect and found that "drm/i915: enable error detection & state
collection" was the bad commit. I wanted to confirm this with a sanity
check, but could not revert it on its own, I had to revert the following
to get a working setup based off the current linux-2.6
drm/i915: Don't update display FIFO watermark on IGDNG
drm/i915: add FIFO watermark support
drm/i915: enable error detection & state collection
Same problem in 2.6.31-rc6. Unfortunately the patches I previously reverted to get a working system does not revert cleanly anymore.
On Thursday 20 August 2009, reinette chatre wrote:
> On Wed, 2009-08-19 at 13:26 -0700, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
> > Subject : system freeze when switching to console
> > Submitter : Reinette Chatre <firstname.lastname@example.org>
> > Date : 2009-07-23 17:57 (28 days old)
> This issue is still present in 2.6.31-rc6. Unfortunately the patches I
> reverted to get a working system does not revert cleanly anymore.
This is fixed by:
Author: Linus Torvalds <email@example.com>
Date: Tue Sep 8 17:09:24 2009 -0700
i915: disable interrupts before tearing down GEM state
Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
when switching from graphics mode to the text console, or when
suspending (which does the same thing). With netconsole, the oops
turned out to be
BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
and it's due to the i915_gem.c code doing drm_irq_uninstall() after
having done i915_gem_idle(). And the i915_gem_idle() path will do
dev_priv->hw_status_page = NULL;
but if an i915 interrupt comes in after this stage, it may want to
access that hw_status_page, and gets the above NULL pointer dereference.
And since the NULL pointer dereference happens from within an interrupt,
and with the screen still in graphics mode, the common end result is
simply a silently hung machine.
Fix it by simply uninstalling the irq handler before idling rather than
Reported-and-tested-by: Reinette Chatre <firstname.lastname@example.org>
Acked-by: Jesse Barnes <email@example.com>
Signed-off-by: Linus Torvalds <firstname.lastname@example.org>