Bug 11947 - 2.6.28-rc VC switching with Intel graphics broken
2.6.28-rc VC switching with Intel graphics broken
Status: CLOSED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel)
All Linux
: P1 high
Assigned To: Bernhard Schmidt
:
: 11984 (view as bug list)
Depends on:
Blocks: 11808
  Show dependency treegraph
 
Reported: 2008-11-03 12:10 UTC by Romano Giannetti
Modified: 2008-12-08 13:12 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.28-rc3
Tree: Mainline
Regression: Yes


Attachments
syslog after the crash (620.59 KB, text/plain)
2008-11-03 14:17 UTC, Romano Giannetti
Details
boot dmesg with 2.6.27.4 kernel (56.24 KB, text/plain)
2008-11-03 14:18 UTC, Romano Giannetti
Details
boot dmesg with 2.6.28-rc3 kernel (90.98 KB, text/plain)
2008-11-03 14:19 UTC, Romano Giannetti
Details
config for 2.6.28-rc3 (72.19 KB, text/plain)
2008-11-03 14:20 UTC, Romano Giannetti
Details
lsmd with 2.6.27.4 (3.91 KB, text/plain)
2008-11-03 14:21 UTC, Romano Giannetti
Details
dmesg at boot with 2.6.28-rc3-drm-intel (91.27 KB, text/plain)
2008-11-04 01:53 UTC, Romano Giannetti
Details
syslog during the session with 2.6.28-rc3-drm-intel (175.09 KB, text/plain)
2008-11-04 01:54 UTC, Romano Giannetti
Details
real patch applied on top v2.6.28-rc6-7-ged31348 fixing VC switch (2.95 KB, patch)
2008-11-25 01:45 UTC, Romano Giannetti
Details | Diff
2.6.28-rc7 freeze2hang patch (492 bytes, patch)
2008-12-03 16:02 UTC, Guillaume Ayoub
Details | Diff

Description Romano Giannetti 2008-11-03 12:10:18 UTC
Latest working kernel version: 2.6.27.4
Earliest failing kernel version: 2.6.28-rc3
Distribution: Ubuntu 8.10 Intrepid Ibex
Hardware Environment:  Intel 945GM Chipset
Software Environment:  xserver-xorg-video-intel  2:2.4.1-1ubuntu10

Problem Description: X does not come back from a VC switching. Just a black screen with the cursor in it. The laptop seems to work, just screen (and keyboard) is busted.

Steps to reproduce: Switch to a VC (ctrl-alt-F1). Switch back (alt-f7, f8 or whatever). You have just a black screen. No switch back to another VC. No ctrl-alt-backspace. No ctrl-alt-del. Need to reboot with SysRq-B.
Comment 1 Jesse Barnes 2008-11-03 13:40:40 UTC
Can you capture the dmesg from after the failure, maybe by ssh'ing into the box before you crash it?
Comment 2 Jesse Barnes 2008-11-03 14:14:46 UTC
Also, this is filed against console/framebuffers.  Are you running intelfb or vesafb drivers?  Did you notice them as the problem somehow?
Comment 3 Romano Giannetti 2008-11-03 14:15:59 UTC
I have collected dmesg (at boot) for the two kernels, see dmesg-*.txt file
attached. I have collected a syslog file too, during the lock, by doing a
sysrq-t sysrq-s before rebooting, see syslog-*.txt. It seems that the problem
starts around line 1761:

[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled
pipe 0

but that's only a wild guess. I think I have no vesafb nor intelfb on, I'll attach a lsmod output too
Comment 4 Romano Giannetti 2008-11-03 14:17:26 UTC
Created attachment 18639 [details]
syslog after the crash
Comment 5 Romano Giannetti 2008-11-03 14:18:21 UTC
Created attachment 18640 [details]
boot dmesg with 2.6.27.4 kernel
Comment 6 Romano Giannetti 2008-11-03 14:19:10 UTC
Created attachment 18641 [details]
boot dmesg with 2.6.28-rc3 kernel
Comment 7 Romano Giannetti 2008-11-03 14:20:28 UTC
Created attachment 18642 [details]
config for 2.6.28-rc3
Comment 8 Romano Giannetti 2008-11-03 14:21:06 UTC
Created attachment 18643 [details]
lsmd with 2.6.27.4
Comment 9 Romano Giannetti 2008-11-03 14:22:49 UTC
Sorry, I do no have a lsmod with 2.6.28-rc3 now, so I attached a config (it was made by make oldconfig from the 2.6.27.4 one, modulo some lock debug). 

Have to go now (late in the night here), tomorrow I will try to help some more. 
Comment 10 Romano Giannetti 2008-11-03 14:58:34 UTC
Hmmm. It seems like bug#10892 could be related... I am booting since then with nosplash, but in this case it didn't help (and it is a slight different issue, I think, it happens only after a VC switch now. But symptoms are very similar). 

Another very similar issue was with bug#10620. 

Comment 11 Jesse Barnes 2008-11-03 15:04:28 UTC
Ok, so it looks like Xorg is waiting for a vblank event:

Nov  3 19:39:23 rukbat kernel: [  345.123844] Xorg          S f65fdde8     0  5211   5205
Nov  3 19:39:23 rukbat kernel: [  345.123844]  f65fde00 00003046 00000002 f65fdde8 f65fddf0 00000000 f65fdda4 00003046
Nov  3 19:39:23 rukbat kernel: [  345.123844]  00000000 c04223c0 c04bd500 f65fddf0 f65fddec f65fdde8 c04fbc00 00003286
Nov  3 19:39:23 rukbat kernel: [  345.123844]  f6609120 f6609274 00000000 00002bdc 00000000 f65fddc4 c015152e c04fbc00
Nov  3 19:39:23 rukbat kernel: [  345.123844] Call Trace:
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c015152e>] ? trace_hardirqs_on_caller+0x10e/0x160
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c033b029>] ? _spin_unlock_irqrestore+0x39/0x70
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c0136c12>] ? __mod_timer+0xc2/0x110
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c0338d6f>] schedule_timeout+0x7f/0xe0
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c033b045>] ? _spin_unlock_irqrestore+0x55/0x70
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c0136580>] ? process_timeout+0x0/0x10
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c0338d6a>] ? schedule_timeout+0x7a/0xe0
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<f84688b6>] drm_wait_vblank+0x186/0x410 [drm]
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c010b6fe>] ? restore_i387_fxsave+0x6e/0x80
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c015152e>] ? trace_hardirqs_on_caller+0x10e/0x160
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c015158b>] ? trace_hardirqs_on+0xb/0x10
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c01245e0>] ? default_wake_function+0x0/0x10
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<f84662b8>] drm_ioctl+0xe8/0x2f0 [drm]
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c010643a>] ? show_interrupts+0x1ea/0x4e0
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<f8468730>] ? drm_wait_vblank+0x0/0x410 [drm]
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c01a4961>] vfs_ioctl+0x71/0x80
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c010643a>] ? show_interrupts+0x1ea/0x4e0
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c01a49ce>] do_vfs_ioctl+0x5e/0x490
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c01039f0>] ? restore_sigcontext+0x100/0x150
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c0103bee>] ? sys_sigreturn+0xce/0x170
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c0137ced>] ? sys_rt_sigaction+0x6d/0xa0
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c01a4e39>] sys_ioctl+0x39/0x70
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c0103f15>] sysenter_do_call+0x12/0x35
Nov  3 19:39:23 rukbat kernel: [  345.123844]  [<c010643a>] ? show_interrupts+0x1ea/0x4e0

You might try building a kernel from the Intel DRM git tree to see if the problem still happens there:
git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git branch drm-intel-next

I think 10892 is probably a separate bug related to mode programming in general.
Comment 12 Romano Giannetti 2008-11-03 15:10:22 UTC
Hmmm... I will need a bit more git guidance. Now I have cloned Linus' tree and i git pull it. Should I

git clone <your tree>
git checkout drm-intel-next

and compile it (I mean, it's a complete kernel) or should I make some git remote magic? 
Comment 13 Romano Giannetti 2008-11-03 15:10:52 UTC
Well, tomorrow now. I have really to go...
Comment 14 Romano Giannetti 2008-11-03 15:13:43 UTC
But hey, reading it up, bug#10620 really seems the same problem.
Comment 15 Jesse Barnes 2008-11-03 16:44:28 UTC
Right:
  $ git clone <url>
  $ cd drm-intel
  $ git checkout -b drm-intel-next --track origin/drm-intel-next
then build it with your favorite .config.

As for 10620 there could be a relationship, but please try the drm-intel bits; this area has seen many changes & fixes.
Comment 16 Romano Giannetti 2008-11-04 01:52:38 UTC
Done. No joy.

The behavior is very similar. The only differences are: 

* after a log while from hitting ctrl-alt-f1, this time I can switch back to the 
  VC and shutdown from there
* sysrq-t produces nothing (? I used the same config as before...)

This time the screen is not everytime black; at a point I had a couple of windows drawn but X is still inoperative. 

I'll attach dmesg and syslog of this test. 
Comment 17 Romano Giannetti 2008-11-04 01:53:30 UTC
Created attachment 18650 [details]
dmesg at boot with 2.6.28-rc3-drm-intel
Comment 18 Romano Giannetti 2008-11-04 01:54:27 UTC
Created attachment 18651 [details]
syslog during the session with 2.6.28-rc3-drm-intel

Ths one shows the same VBLANK error message...
Comment 19 Rafael J. Wysocki 2008-11-09 10:49:39 UTC
Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org>
Comment 20 Bernhard Schmidt 2008-11-09 12:44:18 UTC
I have pretty much the same issue with an GM965 chipset on Ubuntu Intrepid 8.10 with more recent Xorg intel drivers (2.5.0~git20081023).

My report is tracked as bug#11984
Comment 21 Romano Giannetti 2008-11-10 00:51:41 UTC
I tested -rc3 and -drm-intel tip. Should I test -rc4 or better I wait for some patch to test? I have both trees now, so I can test patches versus -linus or -drm-intel. 
Comment 22 Rafael J. Wysocki 2008-11-11 05:53:46 UTC
On Tuesday, 11 of November 2008, Romano Giannetti wrote:
> Rafael J. Wysocki wrote:
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11947
> > Subject		: 2.6.28-rc VC switching with Intel graphics broken
> > Submitter	: Romano Giannetti <romano.giannetti@gmail.com>
> > Date		: 2008-11-03 12:10 (7 days old)
> > Handled-By	: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> Still here in 2.6.28-rc4. Complete lock switching back from a VC to X.

Comment 23 devsk 2008-11-15 16:47:24 UTC
Is this thing fixed in RC5?
Comment 24 Johan Bilien 2008-11-16 05:56:27 UTC
(In reply to comment #23)
> Is this thing fixed in RC5?
> 

Yes this is still happening with RC5, on an apple macbook with intel GM945 graphics.

When switching back to X from a VT, only the cursor on a black screen is shown.
Comment 25 Rafael J. Wysocki 2008-11-16 09:33:36 UTC
*** Bug 11984 has been marked as a duplicate of this bug. ***
Comment 26 Rafael J. Wysocki 2008-11-16 09:34:25 UTC
Notify-Also : Bernhard Schmidt <berni@birkenwald.de>
Comment 27 Rafael J. Wysocki 2008-11-16 09:35:08 UTC
Notify-Also : Johan Bilien <jobi@via.ecp.fr>
Comment 28 Romano Giannetti 2008-11-17 06:21:14 UTC
More data: it locks in 2.6.28-rc5 too, bu I restricted it to 3D. If I start X, then disable "Visula effect" (no composite, no compiz) then it works ok - I can switch back and forth from the VC. Enabling visual effects again causes the lock. 
Comment 29 Romano Giannetti 2008-11-17 06:40:31 UTC
Last hour notice: it's not a lock. It's simply 2 minutes of delay. I discovered it because I had to do another thing and X came back automagically. So: it's a 2 minutes delay when switching back from a VC when 3D is on. I have these messages over 3 tries:

(0)rukbat:~% dmesg -s 20000000 | grep i915
[   41.027994] [drm] Initialized i915 1.6.0 20080730 on minor 0
[  162.718798] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[  281.616778] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[  425.802808] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0

Comment 30 Romano Giannetti 2008-11-17 06:57:52 UTC
On the other hand, it completely locks on resume from RAM. Sigh.
Comment 31 Andreas Mohr 2008-11-17 07:39:57 UTC
Experienced same VC switch issue once, on A110L on -rc5, also full machine lockup on resume (not sure whether caused by this or something else).
Comment 32 devsk 2008-11-17 18:30:15 UTC
(In reply to comment #29)
> Last hour notice: it's not a lock. It's simply 2 minutes of delay. I discovered
> it because I had to do another thing and X came back automagically. So: it's a
> 2 minutes delay when switching back from a VC when 3D is on. I have these
> messages over 3 tries:
> 
> (0)rukbat:~% dmesg -s 20000000 | grep i915
> [   41.027994] [drm] Initialized i915 1.6.0 20080730 on minor 0
> [  162.718798] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count
> for disabled pipe 0
> [  281.616778] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count
> for disabled pipe 0
> [  425.802808] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count
> for disabled pipe 0
> 

I am seeing the same messages in dmesg. My problem seems to be intermittent.I get it while switching VTs or while logging out of KDE 3.5 and even sysrq refuses to work.

I am not using any KDE effects, vesafb or uvesafb. Both vesafb and uvesafb give a repeatable lockup if i915 is loaded and VT switch is done.

Without vesafb/uvesafb, I am not seeing the lockup after resume from RAM. Only intermittent lockup while switching VTs or logging out of KDE.
Comment 33 Romano Giannetti 2008-11-21 00:49:57 UTC
Changed to video/DRI, because I think it's drm related. Compiling -rc6 now, but I do not think a lot changed in this field.
Comment 34 Romano Giannetti 2008-11-21 04:13:35 UTC
Still here with us for -rc6. Black screen on VC switch. 
Comment 35 Rafael J. Wysocki 2008-11-22 13:29:06 UTC
Notify-Also : Andreas Mohr <andi@lisas.de>
Notify-Also : devsk <kernel-bugs.dev1world@spamgourmet.com>
Comment 36 Andreas Mohr 2008-11-23 02:54:29 UTC
Yup, -rc6 behaviour seems identical to -rc5: Have to wait a random 30 to 45 seconds after switch back to X until blank screen with cursor is gone.
GNOME/Compiz setup with PAT and MTRR cleanup enabled here (MTRR cleanup _does_ help here BTW, only 6 entries allocated instead of overcapacity error!).

dmesg sometimes shows
[   79.971895] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0

Xorg.0.log shows:

(II) AIGLX: Suspending AIGLX clients for VT switch
(II) intel(0): xf86UnbindGARTMemory: unbind key 0
(II) intel(0): xf86UnbindGARTMemory: unbind key 1
(II) intel(0): xf86UnbindGARTMemory: unbind key 2
(II) intel(0): xf86UnbindGARTMemory: unbind key 3
(II) intel(0): xf86UnbindGARTMemory: unbind key 4

^^^ this is logged until the moment we end up in non-X tty...

...then when switching back to X:

(II) Open ACPI successful (/var/run/acpid.socket)
(II) AIGLX: Resuming AIGLX clients after VT switch
(II) intel(0): xf86BindGARTMemory: bind key 0 at 0x00800000 (pgoffset 2048)
(II) intel(0): xf86BindGARTMemory: bind key 1 at 0x00c00000 (pgoffset 3072)
(II) intel(0): xf86BindGARTMemory: bind key 2 at 0x01800000 (pgoffset 6144)
(II) intel(0): xf86BindGARTMemory: bind key 3 at 0x01c00000 (pgoffset 7168)
(II) intel(0): xf86BindGARTMemory: bind key 4 at 0x02000000 (pgoffset 8192)
(II) intel(0): Fixed memory allocation layout:
(II) intel(0): 0x00000000-0x0001ffff: ring buffer (128 kB)
(II) intel(0): 0x00020000-0x0061ffff: compressed frame buffer (6144 kB, 0x000000005f820000 physical
)
(II) intel(0): 0x00620000-0x00620fff: compressed ll buffer (4 kB, 0x000000005fe20000 physical
)
(II) intel(0): 0x00621000-0x0062afff: HW cursors (40 kB, 0x000000005fe21000 phys
ical
)
(II) intel(0): 0x0062b000-0x00632fff: logical 3D context (32 kB)
(II) intel(0): 0x00633000-0x00633fff: overlay registers (4 kB, 0x000000005fe33000 physical
)
(II) intel(0): 0x007bf000:            end of stolen memory
(II) intel(0): 0x00800000-0x00bfffff: front buffer (4096 kB) X tiled
(II) intel(0): 0x00c00000-0x017fffff: exa offscreen (12288 kB)
(II) intel(0): 0x01800000-0x01bfffff: back buffer (4096 kB) X tiled
(II) intel(0): 0x01c00000-0x01ffffff: depth buffer (4096 kB) X tiled
(II) intel(0): 0x02000000-0x03ffffff: classic textures (32768 kB)
(II) intel(0): 0x10000000:            end of aperture
(WW) intel(0): ESR is 0x00000001, instruction error
(WW) intel(0): Existing errors found in hardware state.
(II) intel(0): using SSC reference clock of 96 MHz
(II) intel(0): Selecting standard 18 bit TMDS pixel format.
(II) intel(0): Output configuration:
(II) intel(0):   Pipe A is off
(II) intel(0):   Display plane A is now disabled and connected to pipe A.
(II) intel(0):   Pipe B is on
(II) intel(0):   Display plane B is now enabled and connected to pipe B.
(II) intel(0):   Output VGA is connected to pipe none
(II) intel(0):   Output LVDS is connected to pipe B
(II) intel(0): [drm] dma control initialized, using IRQ 16
(II) Synaptics Touchpad: x-axis range 1472 - 5472
(II) Synaptics Touchpad: y-axis range 1408 - 4448
(--) Synaptics Touchpad touchpad found
(II) SynPS/2 Synaptics TouchPad: x-axis range 1472 - 5472
(II) SynPS/2 Synaptics TouchPad: y-axis range 1408 - 4448
(WW) SynPS/2 Synaptics TouchPad can't grab event device, errno=16
(--) SynPS/2 Synaptics TouchPad touchpad found
(II) AT Translated Set 2 keyboard: Device reopened after 10 attempts.
(II) Video Bus: Device reopened after 10 attempts.



Resume is complete no-go as usual, end up with completely locked-up box, no backlight, no keyboard, nothing.
Comment 37 Rafael J. Wysocki 2008-11-24 05:32:03 UTC
On Monday, 24 of November 2008, Romano Giannetti wrote:
> 
> Rafael J. Wysocki wrote:
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11947
> > Subject		: 2.6.28-rc VC switching with Intel graphics broken
> > Submitter	: Romano Giannetti <romano.giannetti@gmail.com>
> > Date		: 2008-11-03 12:10 (20 days old)
> > Handled-By	: Jesse Barnes <jbarnes@virtuousgeek.org>
> > 
> 
> Still with us on -rc6, nasty, reproducible, no clues on what can be.
> 
> VC switch locks or delay the systems minutes, resuming from suspend locks hard 
> the machine.
> 
> 2.6.28-rc is unusable here.

Comment 38 Jesse Barnes 2008-11-24 10:54:18 UTC
This sound a lot like a bug that was fixed by:
http://lists.freedesktop.org/archives/intel-gfx/2008-November/000614.html

I'll ping Eric to make sure the fix makes it upstream quickly.
Comment 39 Romano Giannetti 2008-11-25 01:26:40 UTC
 That patch does not apply cleanly to 2.6.28-rc6, because in
i915_irq.c::i915_driver_irq_postinstall there is an additional line, namely 

/* Set initial unmasked IRQs to just the selected vblank pipes. */      
        dev_priv->irq_mask_reg = ~0; 

So i applied it manually deleting this line too, I hope I am doing things ok. 
Compiling now.
Comment 40 Romano Giannetti 2008-11-25 01:43:46 UTC
It works! VC switching is fixed. I will attach the patch I have used on top of
v2.6.28-rc6-7-ged31348.

Tested-by: Romano Giannetti <romano.giannetti@gmail.com>

Will test later and report about resume lock...
Comment 41 Romano Giannetti 2008-11-25 01:45:25 UTC
Created attachment 19013 [details]
real patch applied on top v2.6.28-rc6-7-ged31348 fixing VC switch
Comment 42 Romano Giannetti 2008-11-25 01:48:47 UTC
And it fixes resume from ram too! 

Tested-again-by-happy-user: Romano Giannetti <romano.giannetti@gmail.com>

Please push upstream. 
Comment 43 Andreas Mohr 2008-11-25 06:51:37 UTC
OK, success as well (if partial), on A110L: with Romano's "cooked" patch on -rc6, VC switching fully works, however S2R resume weirdly still fails (complete lockup of box).
I'm suspecting that it's for a different reason, though. Now that the DRM resume issue is fixed, maybe I can properly detect the _other_ reason for the lockup (unload several drivers and retry).
So a thumbs up from me for pushing this mainline, quickly.
Comment 44 Andreas Mohr 2008-11-25 09:34:36 UTC
OK, resume lockup for me was a different regression after all (Intel microcode module, see bug #12100).
Tried -rc5 again (with microcode module unloaded this time), VC switching was broken there, but DRM resume lockup did NOT occur with this unrepaired version for me! (just to submit this minor/unimportant piece of data).
Comment 45 Guillaume Ayoub 2008-11-26 07:31:55 UTC
Patch applied on top of rc6 fixes VC switch for me, good job.

Unfortunately, S2R and S2D still fail, I have a black screen with cursor and sometimes a monitor icon and an blue cross. Everything else seems to work (SSH server up, dmesg and xorg logs silent). Killing X by SSH works, but after relaunching X, VT switching is impossible (black screen or random color pixels, killing X again is the only solution).

I use no microcode module.

It looks like I am not alone: http://lists.freedesktop.org/archives/intel-gfx/2008-November/000648.html

I have a GM965, I use compiz. Please ask me if you want more information.
Comment 46 Andreas Mohr 2008-11-29 21:39:09 UTC
Strange issues here still after fixed -rc6, after another resume (maybe the third one?): extreme load (up to 4, X.org and compiz and some other X apps taking as much CPU as they can get), everything bloody slow (to be measured in dozens of seconds), then switched to tty1, load settled down and appeared normal, switched back, _VERY_ unresponsive again and trying to switch back to tty1 didn't even work, I got annoyed thus killed the box and rebooted
(would have attached gdb otherwise, maybe remote login would have been a good idea).
Comment 47 Rafael J. Wysocki 2008-11-30 05:32:17 UTC
Can anyone having this problem carry out bisection to identify the exact commit that caused it to happen?  That would greatly help identify the root cause of the problem.
Comment 48 Guillaume Ayoub 2008-11-30 10:59:41 UTC
I have my problem since 2.6.28-rc1. I now use 2.6.28-rc6 patched (the patch totally fixed VT switching *before* S2R or S2D). All the 2.6.27.x releases work for me, with VT switching, S2R and S2D.

Here is exactly what happens:

VT switching works before suspend. After wake up, I have a black screen with unmovable cursor, sometimes a monitor icon under the cursor. Everything else works. I can kill X (-9) from SSH, X restarts. After this, switching to a VT gives me random color pixels, even from gdm without composite. X can be killed again from SSH. Killing X with ctrl+alt+backspace gives me random color pixels during a couple of seconds, then reboot.

I tried to keep logs. dmesg is silent. I thougth that Xorg was silent too, but I have in fact these lines:

[normal start]
[…]
(II) intel(0): EDID vendor "AUO", prod id 33140
(II) intel(0): Printing DDC gathered Modelines:
(II) intel(0): Modeline "1280x800"x0.0   71.11  1280 1328 1360 1440  800 803 809 823 -hsync -vsync (49.4 kHz)
(II) intel(0): EDID vendor "AUO", prod id 33140
[suspend and wake up]
(II) AIGLX: Suspending AIGLX clients for VT switch
(II) intel(0): xf86UnbindGARTMemory: unbind key 0
(II) intel(0): xf86UnbindGARTMemory: unbind key 1
(WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory)
(II) No APM support in BIOS or kernel
(II) AIGLX: Resuming AIGLX clients after VT switch
(II) intel(0): xf86BindGARTMemory: bind key 0 at 0x0cd40000 (pgoffset 52544)
(II) intel(0): xf86BindGARTMemory: bind key 1 at 0x0e000000 (pgoffset 57344)
(II) intel(0): Fixed memory allocation layout:
(II) intel(0): 0x00000000-0x00000fff: power context (4 kB)
(II) intel(0): 0x0077f000:            end of stolen memory
(II) intel(0): 0x0077f000-0x0cd3ffff: DRI memory manager (202500 kB)
(II) intel(0): 0x0cd40000-0x0dffffff: exa offscreen (19200 kB)
(II) intel(0): 0x0e000000-0x0fffffff: classic textures (32768 kB)
(II) intel(0): 0x10000000:            end of aperture
(II) intel(0): BO memory allocation layout:
(II) intel(0): 0x0077f000:            start of memory manager
(II) intel(0): 0x0079f000-0x00ddefff: depth buffer (6400 kB) Y tiled
(II) intel(0): 0x00f9f000-0x015defff: back buffer (6400 kB) X tiled
(II) intel(0): 0x01800000-0x01e3ffff: front buffer (6400 kB) X tiled
(II) intel(0): 0x0179f000-0x0179ffff: overlay registers (4 kB)
(II) intel(0): 0x017a0000-0x017b5fff: exa G965 state buffer (88 kB)
(II) intel(0): 0x017c0000-0x017c7fff: logical 3D context (32 kB)
(II) intel(0): 0x017c8000-0x017d1fff: HW cursors (40 kB)
(II) intel(0): 0x0cd40000:            end of memory manager
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): Existing errors found in hardware state.
(II) intel(0): using SSC reference clock of 96 MHz
(II) intel(0): Selecting standard 18 bit TMDS pixel format.
(II) intel(0): Output configuration:
(II) intel(0):   Pipe A is off
(II) intel(0):   Display plane A is now disabled and connected to pipe A.
(II) intel(0):   Pipe B is on
(II) intel(0):   Display plane B is now enabled and connected to pipe B.
(II) intel(0):   Output VGA is connected to pipe none
(II) intel(0):   Output LVDS is connected to pipe B
(II) intel(0):   Output TV is connected to pipe none
(II) intel(0): [drm] mapped front buffer at 0x41800000, handle = 0x18300000
(--) AlpsPS/2 ALPS GlidePoint touchpad found

Note that I always had "Open ACPI failed (/var/run/acpid.socket)", even before 2.6.28.
Comment 49 Rafael J. Wysocki 2008-11-30 15:22:53 UTC
Do this problems exist in the current Linus' tree?  It contains several important DRM fixes.
Comment 50 Guillaume Ayoub 2008-11-30 17:21:50 UTC
Wow, last updates about VT switching and DRM have been added 15 minutes ago… I'll test these fixes during the next days and give news.
Comment 51 Guillaume Ayoub 2008-12-01 03:57:37 UTC
Linus' tree now includes the VT switching patch:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=52440211dcdc52c0b757f8b34d122e11b12cdd50

But bad news, I can't switch back from VT to X even *before* suspending. There are lots of drm changes and I don't really know where to find the source of the problem.

Can anybody try Linus' tree to confirm that our original VT switching bug is not fixed even if the patch is applied?

I'm now trying to apply some of the new DRM patches to the rc6…
Comment 52 Romano Giannetti 2008-12-01 07:17:42 UTC
I tried it. For me v2.6.28-rc6-184-gd9d060a is working ok: VC switching before and after S2R, and S2R itself are ok. I didn't try S2D (since ages). 
Comment 53 Rafael J. Wysocki 2008-12-01 12:39:32 UTC
Guillaume, did you test the Linus' tree, actually?
Comment 54 Bernhard Schmidt 2008-12-01 16:06:50 UTC
current git head works fine for me, both for VC switching and S2R
Comment 55 Guillaume Ayoub 2008-12-02 01:38:54 UTC
I confirm that VT switching, for me:

- 2.6.28-rc6 does not work
- 2.6.28-rc6 + "patch" works
- 2.6.28-rc7 does not work

By "patch", I mean the patch posted here, committed to the Linus' tree with label "drm: move drm vblank initialization/cleanup to driver load/unload".

The code causing my (new?) bug is in the commit "drm/i915: Manage PIPESTAT to control vblank interrupts instead of IMR". rc6 + "drm vblank init patch" works. rc6 + "drm vblank init patch" + "pipestat patch" does not work.

Sorry :)
Comment 56 devsk 2008-12-03 12:41:01 UTC
The lockup has become so annoying and for me happens mostly when logging out of KDE, that I have decided to just go without i915 for now.
Comment 57 Guillaume Ayoub 2008-12-03 15:59:55 UTC
Well, well, well… Quite good news!

2.6.28-rc7 does not work for me, X "freezes" after a VT switch (including suspend). But here is a little patch transforming the freeze to a hang up. I have to wait about 1 minute (actually between 30 and 100 seconds, random) with a black screen and movable cursor, then the screen is redrawn and everything works fine.

S2R works the same, I suppose that S2D too. I can switch again and again, as long as I wait 1mn each time.

The lines added in the patch had been removed between rc6 and rc7.

I don't know why it works, don't ask me (I'm just damned lucky). This is not a solution, just an ugly workaround. I'd be glad if an intel-guru-like-guy found a real solution, I can give anything to help (dmesg, Xorg logs, credit card number, hardware config, whatever).
Comment 58 Guillaume Ayoub 2008-12-03 16:02:05 UTC
Created attachment 19135 [details]
2.6.28-rc7 freeze2hang patch
Comment 59 Guillaume Ayoub 2008-12-03 17:16:14 UTC
Last two things:

- S2R works but S2D gives me xorg freeze with monitor-and-blue-cross icon or kernel panic.

- @Romano: you related a 2 minutes delay in comment #29, that sounds like my situation now. Do you still have this delay?

I need to sleep a little bit…
Comment 60 Romano Giannetti 2008-12-04 02:50:18 UTC
For me it works ok with v2.6.28-rc7-105-gfeaf384 (note that I skipped the nasty watchdog bug that creeped into -rc7).

@Guillaume: no, no delay now, since the vblank patch. 

Comment 61 Guillaume Ayoub 2008-12-07 14:22:26 UTC
I'm tired… 2.6.28-rc7 works very well with metacity (2.24, composite) instead of compiz. VT switch and S2R are OK. I'm not sure that my bug is a kernel bug. I'll test compiz 0.8 later, I keep metacity now.

Thanks a lot for your help. We may close that bug, resolved with the vblank patch in rc7.
Comment 62 Romano Giannetti 2008-12-08 13:08:34 UTC
Hi,

I think that, on one hand, the original problem is solved for me; on the other hand, if Guillaume has a configuration that works on 2.6.27 and fails now, it's a regression in y opinion. So... I think Jesse and Rafael will decide on it, but if we close this bug, we should open another one for x+compiz+2.6.28. 

Just my 2 cents. 
Comment 63 Rafael J. Wysocki 2008-12-08 13:12:30 UTC
I'm going to close it now, and Guillaume, if the problem you're observing with compiz turns out to be a kernel issue, please open a separate bug entry for it.

Note You need to log in before you can comment on or make changes to this bug.