Bug 10620

Summary: VT switching broken - X does not resume (intel chipset)
Product: Power Management Reporter: Romano Giannetti (romano.giannetti)
Component: Hibernation/SuspendAssignee: power-management_other
Status: CLOSED CODE_FIX    
Severity: high CC: acpi-bugzilla, airlied, akpm, jbarnes, rjw, romano.giannetti
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26-rc1-110-ga153063 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 10492    
Attachments: Config
lshw output
good suspend/resume log with 2.6.25.1
Log of two failed resume with new kernel

Description Romano Giannetti 2008-05-08 06:53:01 UTC
Latest working kernel version:2.6.25.1
Earliest failing kernel version:2.6.26-rc1
Distribution:Ubuntu 8.04 Hardy
Hardware Environment:see attached lshw.txt
Software Environment:xserver-xorg-video-intel  2:2.2.1-1ubuntu12
Problem Description:

Suspend to ram, using the distribution scripts or just echo mem > /sys/power/state. 

On resume, the screen stays blank (most of the time), with just the cursor shown (sometime you can move it, sometime no). The system is nevertheless working. Sometime you can recover by hitting ctr-alt-backspace and reenter X, sometime not.
 
I can't test on console because after X starts, console is busted for me (that's not a regression, it has been like that since at least 2.6.23, and it was decided it was a x driver problem, but it's still not fixed for me...
see https://bugs.launchpad.net/bugs/182865 ) 

I attach .config, lshw output, and two syslogs. syslog_2.6.25.txt shows a successfull suspend/resume cycle in 2.6.25.1. 

syslog_new.txt shows a boot of the new kernel, then a suspend/resume with ended with a blank video, resumed with a ctrl-alt-backspace, and finally a new suspend/resume ended with a blank screen that could not be resumed by a ctrl-alt-backspace, so sysrq-b was needed.

Strange things I see: 

There is a WARN_ON at line 1485 of the log (sysdev_driver_register()) but seems unrelated.

A lot of "ACPI handle has no context!" during suspend/resume

A very scary "trying to get vblank count for disabled pipe 0" that smells as a possible guilty failure...
Comment 1 Romano Giannetti 2008-05-08 06:54:38 UTC
Created attachment 16066 [details]
Config
Comment 2 Romano Giannetti 2008-05-08 06:55:02 UTC
Created attachment 16067 [details]
lshw output
Comment 3 Romano Giannetti 2008-05-08 06:56:04 UTC
Created attachment 16068 [details]
good suspend/resume log with 2.6.25.1
Comment 4 Romano Giannetti 2008-05-08 06:57:03 UTC
Created attachment 16069 [details]
Log of two failed resume with new kernel

There are comments in the log, search for "*****"
Comment 5 Andrew Morton 2008-05-08 10:20:45 UTC
Oh Dear.

Len, Rafael, Dave: I'm not even sure which subsystem might have caused
this.  Perhaps acpi?  Can you take a look, please?

Thanks.
Comment 6 Romano Giannetti 2008-05-08 10:35:22 UTC
Maybe someone on the intel group could be Cc:ed... Jesse for example. He was quite helpful when trying to solve bug#10319 (which I feel could be related, but that's just a feeling...)
Comment 7 Jesse Barnes 2008-05-08 12:04:19 UTC
Can you try reproducing this with Dave's latest DRM patch applied?  It reverts the new vblank code, which may be the culprit...
Comment 8 Romano Giannetti 2008-05-08 14:05:24 UTC
On Thu, 2008-05-08 at 12:04 -0700, bugme-daemon@bugzilla.kernel.org
wrote:
> Can you try reproducing this with Dave's latest DRM patch applied?  It
> reverts
> the new vblank code, which may be the culprit...

It's in current -git? Otherwise, can you point me to the patch?

Thanks,
Comment 9 Romano Giannetti 2008-05-09 02:33:28 UTC
v2.6.26-rc1-279-g28a4acb: nothing has changed. Still buggy.
Comment 10 Romano Giannetti 2008-05-09 13:50:08 UTC
Maybe-related-to: http://lkml.org/lkml/2008/5/8/378
Comment 11 Rafael J. Wysocki 2008-05-09 13:56:49 UTC
Regressions list annotation:
References : http://lkml.org/lkml/2008/5/8/378
Comment 12 Theodore Tso 2008-05-14 13:55:57 UTC
Note that my problem was solved with Hugh's patch here: http://lkml.org/lkml/2008/5/13/188

My symptoms were quite different from what he experienced (and what is described in this bug report), but I booted one kernel that didn't have his patch, and could reproduce the problem, and another kernel where the only difference was the application of his patch, and my problem with the X server ignoring keyboard/button input shortly after a suspend/resume, and subsequent restarts of the X server totally malfunctioning, went away after application of his patch.

So you might want to try to see if this patch solves your problem too.  (My system was an X61s with an Intel video chipset).
Comment 13 Rafael J. Wysocki 2008-05-15 14:42:35 UTC
Romano, can you test 2.6.26-rc2-git5 when it's out, please?
Comment 14 Anonymous Emailer 2008-05-16 17:52:59 UTC
Reply-To: romano@dea.icai.upcomillas.es


> Romano, can you test 2.6.26-rc2-git5 when it's out, please?

as soon es I can. I'm in the middle of a flight sitting in the floor in
Vancouver airport... :-) 
Comment 15 Romano Giannetti 2008-05-18 10:48:05 UTC
Tested with today git, v2.6.26-rc2-433-gf26a398, which as far as I know it's 
-git5 or better. 

No joy: the things are maybe worst. After resume I have the same symptom than before (black screen with the pointer in the upper left corner). If I can recover the system with ctrl-alt-backspace, now X does not recognize any more the card (I have a dialog saying that the card is not recognized, and ask me if I want to run 
the laptop in 640x480 mode).

As a side note, I run i915resolution at boot. Should I try to let this out? Last time I tried it was necessary to have 1280-wide display.
Comment 16 Romano Giannetti 2008-05-19 02:04:17 UTC
Nice. I discovered that you do not need a suspend/resume: just switching VC can cause the exact same symptoms.

To reproduce, I simply switch to VC-1 (is busted with this X driver, I have to use "setupcon" to have it works ok), and then switch back to the X VC. Black screen. If I then switch again to VC 1, there is a 10 second delay, then the X screen flashes on for a moment, and I have the console back. Killing X make it restart ok, but then the same pattern is on.

So: the problem is VC switching. 

Time to start a bisect run? I will try to downgrade the X driver as per 
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/182865/comments/19 , then I'll start bisect. 
Comment 17 Romano Giannetti 2008-05-19 04:27:26 UTC
1888 revs to go... I noticed that I have PAT=Y in my .config. Could be a hint? Should I try with PAT=n?

Bisecting...
Comment 18 Romano Giannetti 2008-05-19 06:22:46 UTC
Uf, seems more complex than ever. I have hit two times a build error...

ERROR: "__locks_copy_lock" [fs/lockd/lockd.ko] undefined!                       WARNING: modpost: Found 14 section mismatch(es).                                To see full details build your kernel with:                                     'make CONFIG_DEBUG_SECTION_MISMATCH=y'                                          make[1]: *** [__modpost] Error 1                                                make: *** [modules] Error 2 

I'm trying to jump around to see if I can get away, but it seems quite nasty.
 
Comment 19 Romano Giannetti 2008-05-19 07:13:02 UTC
Hmmm. Additional thing: the black screen, switching VT or doing s2ram, happens only if gnome is started. If the X screen displayed is the gdm one, no problem. 

Uff. Now I know that a workaround for the lockd buck is to compile in lockd, will do. 648 revs to go.

Is there anybody out there :-)? 
Comment 20 Romano Giannetti 2008-05-19 10:35:26 UTC
...maybe it's the most painful bisect ever...

It failed to compile in two other points...

Now at  10c993a6b5418cb1026775765ba4c70ffb70853d (which is good), compiling the
kernel started to corrupt files with a 0xf0 in it... I rebooted on a known well
kernel, git reset --hard, and started again to compile. Will see. 
Comment 21 Romano Giannetti 2008-05-19 14:17:00 UTC
Ok, I do not think that I can make much more than this... not this night at least. I have restricted this like that: 

good 50704516f334d5036c09b0ecc0064598f7c5596f
bad  d9c04d678418fe42646de641f499209ca00fd94f

but then 2c14f28be2a3f2a2e9861b156d64fbe2bc7000c3 makes my laptop oops on boot. I have to sleep now... the log is promising (all video related things):

d9c04d678418fe42646de641f499209ca00fd94f Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
4d9c55e44336602f8b2880b972fb55f67bc51dd0 Merge branch 'drm-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
09aa356b5584090aab6810ec8002936d710cd4ac agp: convert drivers/char/agp/frontend.c to use unlocked_ioctl
4ab92bcf773e7b9e1367897047d5fa4d151d9e90 agp: fix shadowed variable warning in amd-k7-agp.c
b74e2082f8e7b8f37af3fc39e8ee0dd0d218c589 drm: _end is shadowing real _end, just rename it.
ac741ab71bb39e6977694ac0cc26678d8673cda4 drm/vbl rework: rework how the drm deals with vblank.
2c14f28be2a3f2a2e9861b156d64fbe2bc7000c3 drm: reorganise minor number handling using backported modesetting code.
7b832b56bd971348329c3f4c753ca0abfdf3a3d1 drm/i915: Handle tiled buffers in vblank tasklet
a36b7dcc05bc4c4580f11cf78e95edfefa86b8a6 drm/i965: On I965, use correct 3DSTATE_DRAWING_RECTANGLE command in vblank
f1c3e67eb73a4a1db31e235883156ac098e29ff6 drm: Remove unneeded dma sync in ATI pcigart alloc
5ff64611333fd282793ff8997e02138aa2f6aab9 drm: Fix mismerge of non-coherent DMA patch

I hope this helps. 

See you tomorrow...
Comment 22 Diego Calleja 2008-05-19 14:44:45 UTC
Notice that git bisect accepts a path, so that it will only bisect between the commits that affect to a given path. If you are sure that it's a drm-related thing, you can try to do bisection on drivers/char/drm
Comment 23 Romano Giannetti 2008-05-19 23:39:28 UTC
On Mon, 2008-05-19 at 14:44 -0700, bugme-daemon@bugzilla.kernel.org
wrote:
> Notice that git bisect accepts a path, so that it will only bisect between
> the
> commits that affect to a given path. If you are sure that it's a drm-related
> thing, you can try to do bisection on drivers/char/drm

I know. The problem was that I didn't know if this was due to the drm,
video, acpi or x86... so it seems that is quite restricted now.

To resume (see http://bugzilla.kernel.org/show_bug.cgi?id=10620 )

- VT switching is broken after gnome has started (acceleration?)

- the symptom is that the X VT stays black, which just the cursor (and
sometime some residual bit of the panel) shown.

- it happens almost all the time with suspend/resume

- sometime you can kill the X server with ctrl-alt-backspace, sometime
you have to reboot.

My (painful) bisect stopped here:

good 50704516f334d5036c09b0ecc0064598f7c5596f
bad  d9c04d678418fe42646de641f499209ca00fd94f

but then 2c14f28be2a3f2a2e9861b156d64fbe2bc7000c3 makes my laptop oops on boot.

Romano
Comment 24 Romano Giannetti 2008-05-20 03:05:08 UTC
Fixed in v2.6.26-rc3-119-g8033c6e. I'll close this bug. 
Thanks to all!