Bug 27062

Summary: [855GM] fb console scrolling anomaly and slow
Product: Drivers Reporter: Ferenc Wágner (wferi)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Severity: normal CC: chris, daniel, florian, jbarnes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: screenshot before suspend
screenshot after supending the devices
screenshot during resume with screen corruption (partial scrolling) in the upper right part
screenshot after resume and VT switchback (about to start the next iteration)
console corruption during resume
scrolling the picture left with the keyboard in the Geeqie image viewer
the final (stationary) result after scrolling the picture to extreme left

Description Ferenc Wágner 2011-01-19 10:33:01 UTC
Since upgrading my user space (including Xorg drivers) to Debian squeeze, my previously perfectly working 2.6.36 kernel became pretty much unusable because of frequent crashes (#26042, #26582). Upgrading to vanilla 2.6.37 did not help. The only problem I can reliably reproduce is display corruption during the suspend-resume cycle. During bootup I load i915 with modeset=1. Then if I do

# cd /sys/power
# echo devices >pm_test
# while echo mem >state; do sleep 1; done

the alternate VT detailing the process gets corrupted during scrolling: the last part of some lines stays in place until the VT switchback, as illustrated by the attached screenshot series. This happens on each cycle, the loop above was only for getting good screenshots.

The machine is an IBM ThinkPad R50e with an Intel 855GM chipset.
I'm running with no_console_suspend and suspend test delay decremented to 1 second, FWIW.
Comment 1 Ferenc Wágner 2011-01-19 10:34:23 UTC
Created attachment 44152 [details]
screenshot before suspend
Comment 2 Ferenc Wágner 2011-01-19 10:35:35 UTC
Created attachment 44162 [details]
screenshot after supending the devices
Comment 3 Ferenc Wágner 2011-01-19 10:37:32 UTC
Created attachment 44172 [details]
screenshot during resume with screen corruption (partial scrolling) in the upper right part
Comment 4 Ferenc Wágner 2011-01-19 10:39:45 UTC
Created attachment 44182 [details]
screenshot after resume and VT switchback (about to start the next iteration)
Comment 5 Florian Mickler 2011-03-04 22:37:40 UTC
Is there a modesetting capable kernel which does not exhibit these problems? 
(>v2.6.29 are candidates, but I guess for testing I would actually go backwards from 2.6.36)
Comment 6 Ferenc Wágner 2011-03-07 18:37:28 UTC
Created attachment 50262 [details]
console corruption during resume

Under 2.6.38-rc7+, I can't observe the scrolling corruption anymore, but this other type of corruption is still very prominent. It's also not new, appeared together with the scrolling corruption before, but the other was easier to capture, probably because I'm testing with a monolithic kernel now, and the ATA driver halts the resume process for a second before flipping back to the original framebuffer contents. This picture is fully reproducible by stopping in the initramfs phase, then

(initramfs) cd /sys/power
(initramfs) echo core >pm_test
(initramfs) echo mem >state

However, further echo mem >state operations don't show the issue, only the first.
Comment 7 Ferenc Wágner 2011-03-07 18:48:50 UTC
Created attachment 50272 [details]
scrolling the picture left with the keyboard in the Geeqie image viewer

Possibly related: submitting attachment 50262 [details] I noticed display artifacts (white vertical lines) during scrolling the zoomed fullscreen image in the Geeqie image viewer program (an X application). This screenshot was taken while actually scrolling the image, thus the apparent tiling.
Comment 8 Ferenc Wágner 2011-03-07 18:53:26 UTC
Created attachment 50282 [details]
the final (stationary) result after scrolling the picture to extreme left

This is the end result, after the view hit the left border of the image. The increasing spacing of the vertical lines corresponds to the acceleration of the scrolling.
I reckon this may as well be an application bug, just showing here in case it's related.
Comment 9 Ferenc Wágner 2011-03-10 10:44:24 UTC
(In reply to comment #5)
> Is there a modesetting capable kernel which does not exhibit these problems? 
> (>v2.6.29 are candidates, but I guess for testing I would actually go
> backwards
> from 2.6.36)

I went the other way, and found that 2.6.30 and 2.6.29 definitely exhibit this problem. (Since suspend was broken on my machine before b690e96c, I could test these only by inserting a break statement after /* Next, disable display pipes */ in intel_display.c.)  I didn't test 2.6.31-36, but wouldn't expect better.

I also tested 79e53945 (DRM: i915: add mode setting support), which didn't even switch into a high-res framebuffer mode, but still exhibited the problem. Then I thought maybe it's not related to i915 at all and disabled CONFIG_DRM_I915, but then I didn't even got my original console back after resume -- but I didn't see screen corruption either. (Blindly issuing the reset command rebooted the machine all right.)
Comment 10 Daniel Vetter 2012-03-25 14:00:29 UTC
Plese retest with 3.3, 2.6.37 is rather ancient in graphics-land.
Comment 11 Ferenc Wágner 2012-03-27 08:05:27 UTC
Yes, under 3.3 I can still see console corruption in the timestamp area during resume before the original screen contents are restored, and scrolling in Geeqie still leaves white lines behind.
Comment 12 Jesse Barnes 2012-04-18 21:41:17 UTC
Ooh 855, I bet Chris wants this one. :)
Comment 13 Chris Wilson 2012-04-18 21:46:49 UTC
Oh crikey. Can you turn on all the memory debugging options under kernel hacking and try testing with nomodeset? The former because there is nothing special about the fb at that time, it is just a coherent mapping onto the scanout so I presume something is going badly in memory. The latter will help confirm if it i915.ko that is at fault.
Comment 14 Ferenc Wágner 2012-04-23 09:11:06 UTC
The good news: 3.3.1 fixed the console corruption issues during STR resume!

And the bad: the permanent corruption after scrolling the fullscreen image in Geeqie stayed with me even under 3.3.3. Interestingly, scrolling down and left works fine, while scrolling up and right introduces occasional white lines. Also, scrolling left via the keyboard is unbearably slow.

(Am I supposed to change the bug status now?)
Comment 15 Chris Wilson 2012-04-23 09:24:29 UTC
I think the scrolling bug will be fixed (by removing the ability to scroll!) with

commit 62fb376e214d3c1bfdf6fbb77dac162f6da04d7e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Mar 26 21:15:53 2012 +0100

    drm: Validate requested virtual size against allocated fb size
    mplayer -vo fbdev tries to create a screen that is twice as tall as the
    allocated framebuffer for "doublebuffering". By default, and all in-tree
    users, only sufficient memory is allocated and mapped to satisfy the
    smallest framebuffer and the virtual size is no larger than the actual.
    For these users, we should therefore reject any userspace request to
    create a screen that requires a buffer larger than the framebuffer
    originally allocated.
    References: https://bugs.freedesktop.org/show_bug.cgi?id=38138
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: stable@kernel.org
    Signed-off-by: Dave Airlie <airlied@redhat.com>

which hit upstream in v3.4-rc2 and 3.3.2 and 3.3.3. Instead that explains the speed, Geeqie is doing a full transfer for every scroll, and also suggests where the root cause is. We don't intend to accelerate fbdev as it is a must not fail output for panics. Is there a reason why you need to do it this way?
Comment 16 Ferenc Wágner 2012-04-23 11:49:43 UTC
I'm afraid I created some confusion here. This report was mostly about the framebuffer console corruption during STR resume, which is present in 3.3 but fixed in 3.3.1. Meanwhile I added some info about screen corruption under Geeqie, but I run Geeqie under X.Org 1.7.7 with intel driver module version 2.13.0, not on the framebuffer. So this part may well be a totally unrelated userspace bug, but I can't judge this.

Let me add one more detail. I'm running with two monitors now, xrandr reports:

Screen 0: minimum 320 x 200, current 1920 x 1848, maximum 2048 x 2048
LVDS1 connected 1024x768+0+1080 (normal left inverted right x axis y axis) 0mm x 0mm
   1024x768       60.0*+   85.0     75.0     70.1     60.0*    43.5  
VGA1 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 477mm x 268mm
   1920x1080      60.0*+

And if I switch Geeqie into fullscreen mode on LVDS1, the top row of pixels leaks into the bottom row of VGA1. This happens similarly in other screen configurations, too. Also Mplayer leaks a single row of overlay-blue into the other screen if I switch info fullscreen. So Geeqie isn't alone at least. May this be an X driver bug?
Comment 17 Chris Wilson 2012-04-23 11:54:59 UTC
The top line wrapping around onto the bottom of the display is still likely to be an issue in the kernel modesetting. On the other hand, the Geeqie corruption definitely sounds like a xf86-video-intel driver bug; 2.13.0 is quite old and a number of corruption bugs have been fixed since then.
Comment 18 Chris Wilson 2012-04-23 11:55:28 UTC
(And if you have an example of a slow X application, I also want to hear about it :)
Comment 19 Chris Wilson 2012-04-23 11:57:35 UTC
Ferenc, if you are happy we have the fb corruption resolved, can you open a new bug for the bad modesetting (and if you want to try fresh xf86-video-intel drivers, a bug for Xorg Driver/Intel on bugs.freedesktop.org), and lets close this particular bug.
Comment 20 Ferenc Wágner 2012-04-23 12:08:30 UTC
Sure, the console is all right now, closing this bug.

I'll look into the remaining issues; trying a newer Xorg driver isn't entirely off the plate, so let's start with that.
Thanks all for your assistance!