Created attachment 45112 [details] black stripes over a pidgin conversation. happens since 2.6.38-rc KMS is enabled. running a dualscreen display. Name : xf86-video-intel Version : 2.14.0-1 Name : mesa Version : 7.10-1 Name : xorg-server Version : 1.9.3.901-1 when clicking on the window, the corruption goes away. im attaching two screenshots. the stripes appear to be pixels from other portions of the screen. this is running compiz. havent tested without it, but im doing it now
Created attachment 45122 [details] white stripes over a pidgin conversation forgot to mention: blue boxes are intentional, for privacy reasons. screen below them was correctly rendered sometimes, the screen gets stretched vertically, where some additional 1 pixel width lines get inserted in between (as if being interlaced)
Created attachment 45202 [details] the google logo rendered by firefox.
Looks like a tiling ddx/mesa bug.
should i file this bug somewhere else?
You haven't given much to go on. With an unspecified kernel version and no log files, you could easily be reporting one of the short-lived regressions before rc1 or something much more sinister. If it is indeed a kernel regression, then it would have been trivial to bisect...
will try to bisect this and report back. bisecting before rc1 has always proven unfruitful to me in the past.
rc1 and rc2 have the same issue. i will try to bisect from 2.6.37 to 2.6.38-rc1 but its quite hard to catch it (have to use the system for a while for the bug to show itself).
bisected this between 2.6.37 and 2.6.38-rc1 came down to this: ---------------- 6bda10d152735c22baf1dcd92937420b4b0a359a is the first bad commit commit 6bda10d152735c22baf1dcd92937420b4b0a359a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Dec 5 21:04:18 2010 +0000 drm/i915: Completely disable fence pipelining. I'm still seeing tiling corruption of PutImage and CopyArea (I think) under mutter on pnv, so obviously the pipelining logic is deeply flawed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> :040000 040000 379a8302a494aee383969f5cefe5a166ac040a47 a15f4dce648440a8445da8c784ef7fa44a99ad42 M drivers -------------------------- ive reverted this on 2.6.38-rc1 and artifacts appear to be gone. does this make any sense?
its a regresion
I doubt your bisection. That patch does exactly what it says and restores the behaviour to 2.6.37.
right now im testing rc4 with the patch reverted, and things work ok. i will build it without revering and report back. but im almost sure of the outcome :( testing rc1 with the patch reverted did indeed remove the artifacts for me (?????)
Created attachment 47742 [details] slashdot page with artifacts. kernel 2.6.38-rc4 and it didnt take me long to grab another example. rc4 without the patch reverted. i dont know what triggers this, but reverting this patch fixes the issue for me. is there anything else i can try to get to the bottom of this? scrolling the webpage makes them go away and come back.
Chris, Ive been testing 2.6.37 and found no screen corruption. is it possible that another commit in conjunction with the one from comment 8 be triggering this? is there a way i can test this theory out?
FWIW, I had seen problems in updating parts on the screen. For example just trying to scroll back in rxvt would cause only parts of rxvt window to be redrawn (usually some 100 pixels at the top and always the last line), but the middle of the windows stayed at the old contents. Same for YouTube videos in firefox (with Adobe's plugin). After reverting the patch I couldn't reproduce the problem yet.
Strangely, I also cannot reproduce the problem without reverting the patch but when using compiz. Probably because all the graphic output goes through the compositor and takes a differen path in the graphics code.
I suspect you're hitting the same problem as me. Please try Daniel's test patch: https://lkml.org/lkml/2011/2/20/25 diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 17bd766..d275c96 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -762,9 +762,6 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_BLT: value = HAS_BLT(dev); break; - case I915_PARAM_HAS_RELAXED_FENCING: - value = 1; - break; case I915_PARAM_HAS_COHERENT_RINGS: value = 1; break; If that does not totally fix it, try his other patch too.
This patch can't be fix for anything! It just leaves "value" uninitialized if "param->param" set to I915_PARAM_HAS_RELAXED_FENCING. gcc complains about uninitialized "value", BTW.
that patch is not supposed to be a fix. but a workaround test to figure where we are failing.
ah. That's why the variable was left at random value...
Whatever the reasons: it does not help at all. I initialized the value with 0, just to be the opposite to the removed case. Or because zeros happen relatively often in uninitialized on-stack variables?
the patch does not fix the issue for me. :( gonna test the first one next
(In reply to comment #20) > Whatever the reasons: it does not help at all. I initialized the value with > 0, > just to be the opposite to the removed case. Or because zeros happen > relatively > often in uninitialized on-stack variables? Alex, you missed that it removes the whole case, not just the value setting. Initializing to 0 does not help, because the userspace code only checks the return value, not the actual value set. You have to remove the I915_PARAM_HAS_RELAXED_FENCING case. Tomas, too bad it doesn't work for you. Your corruption looks different than the one I get, but it was worth a try.
Indan, you missed that the variable values becomes UN-FSCKING-initialized!
Read the code. 'value' isn't used if the param is unknown, because it goes to the default case which return EINVAL. So it's impossible to use it uninitialised, whatever gcc says. And if you read the userspace code which actually uses this ioctl, the damn 'value' isn't even used, it just checks the function return value (which is a bug in it self, but oh well). Anyway, I discovered I have lots of screen corruption when running without xcompmgr, so there are even more problems, though it seems caused by user space bugginess.
I see. You should have just replaced the "break;" in that case with "return -EINVAL;" and your intention would be immediately obvious. I re-tested this (with "return -EINVAL;") on my laptop and could not reproduce any picture corruption running without compiz compositor. Chris? Is this change any better than reverting that "disable fence pipelining" commit? If so, could we have it (or something which you deem better) submitted for v2.6.38? Just now 2.6.38 is almost unusable on this chipset.
It's Daniel's patch, not mine. And his original patch set value = 0, which is clear, but as mentioned before, the userspace code ignores that and only checks the return value. It's just a quick patch trying to narrow down the source of the problem, the patch was called "for-indan-2.patch". Daniels is working in his spare time on the Intel code, supposedly for fun ;-), and I'm just a user like you who tries to keep things working for myself. But anyway, it would be nice to have some real fix before the release, or if the source isn't found, at least a temporary hack to disable the faulty code. I'll test with "disable fence pipelining" reverted and tell if it makes a difference for me.
I tried plain 2.6.38-rc6 with "disable fence pipelining" reverted and I still got the screen corruption, so reverting 6bda10d152735c22baf1dcd doesn't work for me.
But returning EINVAL for I915_PARAM_HAS_RELAXED_FENCING does?
Yes.
Tomas, does returning -EINVAL for I915_PARAM_HAS_RELAXED_FENCING helps in your case? (Just replace the "break;" in the case with "return -EINVAL;").
Tomas, what hardware do you have? Is it a gen 2 or 3 chipset? If so, then perhaps we can use this patch for 2.6.38, assuming the fence stuff actually works for someone: diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 17bd766..d70edcd 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -763,6 +763,8 @@ static int i915_getparam(struct drm_device *dev, void *data, value = HAS_BLT(dev); break; case I915_PARAM_HAS_RELAXED_FENCING: + if (INTEL_INFO(dev)->gen <= 3) + return -EINVAL; value = 1; break; case I915_PARAM_HAS_COHERENT_RINGS: But I think it's safer to remove the I915_PARAM_HAS_RELAXED_FENCING till it got sorted out properly. Alex, Tomas already reported that that patch doesn't help his case.
hi Indian, ive got a 945GMA (gen3?) im testing Daniel's first patch and it fixed my issues. now im going to test this last patch you proposed here with -EINVAL to see if it makes a difference. will report back when im done with it
and i spoke too soon. the first patch doesnt fix it either :(
hi Thomas, Daniel has a real fix at: http://lists.freedesktop.org/archives/dri-devel/2011-February/008658.html Can you give that a try? Are you using X composition? If not, can you try with xcompmgr -a and see if that improves things for you? If it does, it might be the same bug I'm hitting. Do you get it without compiz? Can you also try an older intel X driver, 2.12 or 2.13 or so? Lastly, can you go back to a clean 2.6.38-rc6 kernel, test it out with and without 6bda10d152735c22baf1dcd92937420b4b0a359a reverted and make really sure that reverting that fixes it for you.
> --- Comment #34 from Indan <indan@nul.nu> 2011-02-23 01:53:25 --- > Daniel has a real fix at: > http://lists.freedesktop.org/archives/dri-devel/2011-February/008658.html You can safely ignore this patch. It only fixes a very special corruption due to relaxed tiling. This kind of corruption manifests itself in garbage in the lower-left corner of pixmaps (think ui elements) if and only if the height rounded up to the next multiple of 8 is not a multiple of 16. Your corruptions look different. [lower-left corner means: at most 8 pixels high, at most half the width of the total pixmap]
Indian, ive tested 2.6.38-rc6 and artifacts appeared after several hours of usage. im testing 2.6.38-rc6 - 6bda10d152735c22baf1 and i havent seen artifacts yet. but as you stated, its much easier to detect a bad build than a good one, and considering the doubt Chris placed on my bisection, im going to test this more to be sure. after this test, i will downgrade userspace and check results thanks for the followup on this
i did not see the artifacts with 6bda10d15 reverted. went back to a clean rc6 with xf86-video-intel 2.13 and have not seen the artifacts there either. will keep on testing some more. next, i will try to bisect the xorg driver but last time i tried to bisect xf86-video-intel, it was impossible (too many changes produced too many shortlived bugs).
Tomas, if you can't easily trigger a bug then don't bother bisecting. But in this case if reverting 6bda10d15 fixes it for you, then it should be a kernel bug. Chris seems to be working on it, let's wait till he comes with a fix, because reverting 6bda10d15 apparently causes other bugs.
Indian, thanks for the input. its difficult as in: it doesnt show 100% of the time and in the same test case. but using the computer for half an hour has always shown it. ive been extra careful to bisect this, testing working kernels days at a time. now im actually running rc6 clean with xf86-video-intel 2.13 and it seems to work. its definately something interwined between the xorg driver and the kernel. and of course. im always open to test patches, use cases, etcetera. im sure reverting 6bda10 causes other bugs, the commit itself was introduced to revert to previous behaviour just because of that. but in the mix...something broke here and thats what i wish to find out ;)
after testing for several days. the bug has appeared. i dont know what else to do. will revert back 6bda10 and test more exhaustedly.
Are you sure it's the same corruption, or is it slightly different? It could be another bug.
the only difference was the time it took for it to appear.
I noticed the graphic glitches in 2.6.38 final with my Intel GMA3150. An updated version of the patch in comment #31 seems to fix the problem for me.
Created attachment 51002 [details] updated version of the patch in comment #31 for 2.6.38 stable
Daniel's patch that fixed the problem from comment 31 in upstream was reverted because it caused problems for some gen >4 hardware. It's fixed in userspace with libdrm 2.4.24 and xf86-video-intel newer than 2011-02-22 instead. This is unrelated to tomas's problem. Richard, I think GMA3150 is newer than gen 3, so that patch shouldn't change anything for you... Tomas, can you confirm that with 6bda10 you don't get any corruptions?
> --- Comment #45 from Indan <indan@nul.nu> 2011-03-17 00:11:41 --- > Richard, I think GMA3150 is newer than gen 3, so that patch shouldn't > change > anything for you... GMA3510 is a gen3 device (codename g33/pineview in the source). It's essentially a i945 with a bigger gtt and hw virtualization support. Only GMA devices with the X in the marketing name are gen4 and later (e.g GMA X3100).
(In reply to comment #46) > GMA3510 is a gen3 device (codename g33/pineview in the source). It's > essentially a i945 with a bigger gtt and hw virtualization support. Only > GMA devices with the X in the marketing name are gen4 and later (e.g GMA > X3100). I grepped the source, but didn't find any mentioning of GMA, nor 3510, but a quick internet search showed that it was an Atom, so I expected it would be "new". Intel marketing is a nightmare to navigate. If you go to http://intellinuxgraphics.org/documentation.html there's no mapping from the model names to the docs or chipset generation. Closest thing I found is i915_drv.c, but that uses other code names too. :-(
> Tomas, can you confirm that with 6bda10 you don't get any corruptions? if you mean, with 6bda10 reverted, then yes. ive been running rc6 rc7 rc8 and final with 6bda10 reverted and i have seen no corruption. i did try the final release without reverting 6bda10 and the glitches did appear.
I'm seeing glitches in Firefox, urxvt and awesome (those I use most often) too. Distro is ArchLinux, hardware is Thinkpad x60s. packages: xorg-server 1.9.4.901-1 xf86-video-intel 2.14.0-3 libdrm 2.4.23-2 intel-dri 7.10.1-1 hardware: 945GM
Damjan, have you tried reverting 6bda10 ? could you post your results?
Created attachment 53312 [details] Image corruption with 2.6.32 I wanted to check if the corruption was happening with the Debian 2.6.32-5-amd64 kernel, too. Before upgrading to Debian's Squeeze release I haven't experienced these issues although I had been using 2.6.32+27~bpo.5+1. So I suspect there could be some user space code that triggers the bug.
Is this the same bug as bug #26002 ?
On Fri, Apr 15, 2011 at 08:34:03AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #52 from Eddy Petrișor <eddy.petrisor+linbug@gmail.com> > 2011-04-15 08:33:58 --- > Is this the same bug as bug #26002 ? Maybe, maybe not. Generally with corruptions and gpu hangs it's better to handle bugs separately until a root cause is clearly identified (with either a fix or at least pointing to the same culprit when it's a regression). There are simply too many ways corruptions can show up and gpus can hang. If you hit something like this it's easiest if you could ask us on irc (#intel-gfx on freenode) - it's much quicker to triage such things interactively. -Daniel
The problem still persists in 2.6.39-rc4 (it look like it even worse since 2.6.38). A "man wmii" (probably any long enough page) in urxvt is absolutely unusable after paging back and forth a few times. After the artifacts (all over urxvt window) appear, the windows starts blinking, roughly once in 5 seconds. To confirm it is not wmii, I tried twm with almost the same effect: the blinking only happens once, after ~10 seconds since last attempt to change window contents. I used a very minimal environment, just X, window manager, urxvt and man. As before, I couldn't reproduce the situation exactly in same way under a full desktop environment (an XFCE desktop, with and without Compiz), but there were small quirks here and there: a missing pixel, sluggish or jerky scrolling in a terminal: just do a long listing ("find /"), than start scrolling the terminal up one line at a time. After a while scrolling pauses. If I leave the buttons now, the window will be update after a while (but to something slightly shifted up or down for half a symbol or so). And I also get these: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung many times in a row. Sometimes they coincide with really weird display (kind of expected).
On Tue, Apr 19, 2011 at 10:37 PM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Your gpu hung. Yes, we're doing quite some work to make this non-fatal by resetting the chip (and if that fails switching to sw rendering). But it's kinda expected to get graphics corruptions when that happens. The kernel should dump an error state in the file i915_error_state in debugfs. Can you please rehang your gpu and attach that?
(In reply to comment #55) > On Tue, Apr 19, 2011 at 10:37 PM, <bugzilla-daemon@bugzilla.kernel.org> > wrote: > > [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung > > Your gpu hung. The case when "GPU hung" happens is VERY rare. What about all the "scrolling in urxvt" cases? > The kernel should dump an error state in the file i915_error_state in > debugfs. Can you please rehang your gpu and attach that? Will do, but as I said, such cases are rare.
just reporting i cannot reproduce this in the 2.6.39-rc series tbh, tested since rc4 cause i could not boot before that.
2.6.39-rc7: scrolling still stalls.
The "scrolling stalls" bisected to a7a09aebe8c0dd2b76c7b97018a9c614ddb483a5: drm/i915: Rework execbuffer pinning Avoid evicting buffers that will be used later in the batch in order to make room for the initial buffers by pinning all bound buffers in a single pass before binding (and evicting for) fresh buffer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> I am not able to simply revert it, the code is heavily modified and moved into a file of its own.
To Alex Riesen and the scrolling stalls: Please file a separate bug for that, preferrably on bugs.freedesktop.org. To everyone else: This bug report got out of hand and mixes too many issues. If you still have hangs, graphics corruptions and other issues on the latest code, please file a new bug report. Thanks everyone for reporting issues.
Haven't seen the scrolling bug for a good while. Will file a report on freedesktop.org if see it again.