Bug 60530
Created attachment 106825 [details]
Password prompt screen right after resume (camera shot)
Right after resume the screen doesn't look like in the file attached to Description (this is what ksnapshot saved). The "lock screen" password prompt screen looks like the camera shot (scaled, due to the BZ attachment size limit) attached in comment #1. Created attachment 106826 [details]
Desktop after typing in the password
After typing in the password the password prompt goes away and the desktop looks like this (visible is a corrupted application window).
So far, I haven't been able to reproduce the problem on the IVB-based machine with 3.10.0-rc7, so I'm going to revert commit 19b2dbd and retest. So far, with commit 19b2dbd reverted, I haven't been able to reproduce the problem. Easy enough to check: Please run intel_reg_dumper before and resume. Created attachment 106828 [details]
intel_reg_dumper output before suspend
Created attachment 106829 [details]
intel_reg_dumper output after resume
Created attachment 106830 [details]
intel_reg_dumper output without commit 19b2dbd before suspend
Created attachment 106831 [details]
intel_reg_dumper output without commit 19b2dbd after resume
Without commit 19b2dbd the problem is definitely not reproducible for me (on two different machines). Hmm, I was expecting intel_reg_dumper to include the fence registers. Well, if you have a debug tool that'll give you the information you need and that I can run on 64-bit, please let me know. I've encountered the same problem on an Asus laptop, although the corruption doesn't seem to be quite as bad for me. Reverting 19b2dbd fixes the issue. vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz stepping : 9 microcode : 0x12 Created attachment 106842 [details]
intel_reg_dump before hibernate
Created attachment 106843 [details]
intel_reg_dumper output after hibernate, before standby
Created attachment 106844 [details]
intel_reg_dumper output after hibernate and standby
I had this issue on my Acer Aspire 5750G laptop as well. Vendor ID: GenuineIntel CPU family: 6 Model: 42 Model name: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz Stepping: 7 It also happened on hibernate (suspend-to-disk). I'm using past tense here since, ironically, after installing intel-gpu-tools (no kernel update), the corruption did not occur anymore, neither on standby nor hibernate. If it interests you, I have attached the reg dumps. A remaining problem is that resume seems to break the power management so that the CPU core temperature, normally under 50° Celsius, skyrockets to over 70°C in about 2 minutes, so that I have to restart my computer in order to not endanger the hardware. (Is this a separate bug?) I suspended to disk and then twice to RAM in a row with no problem. This is an Ivybridge desktop PC (Intel (R) Core(TM) i7-3770 CPU @ 3.40GHz) running kernel 3.10. I am using KDE 4.11 beta 2. I read that KDE 4.11 ships some changes to the code handling suspend/resume, don't know if that matters... The overheating issues seemed to be unrelated and appear fixed after updating the 'intel-dri' package to version 9.1.4-3 (Arch Linux). Graphical glitches are still present after resume. So first a call to order: This bug here seems to have caught the attention of google and already gathered a few me-too reports: If you think this describes your issue please check first whether reverting the offending commit (19b2dbde5732170a03bd82cc8bd442cf88d856f7 on upstream) works around it. If that's not the case please file a new bug report. Otherwise we'll quickly have a mess of contradicting reporters ... Now on the bug itself, that looks very much like we've lost track of fences (i.e. the screenshots are rather typical for this, not random garbage). Which is strange since that commit fixes a different case of "lost track of fences". I'll hunt down a few theories and hopefully should have a patch or so soon. Created attachment 106848 [details]
test patch 1: unmap tiled objects from userspace before suspend
Created attachment 106849 [details]
test patch 2: restore fence regs after gem hw init completes
Please test the attached two patches, they should undo what the other patch might have changed accidentally without breaking that fix. Patch 2 has imo the higher chances to work out.
Ok, I've pushed an updated intel_reg_dumper to http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/ Please grab the latest git from there and attach new reg dumps for working and broken kernels. Also please attach the contents of dri/0/i915_gem_gtt from debugfs for both cases to that we can correlate the registers. Created attachment 106850 [details]
reg_dumps from commit 7d13205
taken with the git version of intel-gpu-tools (installed via 'yaourt -S intel-gpu-tools-git' on my arch laptop)
The power management problems seem to persist after downgrading to 7d1320; I'm looking for that issue on the bugtracker separately now. Created attachment 106851 [details]
reg_dumps on 3.10 (commit 8bb495e)
reg_dumps from the 8bb495e (with regressions). Taken with the same version of intel-gpu-tools as the dumps from commit 7d13205.
Ignore what I said earlier about the temperatures, I highly suspect the crappy cooling in this laptop is at fault.
After applying the first test patch on master I get this build error: drivers/gpu/drm/i915/i915_gem.c: In function 'i915_gem_reset_fences': drivers/gpu/drm/i915/i915_gem.c:2149:26: error: 'obj' undeclared (first use in this function) i915_gem_release_mmap(obj); ^ drivers/gpu/drm/i915/i915_gem.c:2149:26: note: each undeclared identifier is reported only once for each function it appears in Or was I supposed to apply the two patches together? Created attachment 106852 [details]
reg_dumps for master with patch 1 applied
I took a look a the first patch and concluded that there was supposed to be 'reg->obj' instead of 'obj'. It compiled well but the display corruption was still there.
Created attachment 106853 [details]
reg_dumps for master with patch 2 applied
Patch 2 compiled correctly, but did not fix the bug for me. (returning to 3.10-rc6 for now)
At least on my snb here I can only reproduce garbage when enabling the UXA xf86-video-intel backend, not with SNA. Can everyone who sees this please check in Xorg.log which backend is in use (just grep for UXA|SNA)? I used UXA after I got graphical glitches from SNA in Mozilla Firefox (window contents of other applications appearing in images) but using SNA seems to fix this resume issue, even for the latest mainline release. (In reply to Jona Stubbe from comment #32) > I used UXA after I got graphical glitches from SNA in Mozilla Firefox > (window contents of other applications appearing in images). For the record, they are fixed by commit daa13e1ca587bc773c1aae415ed1af6554117bd4 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 28 16:54:08 2013 +0100 drm/i915: Only clear write-domains after a successful wait-seqno (In reply to Chris Wilson from comment #33) > (In reply to Jona Stubbe from comment #32) > > I used UXA after I got graphical glitches from SNA in Mozilla Firefox > > (window contents of other applications appearing in images). > > For the record, they are fixed by > commit daa13e1ca587bc773c1aae415ed1af6554117bd4 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Jun 28 16:54:08 2013 +0100 > > drm/i915: Only clear write-domains after a successful wait-seqno Note that this commit is currently only available in linux-next or drm-intel-nightly. I'll send the pull to Dave for it shortly, but I guess it'd be useful if people could retest whether this fixes the issue - the regression fixed by this patch is a rather old one. I am the one who reported not to be affected by this resume issue (comment 19), though I am on Ivybridge. I am indeed using the SNA backend. That's actually reassuring since trying to ascribe this to a difference in UXA vs SNA across resume was worrisome. I guess the reason why Daniel found it easier to reproduce with UXA is that UXA tends to use fences a lot more than SNA. I am using OpenELEC (http://openelec.tv/news/22-releases/99-testing-openelec-3-1-2-released) with the 3.10 kernel (Mesa-9.1.4, xf86-video-intel-2.21.11) and I might be affected by this too (i3-3220 Ivy Bridge). After resume from suspend, XBMC is showing a lot of distortions on the screen: http://i.imgur.com/ktxJ9uG.jpg http://i.imgur.com/4JrRGVt.jpg The issue is reported to OpenELEC here: https://github.com/OpenELEC/OpenELEC.tv/issues/2453 UXA / SNA both have the problem. On Thursday, July 11, 2013 07:37:43 AM bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #31 from Daniel Vetter <daniel@ffwll.ch> --- > At least on my snb here I can only reproduce garbage when enabling the UXA > xf86-video-intel backend, not with SNA. Can everyone who sees this please > check > in Xorg.log which backend is in use (just grep for UXA|SNA)? UXA for me. I can reproduce this quite easily with SNA backend. Intel hd4000, using the latest 3.10.1 kernel on arch linux. Almost every resume gnome-shell becomes totally unusable, covered in graphical garbage and artifacts. (In reply to Daniel Vetter from comment #31) > At least on my snb here I can only reproduce garbage when enabling the UXA > xf86-video-intel backend, not with SNA. Can everyone who sees this please > check in Xorg.log which backend is in use (just grep for UXA|SNA)? On my laptop: [ 30.592] (II) intel(0): SNA initialized with IvyBridge backend Created attachment 106910 [details]
correctly restore fence registers with objects attached
Please test the attached patch and report whether it works or whether I need to go back to banging my head against the wall.
I tested it a bit and it fixes the bug for me (Thinkpad Edge E530 (Intel 3rdn gen), 3.10.1 with pf patchset) Ok, I think this is the real fix (since I've gotten other confirmations on irc, too). Patch merged to drm-intel-fixes, will get forwarded soon and then trickle back to stable trees: commit 94a335dba34ff47cad3d6d0c29b452d43a1be3c8 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 17 14:51:28 2013 +0200 drm/i915: correctly restore fences with objects attached Thanks everyone for reporting this issue and testing stuff. *** Bug 60598 has been marked as a duplicate of this bug. *** (In reply to Daniel Vetter from comment #43) > Ok, I think this is the real fix (since I've gotten other confirmations on > irc, too). Patch merged to drm-intel-fixes, will get forwarded soon and then > trickle back to stable trees: > > commit 94a335dba34ff47cad3d6d0c29b452d43a1be3c8 > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Wed Jul 17 14:51:28 2013 +0200 > > drm/i915: correctly restore fences with objects attached > > Thanks everyone for reporting this issue and testing stuff. Is there anything that needs to be done in order for this to be bakported? It seems that 3.10.3 and 3.10.4 still do not contain the fix - it'd be would really awesome to not have to restart gnome-shell after every suspend cycle :-) There's no backport required to apply this to 3.10.1+, I already use it with this version (+ pf patchset) and have no issues. Sorry, I meant actually getting the fix in to the mainless 3.10.x series. At least 3.10.2, .3, and .4 do not contain the fix. I too just come across this issue and this patch, on 3.10.4 kernel. The issue is easily triggerable by suspending while (eg) glxgears is running, 3 reboots in a row confirmed this (each time I ran glxgears, suspended and resumed, and each time the display were garbled). With the proposed patch I can't trigger the issue anymore, tried multiple suspend/resume cycles. I took commit 94a335dba34ff47cad3d6d0c29b452d43a1be3c8 from Linus tree and applied to to 3.10.4. What upstream kernel version will patch be included in? Is it fixed in the new 3.10.5? 3.10.5 should have the fix: commit 19a280cac37e30243023a7f53651504a135ac960 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 17 14:51:28 2013 +0200 drm/i915: correctly restore fences with objects attached commit 94a335dba34ff47cad3d6d0c29b452d43a1be3c8 upstream. To avoid stalls we delay tiling changes and especially hold of committing the new fence state for as long as possible. Synchronization points are in the execbuf code and in our gtt fault handler. Unfortunately we've missed that tricky detail when adding proper fence restore code in commit 19b2dbde5732170a03bd82cc8bd442cf88d856f7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 12 10:15:12 2013 +0100 drm/i915: Restore fences after resume and GPU resets The result was that we've restored fences for objects with no tiling, since the object<->fence link still existed after resume. Now that wouldn't have been too bad since any subsequent access would have fixed things up, but if we've changed from tiled to untiled real havoc happened: The tiling stride is stored -1 in the fence register, so a stride of 0 resulted in all 1s in the top 32bits, and so a completely bogus fence spanning everything from the start of the object to the top of the GTT. The tell-tale in the register dumps looks like: FENCE START 2: 0x0214d001 FENCE END 2: 0xfffff3ff Bit 11 isn't set since the hw doesn't store it, even when writing all 1s (at least on my snb here). To prevent such a gaffle in the future add a sanity check for fences with an untiled object attached in i915_gem_write_fence. v2: Fix the WARN, spotted by Chris. v3: Trying to reuse get_fences looked ugly and obfuscated the code. Instead reuse update_fence and to make it really dtrt also move the fence dirty state clearing into update_fence. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60530 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Stéphane Marchesin <marcheu@chromium.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Matthew Garrett <matthew.garrett@nebula.com> Tested-by: Björn Bidar <theodorstormgrade@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Created attachment 106824 [details] Snapshot taken with ksnapshot As described in the linked message: I've just started to play with a new Acer Aspire S5 test box and noticed that garbage is displayed after resume from suspend to RAM with the 3.10 kernel (under KDE 4.10.3 on openSUSE 12.3). The display corruption goes away after killing X and restarting it. The CPU is a Core i5-3317U (Ivy Bridge), i915 graphics. That doesn't happen with 3.9 (same config otherwise). Also it turns out to happen on an SNB-based machine (not 100% of the time).