Bug 51181 - [GM45] GPU hang
Summary: [GM45] GPU hang
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-01 11:17 UTC by Mircea Gherzan
Modified: 2013-01-16 14:19 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.7.0-rc7+
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kernel log (66.15 KB, text/plain)
2012-12-01 11:17 UTC, Mircea Gherzan
Details
i915_error_state (203.45 KB, application/x-gzip)
2012-12-04 23:38 UTC, Mircea Gherzan
Details

Description Mircea Gherzan 2012-12-01 11:17:52 UTC
Created attachment 88001 [details]
kernel log

X11 will become unusable at at random point in time, with the title bar of the windows (GNOME3) dissapearing.
Comment 1 Daniel Vetter 2012-12-01 11:38:01 UTC
Please rehang your gpu and then attach the i915_error_state file to this bug report (you likely need to gzip it due to stupid kernel bugzilla restrictions). Also:
- is this a regression?
- please attach the versions of all the userspace components of the gfx driver stack you've installed (libdrm, mesa, xf86-video-intel).
Comment 2 Mircea Gherzan 2012-12-02 15:15:42 UTC
Yes, this is a regression.

mesa: 8.0.5
libdrm: 2.4.33
xf86-video-intel: 2.19.0
Comment 3 Daniel Vetter 2012-12-02 15:30:34 UTC
So I guess kernel 3.6 worked, or does this regression go back further? Also, a bisect would be awesome.
Comment 4 Mircea Gherzan 2012-12-04 12:54:44 UTC
Yes, the 3.6 kernel worked. The regression was introduced somewhere between 3.6 and 3.7-rc1.
Comment 5 Mircea Gherzan 2012-12-04 23:38:30 UTC
Created attachment 88471 [details]
i915_error_state
Comment 6 Jesse Barnes 2012-12-12 19:29:52 UTC
Mircea, any chance you can bisect since this seems to be reproducible for you?
Comment 7 Mircea Gherzan 2012-12-17 23:02:59 UTC
Sorry, but this is really hard to reproduce in the sense that there is no way to triger the bug. Sometimes it occurs after a quarter of an hour and sometimes it takes more than 3 hours. I want to help but I need to have a way to trigger this behaviour. Is there any chance you can tell from the error state how to force this bug?
Comment 8 Daniel Vetter 2012-12-18 11:00:00 UTC
Please try out the patch at

https://patchwork.kernel.org/patch/1885411/

It has a decent chance to reduce gtt trashing, which might be good enough to
again ducttape over the hangs. Or maybe change the pattern to be able to
reproduce it much quicker. In any case, should be interesting ...
Comment 9 Mircea Gherzan 2012-12-21 11:56:09 UTC
I applied the patch on top of the vanilla 3.7 and the GPU no longer hangs. However, I haven't really used the machine a lot. I will try to use it more in the  next days and I will report back.
Comment 10 Daniel Vetter 2013-01-16 14:19:45 UTC
Presumably fixed with

commit 262b6d363fcff16359c93bd58c297f961f6e6273
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jan 15 16:17:54 2013 +0000

    drm/i915: Invalidate the relocation presumed_offsets along the slow path

and paper over by

commit 93927ca52a55c23e0a6a305e7e9082e8411ac9fa
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jan 10 18:03:00 2013 +0100

    drm/i915: Revert shrinker changes from "Track unbound pages"

and

commit 901593f2bf221659a605bdc1dcb11376ea934163
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 19 16:51:06 2012 +0000

    drm: Only evict the blocks required to create the requested hole

Thanks a lot for reporting this regression and please reopen if you're still experiencing issues. Note that the first commit is only available in drm-intel-fixes for now.

Note You need to log in before you can comment on or make changes to this bug.