Bug 59941 - Frequent GPU hangs since v3.9
Summary: Frequent GPU hangs since v3.9
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-20 07:39 UTC by Chow Loong Jin
Modified: 2013-06-25 09:01 UTC (History)
4 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
i915_error_state from 3.10-rc6 (62.75 KB, application/x-xz; charset=binary)
2013-06-25 06:21 UTC, Chow Loong Jin
Details
glxinfo -l output (14.24 KB, text/plain)
2013-06-25 07:11 UTC, Chow Loong Jin
Details

Description Chow Loong Jin 2013-06-20 07:39:54 UTC
Since v3.9, when running Ubuntu 13.04 with the X and Mesa stack from https://launchpad.net/~xorg-edgers/+archive/ppa, I have been getting frequent hangs on my Thinkpad E220S running with a Sandy Bridge (8086:0116/Core i5-2537M) GPU:

[10121.411125] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[10121.411131] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
--
[53978.637042] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
--
[68285.759379] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[68285.759404] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring
--
[69825.557728] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[69825.557740] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state

In v3.9.x, X would freeze up, and I would have to kill X via Alt+SysRq+K in order to get out of the situation, but in v3.10 it looks like it's at least able to unhang itself without crashing X.
Comment 1 Daniel Vetter 2013-06-24 16:46:52 UTC
We need the error state file from debugfs (please attach as gzip to fit into bugzilla size limits).
Comment 2 Chow Loong Jin 2013-06-24 17:04:45 UTC
Here you go: http://people.ubuntu.com/~hyperair/i915-error-state
Comment 3 Daniel Vetter 2013-06-24 17:36:02 UTC
We seem to be executing (or trying to) complete garbage ... or the error state capture code is doing something funny. I'm confused.

Chris?
Comment 4 Chris Wilson 2013-06-24 19:47:08 UTC
That's the old i965g gallium driver.
Comment 5 Chow Loong Jin 2013-06-24 23:52:01 UTC
What? No, I'm not running i965g. I don't even have that installed.
Comment 6 Chow Loong Jin 2013-06-25 06:21:15 UTC
Created attachment 105931 [details]
i915_error_state from 3.10-rc6

If it helps, here's one from 3.10-rc6.
Comment 7 Chris Wilson 2013-06-25 06:43:12 UTC
At this moment, glxinfo would be more useful. Those batches are profoundly incorrect (sending commands for the wrong generation of GPUs).
Comment 8 Chow Loong Jin 2013-06-25 07:11:13 UTC
Created attachment 105941 [details]
glxinfo -l output
Comment 9 Chris Wilson 2013-06-25 07:18:02 UTC
That last error state is much cleaner; just the usual mesa blorp death.
Comment 10 Daniel Vetter 2013-06-25 09:01:39 UTC
Yeah, this smells like a mesa bug. Please retest with latest git mesa (if possible) and file a bug on bugs.freedesktop.org against Mesa -> Drivers/DRI/i965 with the error_state attached.

Note You need to log in before you can comment on or make changes to this bug.