Bug 52661 - GPU hang when transferring large files
Summary: GPU hang when transferring large files
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-13 22:44 UTC by Sudaraka Wijesinghe
Modified: 2013-01-14 16:42 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.7
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
lspci output (10.77 KB, text/plain)
2013-01-13 22:44 UTC, Sudaraka Wijesinghe
Details
journalctl output from the HP laptop that complete freezes (45.63 KB, text/plain)
2013-01-13 22:46 UTC, Sudaraka Wijesinghe
Details
/proc/cpuinfo (3.29 KB, text/plain)
2013-01-13 22:47 UTC, Sudaraka Wijesinghe
Details
/proc/meminfo (1.14 KB, text/plain)
2013-01-13 22:47 UTC, Sudaraka Wijesinghe
Details
Kernel configuration (78.64 KB, text/plain)
2013-01-13 22:48 UTC, Sudaraka Wijesinghe
Details
i915_error_state from 3.7.2 kernel after GPU hang (198.33 KB, application/x-gzip)
2013-01-14 16:24 UTC, Sudaraka Wijesinghe
Details

Description Sudaraka Wijesinghe 2013-01-13 22:44:39 UTC
Created attachment 91221 [details]
lspci output

Every time a large file (i.e 2GB) is transferred from one place to another, GPU hangs with the following error message.

-- from: journalctl -f --------------------------------------------------------
Jan 13 20:18:11 sw-main kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 13 20:18:12 sw-main kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 13 20:18:12 sw-main kernel: [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Jan 13 20:18:12 sw-main kernel: [drm:i915_reset] *ERROR* Failed to reset chip.
-------------------------------------------------------------------------------

Hardware info of the machine (Acer Aspire 4738) is attached, On another machine it freezes entirely with attached error log (hp-error.txt)


-- bisect result --------------------------------------------------------------
504c7267a1e84b157cbd7e9c1b805e1bc0c2c846 is the first bad commit
commit 504c7267a1e84b157cbd7e9c1b805e1bc0c2c846
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 23 13:12:52 2012 +0100

    drm/i915: Use cpu relocations if the object is in the GTT but not mappable
    
    This prevents the case of unbinding the object in order to process the
    relocations through the GTT and then rebinding it only to then proceed
    to use cpu relocations as the object is now in the CPU write domain. By
    choosing to use cpu relocations up front, we can therefore avoid the
    rebind penalty.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 090ed3d52b4f3210b988877f747b6ff86e123385 1d48be89ded4777a543b693db833de64877059c4 M	drivers
-------------------------------------------------------------------------------
Comment 1 Sudaraka Wijesinghe 2013-01-13 22:46:29 UTC
Created attachment 91231 [details]
journalctl output from the HP laptop that complete freezes
Comment 2 Sudaraka Wijesinghe 2013-01-13 22:47:11 UTC
Created attachment 91241 [details]
/proc/cpuinfo
Comment 3 Sudaraka Wijesinghe 2013-01-13 22:47:31 UTC
Created attachment 91251 [details]
/proc/meminfo
Comment 4 Sudaraka Wijesinghe 2013-01-13 22:48:08 UTC
Created attachment 91261 [details]
Kernel configuration
Comment 5 Daniel Vetter 2013-01-14 13:28:21 UTC
Can you please rehang your machine and attach the i915_error_state file from debugfs? Also, please test this on latest upstream git branch from Linus' tree. Make sure that you have the following commit:

commit 93927ca52a55c23e0a6a305e7e9082e8411ac9fa
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jan 10 18:03:00 2013 +0100

    drm/i915: Revert shrinker changes from "Track unbound pages"
Comment 6 Sudaraka Wijesinghe 2013-01-14 16:24:29 UTC
Created attachment 91321 [details]
i915_error_state from 3.7.2 kernel after GPU hang
Comment 7 Sudaraka Wijesinghe 2013-01-14 16:25:07 UTC
On 01/14/13 18:58, bugzilla-daemon@bugzilla.kernel.org wrote:
> --- Comment #5 from Daniel Vetter <daniel@ffwll.ch>  2013-01-14 13:28:21 ---
> Can you please rehang your machine and attach the i915_error_state file from
> debugfs? Also, please test this on latest upstream git branch from Linus'
> tree.

Problem doesn't seem to be in the Linus' tree (3.8.0-rc3), so I guess
it's already fixed.

I will attach the i915_error_state from 3.7.2 anyway if you need it.

You may close this if no longer necessary, or if there is some thing to
be looked into I'm glad to help with testing.

Thanks.
Comment 8 Daniel Vetter 2013-01-14 16:42:46 UTC
Yeah, we've (re)applied a workaround for this one, but already have tons of reporters. See https://bugs.freedesktop.org/show_bug.cgi?id=55984

Note You need to log in before you can comment on or make changes to this bug.