Bug 50731 - Transparent hugepages and VA API: Intel GPU hang
Summary: Transparent hugepages and VA API: Intel GPU hang
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-19 16:07 UTC by Jens Weibler
Modified: 2019-09-05 10:32 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.7.0-030700rc4-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg | grep drm (36.04 KB, text/plain)
2012-11-22 11:58 UTC, Jens Weibler
Details
some information from dri/0/i915_error_state (4.96 KB, text/plain)
2012-11-22 12:09 UTC, Jens Weibler
Details

Description Jens Weibler 2012-11-19 16:07:52 UTC
I'm not quite sure which component is guilty.

Playing a video with vlc doesn't correctly work if transparent hugepages are set on always and vlc is using the VA API ("Use GPU accelerated decoding").

Sometimes I get only short audio, than a black video screen and after a while the following kernel message:
---
Nov 19 16:31:57 jtb kernel: [305355.981827] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 19 16:31:57 jtb kernel: [305355.983390] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
---
But I can't clearly reproduce it. If I get the error again, I'll collect the i915_error_state file.


boot with transparent_hugepage=always
-> affected

boot with transparent_hugepage=always, set /sys/kernel/mm/transparent_hugepage/enabled to never
-> affected

boot without transparent_hugepage=always
-> not affected

boot without transparent_hugepage=always, set /sys/kernel/mm/transparent_hugepage/enabled to always, start vlc afterwards
-> not affected

boot without transparent_hugepage=always, set /sys/kernel/mm/transparent_hugepage/enabled to always, restart X afterwords
-> not affected
Comment 1 Daniel Vetter 2012-11-19 16:33:27 UTC
Can you please add a dmesg with drm.debug=0xe added to your kernel cmdline so we know the details of your machine?
Comment 2 Jens Weibler 2012-11-22 11:58:45 UTC
Created attachment 86981 [details]
dmesg | grep drm

It just happend after a clear boot.
Comment 3 Jens Weibler 2012-11-22 12:06:42 UTC
I can't properly read i915_error_state:

# cat dri/0/i915_error_state 
cat: dri/0/i915_error_state: Cannot allocate memory

# cp dri/0/i915_error_state ~/     
cp: reading `dri/0/i915_error_state': Cannot allocate memory
cp: failed to extend `/root/i915_error_state': Cannot allocate memory

My memory:
# cat /proc/meminfo 
MemTotal:        8055016 kB
MemFree:         3555072 kB
Buffers:            1692 kB
Cached:          2188656 kB
SwapCached:            0 kB
Active:          2361084 kB
Inactive:        1807128 kB
Active(anon):    1986028 kB
Inactive(anon):   199112 kB
Active(file):     375056 kB
Inactive(file):  1608016 kB
Unevictable:       31344 kB
Mlocked:           31344 kB
SwapTotal:       3905532 kB
SwapFree:        3905532 kB
Dirty:               184 kB
Writeback:             0 kB
AnonPages:       2009164 kB
Mapped:           331420 kB
Shmem:            201448 kB
Slab:             158024 kB
SReclaimable:      96464 kB
SUnreclaim:        61560 kB
KernelStack:        4488 kB
PageTables:        51116 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     7933040 kB
Committed_AS:    4589804 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      364528 kB
VmallocChunk:   34359369792 kB
HardwareCorrupted:     0 kB
AnonHugePages:    276480 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      100352 kB
DirectMap2M:     8173568 kB


But with less I can see some information.. The file starts with some debug information but ends with binary crap. Some strings like "ECRYPTFS_FNEK_ENCRYPTED.FWbci9AVF10P" are included or even:
oject-Id-Version: update-notifier
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2012-04-10 12:15+0100
PO-Revision-Date: 2006-10-22 17:56+0000
Last-Translator: FULL NAME <EMAIL@ADDRESS>
Language-Team: English <en@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Plural-Forms: nplurals=2; plural=n != 1;
X-Launchpad-Export-Date: 2012-10-09 15:32+0000
X-Generator: Launchpad (build 16112)

I guess this has nothing to do with i915 but is some part of my kernel memory!?
Comment 4 Chris Wilson 2012-11-22 12:08:25 UTC
Just shoot a few memory hogs (like X) then capture the error-state.
Comment 5 Jens Weibler 2012-11-22 12:09:50 UTC
Created attachment 86991 [details]
some information from dri/0/i915_error_state
Comment 6 Jens Weibler 2012-11-22 12:30:23 UTC
(In reply to comment #4)
> Just shoot a few memory hogs (like X) then capture the error-state.

I shutdown/killed everything except init and my shell - but still "Cannot allocate memory".
strace:
  fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(5, 1), ...}) = 0
  open("dri/0/i915_error_state", O_RDONLY) = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
  read(3, 0x62f000, 32768)                = -1 ENOMEM (Cannot allocate memory)
  write(2, "cat: ", 5)                    = 5
  write(2, "dri/0/i915_error_state", 22)  = 22
  write(2, ": Cannot allocate memory", 24) = 24

The binary "crap" after what seems to be the debugging information changes on each read. I recognize parts of the mime database, found a license text, package informations from dpkg or so. I even found some http requests/replies.

btw: each file saved by less is 4194304 bytes big.
Comment 7 Chris Wilson 2013-01-23 22:50:57 UTC
I wonder if this was not just the infamous: http://cgit.freedesktop.org/~danvet/drm-intel/commit/?id=262b6d363fcff16359c93bd58c297f961f6e6273

Maybe just try a recent 3.8 on the off-chance it was that bug.
Comment 8 Jens Weibler 2013-03-05 11:34:38 UTC
Sorry, I can't reproduce it - had to switch my notebook..
Comment 9 Daniel Vetter 2013-03-06 09:33:49 UTC
Ok, closing as unreproducible since we've lost the hw. Thanks anyway for reporting this issue.

Note You need to log in before you can comment on or make changes to this bug.