Bug 50731

Summary: Transparent hugepages and VA API: Intel GPU hang
Product: Drivers Reporter: Jens Weibler (bugzilla-kernel)
Component: Video(DRI - Intel)Assignee: intel-gfx-bugs (intel-gfx-bugs)
Status: RESOLVED UNREPRODUCIBLE    
Severity: normal CC: chris, daniel, intel-gfx-bugs, leho
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.7.0-030700rc4-generic Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg | grep drm
some information from dri/0/i915_error_state

Description Jens Weibler 2012-11-19 16:07:52 UTC
I'm not quite sure which component is guilty.

Playing a video with vlc doesn't correctly work if transparent hugepages are set on always and vlc is using the VA API ("Use GPU accelerated decoding").

Sometimes I get only short audio, than a black video screen and after a while the following kernel message:
---
Nov 19 16:31:57 jtb kernel: [305355.981827] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 19 16:31:57 jtb kernel: [305355.983390] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
---
But I can't clearly reproduce it. If I get the error again, I'll collect the i915_error_state file.


boot with transparent_hugepage=always
-> affected

boot with transparent_hugepage=always, set /sys/kernel/mm/transparent_hugepage/enabled to never
-> affected

boot without transparent_hugepage=always
-> not affected

boot without transparent_hugepage=always, set /sys/kernel/mm/transparent_hugepage/enabled to always, start vlc afterwards
-> not affected

boot without transparent_hugepage=always, set /sys/kernel/mm/transparent_hugepage/enabled to always, restart X afterwords
-> not affected
Comment 1 Daniel Vetter 2012-11-19 16:33:27 UTC
Can you please add a dmesg with drm.debug=0xe added to your kernel cmdline so we know the details of your machine?
Comment 2 Jens Weibler 2012-11-22 11:58:45 UTC
Created attachment 86981 [details]
dmesg | grep drm

It just happend after a clear boot.
Comment 3 Jens Weibler 2012-11-22 12:06:42 UTC
I can't properly read i915_error_state:

# cat dri/0/i915_error_state 
cat: dri/0/i915_error_state: Cannot allocate memory

# cp dri/0/i915_error_state ~/     
cp: reading `dri/0/i915_error_state': Cannot allocate memory
cp: failed to extend `/root/i915_error_state': Cannot allocate memory

My memory:
# cat /proc/meminfo 
MemTotal:        8055016 kB
MemFree:         3555072 kB
Buffers:            1692 kB
Cached:          2188656 kB
SwapCached:            0 kB
Active:          2361084 kB
Inactive:        1807128 kB
Active(anon):    1986028 kB
Inactive(anon):   199112 kB
Active(file):     375056 kB
Inactive(file):  1608016 kB
Unevictable:       31344 kB
Mlocked:           31344 kB
SwapTotal:       3905532 kB
SwapFree:        3905532 kB
Dirty:               184 kB
Writeback:             0 kB
AnonPages:       2009164 kB
Mapped:           331420 kB
Shmem:            201448 kB
Slab:             158024 kB
SReclaimable:      96464 kB
SUnreclaim:        61560 kB
KernelStack:        4488 kB
PageTables:        51116 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     7933040 kB
Committed_AS:    4589804 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      364528 kB
VmallocChunk:   34359369792 kB
HardwareCorrupted:     0 kB
AnonHugePages:    276480 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      100352 kB
DirectMap2M:     8173568 kB


But with less I can see some information.. The file starts with some debug information but ends with binary crap. Some strings like "ECRYPTFS_FNEK_ENCRYPTED.FWbci9AVF10P" are included or even:
oject-Id-Version: update-notifier
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2012-04-10 12:15+0100
PO-Revision-Date: 2006-10-22 17:56+0000
Last-Translator: FULL NAME <EMAIL@ADDRESS>
Language-Team: English <en@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Plural-Forms: nplurals=2; plural=n != 1;
X-Launchpad-Export-Date: 2012-10-09 15:32+0000
X-Generator: Launchpad (build 16112)

I guess this has nothing to do with i915 but is some part of my kernel memory!?
Comment 4 Chris Wilson 2012-11-22 12:08:25 UTC
Just shoot a few memory hogs (like X) then capture the error-state.
Comment 5 Jens Weibler 2012-11-22 12:09:50 UTC
Created attachment 86991 [details]
some information from dri/0/i915_error_state
Comment 6 Jens Weibler 2012-11-22 12:30:23 UTC
(In reply to comment #4)
> Just shoot a few memory hogs (like X) then capture the error-state.

I shutdown/killed everything except init and my shell - but still "Cannot allocate memory".
strace:
  fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(5, 1), ...}) = 0
  open("dri/0/i915_error_state", O_RDONLY) = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
  read(3, 0x62f000, 32768)                = -1 ENOMEM (Cannot allocate memory)
  write(2, "cat: ", 5)                    = 5
  write(2, "dri/0/i915_error_state", 22)  = 22
  write(2, ": Cannot allocate memory", 24) = 24

The binary "crap" after what seems to be the debugging information changes on each read. I recognize parts of the mime database, found a license text, package informations from dpkg or so. I even found some http requests/replies.

btw: each file saved by less is 4194304 bytes big.
Comment 7 Chris Wilson 2013-01-23 22:50:57 UTC
I wonder if this was not just the infamous: http://cgit.freedesktop.org/~danvet/drm-intel/commit/?id=262b6d363fcff16359c93bd58c297f961f6e6273

Maybe just try a recent 3.8 on the off-chance it was that bug.
Comment 8 Jens Weibler 2013-03-05 11:34:38 UTC
Sorry, I can't reproduce it - had to switch my notebook..
Comment 9 Daniel Vetter 2013-03-06 09:33:49 UTC
Ok, closing as unreproducible since we've lost the hw. Thanks anyway for reporting this issue.