Bug 19582 - Watching Sintel in 2K resolution via VAAPI results in GPU Hang and memory allocation issues
Summary: Watching Sintel in 2K resolution via VAAPI results in GPU Hang and memory all...
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri-intel@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-02 18:15 UTC by Julian Andres Klode
Modified: 2010-12-19 11:56 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.35.4, 2.6.35.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Julian Andres Klode 2010-10-02 18:15:35 UTC
I tried watching Sintel in 2K resolution using VAAPI; this caused a GPU hang. debugging the GPU hang was not possible, as reading i915_error_state failed with the message that not enough memory could be allocated.

The following dmesg logs show the hang and the trace from running cat on i915_error_state.

Result from Debian Kernel based on 2.6.35.4
============================================

[  501.423918] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  501.425867] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 6477 at 6080)
[  506.641302] ------------[ cut here ]------------
[  506.641313] WARNING: at /build/mattems-linux-2.6_2.6.35-1~experimental.3-amd64-XReacf/linux-2.6-2.6.35/debian/build/source_amd64_none/mm/page_alloc.c:1968 __alloc_pages_nodemask+0x17c/0x70b()
[  506.641316] Hardware name: 03017VG
[  506.641317] Modules linked in: cbc hidp aes_x86_64 aes_generic ecryptfs parport_pc ppdev sco lp parport rfcomm bnep l2cap acpi_cpufreq mperf binfmt_misc cpufreq_stats cpufreq_userspace cpufreq_powersave kvm_intel cpufreq_conservative kvm uinput fuse loop snd_hda_codec_intelhdmi snd_hda_codec_realtek btusb joydev bluetooth arc4 uvcvideo thinkpad_acpi videodev snd_hda_intel ecb v4l1_compat v4l2_compat_ioctl32 iwlagn snd_hda_codec iwlcore snd_hwdep snd_seq snd_pcm serio_raw mac80211 snd_timer cfg80211 i2c_i801 pcspkr snd_seq_device rfkill snd_page_alloc tpm_tis snd soundcore led_class tpm tpm_bios psmouse battery nvram processor ac evdev ext4 mbcache jbd2 crc16 btrfs zlib_deflate crc32c libcrc32c sg sr_mod cdrom sd_mod usbhid crc_t10dif ata_generic usb_storage hid i915 ata_piix libata drm_kms_helper drm ehci_hcd i2c_algo_bit scsi_mod i2c_core usbcore thermal video output r8169 mii button thermal_sys nls_base [last unloaded: scsi_wait_scan]
[  506.641378] Pid: 3402, comm: cat Not tainted 2.6.35-trunk-amd64 #1
[  506.641380] Call Trace:
[  506.641387]  [<ffffffff81044307>] ? warn_slowpath_common+0x78/0x8c
[  506.641389]  [<ffffffff810b4866>] ? __alloc_pages_nodemask+0x17c/0x70b
[  506.641397]  [<ffffffff8100938e>] ? apic_timer_interrupt+0xe/0x20
[  506.641403]  [<ffffffff813064db>] ? _raw_spin_unlock_irqrestore+0xb/0x11
[  506.641407]  [<ffffffff810d90ba>] ? alloc_pages_current+0x9f/0xc2
[  506.641410]  [<ffffffff810b3bdb>] ? __get_free_pages+0x9/0x46
[  506.641414]  [<ffffffff810e17f2>] ? __kmalloc+0x3f/0x136
[  506.641418]  [<ffffffff811003fe>] ? seq_read+0x1f6/0x360
[  506.641420]  [<ffffffff810e96ba>] ? vfs_read+0xa1/0xfd
[  506.641422]  [<ffffffff810e97c9>] ? sys_read+0x45/0x6b
[  506.641425]  [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b
[  506.641427] ---[ end trace ad0d7a981527aba9 ]---

Result from Kernel 2.6.35.7
=============================
[  276.102224] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  276.104675] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 3735 at 3513)
[  279.986230] SysRq : Changing Loglevel
[  279.988670] Loglevel set to 0
[  295.823788] ------------[ cut here ]------------
[  295.823798] WARNING: at mm/page_alloc.c:1981 __alloc_pages_nodemask+0x17c/0x6f3()
[  295.823800] Hardware name: 03017VG
[  295.823802] Modules linked in: vboxnetadp vboxnetflt vboxdrv
[  295.823808] Pid: 2680, comm: cat Not tainted 2.6.35.7+ #1
[  295.823809] Call Trace:
[  295.823816]  [<ffffffff8106285b>] ? warn_slowpath_common+0x78/0x8c
[  295.823819]  [<ffffffff810ce9df>] ? __alloc_pages_nodemask+0x17c/0x6f3
[  295.823825]  [<ffffffff8102a38e>] ? apic_timer_interrupt+0xe/0x20
[  295.823828]  [<ffffffff8148c38b>] ? _raw_spin_unlock_irqrestore+0xb/0x11
[  295.823833]  [<ffffffff810f328a>] ? alloc_pages_current+0x9f/0xc2
[  295.823836]  [<ffffffff810cdeab>] ? __get_free_pages+0x9/0x46
[  295.823839]  [<ffffffff810fb9c6>] ? __kmalloc+0x3f/0x136
[  295.823842]  [<ffffffff81119f66>] ? seq_read+0x1f6/0x360
[  295.823846]  [<ffffffff8110370b>] ? vfs_read+0xa1/0xfd
[  295.823877]  [<ffffffff8110381a>] ? sys_read+0x45/0x6b
[  295.823880]  [<ffffffff810299c2>] ? system_call_fastpath+0x16/0x1b
[  295.823882] ---[ end trace 22ac58d95ef11a99 ]---
Comment 1 Chris Wilson 2010-12-16 13:42:19 UTC
Not a kernel bug, please inform the libva developers.
Comment 2 Julian Andres Klode 2010-12-19 11:48:28 UTC
(In reply to comment #1)
> Not a kernel bug, please inform the libva developers.

How is it not a kernel bug if a user space application running as a normal user can cause the GPU to hang? There might be a bug in libva causing this, but even with this fixed, other user space code could still cause GPU hangs by going the same path.
Comment 3 Chris Wilson 2010-12-19 11:56:15 UTC
Because userspace is doing undefined operations with the GPU. In exactly the same manner as if the application tried to *0, only the exception handling in the GPU is not as robust in the CPU.

Note You need to log in before you can comment on or make changes to this bug.