Bug 15114

Summary: X.org hang with [drm:i915_gem_do_execbuffer] *ERROR* in dmesg
Product: Drivers Reporter: Matej Laitl (matej)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: CLOSED UNREPRODUCIBLE    
Severity: high CC: chris, jacmet, jbarnes, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc4-00399-g24bc734 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: config
intel_gpu_dump.txt
intel_gpu_dump.txt.bz2

Description Matej Laitl 2010-01-23 19:54:05 UTC
Hello,
from time to time, my X.org freezes. The system is otherwise (remotely) usable, but any application depending on X hangs.

Versions of relevant software:
vanilla kernel 2.6.33-rc4-00399-g24bc734
xf86-video-intel 2.10.0
mesa 7.5.2
libdrm 2.4.17
xorg-server 1.7.4

Upon hang, following appears in dmesg:

[ 6379.732892] [drm:i915_gem_do_execbuffer] *ERROR* Object ffff880098cd6540 appears more than once in object list
[ 6379.740976] [drm:i915_gem_do_execbuffer] *ERROR* Object ffff880098cd6540 appears more than once in object list
[ 6379.740995] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
[ 6379.740998] IP: [<ffffffff8122ddb5>] i915_gem_do_execbuffer+0xba5/0x1260
[ 6379.741006] PGD babab067 PUD bb435067 PMD 0 
[ 6379.741010] Oops: 0002 [#1] PREEMPT SMP 
[ 6379.741014] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.2/0000:06:00.0/ieee80211/phy0/rfkill0/state
[ 6379.741017] CPU 1 
[ 6379.741021] Pid: 2186, comm: X Not tainted 2.6.33-rc4-00399-g24bc734 #142 M11D/ESPRIMO Mobile M9400
[ 6379.741023] RIP: 0010:[<ffffffff8122ddb5>]  [<ffffffff8122ddb5>] i915_gem_do_execbuffer+0xba5/0x1260
[ 6379.741027] RSP: 0018:ffff8800b9047b78  EFLAGS: 00213206
[ 6379.741029] RAX: 0000000000000000 RBX: 000000000000004f RCX: ffff880098cac800
[ 6379.741032] RDX: ffff880098caca78 RSI: ffff8800b9047c98 RDI: ffff880098cd6540
[ 6379.741034] RBP: ffff8800b9047c78 R08: ffffffff814b96b5 R09: 0000000000000006
[ 6379.741036] R10: 0000000000000000 R11: 0000000000000003 R12: 000000000000004e
[ 6379.741038] R13: 00000000fffffff7 R14: 0000000000000000 R15: 0000000000000001
[ 6379.741041] FS:  0000000000000000(0000) GS:ffff880001900000(0063) knlGS:00000000f72636c0
[ 6379.741043] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 6379.741045] CR2: 00000000000000a0 CR3: 00000000b9000000 CR4: 00000000000006e0
[ 6379.741048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6379.741050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6379.741052] Process X (pid: 2186, threadinfo ffff8800b9046000, task ffff8800bb5d8000)
[ 6379.741054] Stack:
[ 6379.741055]  ffffc90023f57000 ffffc90023f56fff ffffc90023f56fff ffffc90023f55000
[ 6379.741059] <0> ffff8800b9047c98 ffff8800bb43c840 ffff8800bf1de800 ffff8800bf1de820
[ 6379.741063] <0> ffff8800b9047bd8 ffff880098cac800 0000000000000000 0000000000000002
[ 6379.741068] Call Trace:
[ 6379.741072]  [<ffffffff8122e6cb>] ? i915_gem_execbuffer+0x6b/0x370
[ 6379.741077]  [<ffffffff810a5f52>] ? __vmalloc_node+0xa2/0xb0
[ 6379.741080]  [<ffffffff8122e6cb>] ? i915_gem_execbuffer+0x6b/0x370
[ 6379.741083]  [<ffffffff8122e816>] i915_gem_execbuffer+0x1b6/0x370
[ 6379.741086]  [<ffffffff8120cd55>] drm_ioctl+0x1d5/0x460
[ 6379.741089]  [<ffffffff8122e660>] ? i915_gem_execbuffer+0x0/0x370
[ 6379.741093]  [<ffffffff81248c35>] i915_compat_ioctl+0x45/0x50
[ 6379.741097]  [<ffffffff810f1659>] compat_sys_ioctl+0xa9/0x1570
[ 6379.741102]  [<ffffffff810b1d5c>] ? vfs_read+0x13c/0x1a0
[ 6379.741106]  [<ffffffff81028424>] sysenter_dispatch+0x7/0x2b
[ 6379.741108] Code: 08 85 c0 74 52 31 db 0f 1f 80 00 00 00 00 48 63 c3 48 8b 8d 68 ff ff ff 48 8d 14 c1 48 8b 02 48 85 c0 74 25 48 8b 80 80 00 00 00 <c7> 80 a0 00 00 00 00 00 00 00 48 8b 3a 48 85 ff 74 0c 48 c7 c6 
[ 6379.741142] RIP  [<ffffffff8122ddb5>] i915_gem_do_execbuffer+0xba5/0x1260
[ 6379.741145]  RSP <ffff8800b9047b78>
[ 6379.741147] CR2: 00000000000000a0
[ 6379.741159] ---[ end trace 0598809afa4c31db ]---
Comment 1 Matej Laitl 2010-01-23 19:55:04 UTC
Created attachment 24687 [details]
config
Comment 2 Matej Laitl 2010-01-23 20:01:30 UTC
Created attachment 24688 [details]
intel_gpu_dump.txt
Comment 3 Matej Laitl 2010-01-23 20:07:50 UTC
Created attachment 24689 [details]
intel_gpu_dump.txt.bz2
Comment 4 Matej Laitl 2010-01-23 20:15:44 UTC
The system is 32bit userland on x86_64 kernel:
Linux esprimo 2.6.33-rc4-00399-g24bc734 #142 SMP PREEMPT Tue Jan 19 13:50:58 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz GenuineIntel GNU/Linux

The video card is GM965:
Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)
Comment 5 Rafael J. Wysocki 2010-02-04 19:35:01 UTC
*** Bug 15224 has been marked as a duplicate of this bug. ***
Comment 6 Jesse Barnes 2010-02-05 18:27:06 UTC
Does this happen regardless of which userspace driver you use?  We definitely shouldn't be hitting NULL pointers in this path, but we did fix some GPU hangs in the userspace drivers recently...

Chris, does this ring any bells?
Comment 7 Matej Laitl 2010-02-05 19:05:30 UTC
(In reply to comment #6)
> Does this happen regardless of which userspace driver you use?  We definitely
> shouldn't be hitting NULL pointers in this path, but we did fix some GPU
> hangs
> in the userspace drivers recently...

Recently = after xf86-video-intel-2.10.0 ? I can try with git tip of intel driver.

Since then I upgraded mesa to 7.7, no hangs occured as of now but I will have to test it for several days to be sure.
Comment 8 Jesse Barnes 2010-02-05 19:17:42 UTC
I think there were some commits after 2.10 released, yeah.  I remember seeing some mails fly by about it, but I don't remember enough of the keywords to dig them up.  Maybe Chris remembers.
Comment 9 Rafael J. Wysocki 2010-02-08 00:10:18 UTC
On Monday 08 February 2010, Matěj Laitl wrote:
> On Sunday 07 February 2010 23:28:50 Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.32.  Please verify if it still should be listed and let the
> > tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=15114
> > Subject             : X.org hang with [drm:i915_gem_do_execbuffer] *ERROR*
> in dmesg
> > Submitter   : Matej Laitl <strohel@gmail.com>
> > Date                : 2010-01-23 19:54 (16 days old)
> 
> I confirm the regression still exists in 2.6.33-rc6-00228-g56dca4c
Comment 10 Matej Laitl 2010-02-18 19:13:30 UTC
I cannot trigger this any more with 2.6.33-rc8.

I'm closing this for now, I whill reopen should it happen again.