Bug 12419
Summary: | possible circular locking dependency on i915 dma | ||
---|---|---|---|
Product: | Drivers | Reporter: | Rafael J. Wysocki (rjw) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | james, jobi, sergio, sitsofe |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28-git | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 12398 |
Description
Rafael J. Wysocki
2009-01-10 17:02:11 UTC
Hello, I see something similar when I run GL apps except in my traces I have i915_gem_execbuffer instead of i915_cmdbuffer --- is this the same issue? On Tuesday 20 January 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-1-20 5:32:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.28. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject : possible circular locking dependency on i915 dma
> > Submitter : Wang Chen <wangchen@cn.fujitsu.com>
> > Date : 2009-01-08 14:11 (12 days old)
> > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> >
>
> This regression is still there in mainline.
On Thursday 05 February 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-2-4 18:23:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.28. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject : possible circular locking dependency on i915 dma
> > Submitter : Wang Chen <wangchen@cn.fujitsu.com>
> > Date : 2009-01-08 14:11 (28 days old)
> > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> >
>
> status not changed.
I'm seeing this on my EeePC 900 with 2.6.29rc5. It's highly reproducible: [ 172.040511] ======================================================= [ 172.040519] [ INFO: possible circular locking dependency detected ] [ 172.040525] 2.6.29-rc5 #16 [ 172.040529] ------------------------------------------------------- [ 172.040534] ioquake3.i386/2981 is trying to acquire lock: [ 172.040539] (&mm->mmap_sem){----}, at: [<c017597b>] might_fault+0x4b/0xa0 [ 172.040556] [ 172.040557] but task is already holding lock: [ 172.040562] (&dev->struct_mutex){--..}, at: [<c02b1020>] i915_cmdbuffer+0x90/0x530 [ 172.040575] [ 172.040577] which lock already depends on the new lock. [ 172.040579] [ 172.040583] [ 172.040584] the existing dependency chain (in reverse order) is: [ 172.040588] [ 172.040590] -> #2 (&dev->struct_mutex){--..}: [ 172.040598] [<c014a92d>] validate_chain+0xb6d/0x1150 [ 172.040609] [<c014c00c>] __lock_acquire+0x31c/0x600 [ 172.040617] [<c014c367>] lock_acquire+0x77/0xa0 [ 172.040624] [<c0402376>] __mutex_lock_common+0x86/0x340 [ 172.040633] [<c040270d>] mutex_lock_nested+0x3d/0x50 [ 172.040641] [<c02a6a42>] drm_vm_open+0x32/0x50 [ 172.040649] [<c01256a3>] dup_mm+0x233/0x320 [ 172.040658] [<c0126893>] copy_process+0xcd3/0xf30 [ 172.040665] [<c0126b74>] do_fork+0x84/0x350 [ 172.040672] [<c01017b4>] sys_clone+0x34/0x40 [ 172.040680] [<c0103355>] sysenter_do_call+0x12/0x35 [ 172.040687] [<ffffffff>] 0xffffffff [ 172.040693] [ 172.040694] -> #1 (&mm->mmap_sem/1){--..}: [ 172.040704] [<c014a92d>] validate_chain+0xb6d/0x1150 [ 172.040712] [<c014c00c>] __lock_acquire+0x31c/0x600 [ 172.040719] [<c014c367>] lock_acquire+0x77/0xa0 [ 172.040727] [<c013ea62>] down_write_nested+0x52/0x70 [ 172.040735] [<c0125536>] dup_mm+0xc6/0x320 [ 172.040742] [<c0126893>] copy_process+0xcd3/0xf30 [ 172.040750] [<c0126b74>] do_fork+0x84/0x350 [ 172.040757] [<c01017b4>] sys_clone+0x34/0x40 [ 172.040763] [<c0103355>] sysenter_do_call+0x12/0x35 [ 172.040770] [<ffffffff>] 0xffffffff [ 172.040787] [ 172.040788] -> #0 (&mm->mmap_sem){----}: [ 172.040796] [<c014a36a>] validate_chain+0x5aa/0x1150 [ 172.040804] [<c014c00c>] __lock_acquire+0x31c/0x600 [ 172.040811] [<c014c367>] lock_acquire+0x77/0xa0 [ 172.040819] [<c01759ac>] might_fault+0x7c/0xa0 [ 172.040825] [<c02b0664>] i915_emit_box+0x24/0x290 [ 172.040832] [<c02b12e2>] i915_cmdbuffer+0x352/0x530 [ 172.040840] [<c02a147d>] drm_ioctl+0xed/0x2d0 [ 172.040847] [<c01931a7>] vfs_ioctl+0x67/0x70 On Monday 16 February 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-2-15 4:38:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.28. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject : possible circular locking dependency on i915 dma
> > Submitter : Wang Chen <wangchen@cn.fujitsu.com>
> > Date : 2009-01-08 14:11 (38 days old)
> > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> >
> >
>
> Yes. It's still there in mainline.
> I think the commit 546b0974c39657017407c86fe79811100b60700d
> "i915: Use struct_mutex to protect ring in GEM mode." brought this
> regression.
>
> The lockdep problem is as following:
> thread-1
> i915_cmdbuffer()
> |
> ---> lock(drm_device->struct_mutex)
> |
> V
> i915_dispatch_cmdbuffer()
> |
> ---->i915_emit_box()
> |
> ----->copy_from_user()
> |
> -----might_fault()
> |
> --->lock(mm->mmap_sem)
>
> thread-2
> dup_mm()
> |
> --->lock(mm->mmap_sem)
> |
> V
> drm_vm_open()
> |
> -------> lock(drm_device->struct_mutex)
>
> The different order to lock "mmap_sem" and "drm_dev->struct_mutex" introduces
> the problem.
> But it seems no way to reverse the lock order in i915.
> So how about refine the lock granularity of drm_dev->struct_mutex and exclude
> the mmap_sem
> lock/unlock out of the drm_dev->struct_mutex lock/unlock range?
First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d Notify-Also : Eric Anholt <eric@anholt.net> Notify-Also : Dave Airlie <airlied@redhat.com> On Tuesday 24 February 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-2-24 5:48:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.28. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject : possible circular locking dependency on i915 dma
> > Submitter : Wang Chen <wangchen@cn.fujitsu.com>
> > Date : 2009-01-08 14:11 (47 days old)
> > First-Bad-Commit:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=546b0974c39657017407c86fe79811100b60700d
> > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> >
>
> not changed.
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.29-rc6-default #165
> -------------------------------------------------------
> X/3940 is trying to acquire lock:
> (&mm->mmap_sem){----}, at: [<c0168e97>] might_fault+0x42/0x7e
>
> but task is already holding lock:
> (&dev->struct_mutex){--..}, at: [<eeb76fed>] i915_cmdbuffer+0xf4/0x411
> [i915]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&dev->struct_mutex){--..}:
> [<c013791a>] validate_chain+0x8be/0xbb5
> [<c0138280>] __lock_acquire+0x66f/0x6f9
> [<c0138365>] lock_acquire+0x5b/0x77
> [<c02e56fe>] mutex_lock_nested+0xdb/0x244
> [<eeb5b03e>] drm_vm_open+0x25/0x37 [drm]
> [<c011a8b3>] dup_mm+0x247/0x2f2
> [<c011b312>] copy_process+0x98c/0xfeb
> [<c011bac7>] do_fork+0x120/0x29c
> [<c01016be>] sys_clone+0x25/0x2a
> [<c0102cdd>] sysenter_do_call+0x12/0x31
> [<ffffffff>] 0xffffffff
>
> -> #1 (&mm->mmap_sem/1){--..}:
> [<c013791a>] validate_chain+0x8be/0xbb5
> [<c0138280>] __lock_acquire+0x66f/0x6f9
> [<c0138365>] lock_acquire+0x5b/0x77
> [<c012e6f6>] down_write_nested+0x32/0x4f
> [<c011a711>] dup_mm+0xa5/0x2f2
> [<c011b312>] copy_process+0x98c/0xfeb
> [<c011bac7>] do_fork+0x120/0x29c
> [<c01016be>] sys_clone+0x25/0x2a
> [<c0102cdd>] sysenter_do_call+0x12/0x31
> [<ffffffff>] 0xffffffff
>
> -> #0 (&mm->mmap_sem){----}:
> [<c0137625>] validate_chain+0x5c9/0xbb5
> [<c0138280>] __lock_acquire+0x66f/0x6f9
> [<c0138365>] lock_acquire+0x5b/0x77
> [<c0168eb4>] might_fault+0x5f/0x7e
> [<eeb76d0a>] i915_emit_box+0x1d/0x20c [i915]
> [<eeb7705e>] i915_cmdbuffer+0x165/0x411 [i915]
> [<eeb5685b>] drm_ioctl+0x1a6/0x21b [drm]
> [<c0182b29>] vfs_ioctl+0x3d/0x50
> [<c0183029>] do_vfs_ioctl+0x41b/0x483
> [<c01830d1>] sys_ioctl+0x40/0x5a
> [<c0102cdd>] sysenter_do_call+0x12/0x31
> [<ffffffff>] 0xffffffff
First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d On Wednesday 04 March 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-3-4 3:25:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.28. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject : possible circular locking dependency on i915 dma
> > Submitter : Wang Chen <wangchen@cn.fujitsu.com>
> > Date : 2009-01-08 14:11 (55 days old)
> > First-Bad-Commit:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=546b0974c39657017407c86fe79811100b60700d
> > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> >
>
> yet not fixed.
untested patch in my for-review branch: commit 891d7ad89882bd81377f09b6dd5823686cc6ba07 Author: Eric Anholt <eric@anholt.net> Date: Wed Mar 11 12:30:04 2009 -0700 drm/i915: Fix lock order reversal with cliprects and cmdbuf in non-GEM paths. This introduces allocation in the batch submission path that wasn't there previously, but this is a compatibility path so we care about simplicity more than performance. kernel.org bug #12419. Signed-off-by: Eric Anholt <eric@anholt.net> Testing the for-review tree (2.6.29-rc7-00158-gbe68829) on http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=summary results in GL not working at all and messages like the following appearing in dmesg on my i915: [ 406.861101] [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22 [ 406.863975] [drm:i915_cmdbuffer] *ERROR* i915_dispatch_cmdbuffer failed First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d Thanks for the report. I think this was the bug: - ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, data); + ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, batch_data); but I still haven't actually tested that path yet (been trying to get the GEM paths solid). New for-review pushed. OK if this latest fix went into for-review (it was kind of hard to tell because the dates are unchanged but the commit id has) then the problem has been resolved (only the classic [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22 seems to be logged). On Monday 23 March 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-3-22 0:28:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.28. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject : possible circular locking dependency on i915 dma
> > Submitter : Wang Chen <wangchen@cn.fujitsu.com>
> > Date : 2009-01-08 14:11 (73 days old)
>
> not yet fixed :)
(In reply to comment #13) > Thanks for the report. I think this was the bug: > > - ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, data); > + ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, batch_data); > > but I still haven't actually tested that path yet (been trying to get the GEM > paths solid). New for-review pushed. I don't find the code which have "ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects," for check it out, this is for libdrm, for kernel or for drive ? ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.29-16.fc10.i686.PAE #1 ------------------------------------------------------- X/2732 is trying to acquire lock: (&mm->mmap_sem){----}, at: [<c049c5ea>] might_fault+0x48/0x85 but task is already holding lock: (&dev->struct_mutex){--..}, at: [<f8353b97>] i915_gem_execbuffer+0x105/0xad7 [i915] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&dev->struct_mutex){--..}: [<c045a290>] __lock_acquire+0x9a8/0xb1b [<c045a45e>] lock_acquire+0x5b/0x81 [<c070ff00>] __mutex_lock_common+0xda/0x32e [<c07101fb>] mutex_lock_nested+0x33/0x3b [<f81bc7b0>] drm_gem_mmap+0x34/0xf8 [drm] [<c04a384f>] mmap_region+0x255/0x3f9 [<c04a3c3a>] do_mmap_pgoff+0x247/0x297 [<c040c739>] sys_mmap2+0x5f/0x80 [<c040966b>] sysenter_do_call+0x12/0x3f [<ffffffff>] 0xffffffff -> #0 (&mm->mmap_sem){----}: [<c045a165>] __lock_acquire+0x87d/0xb1b [<c045a45e>] lock_acquire+0x5b/0x81 [<c049c607>] might_fault+0x65/0x85 [<c055330c>] copy_from_user+0x2f/0x117 [<f8353d55>] i915_gem_execbuffer+0x2c3/0xad7 [i915] [<f81bb8c2>] drm_ioctl+0x1c4/0x241 [drm] [<c04c18c7>] vfs_ioctl+0x55/0x6e [<c04c1e18>] do_vfs_ioctl+0x46f/0x4a8 [<c04c1e96>] sys_ioctl+0x45/0x5f [<c040966b>] sysenter_do_call+0x12/0x3f [<ffffffff>] 0xffffffff other info that might help us debug this: 1 lock held by X/2732: #0: (&dev->struct_mutex){--..}, at: [<f8353b97>] i915_gem_execbuffer+0x105/0xad7 [i915] stack backtrace: Pid: 2732, comm: X Not tainted 2.6.29-16.fc10.i686.PAE #1 Call Trace: [<c070ec6f>] ? printk+0x14/0x1d [<c04596d1>] print_circular_bug_tail+0x5d/0x68 [<c045a165>] __lock_acquire+0x87d/0xb1b [<c045a45e>] lock_acquire+0x5b/0x81 [<c049c5ea>] ? might_fault+0x48/0x85 [<c049c607>] might_fault+0x65/0x85 [<c049c5ea>] ? might_fault+0x48/0x85 [<c055330c>] copy_from_user+0x2f/0x117 [<f8353d55>] i915_gem_execbuffer+0x2c3/0xad7 [i915] [<c04b0d29>] ? __slab_alloc+0x3d0/0x445 [<c049c625>] ? might_fault+0x83/0x85 [<c055330c>] ? copy_from_user+0x2f/0x117 [<f81bb8c2>] drm_ioctl+0x1c4/0x241 [drm] [<f8353a92>] ? i915_gem_execbuffer+0x0/0xad7 [i915] [<c04c18c7>] vfs_ioctl+0x55/0x6e [<c04c1e18>] do_vfs_ioctl+0x46f/0x4a8 [<c0556448>] ? _raw_spin_unlock+0x74/0x78 [<c04b6b92>] ? fsnotify_modify+0x54/0x5f [<c04b6d83>] ? do_sync_write+0x0/0xee [<c04b769f>] ? vfs_write+0xa9/0xe4 [<c04c1e96>] sys_ioctl+0x45/0x5f [<c040966b>] sysenter_do_call+0x12/0x3f |