Subject : possible circular locking dependency on i915 dma Submitter : Wang Chen <wangchen@cn.fujitsu.com> Date : 2009-01-08 14:11 References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4 This entry is being used for tracking a regression from 2.6.28. Please don't close it until the problem is fixed in the mainline.
Hello, I see something similar when I run GL apps except in my traces I have i915_gem_execbuffer instead of i915_cmdbuffer --- is this the same issue?
On Tuesday 20 January 2009, Wang Chen wrote: > Rafael J. Wysocki said the following on 2009-1-20 5:32: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.28. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419 > > Subject : possible circular locking dependency on i915 dma > > Submitter : Wang Chen <wangchen@cn.fujitsu.com> > > Date : 2009-01-08 14:11 (12 days old) > > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4 > > > > This regression is still there in mainline.
On Thursday 05 February 2009, Wang Chen wrote: > Rafael J. Wysocki said the following on 2009-2-4 18:23: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.28. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419 > > Subject : possible circular locking dependency on i915 dma > > Submitter : Wang Chen <wangchen@cn.fujitsu.com> > > Date : 2009-01-08 14:11 (28 days old) > > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4 > > > > status not changed.
I'm seeing this on my EeePC 900 with 2.6.29rc5. It's highly reproducible: [ 172.040511] ======================================================= [ 172.040519] [ INFO: possible circular locking dependency detected ] [ 172.040525] 2.6.29-rc5 #16 [ 172.040529] ------------------------------------------------------- [ 172.040534] ioquake3.i386/2981 is trying to acquire lock: [ 172.040539] (&mm->mmap_sem){----}, at: [<c017597b>] might_fault+0x4b/0xa0 [ 172.040556] [ 172.040557] but task is already holding lock: [ 172.040562] (&dev->struct_mutex){--..}, at: [<c02b1020>] i915_cmdbuffer+0x90/0x530 [ 172.040575] [ 172.040577] which lock already depends on the new lock. [ 172.040579] [ 172.040583] [ 172.040584] the existing dependency chain (in reverse order) is: [ 172.040588] [ 172.040590] -> #2 (&dev->struct_mutex){--..}: [ 172.040598] [<c014a92d>] validate_chain+0xb6d/0x1150 [ 172.040609] [<c014c00c>] __lock_acquire+0x31c/0x600 [ 172.040617] [<c014c367>] lock_acquire+0x77/0xa0 [ 172.040624] [<c0402376>] __mutex_lock_common+0x86/0x340 [ 172.040633] [<c040270d>] mutex_lock_nested+0x3d/0x50 [ 172.040641] [<c02a6a42>] drm_vm_open+0x32/0x50 [ 172.040649] [<c01256a3>] dup_mm+0x233/0x320 [ 172.040658] [<c0126893>] copy_process+0xcd3/0xf30 [ 172.040665] [<c0126b74>] do_fork+0x84/0x350 [ 172.040672] [<c01017b4>] sys_clone+0x34/0x40 [ 172.040680] [<c0103355>] sysenter_do_call+0x12/0x35 [ 172.040687] [<ffffffff>] 0xffffffff [ 172.040693] [ 172.040694] -> #1 (&mm->mmap_sem/1){--..}: [ 172.040704] [<c014a92d>] validate_chain+0xb6d/0x1150 [ 172.040712] [<c014c00c>] __lock_acquire+0x31c/0x600 [ 172.040719] [<c014c367>] lock_acquire+0x77/0xa0 [ 172.040727] [<c013ea62>] down_write_nested+0x52/0x70 [ 172.040735] [<c0125536>] dup_mm+0xc6/0x320 [ 172.040742] [<c0126893>] copy_process+0xcd3/0xf30 [ 172.040750] [<c0126b74>] do_fork+0x84/0x350 [ 172.040757] [<c01017b4>] sys_clone+0x34/0x40 [ 172.040763] [<c0103355>] sysenter_do_call+0x12/0x35 [ 172.040770] [<ffffffff>] 0xffffffff [ 172.040787] [ 172.040788] -> #0 (&mm->mmap_sem){----}: [ 172.040796] [<c014a36a>] validate_chain+0x5aa/0x1150 [ 172.040804] [<c014c00c>] __lock_acquire+0x31c/0x600 [ 172.040811] [<c014c367>] lock_acquire+0x77/0xa0 [ 172.040819] [<c01759ac>] might_fault+0x7c/0xa0 [ 172.040825] [<c02b0664>] i915_emit_box+0x24/0x290 [ 172.040832] [<c02b12e2>] i915_cmdbuffer+0x352/0x530 [ 172.040840] [<c02a147d>] drm_ioctl+0xed/0x2d0 [ 172.040847] [<c01931a7>] vfs_ioctl+0x67/0x70
On Monday 16 February 2009, Wang Chen wrote: > Rafael J. Wysocki said the following on 2009-2-15 4:38: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.28. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419 > > Subject : possible circular locking dependency on i915 dma > > Submitter : Wang Chen <wangchen@cn.fujitsu.com> > > Date : 2009-01-08 14:11 (38 days old) > > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4 > > > > > > Yes. It's still there in mainline. > I think the commit 546b0974c39657017407c86fe79811100b60700d > "i915: Use struct_mutex to protect ring in GEM mode." brought this > regression. > > The lockdep problem is as following: > thread-1 > i915_cmdbuffer() > | > ---> lock(drm_device->struct_mutex) > | > V > i915_dispatch_cmdbuffer() > | > ---->i915_emit_box() > | > ----->copy_from_user() > | > -----might_fault() > | > --->lock(mm->mmap_sem) > > thread-2 > dup_mm() > | > --->lock(mm->mmap_sem) > | > V > drm_vm_open() > | > -------> lock(drm_device->struct_mutex) > > The different order to lock "mmap_sem" and "drm_dev->struct_mutex" introduces > the problem. > But it seems no way to reverse the lock order in i915. > So how about refine the lock granularity of drm_dev->struct_mutex and exclude > the mmap_sem > lock/unlock out of the drm_dev->struct_mutex lock/unlock range?
First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d Notify-Also : Eric Anholt <eric@anholt.net> Notify-Also : Dave Airlie <airlied@redhat.com>
On Tuesday 24 February 2009, Wang Chen wrote: > Rafael J. Wysocki said the following on 2009-2-24 5:48: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.28. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419 > > Subject : possible circular locking dependency on i915 dma > > Submitter : Wang Chen <wangchen@cn.fujitsu.com> > > Date : 2009-01-08 14:11 (47 days old) > > First-Bad-Commit: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=546b0974c39657017407c86fe79811100b60700d > > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4 > > > > not changed. > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.29-rc6-default #165 > ------------------------------------------------------- > X/3940 is trying to acquire lock: > (&mm->mmap_sem){----}, at: [<c0168e97>] might_fault+0x42/0x7e > > but task is already holding lock: > (&dev->struct_mutex){--..}, at: [<eeb76fed>] i915_cmdbuffer+0xf4/0x411 > [i915] > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #2 (&dev->struct_mutex){--..}: > [<c013791a>] validate_chain+0x8be/0xbb5 > [<c0138280>] __lock_acquire+0x66f/0x6f9 > [<c0138365>] lock_acquire+0x5b/0x77 > [<c02e56fe>] mutex_lock_nested+0xdb/0x244 > [<eeb5b03e>] drm_vm_open+0x25/0x37 [drm] > [<c011a8b3>] dup_mm+0x247/0x2f2 > [<c011b312>] copy_process+0x98c/0xfeb > [<c011bac7>] do_fork+0x120/0x29c > [<c01016be>] sys_clone+0x25/0x2a > [<c0102cdd>] sysenter_do_call+0x12/0x31 > [<ffffffff>] 0xffffffff > > -> #1 (&mm->mmap_sem/1){--..}: > [<c013791a>] validate_chain+0x8be/0xbb5 > [<c0138280>] __lock_acquire+0x66f/0x6f9 > [<c0138365>] lock_acquire+0x5b/0x77 > [<c012e6f6>] down_write_nested+0x32/0x4f > [<c011a711>] dup_mm+0xa5/0x2f2 > [<c011b312>] copy_process+0x98c/0xfeb > [<c011bac7>] do_fork+0x120/0x29c > [<c01016be>] sys_clone+0x25/0x2a > [<c0102cdd>] sysenter_do_call+0x12/0x31 > [<ffffffff>] 0xffffffff > > -> #0 (&mm->mmap_sem){----}: > [<c0137625>] validate_chain+0x5c9/0xbb5 > [<c0138280>] __lock_acquire+0x66f/0x6f9 > [<c0138365>] lock_acquire+0x5b/0x77 > [<c0168eb4>] might_fault+0x5f/0x7e > [<eeb76d0a>] i915_emit_box+0x1d/0x20c [i915] > [<eeb7705e>] i915_cmdbuffer+0x165/0x411 [i915] > [<eeb5685b>] drm_ioctl+0x1a6/0x21b [drm] > [<c0182b29>] vfs_ioctl+0x3d/0x50 > [<c0183029>] do_vfs_ioctl+0x41b/0x483 > [<c01830d1>] sys_ioctl+0x40/0x5a > [<c0102cdd>] sysenter_do_call+0x12/0x31 > [<ffffffff>] 0xffffffff
First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d
On Wednesday 04 March 2009, Wang Chen wrote: > Rafael J. Wysocki said the following on 2009-3-4 3:25: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.28. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419 > > Subject : possible circular locking dependency on i915 dma > > Submitter : Wang Chen <wangchen@cn.fujitsu.com> > > Date : 2009-01-08 14:11 (55 days old) > > First-Bad-Commit: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=546b0974c39657017407c86fe79811100b60700d > > References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4 > > > > yet not fixed.
untested patch in my for-review branch: commit 891d7ad89882bd81377f09b6dd5823686cc6ba07 Author: Eric Anholt <eric@anholt.net> Date: Wed Mar 11 12:30:04 2009 -0700 drm/i915: Fix lock order reversal with cliprects and cmdbuf in non-GEM paths. This introduces allocation in the batch submission path that wasn't there previously, but this is a compatibility path so we care about simplicity more than performance. kernel.org bug #12419. Signed-off-by: Eric Anholt <eric@anholt.net>
Testing the for-review tree (2.6.29-rc7-00158-gbe68829) on http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=summary results in GL not working at all and messages like the following appearing in dmesg on my i915: [ 406.861101] [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22 [ 406.863975] [drm:i915_cmdbuffer] *ERROR* i915_dispatch_cmdbuffer failed
Thanks for the report. I think this was the bug: - ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, data); + ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, batch_data); but I still haven't actually tested that path yet (been trying to get the GEM paths solid). New for-review pushed.
OK if this latest fix went into for-review (it was kind of hard to tell because the dates are unchanged but the commit id has) then the problem has been resolved (only the classic [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22 seems to be logged).
On Monday 23 March 2009, Wang Chen wrote: > Rafael J. Wysocki said the following on 2009-3-22 0:28: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.28. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12419 > > Subject : possible circular locking dependency on i915 dma > > Submitter : Wang Chen <wangchen@cn.fujitsu.com> > > Date : 2009-01-08 14:11 (73 days old) > > not yet fixed :)
(In reply to comment #13) > Thanks for the report. I think this was the bug: > > - ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, data); > + ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, batch_data); > > but I still haven't actually tested that path yet (been trying to get the GEM > paths solid). New for-review pushed. I don't find the code which have "ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects," for check it out, this is for libdrm, for kernel or for drive ? ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.29-16.fc10.i686.PAE #1 ------------------------------------------------------- X/2732 is trying to acquire lock: (&mm->mmap_sem){----}, at: [<c049c5ea>] might_fault+0x48/0x85 but task is already holding lock: (&dev->struct_mutex){--..}, at: [<f8353b97>] i915_gem_execbuffer+0x105/0xad7 [i915] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&dev->struct_mutex){--..}: [<c045a290>] __lock_acquire+0x9a8/0xb1b [<c045a45e>] lock_acquire+0x5b/0x81 [<c070ff00>] __mutex_lock_common+0xda/0x32e [<c07101fb>] mutex_lock_nested+0x33/0x3b [<f81bc7b0>] drm_gem_mmap+0x34/0xf8 [drm] [<c04a384f>] mmap_region+0x255/0x3f9 [<c04a3c3a>] do_mmap_pgoff+0x247/0x297 [<c040c739>] sys_mmap2+0x5f/0x80 [<c040966b>] sysenter_do_call+0x12/0x3f [<ffffffff>] 0xffffffff -> #0 (&mm->mmap_sem){----}: [<c045a165>] __lock_acquire+0x87d/0xb1b [<c045a45e>] lock_acquire+0x5b/0x81 [<c049c607>] might_fault+0x65/0x85 [<c055330c>] copy_from_user+0x2f/0x117 [<f8353d55>] i915_gem_execbuffer+0x2c3/0xad7 [i915] [<f81bb8c2>] drm_ioctl+0x1c4/0x241 [drm] [<c04c18c7>] vfs_ioctl+0x55/0x6e [<c04c1e18>] do_vfs_ioctl+0x46f/0x4a8 [<c04c1e96>] sys_ioctl+0x45/0x5f [<c040966b>] sysenter_do_call+0x12/0x3f [<ffffffff>] 0xffffffff other info that might help us debug this: 1 lock held by X/2732: #0: (&dev->struct_mutex){--..}, at: [<f8353b97>] i915_gem_execbuffer+0x105/0xad7 [i915] stack backtrace: Pid: 2732, comm: X Not tainted 2.6.29-16.fc10.i686.PAE #1 Call Trace: [<c070ec6f>] ? printk+0x14/0x1d [<c04596d1>] print_circular_bug_tail+0x5d/0x68 [<c045a165>] __lock_acquire+0x87d/0xb1b [<c045a45e>] lock_acquire+0x5b/0x81 [<c049c5ea>] ? might_fault+0x48/0x85 [<c049c607>] might_fault+0x65/0x85 [<c049c5ea>] ? might_fault+0x48/0x85 [<c055330c>] copy_from_user+0x2f/0x117 [<f8353d55>] i915_gem_execbuffer+0x2c3/0xad7 [i915] [<c04b0d29>] ? __slab_alloc+0x3d0/0x445 [<c049c625>] ? might_fault+0x83/0x85 [<c055330c>] ? copy_from_user+0x2f/0x117 [<f81bb8c2>] drm_ioctl+0x1c4/0x241 [drm] [<f8353a92>] ? i915_gem_execbuffer+0x0/0xad7 [i915] [<c04c18c7>] vfs_ioctl+0x55/0x6e [<c04c1e18>] do_vfs_ioctl+0x46f/0x4a8 [<c0556448>] ? _raw_spin_unlock+0x74/0x78 [<c04b6b92>] ? fsnotify_modify+0x54/0x5f [<c04b6d83>] ? do_sync_write+0x0/0xee [<c04b769f>] ? vfs_write+0xa9/0xe4 [<c04c1e96>] sys_ioctl+0x45/0x5f [<c040966b>] sysenter_do_call+0x12/0x3f