Bug 12419

Summary: possible circular locking dependency on i915 dma
Product: Drivers Reporter: Rafael J. Wysocki (rjw)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: CLOSED CODE_FIX    
Severity: normal CC: james, jobi, sergio, sitsofe
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-git Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 12398    

Description Rafael J. Wysocki 2009-01-10 17:02:11 UTC
Subject    : possible circular locking dependency on i915 dma
Submitter  : Wang Chen <wangchen@cn.fujitsu.com>
Date       : 2009-01-08 14:11
References : http://marc.info/?l=linux-kernel&m=123142399720125&w=4

This entry is being used for tracking a regression from 2.6.28.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 James Ettle 2009-01-30 01:35:07 UTC
Hello, I see something similar when I run GL apps except in my traces I have i915_gem_execbuffer instead of i915_cmdbuffer --- is this the same issue?
Comment 2 Rafael J. Wysocki 2009-02-03 15:02:14 UTC
On Tuesday 20 January 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-1-20 5:32:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.28.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject             : possible circular locking dependency on i915 dma
> > Submitter   : Wang Chen <wangchen@cn.fujitsu.com>
> > Date                : 2009-01-08 14:11 (12 days old)
> > References  : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> > 
> 
> This regression is still there in mainline.
Comment 3 Rafael J. Wysocki 2009-02-05 07:59:08 UTC
On Thursday 05 February 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-2-4 18:23:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.28.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject             : possible circular locking dependency on i915 dma
> > Submitter   : Wang Chen <wangchen@cn.fujitsu.com>
> > Date                : 2009-01-08 14:11 (28 days old)
> > References  : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> > 
> 
> status not changed.
Comment 4 Sitsofe Wheeler 2009-02-15 07:58:12 UTC
I'm seeing this on my EeePC 900 with 2.6.29rc5. It's highly reproducible:

[  172.040511] =======================================================
[  172.040519] [ INFO: possible circular locking dependency detected ]
[  172.040525] 2.6.29-rc5 #16
[  172.040529] -------------------------------------------------------
[  172.040534] ioquake3.i386/2981 is trying to acquire lock:
[  172.040539]  (&mm->mmap_sem){----}, at: [<c017597b>] might_fault+0x4b/0xa0
[  172.040556] 
[  172.040557] but task is already holding lock:
[  172.040562]  (&dev->struct_mutex){--..}, at: [<c02b1020>] i915_cmdbuffer+0x90/0x530
[  172.040575] 
[  172.040577] which lock already depends on the new lock.
[  172.040579] 
[  172.040583] 
[  172.040584] the existing dependency chain (in reverse order) is:
[  172.040588] 
[  172.040590] -> #2 (&dev->struct_mutex){--..}:
[  172.040598]        [<c014a92d>] validate_chain+0xb6d/0x1150
[  172.040609]        [<c014c00c>] __lock_acquire+0x31c/0x600
[  172.040617]        [<c014c367>] lock_acquire+0x77/0xa0
[  172.040624]        [<c0402376>] __mutex_lock_common+0x86/0x340
[  172.040633]        [<c040270d>] mutex_lock_nested+0x3d/0x50
[  172.040641]        [<c02a6a42>] drm_vm_open+0x32/0x50
[  172.040649]        [<c01256a3>] dup_mm+0x233/0x320
[  172.040658]        [<c0126893>] copy_process+0xcd3/0xf30
[  172.040665]        [<c0126b74>] do_fork+0x84/0x350
[  172.040672]        [<c01017b4>] sys_clone+0x34/0x40
[  172.040680]        [<c0103355>] sysenter_do_call+0x12/0x35
[  172.040687]        [<ffffffff>] 0xffffffff
[  172.040693] 
[  172.040694] -> #1 (&mm->mmap_sem/1){--..}:
[  172.040704]        [<c014a92d>] validate_chain+0xb6d/0x1150
[  172.040712]        [<c014c00c>] __lock_acquire+0x31c/0x600
[  172.040719]        [<c014c367>] lock_acquire+0x77/0xa0
[  172.040727]        [<c013ea62>] down_write_nested+0x52/0x70
[  172.040735]        [<c0125536>] dup_mm+0xc6/0x320
[  172.040742]        [<c0126893>] copy_process+0xcd3/0xf30
[  172.040750]        [<c0126b74>] do_fork+0x84/0x350
[  172.040757]        [<c01017b4>] sys_clone+0x34/0x40
[  172.040763]        [<c0103355>] sysenter_do_call+0x12/0x35
[  172.040770]        [<ffffffff>] 0xffffffff
[  172.040787] 
[  172.040788] -> #0 (&mm->mmap_sem){----}:
[  172.040796]        [<c014a36a>] validate_chain+0x5aa/0x1150
[  172.040804]        [<c014c00c>] __lock_acquire+0x31c/0x600
[  172.040811]        [<c014c367>] lock_acquire+0x77/0xa0
[  172.040819]        [<c01759ac>] might_fault+0x7c/0xa0
[  172.040825]        [<c02b0664>] i915_emit_box+0x24/0x290
[  172.040832]        [<c02b12e2>] i915_cmdbuffer+0x352/0x530
[  172.040840]        [<c02a147d>] drm_ioctl+0xed/0x2d0
[  172.040847]        [<c01931a7>] vfs_ioctl+0x67/0x70
Comment 5 Rafael J. Wysocki 2009-02-16 12:43:55 UTC
On Monday 16 February 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-2-15 4:38:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.28.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject             : possible circular locking dependency on i915 dma
> > Submitter   : Wang Chen <wangchen@cn.fujitsu.com>
> > Date                : 2009-01-08 14:11 (38 days old)
> > References  : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> > 
> > 
> 
> Yes. It's still there in mainline.
> I think the commit 546b0974c39657017407c86fe79811100b60700d
> "i915: Use struct_mutex to protect ring in GEM mode." brought this
> regression.
> 
> The lockdep problem is as following:
> thread-1
> i915_cmdbuffer()
>       |
>       ---> lock(drm_device->struct_mutex)
>                    |
>                  V
>       i915_dispatch_cmdbuffer()
>                  |
>                  ---->i915_emit_box()
>                              |
>                              ----->copy_from_user()
>                                       |
>                                       -----might_fault()
>                                               |
>                                               --->lock(mm->mmap_sem)
> 
> thread-2
> dup_mm()
>    |
>    --->lock(mm->mmap_sem)
>            |
>          V
>       drm_vm_open()
>          |
>          -------> lock(drm_device->struct_mutex)
> 
> The different order to lock "mmap_sem" and "drm_dev->struct_mutex" introduces
> the problem.
> But it seems no way to reverse the lock order in i915.
> So how about refine the lock granularity of drm_dev->struct_mutex and exclude
> the mmap_sem
> lock/unlock out of the drm_dev->struct_mutex lock/unlock range?
Comment 6 Rafael J. Wysocki 2009-02-16 12:48:57 UTC
First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d

Notify-Also : Eric Anholt <eric@anholt.net>
Notify-Also : Dave Airlie <airlied@redhat.com>
Comment 7 Rafael J. Wysocki 2009-02-25 14:54:26 UTC
On Tuesday 24 February 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-2-24 5:48:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.28.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject             : possible circular locking dependency on i915 dma
> > Submitter   : Wang Chen <wangchen@cn.fujitsu.com>
> > Date                : 2009-01-08 14:11 (47 days old)
> > First-Bad-Commit:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=546b0974c39657017407c86fe79811100b60700d
> > References  : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> > 
> 
> not changed.
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.29-rc6-default #165
> -------------------------------------------------------
> X/3940 is trying to acquire lock:
>  (&mm->mmap_sem){----}, at: [<c0168e97>] might_fault+0x42/0x7e
> 
> but task is already holding lock:
>  (&dev->struct_mutex){--..}, at: [<eeb76fed>] i915_cmdbuffer+0xf4/0x411
>  [i915]
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&dev->struct_mutex){--..}:
>        [<c013791a>] validate_chain+0x8be/0xbb5
>        [<c0138280>] __lock_acquire+0x66f/0x6f9
>        [<c0138365>] lock_acquire+0x5b/0x77
>        [<c02e56fe>] mutex_lock_nested+0xdb/0x244
>        [<eeb5b03e>] drm_vm_open+0x25/0x37 [drm]
>        [<c011a8b3>] dup_mm+0x247/0x2f2
>        [<c011b312>] copy_process+0x98c/0xfeb
>        [<c011bac7>] do_fork+0x120/0x29c
>        [<c01016be>] sys_clone+0x25/0x2a
>        [<c0102cdd>] sysenter_do_call+0x12/0x31
>        [<ffffffff>] 0xffffffff
> 
> -> #1 (&mm->mmap_sem/1){--..}:
>        [<c013791a>] validate_chain+0x8be/0xbb5
>        [<c0138280>] __lock_acquire+0x66f/0x6f9
>        [<c0138365>] lock_acquire+0x5b/0x77
>        [<c012e6f6>] down_write_nested+0x32/0x4f
>        [<c011a711>] dup_mm+0xa5/0x2f2
>        [<c011b312>] copy_process+0x98c/0xfeb
>        [<c011bac7>] do_fork+0x120/0x29c
>        [<c01016be>] sys_clone+0x25/0x2a
>        [<c0102cdd>] sysenter_do_call+0x12/0x31
>        [<ffffffff>] 0xffffffff
> 
> -> #0 (&mm->mmap_sem){----}:
>        [<c0137625>] validate_chain+0x5c9/0xbb5
>        [<c0138280>] __lock_acquire+0x66f/0x6f9
>        [<c0138365>] lock_acquire+0x5b/0x77
>        [<c0168eb4>] might_fault+0x5f/0x7e
>        [<eeb76d0a>] i915_emit_box+0x1d/0x20c [i915]
>        [<eeb7705e>] i915_cmdbuffer+0x165/0x411 [i915]
>        [<eeb5685b>] drm_ioctl+0x1a6/0x21b [drm]
>        [<c0182b29>] vfs_ioctl+0x3d/0x50
>        [<c0183029>] do_vfs_ioctl+0x41b/0x483
>        [<c01830d1>] sys_ioctl+0x40/0x5a
>        [<c0102cdd>] sysenter_do_call+0x12/0x31
>        [<ffffffff>] 0xffffffff
Comment 8 Rafael J. Wysocki 2009-03-02 14:17:39 UTC
First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d
Comment 9 Rafael J. Wysocki 2009-03-04 11:44:46 UTC
On Wednesday 04 March 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-3-4 3:25:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.28.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject             : possible circular locking dependency on i915 dma
> > Submitter   : Wang Chen <wangchen@cn.fujitsu.com>
> > Date                : 2009-01-08 14:11 (55 days old)
> > First-Bad-Commit:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=546b0974c39657017407c86fe79811100b60700d
> > References  : http://marc.info/?l=linux-kernel&m=123142399720125&w=4
> > 
> 
> yet not fixed.
Comment 10 Eric Anholt 2009-03-11 12:37:18 UTC
untested patch in my for-review branch:

commit 891d7ad89882bd81377f09b6dd5823686cc6ba07
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Mar 11 12:30:04 2009 -0700

    drm/i915: Fix lock order reversal with cliprects and cmdbuf in non-GEM paths.
    
    This introduces allocation in the batch submission path that wasn't there
    previously, but this is a compatibility path so we care about simplicity
    more than performance.
    
    kernel.org bug #12419.
    
    Signed-off-by: Eric Anholt <eric@anholt.net>
Comment 11 Sitsofe Wheeler 2009-03-14 07:43:09 UTC
Testing the for-review tree (2.6.29-rc7-00158-gbe68829) on http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=summary results in GL not working at all and messages like the following appearing in dmesg on my i915:

[  406.861101] [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22
[  406.863975] [drm:i915_cmdbuffer] *ERROR* i915_dispatch_cmdbuffer failed
Comment 12 Rafael J. Wysocki 2009-03-14 14:10:44 UTC
First-Bad-Commit : 546b0974c39657017407c86fe79811100b60700d
Comment 13 Eric Anholt 2009-03-16 15:22:18 UTC
Thanks for the report.  I think this was the bug:

-	ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, data);
+	ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, batch_data);

but I still haven't actually tested that path yet (been trying to get the GEM paths solid).  New for-review pushed.
Comment 14 Sitsofe Wheeler 2009-03-17 08:01:53 UTC
OK if this latest fix went into for-review (it was kind of hard to tell because the dates are unchanged but the commit id has) then the problem has been resolved (only the classic [drm:drm_wait_vblank] *ERROR* failed to acquire vblank counter, -22 seems to be logged).
Comment 15 Rafael J. Wysocki 2009-03-23 15:19:10 UTC
On Monday 23 March 2009, Wang Chen wrote:
> Rafael J. Wysocki said the following on 2009-3-22 0:28:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.28.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=12419
> > Subject             : possible circular locking dependency on i915 dma
> > Submitter   : Wang Chen <wangchen@cn.fujitsu.com>
> > Date                : 2009-01-08 14:11 (73 days old)
> 
> not yet fixed :)
Comment 16 Sérgio M Basto 2009-03-31 02:10:22 UTC
(In reply to comment #13)
> Thanks for the report.  I think this was the bug:
> 
> -    ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, data);
> +    ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects, batch_data);
> 
> but I still haven't actually tested that path yet (been trying to get the GEM
> paths solid).  New for-review pushed.

I don't find the code which have "ret = i915_dispatch_cmdbuffer(dev, cmdbuf, cliprects," for check it out, this is for libdrm, for kernel or for drive ? 

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.29-16.fc10.i686.PAE #1                             
-------------------------------------------------------
X/2732 is trying to acquire lock:                      
 (&mm->mmap_sem){----}, at: [<c049c5ea>] might_fault+0x48/0x85

but task is already holding lock:
 (&dev->struct_mutex){--..}, at: [<f8353b97>] i915_gem_execbuffer+0x105/0xad7 [i915]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&dev->struct_mutex){--..}:
       [<c045a290>] __lock_acquire+0x9a8/0xb1b
       [<c045a45e>] lock_acquire+0x5b/0x81    
       [<c070ff00>] __mutex_lock_common+0xda/0x32e
       [<c07101fb>] mutex_lock_nested+0x33/0x3b   
       [<f81bc7b0>] drm_gem_mmap+0x34/0xf8 [drm]  
       [<c04a384f>] mmap_region+0x255/0x3f9       
       [<c04a3c3a>] do_mmap_pgoff+0x247/0x297     
       [<c040c739>] sys_mmap2+0x5f/0x80           
       [<c040966b>] sysenter_do_call+0x12/0x3f    
       [<ffffffff>] 0xffffffff                    

-> #0 (&mm->mmap_sem){----}:
       [<c045a165>] __lock_acquire+0x87d/0xb1b
       [<c045a45e>] lock_acquire+0x5b/0x81    
       [<c049c607>] might_fault+0x65/0x85     
       [<c055330c>] copy_from_user+0x2f/0x117 
       [<f8353d55>] i915_gem_execbuffer+0x2c3/0xad7 [i915]
       [<f81bb8c2>] drm_ioctl+0x1c4/0x241 [drm]           
       [<c04c18c7>] vfs_ioctl+0x55/0x6e                   
       [<c04c1e18>] do_vfs_ioctl+0x46f/0x4a8              
       [<c04c1e96>] sys_ioctl+0x45/0x5f                   
       [<c040966b>] sysenter_do_call+0x12/0x3f            
       [<ffffffff>] 0xffffffff                            

other info that might help us debug this:

1 lock held by X/2732:
 #0:  (&dev->struct_mutex){--..}, at: [<f8353b97>] i915_gem_execbuffer+0x105/0xad7 [i915]

stack backtrace:
Pid: 2732, comm: X Not tainted 2.6.29-16.fc10.i686.PAE #1
Call Trace:                                              
 [<c070ec6f>] ? printk+0x14/0x1d                         
 [<c04596d1>] print_circular_bug_tail+0x5d/0x68          
 [<c045a165>] __lock_acquire+0x87d/0xb1b                 
 [<c045a45e>] lock_acquire+0x5b/0x81                     
 [<c049c5ea>] ? might_fault+0x48/0x85                    
 [<c049c607>] might_fault+0x65/0x85                      
 [<c049c5ea>] ? might_fault+0x48/0x85                    
 [<c055330c>] copy_from_user+0x2f/0x117                  
 [<f8353d55>] i915_gem_execbuffer+0x2c3/0xad7 [i915]     
 [<c04b0d29>] ? __slab_alloc+0x3d0/0x445                 
 [<c049c625>] ? might_fault+0x83/0x85                    
 [<c055330c>] ? copy_from_user+0x2f/0x117                
 [<f81bb8c2>] drm_ioctl+0x1c4/0x241 [drm]                
 [<f8353a92>] ? i915_gem_execbuffer+0x0/0xad7 [i915]     
 [<c04c18c7>] vfs_ioctl+0x55/0x6e                        
 [<c04c1e18>] do_vfs_ioctl+0x46f/0x4a8                   
 [<c0556448>] ? _raw_spin_unlock+0x74/0x78               
 [<c04b6b92>] ? fsnotify_modify+0x54/0x5f
 [<c04b6d83>] ? do_sync_write+0x0/0xee
 [<c04b769f>] ? vfs_write+0xa9/0xe4
 [<c04c1e96>] sys_ioctl+0x45/0x5f
 [<c040966b>] sysenter_do_call+0x12/0x3f