Bug 13809 - oprofile: possible circular locking dependency detected
Summary: oprofile: possible circular locking dependency detected
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks: 13615
  Show dependency tree
 
Reported: 2009-07-22 13:35 UTC by Jerome Marchand
Modified: 2012-03-13 22:57 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.31-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
config file (79.66 KB, application/octet-stream)
2009-08-05 14:28 UTC, Jerome Marchand
Details

Description Jerome Marchand 2009-07-22 13:35:13 UTC
I've hit a "possible circular locking dependency detected" while trying to use oprofile while doing heavy I/O.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.31-rc2 #53
-------------------------------------------------------
oprofiled/6216 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<c048e1e1>] might_fault+0x43/0x80

but task is already holding lock:
 (dcookie_mutex){+.+.+.}, at: [<c04e576b>] sys_lookup_dcookie+0x2a/0x11c

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (dcookie_mutex){+.+.+.}:
       [<c044d0ad>] __lock_acquire+0x987/0xaf6
       [<c044d2a5>] lock_acquire+0x89/0xa6
       [<c0691244>] __mutex_lock_common+0x35/0x2bc
       [<c0691563>] mutex_lock_nested+0x2e/0x36
       [<c04e587d>] get_dcookie+0x20/0xf7
       [<faaeab83>] sync_buffer+0x180/0x395 [oprofile]
       [<faaeada6>] task_exit_notify+0xe/0x12 [oprofile]
       [<c0693f7f>] notifier_call_chain+0x2b/0x4a
       [<c0443701>] __blocking_notifier_call_chain+0x37/0x4c
       [<c0443722>] blocking_notifier_call_chain+0xc/0xe
       [<c0445d86>] profile_task_exit+0x11/0x13
       [<c0431469>] do_exit+0x19/0x5ef
       [<c0431a9d>] do_group_exit+0x5e/0x85
       [<c0431ad7>] sys_exit_group+0x13/0x17
       [<c04029b4>] sysenter_do_call+0x12/0x32
       [<ffffffff>] 0xffffffff

-> #0 (&mm->mmap_sem){++++++}:
       [<c044cfbf>] __lock_acquire+0x899/0xaf6
       [<c044d2a5>] lock_acquire+0x89/0xa6
       [<c048e1fe>] might_fault+0x60/0x80
       [<c05348d8>] copy_to_user+0x2c/0xfc
       [<c04e5831>] sys_lookup_dcookie+0xf0/0x11c
       [<c04029b4>] sysenter_do_call+0x12/0x32
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by oprofiled/6216:
 #0:  (dcookie_mutex){+.+.+.}, at: [<c04e576b>] sys_lookup_dcookie+0x2a/0x11c

stack backtrace:
Pid: 6216, comm: oprofiled Not tainted 2.6.31-rc2 #53
Call Trace:
 [<c0690158>] ? printk+0xf/0x17
 [<c044c462>] print_circular_bug_tail+0x5d/0x68
 [<c044cfbf>] __lock_acquire+0x899/0xaf6
 [<c044d2a5>] lock_acquire+0x89/0xa6
 [<c048e1e1>] ? might_fault+0x43/0x80
 [<c048e1fe>] might_fault+0x60/0x80
 [<c048e1e1>] ? might_fault+0x43/0x80
 [<c05348d8>] copy_to_user+0x2c/0xfc
 [<c04e5831>] sys_lookup_dcookie+0xf0/0x11c
 [<c04029b4>] sysenter_do_call+0x12/0x32
Comment 1 Rafael J. Wysocki 2009-07-22 17:32:47 UTC
Was 2.6.30 OK?
Comment 2 Jerome Marchand 2009-07-23 08:04:25 UTC
I haven't tested it.
Comment 3 Jerome Marchand 2009-08-05 14:28:16 UTC
Created attachment 22616 [details]
config file
Comment 4 Rafael J. Wysocki 2009-08-05 15:35:36 UTC
On Wednesday 05 August 2009, Jerome Marchand wrote:
> Robert Richter wrote:
> > I also tested v2.6.31-rc4 with prove-locking enabled and could not
> > trigger the warning. My regression runs fine. Please provide more
> > information (workloads, oprofile setup, cpu info, kernel config) to be
> > able to reproduce this.
> > 
> > Thanks,
> > 
> > -Robert
> > 
> 
> Hi Robert,
> 
> I hit the warning on a dual i686 box, with both v2.6.31-rc2 and v2.6.31-rc4.
> I can actually trigger it by just running opcontrol --start and then
> opreport.
> It happens with both --separate=none and --separate=all options (the oprofile
> deamon using otherwise default options).
> About the kernel config, prove_locking is on, oprofile is a module and
> oprofile_ibs is not set. For maor detail, I've attached the config file to
> the bugzilla.
> 
> I've tried to reproduce it on 2.6.31-rc5 without success.
Comment 5 Andreas Mohr 2009-12-15 06:40:38 UTC
P3/700 UP Inspiron 8000 (my netbook crashed, el-cheapo SSD, libata "Media error" exceptions after 15 months ;-P), Debian testing,

ii  oprofile                             0.9.5-1                            system-wide profiler for Linux systems
ii  oprofile-common                      0.9.2-4                            system-wide profiler for Linux systems (command line components)
ii  oprofile-gui                         0.9.5-1                            system-wide profiler for Linux systems (GUI components)
, 
happened while having sysvinit analyze concurrent booting possibility,
CONFIG_OPROFILE_EVENT_MULTIPLEX=y: 




CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_EVENT_PROFILE=y
# CONFIG_PERF_COUNTERS is not set
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE_EVENT_MULTIPLEX=y
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_DMA_API_DEBUG=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32.1 #3
-------------------------------------------------------
oprofiled/7807 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<c108281a>] might_fault+0x46/0x80

but task is already holding lock:
 (dcookie_mutex){+.+.+.}, at: [<c10d2cbf>] sys_lookup_dcookie+0x2a/0x124

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (dcookie_mutex){+.+.+.}:
       [<c104af40>] __lock_acquire+0x10f6/0x141a
       [<c104b2ec>] lock_acquire+0x88/0x9f
       [<c123b88d>] mutex_lock_nested+0x40/0x21b
       [<c10d2dd9>] get_dcookie+0x20/0xf7
       [<c11cca1f>] sync_buffer+0x178/0x35e
       [<c11ccc0f>] task_exit_notify+0xa/0xe
       [<c103efea>] notifier_call_chain+0x49/0x71
       [<c103f1b6>] __blocking_notifier_call_chain+0x37/0x4c
       [<c103f1d7>] blocking_notifier_call_chain+0xc/0xe
       [<c1040fb5>] profile_task_exit+0x11/0x13
       [<c102a512>] do_exit+0x18/0x542
       [<c102aa9d>] do_group_exit+0x61/0x84
       [<c102aad3>] sys_exit_group+0x13/0x17
       [<c10029f9>] syscall_call+0x7/0xb

-> #0 (&mm->mmap_sem){++++++}:
       [<c104ac96>] __lock_acquire+0xe4c/0x141a
       [<c104b2ec>] lock_acquire+0x88/0x9f
       [<c1082837>] might_fault+0x63/0x80
       [<c1116fac>] copy_to_user+0x2c/0xfc
       [<c10d2d8d>] sys_lookup_dcookie+0xf8/0x124
       [<c10029f9>] syscall_call+0x7/0xb

other info that might help us debug this:

1 lock held by oprofiled/7807:
 #0:  (dcookie_mutex){+.+.+.}, at: [<c10d2cbf>] sys_lookup_dcookie+0x2a/0x124

stack backtrace:
Pid: 7807, comm: oprofiled Not tainted 2.6.32.1 #3
Call Trace:
 [<c123a545>] ? printk+0xf/0x12
 [<c10499fb>] print_circular_bug+0x93/0x9f
 [<c104ac96>] __lock_acquire+0xe4c/0x141a
 [<c104b2ec>] lock_acquire+0x88/0x9f
 [<c108281a>] ? might_fault+0x46/0x80
 [<c1082837>] might_fault+0x63/0x80
 [<c108281a>] ? might_fault+0x46/0x80
 [<c1116fac>] copy_to_user+0x2c/0xfc
 [<c10d2d8d>] sys_lookup_dcookie+0xf8/0x124
 [<c10029f9>] syscall_call+0x7/0xb
Comment 6 Robert Richter 2011-06-10 14:30:12 UTC
A fix is in the tip tree and scheduled to go upstream:

commit fe47ae7f53e179d2ef6771024feb000cbb86640f
Author: Robert Richter <robert.richter@amd.com>
Date:   Tue May 31 12:35:41 2011 +0200

    oprofile, dcookies: Fix possible circular locking dependency
    
    The lockdep warning below detects a possible A->B/B->A locking
    dependency of mm->mmap_sem and dcookie_mutex. The order in
    sync_buffer() is mm->mmap_sem/dcookie_mutex, while in
    sys_lookup_dcookie() it is vice versa.
    
    Fixing it in sys_lookup_dcookie() by unlocking dcookie_mutex before
    copy_to_user().
    
    oprofiled/4432 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff810b444b>] might_fault+0x53/0xa3
    
    but task is already holding lock:
     (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 (dcookie_mutex){+.+.+.}:
           [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
           [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309
           [<ffffffff81124e5c>] get_dcookie+0x30/0x144
           [<ffffffffa0000fba>] sync_buffer+0x196/0x3ec [oprofile]
           [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile]
           [<ffffffff81467b96>] notifier_call_chain+0x37/0x63
           [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67
           [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16
           [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c
           [<ffffffff81039e8f>] do_exit+0x2a/0x6fc
           [<ffffffff8103a5e4>] do_group_exit+0x83/0xae
           [<ffffffff8103a626>] sys_exit_group+0x17/0x1b
           [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
    
    -> #0 (&mm->mmap_sem){++++++}:
           [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
           [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
           [<ffffffff810b4478>] might_fault+0x80/0xa3
           [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
           [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
    
    other info that might help us debug this:
    
    1 lock held by oprofiled/4432:
     #0:  (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149
    
    stack backtrace:
    Pid: 4432, comm: oprofiled Not tainted 2.6.39-00008-ge5a450d #9
    Call Trace:
     [<ffffffff81063193>] print_circular_bug+0xae/0xbc
     [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711
     [<ffffffff8102ef13>] ? get_parent_ip+0x11/0x42
     [<ffffffff810b444b>] ? might_fault+0x53/0xa3
     [<ffffffff8106557f>] lock_acquire+0xf8/0x11e
     [<ffffffff810b444b>] ? might_fault+0x53/0xa3
     [<ffffffff810d7d54>] ? path_put+0x22/0x27
     [<ffffffff810b4478>] might_fault+0x80/0xa3
     [<ffffffff810b444b>] ? might_fault+0x53/0xa3
     [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149
     [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b
    
    References: https://bugzilla.kernel.org/show_bug.cgi?id=13809
    Cc: <stable@kernel.org> # .27+
    Signed-off-by: Robert Richter <robert.richter@amd.com>
Comment 7 Robert Richter 2011-06-16 12:35:43 UTC
Fix upstream in v3.0-rc3.

Note You need to log in before you can comment on or make changes to this bug.