I've hit a "possible circular locking dependency detected" while trying to use oprofile while doing heavy I/O. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.31-rc2 #53 ------------------------------------------------------- oprofiled/6216 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c048e1e1>] might_fault+0x43/0x80 but task is already holding lock: (dcookie_mutex){+.+.+.}, at: [<c04e576b>] sys_lookup_dcookie+0x2a/0x11c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (dcookie_mutex){+.+.+.}: [<c044d0ad>] __lock_acquire+0x987/0xaf6 [<c044d2a5>] lock_acquire+0x89/0xa6 [<c0691244>] __mutex_lock_common+0x35/0x2bc [<c0691563>] mutex_lock_nested+0x2e/0x36 [<c04e587d>] get_dcookie+0x20/0xf7 [<faaeab83>] sync_buffer+0x180/0x395 [oprofile] [<faaeada6>] task_exit_notify+0xe/0x12 [oprofile] [<c0693f7f>] notifier_call_chain+0x2b/0x4a [<c0443701>] __blocking_notifier_call_chain+0x37/0x4c [<c0443722>] blocking_notifier_call_chain+0xc/0xe [<c0445d86>] profile_task_exit+0x11/0x13 [<c0431469>] do_exit+0x19/0x5ef [<c0431a9d>] do_group_exit+0x5e/0x85 [<c0431ad7>] sys_exit_group+0x13/0x17 [<c04029b4>] sysenter_do_call+0x12/0x32 [<ffffffff>] 0xffffffff -> #0 (&mm->mmap_sem){++++++}: [<c044cfbf>] __lock_acquire+0x899/0xaf6 [<c044d2a5>] lock_acquire+0x89/0xa6 [<c048e1fe>] might_fault+0x60/0x80 [<c05348d8>] copy_to_user+0x2c/0xfc [<c04e5831>] sys_lookup_dcookie+0xf0/0x11c [<c04029b4>] sysenter_do_call+0x12/0x32 [<ffffffff>] 0xffffffff other info that might help us debug this: 1 lock held by oprofiled/6216: #0: (dcookie_mutex){+.+.+.}, at: [<c04e576b>] sys_lookup_dcookie+0x2a/0x11c stack backtrace: Pid: 6216, comm: oprofiled Not tainted 2.6.31-rc2 #53 Call Trace: [<c0690158>] ? printk+0xf/0x17 [<c044c462>] print_circular_bug_tail+0x5d/0x68 [<c044cfbf>] __lock_acquire+0x899/0xaf6 [<c044d2a5>] lock_acquire+0x89/0xa6 [<c048e1e1>] ? might_fault+0x43/0x80 [<c048e1fe>] might_fault+0x60/0x80 [<c048e1e1>] ? might_fault+0x43/0x80 [<c05348d8>] copy_to_user+0x2c/0xfc [<c04e5831>] sys_lookup_dcookie+0xf0/0x11c [<c04029b4>] sysenter_do_call+0x12/0x32
Was 2.6.30 OK?
I haven't tested it.
Created attachment 22616 [details] config file
On Wednesday 05 August 2009, Jerome Marchand wrote: > Robert Richter wrote: > > I also tested v2.6.31-rc4 with prove-locking enabled and could not > > trigger the warning. My regression runs fine. Please provide more > > information (workloads, oprofile setup, cpu info, kernel config) to be > > able to reproduce this. > > > > Thanks, > > > > -Robert > > > > Hi Robert, > > I hit the warning on a dual i686 box, with both v2.6.31-rc2 and v2.6.31-rc4. > I can actually trigger it by just running opcontrol --start and then > opreport. > It happens with both --separate=none and --separate=all options (the oprofile > deamon using otherwise default options). > About the kernel config, prove_locking is on, oprofile is a module and > oprofile_ibs is not set. For maor detail, I've attached the config file to > the bugzilla. > > I've tried to reproduce it on 2.6.31-rc5 without success.
P3/700 UP Inspiron 8000 (my netbook crashed, el-cheapo SSD, libata "Media error" exceptions after 15 months ;-P), Debian testing, ii oprofile 0.9.5-1 system-wide profiler for Linux systems ii oprofile-common 0.9.2-4 system-wide profiler for Linux systems (command line components) ii oprofile-gui 0.9.5-1 system-wide profiler for Linux systems (GUI components) , happened while having sysvinit analyze concurrent booting possibility, CONFIG_OPROFILE_EVENT_MULTIPLEX=y: CONFIG_HAVE_PERF_EVENTS=y # # Kernel Performance Events And Counters # CONFIG_PERF_EVENTS=y CONFIG_EVENT_PROFILE=y # CONFIG_PERF_COUNTERS is not set # CONFIG_DEBUG_PERF_USE_VMALLOC is not set CONFIG_VM_EVENT_COUNTERS=y CONFIG_PCI_QUIRKS=y CONFIG_COMPAT_BRK=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_PROFILING=y CONFIG_TRACEPOINTS=y CONFIG_OPROFILE=y CONFIG_OPROFILE_EVENT_MULTIPLEX=y CONFIG_HAVE_OPROFILE=y # CONFIG_KPROBES is not set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_ATTRS=y CONFIG_HAVE_DMA_API_DEBUG=y # # GCOV-based kernel profiling # # CONFIG_GCOV_KERNEL is not set ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32.1 #3 ------------------------------------------------------- oprofiled/7807 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c108281a>] might_fault+0x46/0x80 but task is already holding lock: (dcookie_mutex){+.+.+.}, at: [<c10d2cbf>] sys_lookup_dcookie+0x2a/0x124 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (dcookie_mutex){+.+.+.}: [<c104af40>] __lock_acquire+0x10f6/0x141a [<c104b2ec>] lock_acquire+0x88/0x9f [<c123b88d>] mutex_lock_nested+0x40/0x21b [<c10d2dd9>] get_dcookie+0x20/0xf7 [<c11cca1f>] sync_buffer+0x178/0x35e [<c11ccc0f>] task_exit_notify+0xa/0xe [<c103efea>] notifier_call_chain+0x49/0x71 [<c103f1b6>] __blocking_notifier_call_chain+0x37/0x4c [<c103f1d7>] blocking_notifier_call_chain+0xc/0xe [<c1040fb5>] profile_task_exit+0x11/0x13 [<c102a512>] do_exit+0x18/0x542 [<c102aa9d>] do_group_exit+0x61/0x84 [<c102aad3>] sys_exit_group+0x13/0x17 [<c10029f9>] syscall_call+0x7/0xb -> #0 (&mm->mmap_sem){++++++}: [<c104ac96>] __lock_acquire+0xe4c/0x141a [<c104b2ec>] lock_acquire+0x88/0x9f [<c1082837>] might_fault+0x63/0x80 [<c1116fac>] copy_to_user+0x2c/0xfc [<c10d2d8d>] sys_lookup_dcookie+0xf8/0x124 [<c10029f9>] syscall_call+0x7/0xb other info that might help us debug this: 1 lock held by oprofiled/7807: #0: (dcookie_mutex){+.+.+.}, at: [<c10d2cbf>] sys_lookup_dcookie+0x2a/0x124 stack backtrace: Pid: 7807, comm: oprofiled Not tainted 2.6.32.1 #3 Call Trace: [<c123a545>] ? printk+0xf/0x12 [<c10499fb>] print_circular_bug+0x93/0x9f [<c104ac96>] __lock_acquire+0xe4c/0x141a [<c104b2ec>] lock_acquire+0x88/0x9f [<c108281a>] ? might_fault+0x46/0x80 [<c1082837>] might_fault+0x63/0x80 [<c108281a>] ? might_fault+0x46/0x80 [<c1116fac>] copy_to_user+0x2c/0xfc [<c10d2d8d>] sys_lookup_dcookie+0xf8/0x124 [<c10029f9>] syscall_call+0x7/0xb
A fix is in the tip tree and scheduled to go upstream: commit fe47ae7f53e179d2ef6771024feb000cbb86640f Author: Robert Richter <robert.richter@amd.com> Date: Tue May 31 12:35:41 2011 +0200 oprofile, dcookies: Fix possible circular locking dependency The lockdep warning below detects a possible A->B/B->A locking dependency of mm->mmap_sem and dcookie_mutex. The order in sync_buffer() is mm->mmap_sem/dcookie_mutex, while in sys_lookup_dcookie() it is vice versa. Fixing it in sys_lookup_dcookie() by unlocking dcookie_mutex before copy_to_user(). oprofiled/4432 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff810b444b>] might_fault+0x53/0xa3 but task is already holding lock: (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (dcookie_mutex){+.+.+.}: [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309 [<ffffffff81124e5c>] get_dcookie+0x30/0x144 [<ffffffffa0000fba>] sync_buffer+0x196/0x3ec [oprofile] [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile] [<ffffffff81467b96>] notifier_call_chain+0x37/0x63 [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67 [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c [<ffffffff81039e8f>] do_exit+0x2a/0x6fc [<ffffffff8103a5e4>] do_group_exit+0x83/0xae [<ffffffff8103a626>] sys_exit_group+0x17/0x1b [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff810b4478>] might_fault+0x80/0xa3 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by oprofiled/4432: #0: (dcookie_mutex){+.+.+.}, at: [<ffffffff81124d28>] sys_lookup_dcookie+0x45/0x149 stack backtrace: Pid: 4432, comm: oprofiled Not tainted 2.6.39-00008-ge5a450d #9 Call Trace: [<ffffffff81063193>] print_circular_bug+0xae/0xbc [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffff8102ef13>] ? get_parent_ip+0x11/0x42 [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff810d7d54>] ? path_put+0x22/0x27 [<ffffffff810b4478>] might_fault+0x80/0xa3 [<ffffffff810b444b>] ? might_fault+0x53/0xa3 [<ffffffff81124de7>] sys_lookup_dcookie+0x104/0x149 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b References: https://bugzilla.kernel.org/show_bug.cgi?id=13809 Cc: <stable@kernel.org> # .27+ Signed-off-by: Robert Richter <robert.richter@amd.com>
Fix upstream in v3.0-rc3.