Bug 39852

Summary: deadlock on bd_mutex
Product: IO/Storage Reporter: yury (urykhy)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, tj, witold.baryluk+kernel
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel config
bootlog

Description yury 2011-07-23 15:15:51 UTC
Created attachment 66402 [details]
kernel config

on boot a get an error:

=============================================
[ INFO: possible recursive locking detected ]
3.0.0 #1
---------------------------------------------
udisks-part-id/2391 is trying to acquire lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0

but task is already holding lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&bdev->bd_mutex);
  lock(&bdev->bd_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by udisks-part-id/2391:
 #0:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0
 #1:  (pktcdvd_mutex){+.+...}, at: [<ffffffffa03e14e0>] pkt_open+0x30/0x5fc [pktcdvd]
 #2:  (&ctl_mutex#2){+.+...}, at: [<ffffffffa03e14ee>] pkt_open+0x3e/0x5fc [pktcdvd]

stack backtrace:
Pid: 2391, comm: udisks-part-id Not tainted 3.0.0 #1
Call Trace:
 [<ffffffff81089e66>] __lock_acquire+0x1566/0x1d30
 [<ffffffff8103d8f1>] ? get_parent_ip+0x11/0x50
 [<ffffffff8103f1dd>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffff8100aa82>] ? native_sched_clock+0x22/0x70
 [<ffffffff8108ac84>] lock_acquire+0xa4/0x160
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff8145a36b>] mutex_lock_nested+0x5b/0x3e0
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff81086cdd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff81459439>] ? mutex_unlock+0x9/0x10
 [<ffffffff8122b6ae>] ? disk_block_events+0x6e/0xa0
 [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0
 [<ffffffff8114fd67>] blkdev_get+0x217/0x380
 [<ffffffffa03e161a>] pkt_open+0x16a/0x5fc [pktcdvd]
 [<ffffffff8114f7f7>] __blkdev_get+0x97/0x3f0
 [<ffffffff8114fed0>] ? blkdev_get+0x380/0x380
 [<ffffffff8114fb9f>] blkdev_get+0x4f/0x380
 [<ffffffff8103f1dd>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff8114fed0>] ? blkdev_get+0x380/0x380
 [<ffffffff8114ff3f>] blkdev_open+0x6f/0x90
 [<ffffffff8111927e>] __dentry_open+0x17e/0x3b0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff811195b1>] nameidata_to_filp+0x71/0x80
 [<ffffffff811289b2>] do_last+0x192/0x7b0
 [<ffffffff8112a30e>] path_openat+0xce/0x410
 [<ffffffff8100aa82>] ? native_sched_clock+0x22/0x70
 [<ffffffff81009dd9>] ? sched_clock+0x9/0x10
 [<ffffffff8112a764>] do_filp_open+0x44/0xa0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff81136802>] ? alloc_fd+0xf2/0x140
 [<ffffffff81118ea2>] do_sys_open+0x102/0x1e0
 [<ffffffff81161f55>] compat_sys_open+0x15/0x20
 [<ffffffff8145e4c3>] ia32_do_call+0x13/0x13
Comment 1 Tejun Heo 2011-07-29 05:51:06 UTC
Hello, Yury. Does the thread (or machine) hang afterwards? Are you able to access the block device? Also, can you please attach full bootlog and explain what you need to do to reproduce the problem?

Thanks.
Comment 2 yury 2011-07-29 17:17:40 UTC
Hello,

i got this message at boot time,
box do not hang, thread... i think no (i can't find a process 2391 now)
i can access my cdrom.

reproduce: just boot, i belive i got this message every boot.
Comment 3 yury 2011-07-29 17:18:22 UTC
Created attachment 67122 [details]
bootlog
Comment 4 Tejun Heo 2011-07-30 11:27:49 UTC
Okay, that's pktcdvd calling blkdev_get() on the underlying device from its ->open() method which in turn was called from blkdev_get() for the pktcdvd device.  To lockdep, the two attempts seem to be acquiring the same mutex so it issues deadlock warning.

The warning is spurious as pktcdvd isn't allowed to recurse or chain.  I guess it can be fixed by adding a flavor of blkdev_get() which takes lockdep subclass.  I'll ping pktcdvd maintainer.

Anyways, spurious warning.  No need to be too alarmed.

Thanks.
Comment 5 Witold Baryluk 2011-08-26 07:02:06 UTC
Hi, guys. I had the same problem, do not know if it is new, as I just started using packet writing / udf, yesterday.

i386, pentium-m, gcc 4.6.1-8, Debina unstable.


Just around time when /etc/init.d/udftools is started to load modules and configure mapings:

[  113.430110] pktcdvd: writer pktcdvd0 mapped to sr0
[  113.454050] 
[  113.454052] =============================================
[  113.454054] [ INFO: possible recursive locking detected ]
[  113.454057] 3.1.0-rc3-t43-prod-00161-g671ee7f-dirty #5
[  113.454059] ---------------------------------------------
[  113.454062] udisks-part-id/5609 is trying to acquire lock:
[  113.454064]  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454075] 
[  113.454076] but task is already holding lock:
[  113.454077]  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454082] 
[  113.454083] other info that might help us debug this:
[  113.454085]  Possible unsafe locking scenario:
[  113.454086] 
[  113.454087]        CPU0
[  113.454088]        ----
[  113.454090]   lock(&bdev->bd_mutex);
[  113.454092]   lock(&bdev->bd_mutex);
[  113.454094] 
[  113.454095]  *** DEADLOCK ***
[  113.454096] 
[  113.454097]  May be due to missing lock nesting notation
[  113.454098] 
[  113.454100] 3 locks held by udisks-part-id/5609:
[  113.454102]  #0:  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454108]  #1:  (pktcdvd_mutex){+.+...}, at: [<f84304ef>] pkt_open+0x1f/0x550 [pktcdvd]
[  113.454116]  #2:  (&ctl_mutex#2){+.+...}, at: [<f84304fb>] pkt_open+0x2b/0x550 [pktcdvd]
[  113.454122] 
[  113.454123] stack backtrace:
[  113.454126] Pid: 5609, comm: udisks-part-id Not tainted 3.1.0-rc3-t43-prod-00161-g671ee7f-dirty #5
[  113.454129] Call Trace:
[  113.454136]  [<c1627479>] ? printk+0x18/0x1a
[  113.454142]  [<c10736fc>] __lock_acquire+0x126c/0x1640
[  113.454146]  [<c1058bc0>] ? worker_rebind_fn+0xc0/0xc0
[  113.454150]  [<c1073f78>] lock_acquire+0x78/0xa0
[  113.454154]  [<c110c27f>] ? __blkdev_get+0x2f/0x370
[  113.454158]  [<c162dfd4>] mutex_lock_nested+0x44/0x2c0
[  113.454162]  [<c110c27f>] ? __blkdev_get+0x2f/0x370
[  113.454165]  [<c162ece8>] ? mutex_unlock+0x8/0x10
[  113.454171]  [<c122c0b6>] ? disk_block_events+0x56/0x80
[  113.454175]  [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454178]  [<c110c768>] blkdev_get+0x1a8/0x2b0
[  113.454182]  [<c110b624>] ? bdget+0x34/0x110
[  113.454186]  [<c110ac70>] ? bdev_test+0x20/0x20
[  113.454190]  [<f84305fa>] pkt_open+0x12a/0x550 [pktcdvd]
[  113.454194]  [<c110c2cc>] __blkdev_get+0x7c/0x370
[  113.454198]  [<c110c606>] blkdev_get+0x46/0x2b0
[  113.454202]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454205]  [<c110c8be>] blkdev_open+0x4e/0x70
[  113.454211]  [<c10da0b7>] __dentry_open.isra.15+0xe7/0x2a0
[  113.454214]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454218]  [<c10db53c>] nameidata_to_filp+0x5c/0x70
[  113.454221]  [<c110c870>] ? blkdev_get+0x2b0/0x2b0
[  113.454225]  [<c10e98a7>] do_last.isra.53+0x277/0x750
[  113.454229]  [<c10e9e52>] path_openat+0x92/0x330
[  113.454232]  [<c10ea13c>] do_filp_open+0x2c/0x80
[  113.454236]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454240]  [<c10f71df>] ? alloc_fd+0x12f/0x180
[  113.454243]  [<c10db638>] do_sys_open+0xe8/0x1b0
[  113.454246]  [<c10db729>] sys_open+0x29/0x40
[  113.454250]  [<c163091c>] sysenter_do_call+0x12/0x3c
Comment 6 Alan 2013-12-23 14:13:40 UTC
If this is still present in modern kernels please update the bug