Bug 39852

Summary:	deadlock on bd_mutex
Product:	IO/Storage	Reporter:	yury (urykhy)
Component:	Block Layer	Assignee:	Jens Axboe (axboe)
Status:	RESOLVED OBSOLETE
Severity:	normal	CC:	alan, tj, witold.baryluk+kernel
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	3.0	Subsystem:
Regression:	No	Bisected commit-id:
Attachments:	kernel config bootlog

Description yury 2011-07-23 15:15:51 UTC

Created attachment 66402 [details]
kernel config

on boot a get an error:

=============================================
[ INFO: possible recursive locking detected ]
3.0.0 #1
---------------------------------------------
udisks-part-id/2391 is trying to acquire lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0

but task is already holding lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&bdev->bd_mutex);
  lock(&bdev->bd_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by udisks-part-id/2391:
 #0:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0
 #1:  (pktcdvd_mutex){+.+...}, at: [<ffffffffa03e14e0>] pkt_open+0x30/0x5fc [pktcdvd]
 #2:  (&ctl_mutex#2){+.+...}, at: [<ffffffffa03e14ee>] pkt_open+0x3e/0x5fc [pktcdvd]

stack backtrace:
Pid: 2391, comm: udisks-part-id Not tainted 3.0.0 #1
Call Trace:
 [<ffffffff81089e66>] __lock_acquire+0x1566/0x1d30
 [<ffffffff8103d8f1>] ? get_parent_ip+0x11/0x50
 [<ffffffff8103f1dd>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffff8100aa82>] ? native_sched_clock+0x22/0x70
 [<ffffffff8108ac84>] lock_acquire+0xa4/0x160
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff8145a36b>] mutex_lock_nested+0x5b/0x3e0
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff81086cdd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff81459439>] ? mutex_unlock+0x9/0x10
 [<ffffffff8122b6ae>] ? disk_block_events+0x6e/0xa0
 [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0
 [<ffffffff8114fd67>] blkdev_get+0x217/0x380
 [<ffffffffa03e161a>] pkt_open+0x16a/0x5fc [pktcdvd]
 [<ffffffff8114f7f7>] __blkdev_get+0x97/0x3f0
 [<ffffffff8114fed0>] ? blkdev_get+0x380/0x380
 [<ffffffff8114fb9f>] blkdev_get+0x4f/0x380
 [<ffffffff8103f1dd>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff8114fed0>] ? blkdev_get+0x380/0x380
 [<ffffffff8114ff3f>] blkdev_open+0x6f/0x90
 [<ffffffff8111927e>] __dentry_open+0x17e/0x3b0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff811195b1>] nameidata_to_filp+0x71/0x80
 [<ffffffff811289b2>] do_last+0x192/0x7b0
 [<ffffffff8112a30e>] path_openat+0xce/0x410
 [<ffffffff8100aa82>] ? native_sched_clock+0x22/0x70
 [<ffffffff81009dd9>] ? sched_clock+0x9/0x10
 [<ffffffff8112a764>] do_filp_open+0x44/0xa0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff81136802>] ? alloc_fd+0xf2/0x140
 [<ffffffff81118ea2>] do_sys_open+0x102/0x1e0
 [<ffffffff81161f55>] compat_sys_open+0x15/0x20
 [<ffffffff8145e4c3>] ia32_do_call+0x13/0x13

Comment 1 Tejun Heo 2011-07-29 05:51:06 UTC

Hello, Yury. Does the thread (or machine) hang afterwards? Are you able to access the block device? Also, can you please attach full bootlog and explain what you need to do to reproduce the problem?

Thanks.

Comment 2 yury 2011-07-29 17:17:40 UTC

Hello,

i got this message at boot time,
box do not hang, thread... i think no (i can't find a process 2391 now)
i can access my cdrom.

reproduce: just boot, i belive i got this message every boot.

Comment 3 yury 2011-07-29 17:18:22 UTC

Created attachment 67122 [details]
bootlog

Comment 4 Tejun Heo 2011-07-30 11:27:49 UTC

Okay, that's pktcdvd calling blkdev_get() on the underlying device from its ->open() method which in turn was called from blkdev_get() for the pktcdvd device.  To lockdep, the two attempts seem to be acquiring the same mutex so it issues deadlock warning.

The warning is spurious as pktcdvd isn't allowed to recurse or chain.  I guess it can be fixed by adding a flavor of blkdev_get() which takes lockdep subclass.  I'll ping pktcdvd maintainer.

Anyways, spurious warning.  No need to be too alarmed.

Thanks.

Comment 5 Witold Baryluk 2011-08-26 07:02:06 UTC

Hi, guys. I had the same problem, do not know if it is new, as I just started using packet writing / udf, yesterday.

i386, pentium-m, gcc 4.6.1-8, Debina unstable.


Just around time when /etc/init.d/udftools is started to load modules and configure mapings:

[  113.430110] pktcdvd: writer pktcdvd0 mapped to sr0
[  113.454050] 
[  113.454052] =============================================
[  113.454054] [ INFO: possible recursive locking detected ]
[  113.454057] 3.1.0-rc3-t43-prod-00161-g671ee7f-dirty #5
[  113.454059] ---------------------------------------------
[  113.454062] udisks-part-id/5609 is trying to acquire lock:
[  113.454064]  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454075] 
[  113.454076] but task is already holding lock:
[  113.454077]  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454082] 
[  113.454083] other info that might help us debug this:
[  113.454085]  Possible unsafe locking scenario:
[  113.454086] 
[  113.454087]        CPU0
[  113.454088]        ----
[  113.454090]   lock(&bdev->bd_mutex);
[  113.454092]   lock(&bdev->bd_mutex);
[  113.454094] 
[  113.454095]  *** DEADLOCK ***
[  113.454096] 
[  113.454097]  May be due to missing lock nesting notation
[  113.454098] 
[  113.454100] 3 locks held by udisks-part-id/5609:
[  113.454102]  #0:  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454108]  #1:  (pktcdvd_mutex){+.+...}, at: [<f84304ef>] pkt_open+0x1f/0x550 [pktcdvd]
[  113.454116]  #2:  (&ctl_mutex#2){+.+...}, at: [<f84304fb>] pkt_open+0x2b/0x550 [pktcdvd]
[  113.454122] 
[  113.454123] stack backtrace:
[  113.454126] Pid: 5609, comm: udisks-part-id Not tainted 3.1.0-rc3-t43-prod-00161-g671ee7f-dirty #5
[  113.454129] Call Trace:
[  113.454136]  [<c1627479>] ? printk+0x18/0x1a
[  113.454142]  [<c10736fc>] __lock_acquire+0x126c/0x1640
[  113.454146]  [<c1058bc0>] ? worker_rebind_fn+0xc0/0xc0
[  113.454150]  [<c1073f78>] lock_acquire+0x78/0xa0
[  113.454154]  [<c110c27f>] ? __blkdev_get+0x2f/0x370
[  113.454158]  [<c162dfd4>] mutex_lock_nested+0x44/0x2c0
[  113.454162]  [<c110c27f>] ? __blkdev_get+0x2f/0x370
[  113.454165]  [<c162ece8>] ? mutex_unlock+0x8/0x10
[  113.454171]  [<c122c0b6>] ? disk_block_events+0x56/0x80
[  113.454175]  [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454178]  [<c110c768>] blkdev_get+0x1a8/0x2b0
[  113.454182]  [<c110b624>] ? bdget+0x34/0x110
[  113.454186]  [<c110ac70>] ? bdev_test+0x20/0x20
[  113.454190]  [<f84305fa>] pkt_open+0x12a/0x550 [pktcdvd]
[  113.454194]  [<c110c2cc>] __blkdev_get+0x7c/0x370
[  113.454198]  [<c110c606>] blkdev_get+0x46/0x2b0
[  113.454202]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454205]  [<c110c8be>] blkdev_open+0x4e/0x70
[  113.454211]  [<c10da0b7>] __dentry_open.isra.15+0xe7/0x2a0
[  113.454214]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454218]  [<c10db53c>] nameidata_to_filp+0x5c/0x70
[  113.454221]  [<c110c870>] ? blkdev_get+0x2b0/0x2b0
[  113.454225]  [<c10e98a7>] do_last.isra.53+0x277/0x750
[  113.454229]  [<c10e9e52>] path_openat+0x92/0x330
[  113.454232]  [<c10ea13c>] do_filp_open+0x2c/0x80
[  113.454236]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454240]  [<c10f71df>] ? alloc_fd+0x12f/0x180
[  113.454243]  [<c10db638>] do_sys_open+0xe8/0x1b0
[  113.454246]  [<c10db729>] sys_open+0x29/0x40
[  113.454250]  [<c163091c>] sysenter_do_call+0x12/0x3c

Comment 6 Alan 2013-12-23 14:13:40 UTC

If this is still present in modern kernels please update the bug