Bug 39852 - deadlock on bd_mutex
Summary: deadlock on bd_mutex
Status: RESOLVED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-23 15:15 UTC by yury
Modified: 2013-12-23 14:13 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel config (16.19 KB, application/gzip)
2011-07-23 15:15 UTC, yury
Details
bootlog (30.64 KB, application/octet-stream)
2011-07-29 17:18 UTC, yury
Details

Description yury 2011-07-23 15:15:51 UTC
Created attachment 66402 [details]
kernel config

on boot a get an error:

=============================================
[ INFO: possible recursive locking detected ]
3.0.0 #1
---------------------------------------------
udisks-part-id/2391 is trying to acquire lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0

but task is already holding lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&bdev->bd_mutex);
  lock(&bdev->bd_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by udisks-part-id/2391:
 #0:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0
 #1:  (pktcdvd_mutex){+.+...}, at: [<ffffffffa03e14e0>] pkt_open+0x30/0x5fc [pktcdvd]
 #2:  (&ctl_mutex#2){+.+...}, at: [<ffffffffa03e14ee>] pkt_open+0x3e/0x5fc [pktcdvd]

stack backtrace:
Pid: 2391, comm: udisks-part-id Not tainted 3.0.0 #1
Call Trace:
 [<ffffffff81089e66>] __lock_acquire+0x1566/0x1d30
 [<ffffffff8103d8f1>] ? get_parent_ip+0x11/0x50
 [<ffffffff8103f1dd>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffff8100aa82>] ? native_sched_clock+0x22/0x70
 [<ffffffff8108ac84>] lock_acquire+0xa4/0x160
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff8145a36b>] mutex_lock_nested+0x5b/0x3e0
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff81086cdd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8114f79b>] ? __blkdev_get+0x3b/0x3f0
 [<ffffffff81459439>] ? mutex_unlock+0x9/0x10
 [<ffffffff8122b6ae>] ? disk_block_events+0x6e/0xa0
 [<ffffffff8114f79b>] __blkdev_get+0x3b/0x3f0
 [<ffffffff8114fd67>] blkdev_get+0x217/0x380
 [<ffffffffa03e161a>] pkt_open+0x16a/0x5fc [pktcdvd]
 [<ffffffff8114f7f7>] __blkdev_get+0x97/0x3f0
 [<ffffffff8114fed0>] ? blkdev_get+0x380/0x380
 [<ffffffff8114fb9f>] blkdev_get+0x4f/0x380
 [<ffffffff8103f1dd>] ? sub_preempt_count+0x9d/0xd0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff8114fed0>] ? blkdev_get+0x380/0x380
 [<ffffffff8114ff3f>] blkdev_open+0x6f/0x90
 [<ffffffff8111927e>] __dentry_open+0x17e/0x3b0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff811195b1>] nameidata_to_filp+0x71/0x80
 [<ffffffff811289b2>] do_last+0x192/0x7b0
 [<ffffffff8112a30e>] path_openat+0xce/0x410
 [<ffffffff8100aa82>] ? native_sched_clock+0x22/0x70
 [<ffffffff81009dd9>] ? sched_clock+0x9/0x10
 [<ffffffff8112a764>] do_filp_open+0x44/0xa0
 [<ffffffff8145c6c0>] ? _raw_spin_unlock+0x30/0x60
 [<ffffffff81136802>] ? alloc_fd+0xf2/0x140
 [<ffffffff81118ea2>] do_sys_open+0x102/0x1e0
 [<ffffffff81161f55>] compat_sys_open+0x15/0x20
 [<ffffffff8145e4c3>] ia32_do_call+0x13/0x13
Comment 1 Tejun Heo 2011-07-29 05:51:06 UTC
Hello, Yury. Does the thread (or machine) hang afterwards? Are you able to access the block device? Also, can you please attach full bootlog and explain what you need to do to reproduce the problem?

Thanks.
Comment 2 yury 2011-07-29 17:17:40 UTC
Hello,

i got this message at boot time,
box do not hang, thread... i think no (i can't find a process 2391 now)
i can access my cdrom.

reproduce: just boot, i belive i got this message every boot.
Comment 3 yury 2011-07-29 17:18:22 UTC
Created attachment 67122 [details]
bootlog
Comment 4 Tejun Heo 2011-07-30 11:27:49 UTC
Okay, that's pktcdvd calling blkdev_get() on the underlying device from its ->open() method which in turn was called from blkdev_get() for the pktcdvd device.  To lockdep, the two attempts seem to be acquiring the same mutex so it issues deadlock warning.

The warning is spurious as pktcdvd isn't allowed to recurse or chain.  I guess it can be fixed by adding a flavor of blkdev_get() which takes lockdep subclass.  I'll ping pktcdvd maintainer.

Anyways, spurious warning.  No need to be too alarmed.

Thanks.
Comment 5 Witold Baryluk 2011-08-26 07:02:06 UTC
Hi, guys. I had the same problem, do not know if it is new, as I just started using packet writing / udf, yesterday.

i386, pentium-m, gcc 4.6.1-8, Debina unstable.


Just around time when /etc/init.d/udftools is started to load modules and configure mapings:

[  113.430110] pktcdvd: writer pktcdvd0 mapped to sr0
[  113.454050] 
[  113.454052] =============================================
[  113.454054] [ INFO: possible recursive locking detected ]
[  113.454057] 3.1.0-rc3-t43-prod-00161-g671ee7f-dirty #5
[  113.454059] ---------------------------------------------
[  113.454062] udisks-part-id/5609 is trying to acquire lock:
[  113.454064]  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454075] 
[  113.454076] but task is already holding lock:
[  113.454077]  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454082] 
[  113.454083] other info that might help us debug this:
[  113.454085]  Possible unsafe locking scenario:
[  113.454086] 
[  113.454087]        CPU0
[  113.454088]        ----
[  113.454090]   lock(&bdev->bd_mutex);
[  113.454092]   lock(&bdev->bd_mutex);
[  113.454094] 
[  113.454095]  *** DEADLOCK ***
[  113.454096] 
[  113.454097]  May be due to missing lock nesting notation
[  113.454098] 
[  113.454100] 3 locks held by udisks-part-id/5609:
[  113.454102]  #0:  (&bdev->bd_mutex){+.+.+.}, at: [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454108]  #1:  (pktcdvd_mutex){+.+...}, at: [<f84304ef>] pkt_open+0x1f/0x550 [pktcdvd]
[  113.454116]  #2:  (&ctl_mutex#2){+.+...}, at: [<f84304fb>] pkt_open+0x2b/0x550 [pktcdvd]
[  113.454122] 
[  113.454123] stack backtrace:
[  113.454126] Pid: 5609, comm: udisks-part-id Not tainted 3.1.0-rc3-t43-prod-00161-g671ee7f-dirty #5
[  113.454129] Call Trace:
[  113.454136]  [<c1627479>] ? printk+0x18/0x1a
[  113.454142]  [<c10736fc>] __lock_acquire+0x126c/0x1640
[  113.454146]  [<c1058bc0>] ? worker_rebind_fn+0xc0/0xc0
[  113.454150]  [<c1073f78>] lock_acquire+0x78/0xa0
[  113.454154]  [<c110c27f>] ? __blkdev_get+0x2f/0x370
[  113.454158]  [<c162dfd4>] mutex_lock_nested+0x44/0x2c0
[  113.454162]  [<c110c27f>] ? __blkdev_get+0x2f/0x370
[  113.454165]  [<c162ece8>] ? mutex_unlock+0x8/0x10
[  113.454171]  [<c122c0b6>] ? disk_block_events+0x56/0x80
[  113.454175]  [<c110c27f>] __blkdev_get+0x2f/0x370
[  113.454178]  [<c110c768>] blkdev_get+0x1a8/0x2b0
[  113.454182]  [<c110b624>] ? bdget+0x34/0x110
[  113.454186]  [<c110ac70>] ? bdev_test+0x20/0x20
[  113.454190]  [<f84305fa>] pkt_open+0x12a/0x550 [pktcdvd]
[  113.454194]  [<c110c2cc>] __blkdev_get+0x7c/0x370
[  113.454198]  [<c110c606>] blkdev_get+0x46/0x2b0
[  113.454202]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454205]  [<c110c8be>] blkdev_open+0x4e/0x70
[  113.454211]  [<c10da0b7>] __dentry_open.isra.15+0xe7/0x2a0
[  113.454214]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454218]  [<c10db53c>] nameidata_to_filp+0x5c/0x70
[  113.454221]  [<c110c870>] ? blkdev_get+0x2b0/0x2b0
[  113.454225]  [<c10e98a7>] do_last.isra.53+0x277/0x750
[  113.454229]  [<c10e9e52>] path_openat+0x92/0x330
[  113.454232]  [<c10ea13c>] do_filp_open+0x2c/0x80
[  113.454236]  [<c162fb1d>] ? _raw_spin_unlock+0x1d/0x20
[  113.454240]  [<c10f71df>] ? alloc_fd+0x12f/0x180
[  113.454243]  [<c10db638>] do_sys_open+0xe8/0x1b0
[  113.454246]  [<c10db729>] sys_open+0x29/0x40
[  113.454250]  [<c163091c>] sysenter_do_call+0x12/0x3c
Comment 6 Alan 2013-12-23 14:13:40 UTC
If this is still present in modern kernels please update the bug

Note You need to log in before you can comment on or make changes to this bug.