Bug 10504 - losetup possible circular locking
Summary: losetup possible circular locking
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-22 06:27 UTC by Zdenek Kabelac
Modified: 2012-05-12 16:25 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.25
Subsystem:
Regression: No
Bisected commit-id:


Attachments
don't hold mutex while fput in loop_clr_fd (653 bytes, patch)
2008-04-24 19:34 UTC, Dave Young
Details | Diff

Description Zdenek Kabelac 2008-04-22 06:27:15 UTC
Latest working kernel version: unknown
Earliest failing kernel version:
Distribution: fedora
Hardware Environment:
Software Environment:
Problem Description:

INFO trace appears in the log - but no other problems visible 

Steps to reproduce:

losetup /dev/loop0 file
losetup -o 32256 /dev/loop1 /dev/loop0 

losetup -d /dev/loop1
losetup -d /dev/loop0

--------------------

EXT3 FS on loop1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.25 #57
-------------------------------------------------------
losetup/9902 is trying to acquire lock:
 (&bdev->bd_mutex){--..}, at: [<ffffffff810e8db8>] __blkdev_put+0x38/0x1e0

but task is already holding lock:
 (&lo->lo_ctl_mutex){--..}, at: [<ffffffffa03e499e>] lo_ioctl+0x4e/0xb40 [loop]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&lo->lo_ctl_mutex){--..}:
       [<ffffffff81066249>] __lock_acquire+0x1149/0x11d0
       [<ffffffff81066366>] lock_acquire+0x96/0xe0
       [<ffffffff812ebfd0>] mutex_lock_nested+0xc0/0x330
       [<ffffffffa03e3384>] lo_open+0x34/0x50 [loop]
       [<ffffffff810e9329>] do_open+0xa9/0x350
       [<ffffffff810e960d>] blkdev_open+0x3d/0x90
       [<ffffffff810b7f94>] __dentry_open+0x114/0x390
       [<ffffffff810b82f4>] nameidata_to_filp+0x44/0x60
       [<ffffffff810c76ab>] do_filp_open+0x1fb/0xa40
       [<ffffffff810b7d96>] do_sys_open+0x76/0x100
       [<ffffffff810b7e4b>] sys_open+0x1b/0x20
       [<ffffffff8100c5db>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&bdev->bd_mutex){--..}:
       [<ffffffff810660e5>] __lock_acquire+0xfe5/0x11d0
       [<ffffffff81066366>] lock_acquire+0x96/0xe0
       [<ffffffff812ebfd0>] mutex_lock_nested+0xc0/0x330
       [<ffffffff810e8db8>] __blkdev_put+0x38/0x1e0
       [<ffffffff810e8f6b>] blkdev_put+0xb/0x10
       [<ffffffff810e8f98>] blkdev_close+0x28/0x40
       [<ffffffff810bb3c5>] __fput+0xc5/0x1f0
       [<ffffffff810bb50d>] fput+0x1d/0x30
       [<ffffffffa03e491d>] loop_clr_fd+0x1ad/0x1e0 [loop]
       [<ffffffffa03e4acf>] lo_ioctl+0x17f/0xb40 [loop]
       [<ffffffff8116b36d>] blkdev_driver_ioctl+0x8d/0xa0
       [<ffffffff8116b5fd>] blkdev_ioctl+0x27d/0x7e0
       [<ffffffff810e86db>] block_ioctl+0x1b/0x20
       [<ffffffff810c8ea1>] vfs_ioctl+0x31/0xa0
       [<ffffffff810c9193>] do_vfs_ioctl+0x283/0x2f0
       [<ffffffff810c9299>] sys_ioctl+0x99/0xa0
       [<ffffffff8100c5db>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

1 lock held by losetup/9902:
 #0:  (&lo->lo_ctl_mutex){--..}, at: [<ffffffffa03e499e>] lo_ioctl+0x4e/0xb40 [loop]

stack backtrace:
Pid: 9902, comm: losetup Not tainted 2.6.25 #57

Call Trace:
 [<ffffffff81064e44>] print_circular_bug_tail+0x84/0x90
 [<ffffffff810660e5>] __lock_acquire+0xfe5/0x11d0
 [<ffffffff81064e90>] ? check_usage+0x40/0x2b0
 [<ffffffff810e8db8>] ? __blkdev_put+0x38/0x1e0
 [<ffffffff81066366>] lock_acquire+0x96/0xe0
 [<ffffffff810e8db8>] ? __blkdev_put+0x38/0x1e0
 [<ffffffff812ebfd0>] mutex_lock_nested+0xc0/0x330
 [<ffffffff810e8db8>] ? __blkdev_put+0x38/0x1e0
 [<ffffffff810e8db8>] __blkdev_put+0x38/0x1e0
 [<ffffffff810e8f6b>] blkdev_put+0xb/0x10
 [<ffffffff810e8f98>] blkdev_close+0x28/0x40
 [<ffffffff810bb3c5>] __fput+0xc5/0x1f0
 [<ffffffff810bb50d>] fput+0x1d/0x30
 [<ffffffffa03e491d>] :loop:loop_clr_fd+0x1ad/0x1e0
 [<ffffffffa03e4acf>] :loop:lo_ioctl+0x17f/0xb40
 [<ffffffff81013458>] ? native_sched_clock+0x78/0x80
 [<ffffffff81065464>] ? __lock_acquire+0x364/0x11d0
 [<ffffffff81013458>] ? native_sched_clock+0x78/0x80
 [<ffffffff81013458>] ? native_sched_clock+0x78/0x80
 [<ffffffff81065464>] ? __lock_acquire+0x364/0x11d0
 [<ffffffff81013458>] ? native_sched_clock+0x78/0x80
 [<ffffffff81065464>] ? __lock_acquire+0x364/0x11d0
 [<ffffffff81062afa>] ? get_lock_stats+0x2a/0x70
 [<ffffffff81062b4e>] ? put_lock_stats+0xe/0x30
 [<ffffffff8105a3e3>] ? down+0x33/0x50
 [<ffffffff812ee295>] ? _spin_unlock_irqrestore+0x65/0x90
 [<ffffffff810649e1>] ? trace_hardirqs_on+0x131/0x190
 [<ffffffff812ee275>] ? _spin_unlock_irqrestore+0x45/0x90
 [<ffffffff8105a3e3>] ? down+0x33/0x50
 [<ffffffff8116b36d>] blkdev_driver_ioctl+0x8d/0xa0
 [<ffffffff8116b5fd>] blkdev_ioctl+0x27d/0x7e0
 [<ffffffff81013458>] ? native_sched_clock+0x78/0x80
 [<ffffffff81065464>] ? __lock_acquire+0x364/0x11d0
 [<ffffffff810c4a21>] ? putname+0x31/0x50
 [<ffffffff810b39f5>] ? check_object+0x265/0x270
 [<ffffffff810b3230>] ? init_object+0x50/0x90
 [<ffffffff810e86db>] block_ioctl+0x1b/0x20
 [<ffffffff810c8ea1>] vfs_ioctl+0x31/0xa0
 [<ffffffff810c9193>] do_vfs_ioctl+0x283/0x2f0
 [<ffffffff812edbd9>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff810c9299>] sys_ioctl+0x99/0xa0
 [<ffffffff8100c5db>] system_call_after_swapgs+0x7b/0x80
Comment 1 Dave Young 2008-04-24 00:20:24 UTC
This only happened when the backing file is a block device file. But IMO it's safe.

There's two "bd_mutex -> lo_ctl_mutex" conditions
1. open
The lo_refcnt could be changed to a atomic_t, it's not necessary using lo_ctl_mutex.

2. release
If LO_FLAGS_AUTOCLEAR is set the lo_release will call loop_clr_fd, how can we fix it?
Comment 2 Dave Young 2008-04-24 19:34:19 UTC
Created attachment 15904 [details]
don't hold mutex while fput in loop_clr_fd

Don't hold the lo_ctl_mutex while fput to silence lockdep.

Jens, what do you think about this?
Comment 3 Jiri Slaby 2009-02-27 00:13:41 UTC
Ping, we hit this bug even in nowadays kernel.

Care to send the patch to the list?
Comment 4 Dave Young 2009-02-28 19:58:47 UTC
> ------- Comment #3 from jirislaby@gmail.com  2009-02-27 00:13 -------
> Ping, we hit this bug even in nowadays kernel.
>
> Care to send the patch to the list?
>
>
Please see:
http://lkml.org/lkml/2008/4/27/322

Al Viro said there's side effect, but I have no better fixes for this,
maybe I need learn more about block/fs part.

Note You need to log in before you can comment on or make changes to this bug.