Bug 29172

Summary: releasing loop on top of other loop leads to deadlock
Product: IO/Storage Reporter: Petr Uzel (petr.uzel)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, maciej.rutecki, petr.uzel, rjw
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://bugzilla.novell.com/show_bug.cgi?id=669394
Kernel Version: 2.6.38-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 21782    

Description Petr Uzel 2011-02-15 10:00:56 UTC
Following steps lead to deadlock in kernel:

dd if=/dev/zero of=img bs=512 count=1000
losetup -f img
mkfs.ext2 /dev/loop0
mount -t ext2 -o loop /dev/loop0 mnt
umount mnt/

The umount gets stuck in the kernel (verified with strace).

Stacktrace:

[  609.086012] umount        D 0000000000000000     0  3832   3831 0x00000001
[  609.086012]  ffff8800ae4c9d90 0000000000000086 ffff8800b882f830 0000000000012640
[  609.086012]  ffff8800ae4c9fd8 0000000000012640 ffff8800ae4c9fd8 ffff8800ae4c9fd8
[  609.086012]  ffff8800ae58e5b0 0000000000012640 0000000000012640 ffff8800ae4c8000
[  609.086012] Call Trace:
[  609.086012]  [<ffffffff8152197b>] __mutex_lock_slowpath+0x11b/0x1d0
[  609.086012]  [<ffffffff8152151a>] mutex_lock+0x1a/0x40
[  609.086012]  [<ffffffff8118583d>] __blkdev_put+0x3d/0x190
[  609.086012]  [<ffffffff811547fe>] __fput+0xae/0x240
[  609.086012]  [<ffffffffa0579b8b>] loop_clr_fd+0x1fb/0x260 [loop]
[  609.086012]  [<ffffffffa0579c6a>] lo_release+0x7a/0x80 [loop]
[  609.086012]  [<ffffffff811858d0>] __blkdev_put+0xd0/0x190
[  609.086012]  [<ffffffff81155573>] deactivate_locked_super+0x43/0x70
[  609.086012]  [<ffffffff811705a9>] sys_umount+0x59/0xd0
[  609.086012]  [<ffffffff8100318b>] tracesys+0xd9/0xde
[  609.086012]  [<00007f222827f8c7>] 0x7f222827f8c7

This is a regression. It broke somewhere between 2.6.36 and 2.6.37. I tried bisecting[*] the problem, which resulted in commit 5704e44d283 being identified as the first bad commit. This looks weird to me: I would rather suspect commit 2a48fc0ab24241755dc9, which introduced the private loop_mutex as part of the BKL removal process.

From the stacktrace it seems to depend on the LO_FLAGS_AUTOCLEAR flag set.

I removed the locking/unlocking of loop_mutex from lo_release() and the problem disappeared. I don't know if this is the proper solution as I don't understand why/if anything in the loop_clr_fd() needs to be protected by the loop_mutex.


[*] For the first time, so it is likely that I did something wrong.
Comment 1 Petr Uzel 2011-02-25 14:31:56 UTC
Still happens with 2.6.38-rc6. Sent patch that fixes it for me:

https://lkml.org/lkml/2011/2/25/145
Comment 2 Florian Mickler 2011-03-04 23:52:48 UTC
merged for .38-rc8 (or final):

commit fd51469fb68b987032e46297e0a4fe9020063c20
Author: Petr Uzel <petr.uzel@suse.cz>
Date:   Thu Mar 3 11:48:50 2011 -0500

    block: kill loop_mutex