Bug 202985

Summary: loop device deadlocks
Product: IO/Storage Reporter: Jan Kara (jack)
Component: OtherAssignee: Jan Kara (jack)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4-stable Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel messages from hung kernel

Description Jan Kara 2019-03-21 10:23:32 UTC
Created attachment 281943 [details]
Kernel messages from hung kernel

Systemd testsuite deadlocks when running with recent 4.4-stable kernels (note the original report is actually for our distribution (SLES15) 4.12-based kernel but the commit present also in 4.4-stable tree has been identified as the culprit.
Comment 1 Jan Kara 2019-03-21 10:24:21 UTC
Looking at the backtraces I can see:

systemd-udevd
__mutex_lock.isra.5+0x178/0x4a0
lo_release+0x44/0xa0 [loop]
__blkdev_put+0x19d/0x1f0
blkdev_close+0x21/0x30
__fput+0xd2/0x210

-> so it holds bdev->bd_mutex, loop_index_mutex, waits for loop_ctl_mutex.

losetup
__mutex_lock.isra.5+0x178/0x4a0
blkdev_reread_part+0x16/0x30
loop_reread_partitions+0x23/0x50 [loop]
loop_set_status+0x4c8/0x530 [loop]
loop_set_status64+0x40/0x70 [loop]
lo_ioctl+0xfb/0x6f0 [loop]
blkdev_ioctl+0x847/0x940
block_ioctl+0x39/0x40
do_vfs_ioctl+0x90/0x5f0

-> so it holds loop_ctl_mutex and waits for bdev->bd_mutex.

So a classical ABBA deadlock. And actually a one that is there for a long time and that got fixed upstream by a rather intrusive locking rework for loop device.

Is this reproducible or a one-time occurrence?

I'm asking because I think that commit 8f611d6dde in our tree (block/loop: Use global lock for ioctl() operation (bsc#1124974)) could make the long-present deadlock easier to hit because loop_ctl_mutex got converted from a per-device one to a global one.
Comment 2 Jan Kara 2019-03-21 10:25:19 UTC
Answer wrt reproducibility:

@Jan Kara .. is reproduced every run of systemd_testuite in openQA with kernel update candidate.