Bug 198647 - device-mapper raid1 without metadata devices hangs on first write
Summary: device-mapper raid1 without metadata devices hangs on first write
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: LVM2/DM (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Alasdair G Kergon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-02 12:58 UTC by jkulik
Modified: 2018-02-02 22:14 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.15
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output (179.30 KB, text/plain)
2018-02-02 12:58 UTC, jkulik
Details

Description jkulik 2018-02-02 12:58:05 UTC
Created attachment 273975 [details]
dmesg output

We recently upgraded from 3.16(debian) to 4.9(debian) (and I tested it on 4.14(debian) and vanilla 4.15) and noticed, that our temporary raid1 setup created via device-mapper's raid target (see script below) hangs on writes. One can see an in-flight request never finishing in /proc/diskstats and any dmsetup calls trying to remove/suspend the device will hang, too.

# grep dm /proc/diskstats
 254       0 dm-0 87 0 4152 0 1 0 8 0 1 13728 13728

Using a raid1 without metadata devices was working in 3.16 and 3.2. Since the kernel doesn't show any error when creating the device-mapper target and seems to successfully sync the devices, I suspect the now seen behaviour is a bug.

I've attached the dmesg output (including echo t > /proc/sysrq-trigger).

You can reproduce the problem with this little script:

#!/bin/sh

dd if=/dev/zero of=./disk1 bs=10M count=10
dd if=/dev/zero of=./disk2 bs=10M count=10

LO1=$(losetup --show -f ./disk1)
LO2=$(losetup --show -f ./disk2)

SIZE=$(blockdev --getsz ${LO1})

echo "0 ${SIZE} linear ${LO1} 0" | dmsetup create dm-raid1-bug
dmsetup suspend dm-raid1-bug
echo "0 ${SIZE} raid raid1 2 0 sync 2 - ${LO1} - ${LO2}" | dmsetup load dm-raid1-bug
dmsetup resume dm-raid1-bug
dmsetup message dm-raid1-bug 0 resync

echo "Waiting for sync to finish."
while ! dmsetup status dm-raid1-bug | grep -q idle; do
	sleep 1
done

echo "Writing to first sector."
dd if=/dev/zero of=/dev/mapper/dm-raid1-bug bs=512 count=1
Comment 1 Heinz Mauelshagen 2018-02-02 21:18:01 UTC
Without any metadata devices and hence no raid superblock(s) md.c:md_write_start() still schedules waiting for them to be written
causing deadlock.

Solution is to not wait in this case.

Quick hack allowing your setup to succeed:

 bool md_write_start(struct mddev *mddev, struct bio *bi)
 {
        int did_change = 0;
+       struct md_rdev *rdev;
+
        if (bio_data_dir(bi) != WRITE)
                return true;
 
@@ -8081,6 +8086,11 @@ bool md_write_start(struct mddev *mddev, struct bio *bi)
        rcu_read_unlock();
        if (did_change)
                sysfs_notify_dirent_safe(mddev->sysfs_state);
+       rdev_for_each(rdev, mddev)
+               if (rdev->sb_page)
+                       goto wait;
+       return true;
+wait:
        wait_event(mddev->sb_wait,
                   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
                   mddev->suspended);
Comment 2 Heinz Mauelshagen 2018-02-02 21:20:55 UTC
FWIW: you should be able to hit this deadlock with any raid1/4/5/6/10 mapping
Comment 3 Heinz Mauelshagen 2018-02-02 22:14:10 UTC
Revised patch sent off to dm-devel, linux-kernel, linux-raid.

Note You need to log in before you can comment on or make changes to this bug.