Bug 65721
Summary: | mdadm --stop causes soft lockup and eventual crash | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Richard W.M. Jones (rjones) |
Component: | MD | Assignee: | io_md |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | neilb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.13.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | log file |
Description
Richard W.M. Jones
2013-11-25 11:49:43 UTC
Thanks for the report. Was the array performing a resync or recovery at the time? Quite likely. Note this is a test program which rapidly creates and stops the array. You can see the test program here: https://github.com/libguestfs/libguestfs/blob/master/tests/md/test-mdadm.sh and you can see the actual commands that it executes by looking at the log file attached to this bug. So in this case it looks as if the scenario is: - Add a four disk MD array to a booting guest. - Immediately run 'mdadm --stop /dev/mdXXX' as soon as the guest has booted. The mdadm command hangs, whereas before recent changes it did not hang. I think I've found it. The bug was caused by the introduction of the MD_STILL_CLOSED flag. This should fix it. diff --git a/drivers/md/md.c b/drivers/md/md.c index b6b7a2866c9e..e60cebf3f519 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7777,7 +7777,7 @@ void md_check_recovery(struct mddev *mddev) if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) return; if ( ! ( - (mddev->flags & ~ (1<<MD_CHANGE_PENDING)) || + (mddev->flags & MD_UPDATE_SB_FLAGS & ~ (1<<MD_CHANGE_PENDING)) || test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) || test_bit(MD_RECOVERY_DONE, &mddev->recovery) || (mddev->external == 0 && mddev->safemode == 1) || I wonder why I couldn't reproduce it under qemu-kvm. I under understand what is happening correctly, the bug should cause the md123_raid1 thread to spin for a short while until the mdadm thread calls md_unregister_thread, at which point the md123_raid1 thread should just exit. Please confirm that this patch fixes your problem. Thanks Yes, this patch fixes the test in the libguestfs test suite on my Fedora Rawhide machine. $ make -C tests/md check LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 TESTS=test-mdadm.sh make: Entering directory `/home/rjones/d/libguestfs/tests/md' make check-TESTS make[1]: Entering directory `/home/rjones/d/libguestfs/tests/md' 310 seconds: ./test-mdadm.sh PASS: test-mdadm.sh ============= 1 test passed ============= make[1]: Leaving directory `/home/rjones/d/libguestfs/tests/md' make: Leaving directory `/home/rjones/d/libguestfs/tests/md' Thanks. I'll send the patch upstream. I would close this bug too, but is seems I cannot. I cannot even assign it to me.... ho hum. Fixed by pull rq "[GIT PULL REQUEST]: md fixes for 3.13-rc" |