Bug 65721

Summary: mdadm --stop causes soft lockup and eventual crash
Product: IO/Storage Reporter: Richard W.M. Jones (rjones)
Component: MDAssignee: io_md
Status: RESOLVED CODE_FIX    
Severity: normal CC: neilb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: log file

Description Richard W.M. Jones 2013-11-25 11:49:43 UTC
Created attachment 115881 [details]
log file

The libguestfs test suite runs mdadm in various combinations.
Currently the mdadm --stop test causes a soft lockup and eventual
crash.  See the very long stack trace which I'll attach to
this bug.

This has just started happening in the Rawhide kernel, in the
last week.

kernel 3.13.0-0.rc1.git0.1.fc21
mdadm-3.3-4.fc21.x86_64

Fedora bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1033971

The first stack trace is below, but see the attached file for
the many subsequent errors seen.

mdadm --stop /dev/md123
[  157.114285] BUG: soft lockup - CPU#0 stuck for 23s! [md123_raid1:146]
[  157.114285] Modules linked in: raid1 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx kvm_amd kvm snd_pcsp snd_pcm snd_page_alloc snd_timer serio_raw snd soundcore ata_generic pata_acpi virtio_balloon virtio_pci virtio_mmio virtio_net virtio_scsi virtio_blk virtio_console virtio_rng virtio_ring virtio ideapad_laptop sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc32 crc_itu_t libcrc32c megaraid megaraid_sas megaraid_mbox megaraid_mm
[  157.114285] irq event stamp: 5730664
[  157.114285] hardirqs last  enabled at (5730663): [<ffffffff8175f926>] _raw_spin_unlock_irqrestore+0x36/0x70
[  157.114285] hardirqs last disabled at (5730664): [<ffffffff8176a5ad>] apic_timer_interrupt+0x6d/0x80
[  157.114285] softirqs last  enabled at (5730448): [<ffffffff8107b098>] __do_softirq+0x198/0x430
[  157.114285] softirqs last disabled at (5730443): [<ffffffff8107b71d>] irq_exit+0xcd/0xe0
[  157.114285] CPU: 0 PID: 146 Comm: md123_raid1 Not tainted 3.13.0-0.rc1.git0.1.fc21.x86_64+debug #1
[  157.114285] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  157.114285] task: ffff88001844a5f0 ti: ffff8800184da000 task.ti: ffff8800184da000
[  157.114285] RIP: 0010:[<ffffffff8175f92b>]  [<ffffffff8175f92b>] _raw_spin_unlock_irqrestore+0x3b/0x70
[  157.114285] RSP: 0018:ffff8800184dbcc8  EFLAGS: 00000296
[  157.114285] RAX: ffff88001844a5f0 RBX: ffff8800184dbc60 RCX: 0000000000000000
[  157.114285] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000296
[  157.114285] RBP: ffff8800184dbcd8 R08: 0000000000000000 R09: 0000000000000000
[  157.114285] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88001844ad90
[  157.114285] R13: 0000000000000002 R14: ffffffff810b6a38 R15: ffff8800184dbc40
[  157.114285] FS:  0000000000000000(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000
[  157.114285] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  157.114285] CR2: 00007f6f943bc000 CR3: 00000000198aa000 CR4: 00000000000006f0
[  157.114285] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  157.114285] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[  157.114285] Stack:
[  157.114285]  ffff880019835098 0000000000000296 ffff8800184dbdd8 ffffffffa021ffd5
[  157.114285]  ffff8800198350e0 ffff880019835068 7fffffffffffffff ffff8800195c7450
[  157.114285]  ffff8800184dbe40 ffffffff81390d9b ffff880019835068 0000000000000001
[  157.114285] Call Trace:
[  157.114285]  [<ffffffffa021ffd5>] raid1d+0x6a5/0xe50 [raid1]
[  157.114285]  [<ffffffff81390d9b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  157.114285]  [<ffffffff815951e8>] md_thread+0x118/0x130
[  157.114285]  [<ffffffff810c6dc0>] ? abort_exclusive_wait+0xb0/0xb0
[  157.114285]  [<ffffffff815950d0>] ? mddev_unlock+0xe0/0xe0
[  157.114285]  [<ffffffff810a01df>] kthread+0xff/0x120
[  157.114285]  [<ffffffff810a00e0>] ? insert_kthread_work+0x80/0x80
[  157.114285]  [<ffffffff8176987c>] ret_from_fork+0x7c/0xb0
[  157.114285]  [<ffffffff810a00e0>] ? insert_kthread_work+0x80/0x80
[  157.114285] Code: 55 08 48 8d 7f 18 53 48 89 f3 be 01 00 00 00 e8 1c 4d 97 ff 4c 89 e7 e8 d4 81 97 ff f6 c7 02 74 1f e8 0a 22 97 ff 48 89 df 57 9d <66> 66 90 66 90 5b 41 5c 65 ff 0c 25 60 c9 00 00 5d c3 0f 1f 00
Comment 1 Neil Brown 2013-11-26 04:19:33 UTC
Thanks for the report.
Was the array performing a resync or recovery at the time?
Comment 2 Richard W.M. Jones 2013-11-26 08:36:22 UTC
Quite likely.  Note this is a test program which rapidly creates
and stops the array.  You can see the test program here:

https://github.com/libguestfs/libguestfs/blob/master/tests/md/test-mdadm.sh

and you can see the actual commands that it executes by looking
at the log file attached to this bug.
Comment 3 Richard W.M. Jones 2013-11-26 08:39:27 UTC
So in this case it looks as if the scenario is:

- Add a four disk MD array to a booting guest.
- Immediately run 'mdadm --stop /dev/mdXXX' as soon as the guest
  has booted.

The mdadm command hangs, whereas before recent changes it did not hang.
Comment 4 Neil Brown 2013-11-27 03:33:23 UTC
I think I've found it.
The bug was caused by the introduction of the MD_STILL_CLOSED flag.
This should fix it.
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b6b7a2866c9e..e60cebf3f519 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7777,7 +7777,7 @@ void md_check_recovery(struct mddev *mddev)
 	if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
 		return;
 	if ( ! (
-		(mddev->flags & ~ (1<<MD_CHANGE_PENDING)) ||
+		(mddev->flags & MD_UPDATE_SB_FLAGS & ~ (1<<MD_CHANGE_PENDING)) ||
 		test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
 		test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
 		(mddev->external == 0 && mddev->safemode == 1) ||

I wonder why I couldn't reproduce it under qemu-kvm.
I under understand what is happening correctly, the bug should cause the md123_raid1 thread to spin for a short while until the mdadm thread calls md_unregister_thread, at which point the md123_raid1 thread should just exit.

Please confirm that this patch fixes your problem.

Thanks
Comment 5 Richard W.M. Jones 2013-11-27 17:28:14 UTC
Yes, this patch fixes the test in the libguestfs test suite on
my Fedora Rawhide machine.

$ make -C tests/md check LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 TESTS=test-mdadm.sh
make: Entering directory `/home/rjones/d/libguestfs/tests/md'
make  check-TESTS
make[1]: Entering directory `/home/rjones/d/libguestfs/tests/md'
310 seconds: ./test-mdadm.sh
PASS: test-mdadm.sh
=============
1 test passed
=============
make[1]: Leaving directory `/home/rjones/d/libguestfs/tests/md'
make: Leaving directory `/home/rjones/d/libguestfs/tests/md'
Comment 6 Neil Brown 2013-11-27 23:36:52 UTC
Thanks.  I'll send the patch upstream.

I would close this bug too, but is seems I cannot.  I cannot even assign it to me.... ho hum.
Comment 7 Richard W.M. Jones 2013-11-28 08:44:13 UTC
Fixed by pull rq "[GIT PULL REQUEST]: md fixes for 3.13-rc"