Bug 28652

Summary: Hang-ups and various DOS started by accessing partition of non-existent partitionable RAID array
Product: IO/Storage Reporter: hkmaly
Component: MDAssignee: Neil Brown (neilb)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, florian, neilb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30,2.6.32 Subsystem:
Regression: No Bisected commit-id:

Description hkmaly 2011-02-08 22:39:37 UTC
Found while repartitioning disks:
On kernel 2.6.30 mdadm -D /dev/md_d1p1 hangs on open of md_d1p1 when md_d1 is not started. After that, even access to /dev/md_d1 hangs up. Attempt to shutdown or reboot then oops in do_md_stop (kernel_shutdown_prepare is little higher on stack, so I assume it's really kernel and not some auto-stop program).
Tested also tested on Ubuntu 2.6.32-28-generic-pae, result was mdadm stopped and taking 100% cpu (difference might be kernel version or some hal/udev).

I can do more experiments but I don't know what would be usefull and I assume this is easily reproducable whenever you stop relying on udev (when using udev, devices for nonexistent arrays doesn't appear).
Comment 1 hkmaly 2011-02-08 23:03:30 UTC
Note that acording to strace the hang up is immediately on open ... fsck /dev/md_d1p1 will cause same problems (and in fact I've just found out that the fsck originally started that).
Comment 2 Neil Brown 2011-02-09 00:38:46 UTC
md_open is repeatedly returning ERESTARTSYS which is causing an endless loop.

Don't know why yet - but I thought I would let you know I was looking.
Thanks.
Comment 3 Neil Brown 2011-02-09 01:40:34 UTC
This patch appears to fix the problem for me.
If you are in a position to test and confirm I would appreciate.
However I am fairly confident and will send it upstream shortly.

Thanks.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0cc30ec..1d87668 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -553,6 +553,9 @@ static mddev_t * mddev_find(dev_t unit)
 {
 	mddev_t *mddev, *new = NULL;
 
+	if (unit && MAJOR(unit) != MD_MAJOR)
+		unit &= ~((1<<MdpMinorShift)-1);
+
  retry:
 	spin_lock(&all_mddevs_lock);
Comment 4 Florian Mickler 2011-03-04 23:49:57 UTC
merged for .38-rc7: 

commit 8f5f02c460b7ca74ce55ce126ce0c1e58a3f923d
Author: NeilBrown <neilb@suse.de>
Date:   Wed Feb 16 13:58:51 2011 +1100

    md: correctly handle probe of an 'mdp' device.