Bug 16220

Summary: md/raid/udev fails to create md partition devices (/dev/mdXpY)
Product: IO/Storage Reporter: Duncan (1i5t5.duncan)
Component: MDAssignee: Neil Brown (neilb)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, neilb, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: config 2.6.35-rc3

Description Duncan 2010-06-15 17:19:04 UTC
Created attachment 26784 [details]
config 2.6.35-rc3

I'm running multiple md/raid devices, mostly raid-1, some partitioned, some not, on the same set of four spindles, identically partitioned such that partitions sd[abcd]N assemble into an md/raid device.

With kernel 2.6.34, the system boots fine, the md partition devices (mdXpY) show up in /dev, and everything mounts normally.

With kernel 2.6.35-rc3, localmount says it can't find various labeled filesystems to mount.

A bit of investigation later, I realize the only /dev/mdXpY devices showing up are the ones for the rootfs, assembled from the kernel command line.  The unpartitioned md devices and the other partitioned md main devices (/dev/mdX) show up, and gdisk (the partitions are GPT, not legacy MBR) sees the partition table, which appears to be intact on the devices, but the /dev/mdXpY devices themselves (other than for the rootfs assembled from the kernel command line) don't show up.

System is a dual Opteron 290, Tyan s2885 mobo, 6 gigs RAM, running Gentoo/~amd64.    All of userspace was recently fresh-rebuilt on gcc-4.5.0, which was used to compile 2.6.35-rc3 as well, tho the 2.6.34 kernel was compiled earlier with gcc 4.4.3.  Related userspace apps: mdadm 3.1.2, util-linux 2.17.2, udev 154.  Filesystems are reiserfs (until btrfs can safely replace it).  md/raid metadata version 0.90.

As far as I can see, the only thing creating the mdXpY devices is the kernel/udev, no funny gentoo or local udev rules or initscript voodoo in that regard.

The kernel config should be attached and I can git bisect.  But first:

1) This isn't already known and bisected, is it?  New report?

2) No known data corruption bombs waiting to blow up the unwary git bisector in the 2.6.35 development series, I hope?  (That's why I wait until rc2 or so before testing, any data corruption bombs should be fixed by then, hopefully.)
Comment 1 Andrew Morton 2010-06-15 19:18:05 UTC
I'll mark this as a regression.

No, I haven't seen previous reports of this.

Yes please, a bisection would be great.  No, it won't eat your disks :)
Comment 2 Neil Brown 2010-06-16 03:41:09 UTC
Thanks for the report.
The offending commit is 

b821eaa572fd737faaf6928ba046e571526c36c6


Proposed (and tested) fix is below.
If you could test and confirm I would appreciate it.

Thanks,
NeilBrown


diff --git a/drivers/md/md.c b/drivers/md/md.c
index 46b3a04..4edcda8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5895,6 +5895,7 @@ static int md_open(struct block_device *bdev, fmode_t mode)
 	atomic_inc(&mddev->openers);
 	mutex_unlock(&mddev->open_mutex);
 
+	check_disk_size_change(mddev->gendisk, bdev);
  out:
 	return err;
 }
Comment 3 Duncan 2010-06-16 10:55:37 UTC
I just finished a bisect (a nice and straightforward testing one, for once =:^) and yes, it's that commit.  I'll test the patch shortly.

BTW, enjoyed your kernel design patterns series on LWN, even if I am more sysadmin than coder! =:^)

Duncan
Comment 4 Duncan 2010-06-16 13:22:45 UTC
Patch confirmed to work. =:^)

Thanks,
Duncan
Comment 5 Rafael J. Wysocki 2010-06-16 20:11:31 UTC
Handled-By : Neil Brown <neilb@suse.de>
Patch : https://bugzilla.kernel.org/show_bug.cgi?id=16220#c2
Comment 6 Neil Brown 2010-06-17 05:28:05 UTC
Thanks for testing.
Patch should appear in -next by the end of the week and in -linus some time later. 

Thanks for the encouragement -- I used to be a sysadmin: it provides a useful perspective....
Comment 7 Rafael J. Wysocki 2010-07-08 22:49:51 UTC
Fixed by commit f3b99be19ded511a1bf05a148276239d9f13eefa .