Bug 11967

Summary: md raid10 fails to resync when disks added
Product: IO/Storage Reporter: David Bronaugh (bronaugh)
Component: MDAssignee: Neil Brown (neilb)
Status: RESOLVED CODE_FIX    
Severity: blocking    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc3 Subsystem:
Regression: Yes Bisected commit-id:

Description David Bronaugh 2008-11-06 18:11:00 UTC
Latest working kernel version: 2.6.26
Earliest failing kernel version: 2.6.27
Distribution: Debian
Hardware Environment: Intel P35 chipset based board, 6x Seagate 1.5T disks
Software Environment: mdadm - v2.6.7.1 - 15th October 2008
Problem Description: When disks are removed from a raid10 set, then readded, the raid10 driver marks the disks as spare and doesn't resync.
See also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285156

Steps to reproduce:
mdadm /dev/md<x> --remove <some device>
mdadm /dev/md<x> --add <same device>
cat /proc/mdstat

Relevant dmesg spew:
[257494.006276] md: bind<sdb3>
[257494.036562] RAID10 conf printout:
[257494.036589]  --- wd:4 rd:6
[257494.036613]  disk 0, wo:0, o:1, dev:sdc3
[257494.036638]  disk 1, wo:0, o:1, dev:sdd3
[257494.036663]  disk 3, wo:0, o:1, dev:sde3
[257494.036687]  disk 5, wo:0, o:1, dev:sda3
[257494.037095] md: recovery of RAID array md4
[257494.037126] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[257494.037156] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[257494.037238] md: using 128k window, over a total of 995904 blocks.
[257494.037267] md: resuming recovery of md4 from checkpoint.
[257494.037294] md: md4: recovery done.
[257494.048583] RAID10 conf printout:
[257494.048608]  --- wd:4 rd:6
[257494.048631]  disk 0, wo:0, o:1, dev:sdc3
[257494.048655]  disk 1, wo:0, o:1, dev:sdd3
[257494.048679]  disk 3, wo:0, o:1, dev:sde3
[257494.048705]  disk 5, wo:0, o:1, dev:sda3
[257494.056925] RAID10 conf printout:
[257494.056959]  --- wd:4 rd:6
[257494.056981]  disk 0, wo:0, o:1, dev:sdc3
[257494.057011]  disk 1, wo:0, o:1, dev:sdd3
[257494.057040]  disk 3, wo:0, o:1, dev:sde3
[257494.057065]  disk 5, wo:0, o:1, dev:sda3
Comment 1 Neil Brown 2008-11-06 18:56:28 UTC
Thanks for the report.

This is fixed by

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a53a6c85756339f82ff19e001e90cfba2d6299a8

which has just been committed to mainline and should go into -stable in due course.
Comment 2 Neil Brown 2008-11-06 18:56:56 UTC
Closing as code fix is available.