Bug 6965 - Kernel doesn't re-sync usb-raid1 after running mdadm --add
Summary: Kernel doesn't re-sync usb-raid1 after running mdadm --add
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: MD (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Neil Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-06 01:18 UTC by Adrian Ulrich
Modified: 2008-03-26 23:15 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.18-rc3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
mdadm output after adding sda2 (123 bytes, text/plain)
2006-08-06 03:42 UTC, Adrian Ulrich
Details
output of mdadm --examine before adding sda2 (1.56 KB, text/plain)
2006-08-06 03:43 UTC, Adrian Ulrich
Details
Patch to avoid this problem (1.58 KB, patch)
2006-08-06 16:01 UTC, Neil Brown
Details | Diff
Same patch against 2.6.18-rc3 (1.66 KB, patch)
2006-08-07 16:01 UTC, Neil Brown
Details | Diff

Description Adrian Ulrich 2006-08-06 01:18:15 UTC
Most recent kernel where this bug did not occur: 2.6.18-rc2
Distribution: Slackintosh-Current
Hardware Environment:

Mac-Mini (ppc) : Internal Harddisk (rootfs) mirrored to an external
USB2 Harddrive

Problem Description:

The internal harddrive of my MacMini is mirrored to an external
USB2-Harddrive via md-raid1.

Because the kernel doesn't attach the usb-device (sda2) to my
raid (md0) itself (after a reboot) i'm using something like this in my
/etc/rc.d/rc.local:

> #!/bin/sh
> postfix start  (<-- note: postfix ist started before attaching sda2)
> /sbin/mdadm --manage /dev/md0 --add /dev/sda2

This worked well with kernels below 2.6.18-rc2 :
A full sync from hda3 -> sda2 was started after each reboot.

Beginning with 2.6.18-rc2 the kernel didn't sync
the drive after attaching it: It was added to the raid without
a full resync.

According to 'iostat' the kernel does write to both devices
*after* mdadm attached the harddrive but it seems like everything
written to md0 before attaching sda2 will never make it to sda2 :-( :

Example:
--------
mdadm --manage /dev/md0 -f /dev/hda3
mdadm --manage /dev/md0 -r /dev/hda3 #sda2 is now the only active disk of md0
mount /dev/hda3 /mnt/bla
# cat /var/spool/postfix/pid/master.pid 
             649
# cat /mnt/bla/var/spool/postfix/pid/master.pid
             413 <-- ouch!
Comment 1 Neil Brown 2006-08-06 01:52:09 UTC
On Sunday August 6, bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=6965
> 
>            Summary: Kernel doesn't re-sync usb-raid1 after running mdadm --
>                     add
>     Kernel Version: 2.6.18-rc3
> 
> Most recent kernel where this bug did not occur: 2.6.18-rc2

This is very surprising as
   git log v2.6.18-rc2..v2.6.18-rc3 drivers/md/
doesn't list anything.  i.e. there have been no
relevant changes in that time.

There was a change leading up to rc1 that could 
possibly explain the situation. 
 commit 07d84d109d8beedd68df9da2e4e9f25c8217e7fb

If you add a drive to an array, and that drive was previously
part of the array, and no actual writes have happened in the
mean time, then the array will be added with no resync (which
is probably what you want).

It could be that the change t master.pid has not actually
been flushed to disk at the time when you break the array
and remount the usb drive.

Could you try the same experiment, only give the command
  sync ; sync
before failing and removing /dev/hda3 ?
Let me know what happens.

Thanks,
NeilBrown

Comment 2 Adrian Ulrich 2006-08-06 03:40:31 UTC
> This is very surprising as
>   git log v2.6.18-rc2..v2.6.18-rc3 drivers/md/
> doesn't list anything. 

hmm..
It's possible that this happened before rc2 .. maybe i just didn't notice it..


> It could be that the change t master.pid has not actually
> been flushed to disk at the time when you break the array
> and remount the usb drive.

Postfix (..and linux) was up since 3 days before i've broken the array: This
means that master.pid was unchanged since then; So the system would have had
enough time to write the file to both disks :-/

Anyway:
I changed my init-script to:
 sleep 5 &&  mdadm --examine /dev/sda2 >> /tmp/exlog ; mdadm --examine /dev/hda3
>> /tmp/exlog ; /sbin/mdadm --manage /dev/md0 --add /dev/sda2 ; cat /proc/mdstat
> /tmp/MDX & (see attachments)

and rebooted.

sda2 was added without a resync.

Then i ran:
# sync ; sync
# mdadm --manage /dev/md0 -f /dev/sda2
# mdadm --manage /dev/md0 -r /dev/sda2

and mounted sda2 to /mnt/floppy:

#cat /var/spool/postfix/pid/master.pid  <-- md0 / hda3
             649
#cat /mnt/floppy/var/spool/postfix/pid/master.pid <-- sda2
             653

*bummer* :-(
Comment 3 Adrian Ulrich 2006-08-06 03:42:45 UTC
Created attachment 8717 [details]
mdadm output after adding sda2
Comment 4 Adrian Ulrich 2006-08-06 03:43:35 UTC
Created attachment 8718 [details]
output of mdadm --examine before adding sda2
Comment 5 Neil Brown 2006-08-06 05:08:44 UTC
On Sunday August 6, bugme-daemon@bugzilla.kernel.org wrote:
> 
> *bummer* :-(

Yeah.... I have a hunch.  I'll look closer in the morning and get
you a patch.

NeilBrown

Comment 6 Neil Brown 2006-08-06 16:01:52 UTC
Created attachment 8721 [details]
Patch to avoid this problem

This patch should fix the problem,  With it in place, you should always
get a resync if any updates have happened sync the array wa broken.

BTW, you might like to try adding a bitmap to your array.
  mdadm --grow /dev/mdX --bitmap=internal
then when you add the USB drive back in, it will only update sections of
the drive that need to be updated, and so the resync should be much faster
and just as safe.
Comment 7 Adrian Ulrich 2006-08-07 11:00:22 UTC
Thanks for the patch.. but: against what kernel version did you diff?

I've tried to apply the diff to 2.6.17.8, 2.6.18rc3 and even
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git but failed.
Comment 8 Neil Brown 2006-08-07 16:01:05 UTC
Created attachment 8729 [details]
Same patch against 2.6.18-rc3

Sorry, that was against latest -mm.  I had forgotten that there have
been a substantial change in that area of code recently.

This patch is against 2.6.18-rc3 and should apply to subsequent rc's.
The bug is not in 2.6.17.8 so no patch is needed there.
Comment 9 Adrian Ulrich 2006-08-08 09:01:45 UTC
The patch fixed the problem:

Rebooted a few minutes ago and the kernel started a full-resync as expected :-)

Thanks!
Comment 10 Adrian Ulrich 2006-08-09 10:53:11 UTC
Neil: btw:

using the internal bitmap wasn't such a good idea:

md0: bitmap file is out of date (0 < 20841588) -- forcing full recovery
md0: bitmap file is out of date, doing full recovery
md0: bitmap initialized from disk: read 10/10 pages, set 305174 bits, status: 0
created bitmap (150 pages) for device md0  <-- created bitmap
raid1: Disk failure on sda2, disabling device.  <-- failed a device
        Operation continuing on 1 devices
md: cannot remove active disk sda2 from md0 ... 
RAID1 conf printout: <-- removed it
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:hda3
 disk 1, wo:1, o:0, dev:sda2
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:hda3
md: unbind<sda2>
md: export_rdev(sda2)
 
md: bind<sda2> <-- and re-added it .. -> boom!
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:hda3
 disk 1, wo:1, o:1, dev:sda2
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for reconstruction.
md: using 128k window, over a total of 19531136 blocks.
kernel BUG in sync_request at drivers/md/raid1.c:1748!
Oops: Exception in kernel mode, sig: 5 [#1]

Modules linked in: cls_fw sch_htb aes loop bsd_comp ppp_synctty ppp_async
ppp_generic slhc xt_length xt_MARK ipt_REJECT ipt_MASQUERADE xt_state ipt_LOG
iptable_mangle iptable_nat ip_nat iptable_filter
NIP: C01C6358 LR: C01C6328 CTR: 00000000
REGS: c64a7db0 TRAP: 0700   Not tainted  (2.6.18-rc3)
MSR: 00029032 <EE,ME,IR,DR>  CR: 24008024  XER: 00000000
TASK = dffdb220[18445] 'md0_resync' THREAD: c64a6000
GPR00: 00000000 C64A7E60 DFFDB220 00000001 00000000 C64A7E68 00000000 C01C5074 
GPR08: D54EA914 00000001 00000000 00000001 84008042 00000000 00000000 00000000 
GPR16: 00000000 00000000 00000000 00000001 00000000 02540B00 00001000 00000000 
GPR24: C1942000 00000000 00000001 C1936420 00000002 00000000 D54EA8E0 CE6466E0 
NIP [C01C6358] sync_request+0x3a0/0x59c
LR [C01C6328] sync_request+0x370/0x59c
Call Trace:
[C64A7E60] [C01C6328] sync_request+0x370/0x59c (unreliable)
[C64A7EB0] [C01CFD2C] md_do_sync+0x404/0x830
[C64A7F80] [C01CBFEC] md_thread+0x114/0x138
[C64A7FC0] [C003E618] kthread+0xc8/0x104
[C64A7FF0] [C0012A94] kernel_thread+0x44/0x60
Instruction dump:
2f830000 409e001c 801b0044 2f800000 409e0010 80180108 70050040 41820120 
80010008 21200007 39200000 7d294914 <0f090000> 5400482c 7f960000 409d0008 
Comment 11 Neil Brown 2006-08-16 21:36:37 UTC
Hmmm... very uncool.

I cannot see how that could possibly happen, but nor can I see just
now why that condition should be a BUG.....

Is this repeatable?

I'll put it on my list to worry about when I have a bit more brain space.
Comment 12 Adrian Ulrich 2006-08-16 23:14:17 UTC
> Is this repeatable?

Yes: I rebooted 2-3x and adding sda always oopsed the kernel.

Removing the Bitmap from md0 cured it..
Comment 13 Dalvenjah FoxFire 2006-09-24 23:19:05 UTC
Just wanted to throw 2c in, though not sure if it's useful info. Let me know if I can help with more info 
or testing.

Hardware: dual 850MHz PIII, 1GB RAM, dual PATA ide controllers on motherboard with two 60GB drives 
and two 300GB drives, one of each on each motherboard channel. (Root/swap/etc is on two mirrored 
drives on a SCSI controller; the IDE drives are for data only.)

I was running RAID1 under 2.6.12.2, and upgraded to 2.6.18 to be able to user-trigger raid scrubs 
(echo check > /sys/block/md5/md/sync_action). This ended up sending a temperamental drive over 
the edge (one of the 60GBs); I replaced it with a new one and did the mdadm /dev/md5 --add /dev/
hdd1 command to add the new member to the array.

The kernel immediately called the drive good, and didn't perform a recovery sync. Additionally, 
requesting a sync with the above 'echo check' command didn't work either; with the array broken and 
the good drive as the only member, fsck -f -n returns a clean filesystem, while with both members in 
the array (before and after the requested sync) the fsck -f -n returns many errors.

Removing the new drive, zeroing the superblock on the new drive and re-adding it didn't help. It would 
not try to recover at all. md would never do a 'recovery', only a 'resync' (according to /proc/mdstat).

I just finished trying to reconstruct under 2.6.12.2; after zeroing the superblock and doing the mdadm 
-a command, it knew what to do and did a 'recovery' (as seen by /proc/mdstat) just fine.
Comment 14 Natalie Protasevich 2007-06-24 18:51:52 UTC
Any update on this bug?
Thanks.
Comment 15 Natalie Protasevich 2008-03-26 23:15:25 UTC
Since no recent activity, closing the bug. (it looks like the problem has been resolved.)

Note You need to log in before you can comment on or make changes to this bug.