Bug 2316

Summary: Resync of Linux-Software-RAID1 swamps out access to file system
Product: IO/Storage Reporter: Hans-Peter Bock (xbk)
Component: MDAssignee: Neil Brown (neilb)
Status: REJECTED WILL_NOT_FIX    
Severity: normal CC: protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13 Subsystem:
Regression: --- Bisected commit-id:

Description Hans-Peter Bock 2004-03-16 08:08:17 UTC
Distribution: Debian testing
Hardware Environment:
* Asus P4P800 Deluxe
* 2x Promise FastTrakS150TX4
* 6x SATA-HDD

Software Environment:
* vanilla-kernel 2.6.4
* gcc version 3.3.3 (Debian)

Problem Description:
normal processes block on file access a short time after starting a
resynchronisation of a RAID-1

Steps to reproduce:
* /boot is on /dev/md0
* / is on /dev/md1
* create (50GB) RAID-1 with "mdadm --create /dev/md1 --level=raid1
--raid-disks=2 /dev/sda7 /dev/sdc7"
* run (for example) "while [ 1 ]; do clear; cat /proc/mdstat; sleep 1; done";
this will block after a short period of time
* pings to and portscans of the RAID-machine still work
* after the resynchronization has finished, everything continues as usual

This problem does not seem to exist in vanilla kernel 2.6.3
Comment 1 Hans-Peter Bock 2004-03-16 08:40:43 UTC
The "/dev/md1" in the "mdadm --create" command is a typo and should be "/dev/md2".
Comment 2 Alasdair G Kergon 2005-07-29 07:01:24 UTC
Is this still an issue or can we close this?
Comment 3 Hans-Peter Bock 2005-10-10 02:22:51 UTC
I'm sorry, but it's still an issue, as I had to notice last saturday.
Comment 4 Neil Brown 2005-10-10 02:41:07 UTC
1/ May I introduce the progam 'watch' to you? very useful for watching
  /proc/mdstat.

2/ Could you get me a copy of /proc/mdstat just before it stops responding?

3/ Can you try
   dd if=/dev/sda7 of=/dev/sdc7 bs=1024k &
  (to effectively do the copy by hand)
  and see how responsive the system is while that is happening?
 (This will ofcourse make a mess of /dev/sdc7, don't do it if you
  value the data there).

Thanks,
NeilBrown
Comment 5 Hans-Peter Bock 2005-10-11 13:10:37 UTC
1/ I recently got introduced to watch, but thank you for the hint. =;)

3/ ok, let's see:

$ sudo mdadm /dev/md2 -r /dev/sde7
$ sudo dd bs=1024k if=/dev/sda7 of=/dev/sde7
This is not an issue.

$ sudo dd bs=1024k if=/dev/sdc7 of=/dev/sde7
This also is not an issue.

2/

$ sudo mdadm /dev/md2 -a /dev/sde7
$ sudo mdadm /dev/md2 -f /dev/sdc7; sudo sudo mdadm /dev/md2 -r /dev/sdc7; sudo
mdadm /dev/md2 -a /dev/sdc7

Every 2s: cat /proc/mdstat                              Tue Oct 11 21:53:19 2005

Personalities : [raid1] [raid5] [raid6]
md2 : active raid1 sdc7[2] sde7[3] sda7[0]
      51375744 blocks [2/1] [U_]
      [>....................]  recovery =  0.2% (130048/51375744) finish=19.6min
 speed=43349K/sec

md4 : active raid1 sde8[2] sdc8[0] sda8[1]
      51375744 blocks [2/2] [UU]

md6 : active raid1 sdc9[2] sde9[0] sda9[1]
      51375744 blocks [2/2] [UU]
[...]

Every 2s: cat /proc/mdstat                              Tue Oct 11 22:08:59 2005

Personalities : [raid1] [raid5] [raid6]
md2 : active raid1 sdc7[2] sde7[3] sda7[0]
      51375744 blocks [2/1] [U_]
      [============>........]  recovery = 60.5% (31095168/51375744) finish=8.0mi
n speed=41885K/sec
[...]

This is an issue. The cat blocks on reading /proc/mdstat often for several
seconds sometimes up to some minutes. Access to the filesystems also blocks
during that.

BTW: All this is not an issue on another machine, which has 2 IDE drives and is
running kernel 2.4.27.
Comment 6 Hans-Peter Bock 2005-10-21 11:11:59 UTC
The problem does not seem to exist any longer when using the cfq I/O scheduler
instead of the anticipatory i/o scheduler.
Comment 7 Natalie Protasevich 2007-10-16 05:15:07 UTC
Hans-Peter,
Does this sufficiently resolve the problem for you? Is this still true with current kernel? Then we can close this bug, if no objections.
Thanks.
Comment 8 Hans-Peter Bock 2007-10-16 05:25:32 UTC
Hello Natalie,
yes, this workaround solves the problem for me.
Best regards, Hans-Peter
Comment 9 Natalie Protasevich 2008-03-03 18:41:49 UTC
Closing the bug. If someone objects and wishes to look more into scheduler, please reopen.