Bug 68181

Summary:	mdadm/raid10/md_mod segfaults during the mirror resync
Product:	IO/Storage	Reporter:	Damian Nowak (spam)
Component:	MD	Assignee:	Neil Brown (neilb)
Status:	RESOLVED CODE_FIX
Severity:	blocking	CC:	mateusz, neilb, spam
Priority:	P1
Hardware:	x86-64
OS:	Linux
Kernel Version:	3.12.6-1-ARCH	Subsystem:
Regression:	Yes	Bisected commit-id:
Attachments:	dmesg with mdadm resync segfault kernel panic stacktrace screenshot (incomplete) full dmesg since server start (second time) another kernel panic screenshot with a different stacktrace part visible kernel panic from dmesg saved by systemd-journald kernel panic screenshot

Description Damian Nowak 2014-01-05 17:50:57 UTC

Created attachment 121001 [details]
dmesg with mdadm resync segfault

Hey, I am using the latest kernel 3.12.6 built by Arch Linux. Recently one of the disks broke, and I replaced it with a new one. However, the mirror doesn't want to resync because of a problem with mdadm/raid10/md_mod. At some point, which is exactly the same every time, mdadm always segaults with the same stacktrace with md_mod and raid10 listed. After that /proc/mdstat is no longer updated. The machine is usable for some time after this happens, sometimes 5 minutes, sometimes an hour. Eventually it kernel panics with ext4, dm_mod, md_mod and raid10 listed in the stacktrace.

The most relevant parts of logs and command outputs are inlined. Full logs are attached.

mdadm segfault stacktrace:

[ 6176.001354] Call Trace:
[ 6176.018596]  [<ffffffffa00eb454>] md_do_sync+0x704/0xdf0 [md_mod]
[ 6176.036266]  [<ffffffff81085e00>] ? wake_up_atomic_t+0x30/0x30
[ 6176.054007]  [<ffffffffa00e761d>] md_thread+0x12d/0x170 [md_mod]
[ 6176.071752]  [<ffffffffa00e74f0>] ? md_register_thread+0xe0/0xe0 [md_mod]
[ 6176.089436]  [<ffffffff81084fe0>] kthread+0xc0/0xd0
[ 6176.106732]  [<ffffffff81084f20>] ? kthread_create_on_node+0x120/0x120
[ 6176.123869]  [<ffffffff814fcffc>] ret_from_fork+0x7c/0xb0
[ 6176.141006]  [<ffffffff81084f20>] ? kthread_create_on_node+0x120/0x120
(...)
[ 6176.194823] RIP  [<ffffffffa01146af>] sync_request+0x5ff/0x1990 [raid10]
[ 6176.329888] ---[ end trace 630b14e1fb4a7af4 ]---
(...)
[ 6176.619024] Call Trace:
[ 6176.637172]  [<ffffffff814ee4fb>] dump_stack+0x54/0x8d
(...)
[ 6176.877469]  [<ffffffff814f5dc8>] page_fault+0x28/0x30
[ 6176.893671]  [<ffffffffa0113bb8>] ? r10buf_pool_free+0xa8/0xc0 [raid10]
[ 6176.910082]  [<ffffffffa01146af>] ? sync_request+0x5ff/0x1990 [raid10]
[ 6176.926580]  [<ffffffffa0114680>] ? sync_request+0x5d0/0x1990 [raid10]
[ 6176.942758]  [<ffffffffa00eb454>] md_do_sync+0x704/0xdf0 [md_mod]
[ 6176.958823]  [<ffffffff81085e00>] ? wake_up_atomic_t+0x30/0x30
[ 6176.974890]  [<ffffffffa00e761d>] md_thread+0x12d/0x170 [md_mod]
[ 6176.990953]  [<ffffffffa00e74f0>] ? md_register_thread+0xe0/0xe0 [md_mod]
[ 6177.007038]  [<ffffffff81084fe0>] kthread+0xc0/0xd0
[ 6177.023021]  [<ffffffff81084f20>] ? kthread_create_on_node+0x120/0x120
[ 6177.039109]  [<ffffffff814fcffc>] ret_from_fork+0x7c/0xb0
[ 6177.055104]  [<ffffffff81084f20>] ? kthread_create_on_node+0x120/0x120
[ 6177.071253] ---[ end trace 630b14e1fb4a7af5 ]---
(...)


% cat /proc/mdstat

Personalities : [raid1] [raid10] 
md125 : inactive sdd1[1](S)
      10477568 blocks super 1.2
       
md126 : active raid10 sdc2[5] sda2[0] sdd2[4] sdb2[1]
      1932289024 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
      [=======>.............]  recovery = 35.4% (342860848/966144512) finish=5635.3min speed=1843K/sec
      
md127 : active raid1 sda1[0] sdb1[1]
      10477440 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>


# mdadm -D /dev/md125
/dev/md125:
        Version : 1.2
  Creation Time : Mon Dec 10 14:17:59 2012
     Raid Level : raid10
     Array Size : 1932289024 (1842.77 GiB 1978.66 GB)
  Used Dev Size : 966144512 (921.39 GiB 989.33 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sun Jan  5 17:42:43 2014
          State : active, degraded, recovering 
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : near=2
     Chunk Size : 512K

 Rebuild Status : 35% complete

           Name : archiso:STORE
           UUID : 4eff1a86:bf500790:976dbff2:46d8aa69
         Events : 2292466

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync set-A   /dev/sda2
       1       8       18        1      active sync set-B   /dev/sdb2
       5       8       34        2      spare rebuilding   /dev/sdc2
       4       8       50        3      active sync set-B   /dev/sdd2


# for dev in sda sdb sdc sdd; do echo "=> $dev"; smartctl -a /dev/$dev | grep -i 'device model' -B 1; smartctl -a /dev/$dev | grep -i hours; smartctl -a /dev/$dev | grep -i _sector; done
=> sda
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST1000DM003-9YN162
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       9694
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       9587h+04m+37.325s
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
=> sdb
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST1000DM003-9YN162
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       9692
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       9584h+40m+44.948s
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
Error 4 occurred at disk power-on lifetime: 9677 hours (403 days + 5 hours)
Error 3 occurred at disk power-on lifetime: 9677 hours (403 days + 5 hours)
Error 2 occurred at disk power-on lifetime: 9677 hours (403 days + 5 hours)
Error 1 occurred at disk power-on lifetime: 9677 hours (403 days + 5 hours)
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0012   100   001   000    Old_age   Always       -       0
=> sdc
=== START OF INFORMATION SECTION ===
Device Model:     WDC WD1003FZEX-00MK2A0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       30
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
=> sdd
=== START OF INFORMATION SECTION ===
Device Model:     HITACHI HUA7210SASUN1.0T 0817GJABRF
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       20244
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0


The hardware is:

- 400 days old HP ProLiant DL165 G7
- 2x ST1000DM003-9YN162 - 400 days old
- 1x WDC WD1003FZEX-00MK2A0 - 1 day old
- 1x HITACHI HUA7210SASUN1.0T - 200 days old in this server, 600 days old just idling
- mdadm dies when resyncing from Hitachi to WDC


There haven't been any problems with mdadm/raid10/md_mod for a year. Of course it's possible that the disk that data is synced from is broken, and doesn't announce that. However, I don't think md_mod should segfault because of that.


The problem first appeared on 3.11.2. I updated to the latest version to see if it still persists on 3.12.6. It does.

(Mateusz Kwiatkowski or I will answer all your questions)

Comment 1 Damian Nowak 2014-01-05 17:58:01 UTC

Created attachment 121011 [details]
kernel panic stacktrace screenshot (incomplete)

Comment 2 Damian Nowak 2014-01-05 18:00:39 UTC

Created attachment 121021 [details]
full dmesg since server start (second time)

Comment 3 Damian Nowak 2014-01-05 18:04:55 UTC

# uname -a
Linux sml.cumulushost.eu 3.12.6-1-ARCH #1 SMP PREEMPT Fri Dec 20 19:39:00 CET 2013 x86_64 GNU/Linux

# lsmod
Module                  Size  Used by
ebt_arp                 1905  28 
ebt_ip                  1646  14 
vhost_net              10194  14 
vhost                  18526  1 vhost_net
macvtap                10801  1 vhost_net
macvlan                 9645  1 macvtap
xt_conntrack            3273  1 
ipt_REJECT              2449  2 
tun                    19663  32 vhost_net
ip6table_filter         1436  0 
ip6_tables             17056  1 ip6table_filter
ebtable_nat             1796  1 
ebtables               24210  1 ebtable_nat
iptable_filter          1496  1 
ipt_MASQUERADE          2162  4 
bridge                 97094  0 
iptable_nat             3366  1 
stp                     1661  1 bridge
nf_conntrack_ipv4       9206  2 
nf_defrag_ipv4          1411  1 nf_conntrack_ipv4
nf_nat_ipv4             3576  1 iptable_nat
llc                     3729  2 stp,bridge
nf_nat                 12335  3 ipt_MASQUERADE,nf_nat_ipv4,iptable_nat
nf_conntrack           73690  6 ipt_MASQUERADE,nf_nat,nf_nat_ipv4,xt_conntrack,iptable_nat,nf_conntrack_ipv4
xt_CHECKSUM             1239  1 
xt_tcpudp               3151  5 
iptable_mangle          1592  1 
ip_tables              17282  3 iptable_filter,iptable_mangle,iptable_nat
x_tables               17359  13 ip6table_filter,ebt_arp,xt_CHECKSUM,ip_tables,xt_tcpudp,ipt_MASQUERADE,ebt_ip,xt_conntrack,iptable_filter,ebtables,ipt_REJECT,iptable_mangle,ip6_tables
kvm_amd                52575  98 
evdev                  11085  2 
kvm                   388773  1 kvm_amd
microcode              15216  0 
mgag200                35408  1 
igb                   161570  0 
ttm                    63507  1 mgag200
drm_kms_helper         36286  1 mgag200
ptp                     8276  1 igb
drm                   238206  3 ttm,drm_kms_helper,mgag200
psmouse                85460  0 
syscopyarea             3486  1 mgag200
sysfillrect             3818  1 mgag200
sysimgblt               2143  1 mgag200
pps_core                8961  1 ptp
dca                     5152  1 igb
i2c_algo_bit            5399  2 igb,mgag200
pcspkr                  2035  0 
serio_raw               5049  0 
ipmi_si                42664  0 
ipmi_msghandler        32144  1 ipmi_si
button                  4677  0 
sp5100_tco              5824  0 
amd64_edac_mod         21465  0 
edac_core              44969  2 amd64_edac_mod
acpi_cpufreq           10779  1 
k10temp                 3058  0 
i2c_piix4              10592  0 
shpchp                 25465  0 
edac_mce_amd           13094  1 amd64_edac_mod
i2c_core               24164  6 drm,igb,i2c_piix4,drm_kms_helper,mgag200,i2c_algo_bit
processor              24935  1 acpi_cpufreq
sd_mod                 30821  11 
ata_generic             3378  0 
pata_acpi               3395  0 
ahci                   23056  8 
pata_atiixp             4819  0 
libahci                21674  1 ahci
ohci_pci                4040  0 
ehci_pci                4000  0 
ohci_hcd               23937  1 ohci_pci
ehci_hcd               59220  1 ehci_pci
libata                170856  5 ahci,pata_acpi,libahci,ata_generic,pata_atiixp
usbcore               180136  4 ohci_hcd,ohci_pci,ehci_hcd,ehci_pci
scsi_mod              130701  2 libata,sd_mod
usb_common              1656  1 usbcore
raid10                 40470  1 
raid1                  27812  1 
md_mod                106028  4 raid1,raid10
dm_mod                 82501  6 
ext4                  474251  2 
crc16                   1367  1 ext4
mbcache                 6082  1 ext4
jbd2                   83504  1 ext4

Comment 4 Damian Nowak 2014-01-05 18:40:25 UTC

Created attachment 121031 [details]
another kernel panic screenshot with a different stacktrace part visible

Comment 5 Neil Brown 2014-01-05 23:20:39 UTC

Thanks for the report.

The bug was introduced in Linux 3.1 by commit e875ecea266a543e643b19e44cf472f1412708f9

It should be fixed by moving 4 lines of code as shown below.

The fact that you hit this bug means that you have a bad block on /dev/sdd2.
When md tries to read that block while rebuilding /dev/sdc2 it fails and that leads to the code with the bug.
So with the bug fixed, the recovery will simply fail when it gets to that point in the process.

However if that were the case I would expect to see evidence of the read error in the kernel logs, yet we don't.  So I'm a little confused.

Are you able to build a kernel with the follow change applied and tried that?



diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c504e8389e69..bb4334205558 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3218,10 +3218,6 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 			if (j == conf->copies) {
 				/* Cannot recover, so abort the recovery or
 				 * record a bad block */
-				put_buf(r10_bio);
-				if (rb2)
-					atomic_dec(&rb2->remaining);
-				r10_bio = rb2;
 				if (any_working) {
 					/* problem is that there are bad blocks
 					 * on other device(s)
@@ -3253,6 +3249,10 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 					mirror->recovery_disabled
 						= mddev->recovery_disabled;
 				}
+				put_buf(r10_bio);
+				if (rb2)
+					atomic_dec(&rb2->remaining);
+				r10_bio = rb2;
 				break;
 			}
 		}

Comment 6 Damian Nowak 2014-01-05 23:50:51 UTC

Hi Neil, thanks for digging into my issue. I will apply the patch and let you know.

After the error there are a few other entries in dmesg but all of them are irrelevant. Some polkit, sshd and kvm (qemu) logs. So yeah, there are no read errors in the log.

Comment 7 Damian Nowak 2014-01-06 22:55:37 UTC

Hi Neil, not only did this patch resolve the issue. It made the sync go further. Mirror resync is AFTER 35.4% at this moment, and proceeds.

I don't think the disk I sync from is broken. I did `dd if=/dev/disk of=/dev/null` for each disk, and there weren't any read errors. If I am not wrong, mdadm resync works just like `dd` (let me know if I'm wrong), so the conclusion should be there aren't any bad blocks. 

If I'm true, there has be some NPE in the code you patched. Or maybe the patch you provided ignores the bad block (if it exists), instead of failing?

Comment 8 Damian Nowak 2014-01-07 02:34:13 UTC

Hi Niels, resync reached 100%. No errors in dmesg.

There are 25 bad sectors marked by mdadm in /sys/block/md126/md/dev-sdc2/bad_blocks and /sys/block/md126/md/dev-sdd2/bad_blocks. However, I don't know if there they were already there before I first faced the problem described in the issue, because I wasn't aware of bad_blocks file in /sys.

Anyway, mdadm seems to work OK. However, when I try to read one specific file on the disk, I experience kernel panic. This isn't any OS-critical file, just some file with data (qemu sparse image). Reading a file that lays on some bad sectors shouldn't lead to that I guess. ;) 

ext4, dm_mod, md_mod and raid10 are in the stacktrace. Are you able to identify any problems in the code so kernel panic could be avoided? Thanks for your help.


Call Trace:
[<ffffffff814ecb53>] dump_stack+0x54/0x8d
[<ffffffff81061b4d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff81061bbc>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff811d9574>] bio_advance+0xa4/0xb0
[<ffffffffa00e3480>] md_trim_bio+0x30/0xd0 [md_mod]
[<ffffffffa0117c24>] make_request+0x354/0xe90 [raid10]
[<ffffffffa00e898c>] md_make_request+0xec/0x290 [md_mod]
[<ffffffffa00c51ab>] ? dm_request+0x14b/0x200 [dm_mod]
[<ffffffff8126bfe2>] generic_make_request+0xc2/0x110
[<ffffffff8126c0a1>] submit_bio+0x71/0x150
[<ffffffff8119854e>] ? __mem_cgroup_commit_charge+0xce/0x3f0
[<ffffffff811e19df>] do_mpage_readpage+0x48f/0x6c0
[<ffffffff81152cc3>] ? __inc_zone_page_state+0x33/0x40
[<ffffffff81134ee6>] ? add_to_page_cache_locked+0xe6/0x1d0
[<ffffffff811e1cfc>] mpage_readpages+0xec/0x140
[<ffffffffa00344a0>] ? _ext4_get_block+0x1b0/0x1b0 [ext4]
[<ffffffffa00344a0>] ? _ext4_get_block+0x1b0/0x1b0 [ext4]
[<ffffffff8117d889>] ? alloc_pages_current+0xa9/0x160
[<ffffffffa003294a>] ext4_readpages+0x3a/0x50 [ext4]
[<ffffffff81140e76>] __do_page_cache_readahead+0x1b6/0x260
[<ffffffff8114135a>] ondemand_readahead+0x14a/0x2c0
[<ffffffff811415a0>] page_cache_async_readahead+0x80/0xa0
[<ffffffff81135ef8>] generic_file_aio_read+0x468/0x710
[<ffffffff811a2eba>] do_sync_read+0x5a/0x90
[<ffffffff811a347a>] vfs_read+0x9a/0x170
[<ffffffff811a3fc9>] SyS_read+0x49/0xa0
[<ffffffff814fb6ed>] system_call_fastpath+0x1a/0x1f
Call Trace:
[<ffffffffa04acb88>] sd_prep_fn+0x318/0xb80 [sd_mod]
[<ffffffff8126d01b>] blk_peek_request+0xfb/0x2a0
[<ffffffffa012ca90>] scsi_request_fn+0x50/0x4a0 [scsi_mod]
[<ffffffff8106f08b>] ? lock_timer_base.isra.35+0x2b/0x50
[<ffffffff812696f3>] __blk_run_queue+0x33/0x40
[<ffffffff8128c95d>] cfq_insert_request+0x34d/0x560
[<ffffffff8129002f>] ? idr_remove.part.6+0xef/0x250
[<ffffffff81266acc>] __elv_add_request+0x1cc/0x310
[<ffffffff8126d4c7>] blk_flush_plug_list+0x1b7/0x210
[<ffffffff8126d914>] blk_finish_plug+0x14/0x40
[<ffffffffa01198e8>] raid10d+0x9f8/0x1600 [raid10]
[<ffffffff814f2199>] ? schedule+0x29/0x70
[<ffffffff814f36d6>] ? _raw_spin_lock_irqsave+0x6/0x50
[<ffffffffa00e75fd>] md_thread+0x12d/0x170 [md_mod]
[<ffffffff81084d40>] ? wake_up_atomic_t+0x30/0x30
[<ffffffffa00e74d0>] ? md_register_thread+0xe0/0xe0 [md_mod]
[<ffffffff81083f20>] kthread+0xc0/0xd0
[<ffffffff81083e60>] ? kthread_create_on_node+0x120/0x120
[<ffffffff814fb63c>] ret_from_fork+0x7c/0xb0
[<ffffffff81083e60>] ? kthread_create_on_node+0x120/0x120

Comment 9 Damian Nowak 2014-01-07 02:36:10 UTC

Created attachment 121141 [details]
kernel panic from dmesg saved by systemd-journald

Comment 10 Neil Brown 2014-01-13 02:16:54 UTC

yeah .... that certainly shouldn't happen.

It is quite possible that the bad blocks were found during the recovery.
I didn't think you would have a bad-block-log configured - you need a very recent version of mdadm for that.  That is why I though the badblock would abort the recovery.  As you do have the log available the recovery completed.
After a recovery you would expect the bad block lists for bother devices to be identical.
Could  you confirm that  /sys/block/md126/md/dev-sdc2/bad_blocks  and  /sys/block/md126/md/dev-sdd2/bad_blocks contain exactly the same numbers?

In the case where these two lists are  identical, the code which caused the crash shouldn't be reached at all.  But then it shouldn't crash if they are different either.

So if you could  confirm whether they are the same or different before a crash, that would help me know where to focus my search.

Thanks.

Comment 11 Damian Nowak 2014-01-13 02:20:43 UTC

They are the same.

Comment 12 Mateusz Kwiatkowski 2014-01-13 08:08:51 UTC

Hi Neil,
bad_blocks files were identical. I managed to somehow clean list for sdc2 then I tried to read broken file but kernel still panics. After resync bad_blocks are identical again.

I'll attach screenshot from latest kernel panic. This would be the last one, because we must bring this machine back to production.

Comment 13 Mateusz Kwiatkowski 2014-01-13 08:09:33 UTC

Created attachment 121751 [details]
kernel panic screenshot

Comment 14 Neil Brown 2014-01-13 23:30:10 UTC

OK, I discovered why I couldn't find the problem before.  I was looking in the wrong place.  I was looking at raid1.c instead of raid10.c.  Doh.

Following patch should fix it.  Just two very small changes: remove an 's' and add an '_irq'.
It looks like you are the first person to enable the bad-block log on a raid10 array, and get bad blocks logged.   Thanks for the pioneering spirit!!

I realise that you want to actually use you machine and not keep testing, but if you are able to confirm at some stage that this makes a difference, I would really appreciate it.


diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 73dc8a377522..cc9b777de9d8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1319,7 +1319,7 @@ read_again:
 			/* Could not read all from this device, so we will
 			 * need another r10_bio.
 			 */
-			sectors_handled = (r10_bio->sectors + max_sectors
+			sectors_handled = (r10_bio->sector + max_sectors
 					   - bio->bi_sector);
 			r10_bio->sectors = max_sectors;
 			spin_lock_irq(&conf->device_lock);
@@ -1327,7 +1327,7 @@ read_again:
 				bio->bi_phys_segments = 2;
 			else
 				bio->bi_phys_segments++;
-			spin_unlock(&conf->device_lock);
+			spin_unlock_irq(&conf->device_lock);
 			/* Cannot call generic_make_request directly
 			 * as that will be queued in __generic_make_request
 			 * and subsequent mempool_alloc might block

Comment 15 Damian Nowak 2014-01-13 23:49:58 UTC

Neil, thanks for having a look at it. Hope this patch prevents other people from having the same problem. ;-) We may give this patch a go but Mateusz already erased the whole data and set up the mdadm array from scratch. In such case we won't know if the bug is fixed or not unfortunately.

Comment 16 Neil Brown 2014-01-14 05:56:26 UTC

Fair enough.
I have the patch queued to go to Linus tomorrow so we'll assume it is fixed.

Thanks again for reporting and testing.

Comment 17 Damian Nowak 2014-01-14 23:17:36 UTC

@Neil, just out of curiosity - can such things be mocked and unit tested somehow? I am totally unfamiliar with C and available toolkit for this language.

Comment 18 Neil Brown 2014-01-15 02:16:48 UTC

I once went to a talk about C unit tests and mocking, but I've never got closer than that.

mdadm has a small test suite in the source package, but it doesn't cover situations where bad blocks develop.  Maybe it should.