Bug 102881 - "add" then "delete missing" failed due to bad replacement drive, now can't mount because "too many missing devices"
Summary: "add" then "delete missing" failed due to bad replacement drive, now can't mo...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-14 19:02 UTC by Timothy Miller
Modified: 2016-03-20 10:02 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.1.4
Tree: Mainline
Regression: No


Attachments
detail steps (3.38 KB, text/plain)
2015-08-14 19:43 UTC, Chris Murphy
Details

Description Timothy Miller 2015-08-14 19:02:46 UTC
Here's my situation:

- I have a 4-drive RAID1 configuration.
- I had a drive fail, so I removed it and mounted degraded.
- I hooked up a replacement drive, did an "add" on that one, and did a
"delete missing".
- During the rebalance, the replacement drive failed, there were OOPSes, etc.
- Now, although all of my data is there, I can't mount degraded,
because btrfs is complaining that too many devices are missing (3 are
there, but it sees 2 missing).

According to Chris Murphy on linux-btrfs, this my situation is definitely an exception to the rule that a RAID1 can't handle more than 1 missing device, because a failure occurred during a restore operation.  All of my data is there, but btrfs is assuming without context that since two devices are "missing," it cannot mount read/write.  What it isn't taking into account is that the second missing isn't really "missing" since it was never fully added -- it was in the middle of a restore when it failed.

This can't be the first time that a replacement drive has turned out to be DOA, so there should be a way to pretend that that restore never got started.  Also, could I get some suggestions about how to get out of this mess?  Nobody on the mailing list has yet popped up with a solution.

Thanks.
Comment 1 Chris Murphy 2015-08-14 19:43:56 UTC
Created attachment 184931 [details]
detail steps

Easily reproducible with 2 device RAID 1 also. These are the summary steps, detail steps as attachment.

# mkfs.btrfs -draid1 -mraid1 /dev/sd[bc]
# mount /dev/sdb /mnt
# cp -a /usr /mnt
# poweroff
### remove /dev/sdc and replace with new device, then startup
# mount -o degraded /dev/sdb /mnt
# btrfs dev add /dev/sdc /mnt
# btrfs dev delete missing /mnt
# poweroff -f     ### during the rebuild
### remove /dev/sdc (devid 3), and reboot
# mount -o degraded /dev/sdb /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
# dmesg
...
[   83.381006] BTRFS info (device sdb): allowing degraded mounts
[   83.381011] BTRFS info (device sdb): disk space caching is enabled
[   83.381013] BTRFS: has skinny extents
[   83.384056] BTRFS warning (device sdb): devid 3 missing
[   83.387611] BTRFS: too many missing devices, writeable mount is not allowed
[   83.409239] BTRFS: open_ctree failed

The work around is -o ro,degraded and using Btrfs send/receive to create a new volume. Still, it's a pernicious bug.

Note You need to log in before you can comment on or make changes to this bug.