Bug 102691 - BTRFS corruption, "failed to read system array"
Summary: BTRFS corruption, "failed to read system array"
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-11 18:03 UTC by Timothy Miller
Modified: 2016-08-02 11:57 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.1.0 and 4.1.4
Tree: Mainline
Regression: No


Attachments
dmesg output (25.22 KB, application/octet-stream)
2015-08-11 18:03 UTC, Timothy Miller
Details
New dmesg output after failed "delete missing" (31.69 KB, application/octet-stream)
2015-08-12 17:55 UTC, Timothy Miller
Details

Description Timothy Miller 2015-08-11 18:03:12 UTC
Created attachment 184691 [details]
dmesg output

Below is what I posted to the btrfs mailing list.  I was asked create a bug report and attach my entire dmesg output, so I'm doing that here.


I have a four-drive RAID1 array, and since yesterday, some problem has
rendered it unmountable (read/write anyhow).  One drive reports a read
error, so maybe the drive is failing, but I've had that happen before,
and it was easy to swap in a new drive.  This time, two more drives
are reporting that they "failed to read the system array."  I managed
to mount it read-only (by specifying the node of the fourth drive) and
rsync everything to a backup drive.  Now I'd like to try to repair.
This is where I'm running into problems.  Since I can't mount it
read-write, I can't do a scrub, so I tried "btrfs check --repair", and
this is what I got:

# btrfs check --repair /dev/sde
enabling repair mode
Checking filesystem on /dev/sde
UUID: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
checking extents
ref mismatch on [1667931533312 524288] extent item 1, found 2
attempting to repair backref discrepency for bytenr 1667931533312
Ref doesn't match the record start and is compressed, please take a
btrfs-image of this file system and send it to a btrfs developer so
they can complete this functionality for bytenr 1667931639808
failed to repair damaged filesystem, aborting

Since this specifically told me to contact a developer, I figured this
is something you guys want to know about.  :)

Also, I was wondering if perhaps someone can help me figure out how to
repair it.

There are only two files that appear to be unrecoverable when I rsync,
and I can restore those from an earlier backup.  Since I can't mount
read/write, I can't go and delete those files, so I seem to be stuck.
Comment 1 Timothy Miller 2015-08-12 17:54:28 UTC
One of the devices was throwing bad sectors, so I mounted the array degraded, added another device, and then did a "delete missing."  It ran for a VERY long time, and then aborted.  I wasn't able to catch the stdout, but I got a ton of messages in dmesg.  I'm attaching everything I have.
Comment 2 Timothy Miller 2015-08-12 17:55:02 UTC
Created attachment 184731 [details]
New dmesg output after failed "delete missing"

Note You need to log in before you can comment on or make changes to this bug.