Bug 85011 - Corrupted metadata. 'btrfs check' fails repair due to an assertion failure: '!(root->ref_cows && trans->transid != root->last_trans)'
Summary: Corrupted metadata. 'btrfs check' fails repair due to an assertion failure: '...
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-22 19:36 UTC by bugzilla.kernel.org
Modified: 2014-10-12 17:47 UTC (History)
0 users

See Also:
Kernel Version: 3.16.3
Tree: Mainline
Regression: No


Attachments
output of 'btrfs-debug-tree -b 13415158087680 /dev/sdh' (13.33 KB, text/plain)
2014-09-22 19:36 UTC, bugzilla.kernel.org
Details

Description bugzilla.kernel.org 2014-09-22 19:36:40 UTC
Created attachment 151401 [details]
output of 'btrfs-debug-tree -b 13415158087680 /dev/sdh'

My log files are spammed with these messages until the drive is full:
Sep 22 21:27:38 nasbak kernel: [ 1911.486180] BTRFS critical (device sdg): corrupt leaf, bad key order: block=13415158087680,root=1, slot=7
Sep 22 21:27:38 nasbak kernel: [ 1911.487179] BTRFS critical (device sdg): corrupt leaf, bad key order: block=13415158087680,root=1, slot=7
Sep 22 21:27:38 nasbak kernel: [ 1911.488334] BTRFS critical (device sdg): corrupt leaf, bad key order: block=13415158087680,root=1, slot=7
Sep 22 21:27:38 nasbak kernel: [ 1911.489497] BTRFS critical (device sdg): corrupt leaf, bad key order: block=13415158087680,root=1, slot=7
Sep 22 21:27:38 nasbak kernel: [ 1911.490512] BTRFS critical (device sdg): corrupt leaf, bad key order: block=13415158087680,root=1, slot=7
Sep 22 21:27:38 nasbak kernel: [ 1911.491551] BTRFS critical (device sdg): corrupt leaf, bad key order: block=13415158087680,root=1, slot=7
Sep 22 21:27:38 nasbak kernel: [ 1911.492557] BTRFS critical (device sdg): corrupt leaf, bad key order: block=13415158087680,root=1, slot=7

Memory cards have been tested for 72hours with no problems to be found. Visiting the IRC channel learned me that I have corrupted meta data, which should be fixed by 'btrfs check'. Unfortunately 'btrfs check' stops due to an assertion failure:

# btrfs check --repair /dev/sdh
enabling repair mode
Checking filesystem on /dev/sdh
UUID: 20ccaf09-54ea-486e-9495-9dc91b933e9c
checking extents
bad key ordering 7 8
btrfs: ctree.c:267: __btrfs_cow_block: Assertion '!(root->ref_cows && trans->transid != root->last_trans)' failed.

I'm also unable to provide an image due to another assertion failure:

# btrfs-image -c9 -t4 /dev/sdh btrfs-image
btrfs-image: disk-io.c:155: readahead_tree_block: Assertion `!(ret)' failed.

I'm using the latest tools and latest stable kernel:

# uname -a
Linux computername 3.16.2-031602-generic #201409052035 SMP Sat Sep 6 00:36:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
Btrfs v3.16

Other possible relevant information:

# cat /etc/fstab
UUID=f6162bfd-f793-4541-828c-be31550e0271 /               ext4    discard,noatime,errors=remount-ro 0       1
UUID=20ccaf09-54ea-486e-9495-9dc91b933e9c /home           btrfs   defaults,subvol=home                                       0       0
# sudo mount /home/
# dmesg
[  414.659393] BTRFS info (device sdh): disk space caching is enabled
[  700.423512] init: smbd main process (1006) killed by TERM signal
[  738.454018] BTRFS info (device sdh): disk space caching is enabled
[  739.318381] BTRFS: bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
[  739.318389] BTRFS: bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
[  739.318393] BTRFS: bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 16, gen 0
[  739.318397] BTRFS: bdev /dev/sda errs: wr 0, rd 0, flush 0, corrupt 24, gen 0
# btrfs fi show
Label: 'btrfs_storage_2'  uuid: 20ccaf09-54ea-486e-9495-9dc91b933e9c
	Total devices 6 FS bytes used 5.08TiB
	devid    1 size 2.73TiB used 2.73TiB path /dev/sdb
	devid    2 size 2.73TiB used 2.73TiB path /dev/sda
	devid    3 size 1.82TiB used 1.82TiB path /dev/sdd
	devid    4 size 1.82TiB used 1.82TiB path /dev/sde
	devid    6 size 2.73TiB used 2.73TiB path /dev/sdg
	devid    7 size 2.73TiB used 2.73TiB path /dev/sdh
# btrfs fi df /home
Data, RAID10: total=7.27TiB, used=5.07TiB
System, RAID10: total=64.00MiB, used=1.14MiB
Metadata, RAID10: total=8.12GiB, used=6.90GiB
unknown, single: total=512.00MiB, used=0.00

Attached you'll find the output of the command: '# btrfs-debug-tree -b 13415158087680 /dev/sdh'

I'm now locked into a situation where my filesystem is buggy(locking up), slow and logfiles are continuously filled. There is no way out because btrfs doesn't allow me to delete the files in the affected block or fix the metadata corruption. I'd very much appreciate a patch for btrfs check, or perhaps a workaround which allows me to get rid of the block. I'm very willing to help debug the issue further.

Thanks for your time.
Comment 1 bugzilla.kernel.org 2014-09-25 12:04:56 UTC
I upgraded to 3.16.3 and the latest tools from git. This made no difference.
Comment 2 bugzilla.kernel.org 2014-09-26 07:39:40 UTC
Just an update. Last night josef worked on getting 'btrfs-image' working, so i'm able to provide the image. Which hopefully will allows debugging the original, issue - the corrupted leaf and 'btrfs checks' failure to fix it.

Anyway - this is josef his patch which fixed btrfs-image in my scenario.

diff --git a/disk-io.c b/disk-io.c
index 26a532e..34c0a97 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -201,7 +201,8 @@ int read_whole_eb(struct btrfs_fs_info *info, struct extent_buffer *eb, int mirr
 		read_len = bytes_left;
 		device = NULL;

-		if (!info->on_restoring) {
+		if (!info->on_restoring &&
+		    eb->start != BTRFS_SUPER_INFO_OFFSET) {
 			ret = btrfs_map_block(&info->mapping_tree, READ,
 					      eb->start + offset, &read_len, &multi,
 					      mirror, NULL);
Comment 3 bugzilla.kernel.org 2014-09-26 07:51:37 UTC
Lately I'm also receiving this message in the kernel log when I mount my corrupted btrfs fs:

Thu Sep 25 22:28:40 2014: Sep 25 22:28:40 nasbak kernel: [17266.940871] BTRFS: bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Comment 4 Josef Bacik 2014-09-29 20:56:27 UTC
I talked to you on IRC and such, update the bz at your convenience.
Comment 5 bugzilla.kernel.org 2014-10-12 17:47:47 UTC
After a lot of debugging we've finally been able to fix the corruption.

Relevant patch:
http://www.spinics.net/lists/linux-btrfs/msg38264.html

Note You need to log in before you can comment on or make changes to this bug.