Bug 198807 - BTRFS SSD mode unable to mount or recover after roottree node corruption
Summary: BTRFS SSD mode unable to mount or recover after roottree node corruption
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-17 02:24 UTC by Shehbaz
Modified: 2018-02-17 02:35 UTC (History)
0 users

See Also:
Kernel Version: 4.15.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
workload (427.33 KB, text/plain)
2018-02-17 02:33 UTC, Shehbaz
Details
dm-track.c (9.68 KB, text/x-csrc)
2018-02-17 02:34 UTC, Shehbaz
Details
corruptoffset.c (1.66 KB, text/x-csrc)
2018-02-17 02:34 UTC, Shehbaz
Details
Makefile for dm-track (334 bytes, text/plain)
2018-02-17 02:35 UTC, Shehbaz
Details

Description Shehbaz 2018-02-17 02:24:14 UTC
Hi,

This is related to earlier bugs that I have filed, where corrupting a certain specific data structure in SSD mode BTRFS renders the filesystem unusable. Please see related bugs list for your reference.

I tried corrupting root-tree node of btrfs in SSD mode to check if mount operation would work on a corrupted root tree node. Unlike superblock data structure or chunk-tree data structure corruptions, which are recoverable using btrfs recover utility, I could not find any tool to recover root tree node, which is essential to mount the file system.

I tried running btrfsck, but it couldn't correct the corruption. For btrfs scrub, the prerequisite is that the file system be mounted before starting the scrub. but mount operation does not succeed after root tree node corruption so I am unable to run btrfs scrub.

Steps to replicate this issue:

1. create 2 disks, /dev/sdb and /dev/sdc of 1.5 GB each.
2. attach them to QEMU. make both disks SSD by doing

echo 0 > /sys/dev/block/<dev_id>/queue/rotational

where dev_id is the dev id for /dev/sdb or /dev/sdc as obtained using lsblk -l command. for my case, it is 8:16 for sdb and 8:32 for sdc.

3. create btrfs file system on /dev/sdc, mount it, run a workload on it (attached for your reference), unmount it.

sudo mkfs.btrfs -f /dev/sdc
sudo mount /dev/sdc /mnt
./cmd2000
umount -l /mnt

copy /dev/sdc to /dev/sdb using pv command (faster than dd).

sudo sh -c 'pv < /dev/sdc > /dev/sdb'

we are essentially keeping a backup disk (sdc), because I noticed that everytime I mount the filesystem, the roottree node location gets changed due to btrfs cow property. keeping backup in /dev/sdc helps get the block number of roottree deterministic.

4. we now note the sequence of blocks that are read from btrfs file system during mount operation. for this, use a device mapper (dm-track.c, Makefile attached) that will print both the block number and the node type that gets accessed.

create folder dm-track at your linux source code folder linux/fs/dm-track
mv dm-track.c Makefile linux/fs/dm-track
cd linux/fs/dm-track
make ( creates and links dm-track.ko to linux kernel )
sudo dmsetup create dmtarget --table "0 3145728 track /dev/sdb 0" --readahead none
#at this point some garbage gets printed in /var/log/syslog, please ignore
#however, next instruction (mount) is important, as this is where we get block numbers and block types:

sudo mount -t btrfs -o nospace_cache,nobarrier,flushoncommit,noinode_cache /dev/mapper/dmtarget /mnt

Note I am removing caches because it leads to too many extrenaous reads and makes readability of logs difficult.

5. If everything works so far, this should print a sequence of log statements, similar to the following output:

Feb 16 20:44:47 vm-guest kernel: [17417.344398] track_end_io():block = 17
Feb 16 20:44:47 vm-guest kernel: [17417.344401] XXX generation = 28229632, owner = 1064960:default blkNo=17 nritems= 0 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.345023] track_end_io():block = 17
Feb 16 20:44:47 vm-guest kernel: [17417.345025] XXX generation = 28229632, owner = 1064960:default blkNo=17 nritems= 0 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.346457] track_end_io():block = 17
Feb 16 20:44:47 vm-guest kernel: [17417.346460] XXX generation = 28229632, owner = 1064960:default blkNo=17 nritems= 0 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.346546] BTRFS info (device dm-0): disabling disk space caching
Feb 16 20:44:47 vm-guest kernel: [17417.346549] BTRFS info (device dm-0): turning off barriers
Feb 16 20:44:47 vm-guest kernel: [17417.346553] BTRFS info (device dm-0): turning on flush-on-commit
Feb 16 20:44:47 vm-guest kernel: [17417.346557] BTRFS info (device dm-0): has skinny extents
Feb 16 20:44:47 vm-guest kernel: [17417.352241] track_end_io():block = 264
Feb 16 20:44:47 vm-guest kernel: [17417.352244] XXX generation = 8, owner = 3:chunkTree blkNo=264 nritems= 7 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.352815] track_end_io():block = 6896
Feb 16 20:44:47 vm-guest kernel: [17417.352818] XXX generation = 10, owner = 1:rootTree blkNo=6896 nritems= 19 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.353399] track_end_io():block = 6900
Feb 16 20:44:47 vm-guest kernel: [17417.353402] XXX generation = 10, owner = 2:extentTree blkNo=6900 nritems= 7 level 1
Feb 16 20:44:47 vm-guest kernel: [17417.353979] track_end_io():block = 5456
Feb 16 20:44:47 vm-guest kernel: [17417.353982] XXX generation = 8, owner = 4:devTree blkNo=5456 nritems= 7 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.354499] track_end_io():block = 6916
Feb 16 20:44:47 vm-guest kernel: [17417.354509] XXX generation = 10, owner = 7:csumTree blkNo=6916 nritems= 14 level 1
Feb 16 20:44:47 vm-guest kernel: [17417.354991] track_end_io():block = 5392
Feb 16 20:44:47 vm-guest kernel: [17417.354992] XXX generation = 7, owner = 9:uuidTree blkNo=5392 nritems= 0 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.355389] track_end_io():block = 6908
Feb 16 20:44:47 vm-guest kernel: [17417.355391] XXX generation = 10, owner = 2:extentTree blkNo=6908 nritems= 208 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.355837] track_end_io():block = 6928
Feb 16 20:44:47 vm-guest kernel: [17417.355838] XXX generation = 10, owner = 2:extentTree blkNo=6928 nritems= 209 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.356116] track_end_io():block = 6924
Feb 16 20:44:47 vm-guest kernel: [17417.356118] XXX generation = 10, owner = 2:extentTree blkNo=6924 nritems= 124 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.356407] track_end_io():block = 6912
Feb 16 20:44:47 vm-guest kernel: [17417.356409] XXX generation = 10, owner = 2:extentTree blkNo=6912 nritems= 125 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.356763] track_end_io():block = 6904
Feb 16 20:44:47 vm-guest kernel: [17417.356765] XXX generation = 10, owner = 2:extentTree blkNo=6904 nritems= 259 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.357420] track_end_io():block = 1320
Feb 16 20:44:47 vm-guest kernel: [17417.357423] XXX generation = 4, owner = 18446744073709551607:default blkNo=1320 nritems= 2 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.357807] track_end_io():block = 6240
Feb 16 20:44:47 vm-guest kernel: [17417.357809] XXX generation = 9, owner = 5:fsTree blkNo=6240 nritems= 253 level 1
Feb 16 20:44:47 vm-guest kernel: [17417.358169] track_end_io():block = 6776
Feb 16 20:44:47 vm-guest kernel: [17417.358171] XXX generation = 9, owner = 5:fsTree blkNo=6776 nritems= 127 level 0
Feb 16 20:44:47 vm-guest kernel: [17417.358537] track_end_io():block = 6264
Feb 16 20:44:47 vm-guest kernel: [17417.358539] XXX generation = 9, owner = 5:fsTree blkNo=6264 nritems= 169 level 0


6. note the line printing the root tree log:

owner = 1:rootTree blkNo=6896 nritems= 19 level 0

from this, we can determine what the offset of root tree would be in our backup file system. we compute the offset as follows:
6896 * 4096 = 28246016

7. unmount the filesystem. delete dmtarget. make a copy from backup file system (which would have the same root-tree location as the first run). 
umount -l /mnt
sudo dmsetup remove dmtarget
sudo sh -c 'pv < /dev/sdc > /dev/sdb'

8. corrupt the rootTree block using corruptoffset.c program at the offset where root-tree exists (attached corruptoffset.c)
sudo ./corruptoffset /dev/sdb 28246016 4096

9. try mounting the file system /dev/sdb:

sudo mount -t btrfs -o nospace_cache,nobarrier,flushoncommit,noinode_cache /dev/sdb /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

10. btrfsck output
sudo btrfsck /dev/sdb 
checksum verify failed on 28246016 found D842F9FA wanted 58585858
checksum verify failed on 28246016 found D842F9FA wanted 58585858
bytenr mismatch, want=28246016, have=6365935209750747224
Couldn't setup extent tree
ERROR: cannot open file system

11. btrfs restore <no option to recover roottree node>
$ sudo btrfs restore -l /dev/sdb
checksum verify failed on 28246016 found D842F9FA wanted 58585858
checksum verify failed on 28246016 found D842F9FA wanted 58585858
bytenr mismatch, want=28246016, have=6365935209750747224
Couldn't setup extent tree
 tree key (EXTENT_TREE ROOT_ITEM 0) 28246016 level 1
 tree key (DEV_TREE ROOT_ITEM 0) 22331392 level 0
 tree key (FS_TREE ROOT_ITEM 0) 25542656 level 1
 tree key (CSUM_TREE ROOT_ITEM 0) 28311552 level 1
 tree key (UUID_TREE ROOT_ITEM 0) 22069248 level 0
 tree key (DATA_RELOC_TREE ROOT_ITEM 0) 5390336 level 0


12. btrfs scrub requires file system to be mounted


HDD mode:
========

In HDD mode, the corruption can be corrected from a backup copy of root-tree block, because metadata replication is enabled.

block number for root-tree in HDD mode - 10888

sudo ./corruptoffset /dev/sdb 44597248 4096 #block 10888 * 4096 = 44597248.
sudo mount /dev/sdb /mnt/

Feb 16 21:16:50 vm-guest kernel: [19340.189229] BTRFS info (device sdb): disk space caching is enabled
Feb 16 21:16:50 vm-guest kernel: [19340.189232] BTRFS info (device sdb): has skinny extents
Feb 16 21:16:50 vm-guest kernel: [19340.193918] BTRFS error (device sdb): bad tree block start 6365935209750747224 36208640
Feb 16 21:16:50 vm-guest kernel: [19340.194794] BTRFS info (device sdb): read error corrected: ino 0 off 36208640 (dev /dev/sdb sector 87104)
Feb 16 21:16:50 vm-guest kernel: [19340.195258] BTRFS info (device sdb): read error corrected: ino 0 off 36212736 (dev /dev/sdb sector 87112)
Feb 16 21:16:50 vm-guest kernel: [19340.195717] BTRFS info (device sdb): read error corrected: ino 0 off 36216832 (dev /dev/sdb sector 87120)
Feb 16 21:16:50 vm-guest kernel: [19340.196189] BTRFS info (device sdb): read error corrected: ino 0 off 36220928 (dev /dev/sdb sector 87128)

(36208640 / 512) * 4096 = 10888 block number which was corrected during read.
Comment 1 Shehbaz 2018-02-17 02:33:27 UTC
Created attachment 274209 [details]
workload
Comment 2 Shehbaz 2018-02-17 02:34:26 UTC
Created attachment 274211 [details]
dm-track.c
Comment 3 Shehbaz 2018-02-17 02:34:52 UTC
Created attachment 274213 [details]
corruptoffset.c
Comment 4 Shehbaz 2018-02-17 02:35:51 UTC
Created attachment 274215 [details]
Makefile for dm-track

please create dm-track folder at location 
your_linux_source_code/fs/dm-track

place dm-track.c and Makefile in this folder.

Note You need to log in before you can comment on or make changes to this bug.