Bug 198463 - Corrupting single byte of fs tree node on SSD device makes all directories within the fs tree node inaccessible
Summary: Corrupting single byte of fs tree node on SSD device makes all directories wi...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-12 17:38 UTC by Shehbaz
Modified: 2018-01-13 06:43 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.4.90
Subsystem:
Regression: No
Bisected commit-id:


Attachments
workload for 400 randomly generated files and directories (85.01 KB, text/plain)
2018-01-12 17:38 UTC, Shehbaz
Details
corruptoffset program (1.66 KB, text/x-csrc)
2018-01-12 17:39 UTC, Shehbaz
Details
ls cmd, cd cmd and btrfsck output (45.02 KB, text/x-matlab)
2018-01-12 17:47 UTC, Shehbaz
Details
btrfs scrub log (4.37 KB, text/x-matlab)
2018-01-13 06:33 UTC, Shehbaz
Details

Description Shehbaz 2018-01-12 17:38:48 UTC
Created attachment 273567 [details]
workload for 400 randomly generated files and directories

Hi,

I am studying the effects of SSD corruption on the BTRFS file system. My machine configuration looks as follows:

Linux vm-Standard-PC-i440FX-PIIX-1996 4.4.90+ #3 SMP Fri Nov 17 06:55:48 EST 2017 x86_64 x86_64 x86_64 GNU/Linux

I create a workload with 400 files and directories with variable file sizes (4K - 400K) and variable directory names (1-100 bytes in length). Please find the workload attached for your reference.

I then create a 1.5GB file system using btrfs using 

mkfs.btrfs, part of btrfs-progs v4.14

Once the filesystem is created, I mount the filesystem on /mnt and then run the workload on the filesystem. Once the workload is run, I unmount the filesystem and create a btrfs-debug-tree out of the file system. 

From the btrfs-debug-tree, I find the on-disk offset / start byte of fs-tree node of the filesystem. I have added the relevant snippet of btrfs-debug-tree here for your reference.

leaf 22478848 items 143 free space 3615 generation 9 owner 5
leaf 22478848 flags 0x1(WRITTEN) backref revision 1
fs uuid 54953d07-6562-417c-8c68-bd56ae730fe3
chunk uuid 4328e942-cafa-46c3-bc6f-7470dd47edf6

I then corrupt a single byte at a random offset of the filesystem. (in my case, I corrupted the filesystem using a simple program corruptoffset.c attached for your refernce).

the corruption program can be run as

./corruptoffset /dev/sdb 22479048 1

where 22479048 is the 200th offset of the leaf node located at 22478848. (22478848 + 200 = 22479048).

Once the device is corrupted, I remount the file system, and run an ls command on the filesystem. I find that about 143 of the 400 files and directories give me an input output error. Ideally only the directory that got corrupted should give me an error. Although ls ends up printing the entire 400 files and directories along with the I/O error on 143 of them, when I individually cd into the directories that gave I/O error on listing, it again gives me an I/O error.

I also run btrfsck on the filesystem. It is able to detect the errors but it is unable to correct any of the cksum mismatches that get detected. I have attached the log for your reference.

Since there is data loss of uncorrupted data (corruption happend for just one key or 1 block pointer insted of all keys or block pointers in the leaf node), this should be treated as a bug.
Comment 1 Shehbaz 2018-01-12 17:39:40 UTC
Created attachment 273569 [details]
corruptoffset program
Comment 2 Shehbaz 2018-01-12 17:47:20 UTC
Created attachment 273571 [details]
ls cmd, cd cmd and btrfsck output

command output for ls, cd, and btrfsck when run on a filesystem on SSD device with 1 byte corrupted.
Comment 3 lakshmipathi 2018-01-13 01:49:40 UTC
>I also run btrfsck on the filesystem. It is able to detect the errors but it
>is unable to correct any of the cksum mismatches that get detected. I have
>attached the log for your reference.

thanks for scripts and logs. Have you tried running 'scrub'? If not, can you give it try? https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub
Comment 4 Shehbaz 2018-01-13 06:33:28 UTC
Created attachment 273579 [details]
btrfs scrub log

The attached file contains results for btrfs scrub output
Comment 5 Shehbaz 2018-01-13 06:43:08 UTC
Thank you for your quick reply!

Unfortunately, btrfs scrub is also not able to correct the error. from the code comment:

 * In case a bad checksum
 * is found or the extent cannot be read, good data will be written back if
 * any can be found.

btrfs scrub would only be able to recover from error if there are duplicate copies of metadata. However, when mkfs.btrfs is done on SSD, no duplicate copies are created. Hence, Once cannot correct corrupted metadata. Here is what the btrfs file system looks like:

sudo btrfs fi df /mnt
Data, single: total=328.00MiB, used=42.00MiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=520.00MiB, used=976.00KiB
GlobalReserve, single: total=16.00MiB, used=0.00B

note the "single" redundancy level for Metadata. Hence recovery using scrub tool may not be possible.

Note You need to log in before you can comment on or make changes to this bug.