Created attachment 281395 [details] Proof of Concept [Kernel version] This bug can be reproduced on kernel 5.0.0-rc8. [Reproduce] * Use a VM, since our PoC simulates a crash by triggering a SysRq! 1. Download a base image (128 MB) $ wget https://gts3.org/~seulbae/fsimg/btrfs-10.image 2. Mount the image $ mkdir /tmp/btrfs $ sudo mount -o loop btrfs-10.image /tmp/btrfs 3. Compile and run PoC $ gcc poc.c -o poc $ sudo ./poc /tmp/btrfs (System reboots) [Check] 1. Re-mount the crashed image $ mkdir /tmp/btrfs $ sudo mount -o loop btrfs-10.image /tmp/btrfs 2. Check inconsistency $ stat /tmp/btrfs/yyy -> Size: 8000 [Description] In the base image, 2 directories and 7 files exist. 0: 0755 (mount_point) +--257: 0755 foo +--258: 0755 bar +--259: 0644 baz (12 bytes, offset: {}) +--259: 0644 hln (12 bytes, offset: {}) +--260: 0644 xattr (0 bytes, offset: {}) +--261: 0644 acl (0 bytes, offset: {}) +--262: 0644 æøå (4 bytes, offset: {}) +--263: 0644 fifo +--264: 0777 sln -> mnt/foo/bar/baz Below is the breakdown of the PoC: 1. Create a file "foo/bar/xxx”, (line 28) fd = syscall(SYS_open, "foo/bar/xxx”, O_CREAT | O_RDWR, 0666); 2. increase the size of “foo/bar/xxx” to 8000 through pwrite64, (Line 29) syscall(SYS_pwrite64, (long)fd, (long)buf, 4000, 4000) 3. flush the data, (line 30) syscall(SYS_fdatasync, fd); 4. truncate the file to 3000 bytes, (line 31) syscall(SYS_ftruncate, fd, 3000); 5. rename “foo/bar/xxx” to “yyy”, (line 32) syscall(SYS_rename, “foo/bar/xxx”, “yyy”); 6. flush the metadata of “yyy”, and (line 33) syscall(SYS_fsync, fd); 7. simulate a crash by rebooting right away without un-mounting. (line 35) system("echo b > /proc/sysrq-trigger"); As we run fsync on file “yyy”’s file descriptor after its size is truncated to 3000 bytes, we expect that the size attribute is successfully flushed to the disk, and when we re-mount the crashed image, we will see that “yyy”’s size is 3000. However, “yyy” still has its old size, 8000. [Further Analysis] I also tested several variations of the aforementioned test case to find the potential root cause. With any of the minor tweaks below, this bug does not happen. In other words, file “yyy” recovers to size 3000, as expected. 1) Creating file “xxx” not under “foo/bar/“, but directly under the mount point (“./xxx”). (line 28) 2) Removing line 30 (fdatasync). 3) Swapping line 32 (rename) with line 33 (fsync). 4) Removing line 32 (rename). Reported by Seulbae Kim (seulbae@gatech.edu) from SSLab, Gatech.
Fixed by https://patchwork.kernel.org/patch/10837829/ .