Bug 215498 - btrfs fi defrag hangs on small files, 100% CPU thread
Summary: btrfs fi defrag hangs on small files, 100% CPU thread
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: BTRFS virtual assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-16 15:46 UTC by Anthony Ruhier
Modified: 2022-01-17 17:42 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.16.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
File blocking the defrag (1 bytes, application/octet-stream)
2022-01-16 15:48 UTC, Anthony Ruhier
Details

Description Anthony Ruhier 2022-01-16 15:46:09 UTC
Hi,
Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag -t128K` hangs on small files (~1 byte) and triggers what it seems to be a loop in the kernel. It results in one CPU thread running being used at 100%. I cannot kill the process, and rebooting is blocked by btrfs.

Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag on bigger files (I filter out files smaller than 3.9KB).

I had a conversation on #btrfs on IRC, so here's what we debugged:

I can replicate the issue by copying a file impacted by this bug, by using `cp --reflink=never`. I attached one of the impacted files to this bug, named README.md.

Someone told me that it could be a bug due to the inline extent. So we tried to check that.

filefrag shows that the file Readme.md is 1 inline extent. I tried to create a new file with random text, of 18 bytes (slightly bigger than the other file), that is also 1 inline extent. This file doesn't trigger the bug and has no issue to be defragmented.

I tried to mount my system with `max_inline=0`, created a copy of README.md. `filefrag` shows me that the new file is now 1 extent, not inline. This new file also triggers the bug, so it doesn't seem to be due to the inline extent.

Someone asked me to provide the output of a perf top when the defrag is stuck:

    28.70%  [kernel]          [k] generic_bin_search
    14.90%  [kernel]          [k] free_extent_buffer
    13.17%  [kernel]          [k] btrfs_search_slot
    12.63%  [kernel]          [k] btrfs_root_node
     8.33%  [kernel]          [k] btrfs_get_64
     3.88%  [kernel]          [k] __down_read_common.llvm
     3.00%  [kernel]          [k] up_read
     2.63%  [kernel]          [k] read_block_for_search
     2.40%  [kernel]          [k] read_extent_buffer
     1.38%  [kernel]          [k] memset_erms
     1.11%  [kernel]          [k] find_extent_buffer
     0.69%  [kernel]          [k] kmem_cache_free
     0.69%  [kernel]          [k] memcpy_erms
     0.57%  [kernel]          [k] kmem_cache_alloc
     0.45%  [kernel]          [k] radix_tree_lookup

I can reproduce the bug on 2 different machines, running 2 different linux distributions (Arch and Gentoo) with 2 different kernel configs.
This kernel is compiled with clang, the other with GCC.

Mount options:
Machine 1: rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2

When the error happens, no message is shown in dmesg.

Thanks
Comment 1 Anthony Ruhier 2022-01-16 15:48:01 UTC
Created attachment 300282 [details]
File blocking the defrag

Note You need to log in before you can comment on or make changes to this bug.