Bug 209039 - xfs_fsr skips most of the files as no improvement will be made
Summary: xfs_fsr skips most of the files as no improvement will be made
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: FileSystem/XFS Default Virtual Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-26 07:40 UTC by mgutt
Modified: 2020-09-01 07:50 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.19.107-Unraid
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description mgutt 2020-08-26 07:40:57 UTC
I checked the fragmentation factor of disk1 as follows:

    xfs_db -c frag -r /dev/md1
    actual 1718, ideal 674, fragmentation factor 60.77%
    Note, this number is largely meaningless.
    Files on this filesystem average 2.55 extents per file

I tried to defrag disk1:

    xfs_fsr /dev/md1 -v -d
    /mnt/disk1 start inode=0
    ino=133
    ino=133 extents=4 can_save=3 tmp=/mnt/disk1/.fsr/ag0/tmp23917
    DEBUG: fsize=30364684107 blsz_dio=16773120 d_min=512 d_max=2147483136 pgsz=4096
    Temporary file has 4 extents (4 in original)
    No improvement will be made (skipping): ino=133
    ino=135
    ino=135 extents=4 can_save=3 tmp=/mnt/disk1/.fsr/ag1/tmp23917
    orig forkoff 288, temp forkoff 0
    orig forkoff 288, temp forkoff 296
    orig forkoff 288, temp forkoff 296
    orig forkoff 288, temp forkoff 296
    orig forkoff 288, temp forkoff 296
    orig forkoff 288, temp forkoff 296
    orig forkoff 288, temp forkoff 296
    orig forkoff 288, temp forkoff 288
    set temp attr
    DEBUG: fsize=28400884827 blsz_dio=16773120 d_min=512 d_max=2147483136 pgsz=4096
    Temporary file has 4 extents (4 in original)
    No improvement will be made (skipping): ino=135
    ino=138
    ...

This means the file would still consist of 4 parts across the hdd platter after defragmentation and because of that it's skipped. But why isn't it able to merge the parts of this and hundreds of other files?

More details about inode 133:

    xfs_db -r /dev/md1 -c "inode 133" -c "bmap -d"
    data offset 0 startblock 1314074773 (4/240332949) count 2097151 flag 0
    data offset 2097151 startblock 1316171924 (4/242430100) count 2097151 flag 0
    data offset 4194302 startblock 1318269075 (4/244527251) count 2097151 flag 0
    data offset 6291453 startblock 1320366226 (4/246624402) count 1121800 flag 0
Comment 1 bfoster 2020-08-26 10:33:46 UTC
On Wed, Aug 26, 2020 at 07:40:57AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=209039
> 
>             Bug ID: 209039
>            Summary: xfs_fsr skips most of the files as no improvement will
>                     be made
>            Product: File System
>            Version: 2.5
>     Kernel Version: 4.19.107-Unraid
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@kernel-bugs.kernel.org
>           Reporter: marc@gutt.it
>         Regression: No
> 
> I checked the fragmentation factor of disk1 as follows:
> 
>     xfs_db -c frag -r /dev/md1
>     actual 1718, ideal 674, fragmentation factor 60.77%
>     Note, this number is largely meaningless.
>     Files on this filesystem average 2.55 extents per file
> 

Without knowing the details of your fs, it sounds like it's not very
fragmented from the cursory numbers.

> I tried to defrag disk1:
> 
>     xfs_fsr /dev/md1 -v -d
>     /mnt/disk1 start inode=0
>     ino=133
>     ino=133 extents=4 can_save=3 tmp=/mnt/disk1/.fsr/ag0/tmp23917
>     DEBUG: fsize=30364684107 blsz_dio=16773120 d_min=512 d_max=2147483136
> pgsz=4096
>     Temporary file has 4 extents (4 in original)
>     No improvement will be made (skipping): ino=133
>     ino=135
>     ino=135 extents=4 can_save=3 tmp=/mnt/disk1/.fsr/ag1/tmp23917
>     orig forkoff 288, temp forkoff 0
>     orig forkoff 288, temp forkoff 296
>     orig forkoff 288, temp forkoff 296
>     orig forkoff 288, temp forkoff 296
>     orig forkoff 288, temp forkoff 296
>     orig forkoff 288, temp forkoff 296
>     orig forkoff 288, temp forkoff 296
>     orig forkoff 288, temp forkoff 288
>     set temp attr
>     DEBUG: fsize=28400884827 blsz_dio=16773120 d_min=512 d_max=2147483136
> pgsz=4096
>     Temporary file has 4 extents (4 in original)
>     No improvement will be made (skipping): ino=135
>     ino=138
>     ...
> 
> This means the file would still consist of 4 parts across the hdd platter
> after
> defragmentation and because of that it's skipped. But why isn't it able to
> merge the parts of this and hundreds of other files?
> 

Note that fsr is not guaranteed to do anything. It simply attempts to
reallocate a file and if the new file has better contiguity than the
original, the old is swapped out for the new. The effectiveness depends
on how fragmented the original file is, how much contiguous free space
is available to create the new one, etc. It's usually not worth playing
with fsr unless you observe some measurable performance impact of
fragmentation (as opposed to just reading the fragmentation numbers,
which can be misleading). Is that the case here?

> More details about inode 133:
> 
>     xfs_db -r /dev/md1 -c "inode 133" -c "bmap -d"
>     data offset 0 startblock 1314074773 (4/240332949) count 2097151 flag 0
>     data offset 2097151 startblock 1316171924 (4/242430100) count 2097151
>     flag
> 0
>     data offset 4194302 startblock 1318269075 (4/244527251) count 2097151
>     flag
> 0
>     data offset 6291453 startblock 1320366226 (4/246624402) count 1121800
>     flag
> 0

In this case, it looks like you already have maximum sized (~8GB)
extents for the first three. The extent map for this file is as
efficient as it can possibly be on XFS.

Brian

> 
> -- 
> You are receiving this mail because:
> You are watching the assignee of the bug.
>
Comment 2 mgutt 2020-08-26 11:04:37 UTC
Ok. This means as my filesystem has a blocksize (bsize) of 4 KiB (4096 bytes):

     xfs_info /dev/md1
     meta-data=/dev/md1               isize=512    agcount=11, agsize=268435455 blks
              =                       sectsz=512   attr=2, projid32bit=1
              =                       crc=1        finobt=1, sparse=1, rmapbt=0
              =                       reflink=1
     data     =                       bsize=4096   blocks=2929721331, imaxpct=5
              =                       sunit=0      swidth=0 blks
     naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
     log      =internal log           bsize=4096   blocks=521728, version=2
              =                       sectsz=512   sunit=0 blks, lazy-count=1
     realtime =none                   extsz=4096   blocks=0, rtextents=0

each extend can't be bigger than 8GB as mentioned in the docs:
https://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/Data_Extents.html
>If a file is zero bytes long, it will have no extents, di_nblocks and
>di_nexents will be zero. Any file with data will have at least one extent, and
>each extent can use from 1 to over 2 million blocks (221) on the filesystem.
>For a default 4KB block size filesystem, a single extent can be up to 8GB in
>length.

Good to know. Maybe this information should be part of xfs_fsr output? At the moment it creates the impression with "ideal 674" and "can_save=3" that it could be more defragmentated.
Comment 3 Eric Sandeen 2020-08-26 16:27:43 UTC
If you'd like to send a patch, go ahead; in general verbose+debugging messages are going to be less user-friendly than normal rogram output; they are inherently more for developer eyes vs. user eyes.

This might also get into messiness about how we assess four separate but contiguous maximally-sized extents; is that 4 "extents" or one?  I'd have to dig into the xfs_fsr reporting to see how it's counting, but I'm not really sure it's all worth the effort - fsr /is/ doing the right thing here.

In the end, the default verbose output is completely correct:  "No improvement will be made" - and it probably doesn't need any update.
Comment 4 mgutt 2020-09-01 07:50:05 UTC
Finally it should be "No improvement possible", but you are right, finally its not worth. But I will try to send a patch for the man-pages project so the manual of the command contains more explanations.

Note You need to log in before you can comment on or make changes to this bug.