Bug 114451 - btrfs no space on device on rebalance, huge spread between total vs. used metadata
Summary: btrfs no space on device on rebalance, huge spread between total vs. used met...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-12 19:55 UTC by Marc Haber
Modified: 2016-03-20 09:51 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.4.4
Tree: Mainline
Regression: No


Attachments
typescript of filesystem information and my balance script. (3.28 KB, application/octet-stream)
2016-03-12 19:55 UTC, Marc Haber
Details
syslog of a balance script run (87.59 KB, application/octet-stream)
2016-03-12 19:55 UTC, Marc Haber
Details

Description Marc Haber 2016-03-12 19:55:04 UTC
Created attachment 208721 [details]
typescript of filesystem information and my balance script.

Hi,

this bug report is the result of a discussion on linux-btrfs with the result that this might be a unique bug which leads to ENOSPC on a btrfs with hundreds of gigabytes free. The discussion starter is http://www.spinics.net/lists/linux-btrfs/msg52505.html

A btrfs-image -s (248 MB xz) of the filesystem in question can be downloaded from http://q.bofh.de/~mh/stuff/20160307-fanbtr-image.xz, and I intend to keep this online until at least the fall of 2016.

Here is the output of some diagnostic tools:

[7/504]mh@fan:~$ sudo btrfs fi usage /media/tempdisk/
[sudo] password for mh:
Overall:
    Device size:                 417.19GiB
    Device allocated:            133.06GiB
    Device unallocated:          284.12GiB
    Device missing:                  0.00B
    Used:                         82.93GiB
    Free (estimated):            284.80GiB      (min: 142.74GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:79.00GiB, Used:78.33GiB
   /dev/mapper/ofanbtr    79.00GiB

Metadata,DUP: Size:27.00GiB, Used:2.30GiB
   /dev/mapper/ofanbtr    54.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/mapper/ofanbtr    64.00MiB

Unallocated:
   /dev/mapper/ofanbtr   284.12GiB
[8/505]mh@fan:~$ sudo btrfs fi df /media/tempdisk/
Data, single: total=79.00GiB, used=78.33GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=27.00GiB, used=2.30GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[9/506]mh@fan:~$ sudo btrfs fi show /media/tempdisk/
Label: 'fanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
        Total devices 1 FS bytes used 80.63GiB
        devid    1 size 417.19GiB used 133.06GiB path /dev/mapper/ofanbtr

I am attaching an entire typescript of filesystem information collected before trying to balance, the script used to balance, the filesystem information after trying to balance, and the syslog obtained during those balance attempts.

The script I am using to balance was suggested on the linux-btrfs mailing list as well. It basically does:

btrfs balance start
btrfs balance start -dprofiles=single
btrfs balance start -mprofiles=dup
btrfs balance start --force -sprofiles=dup
btrfs balance start

This method was suggested on the mailing list since btrfs balance consists of these three steps.

In the attachment, one can see how the Metadata total space grows in each step:

[12/510]mh@fan:~$ grep -E '(Metadata|BEGIN|END)' 20160307-fan-btrfs-syslog 
Mar  7 20:32:06 fan root: BEGIN btrfs-balance script
Mar  7 20:32:06 fan root: Metadata, DUP: total=8.50GiB, used=2.31GiB
Mar  7 20:32:06 fan root:     Metadata ratio:#011#011      1.00
Mar  7 20:32:06 fan root: Metadata,single: Size:4.01GiB, Used:2.18GiB
Mar  7 20:32:06 fan root: BEGIN btrfs balance start /media/tempdisk
Mar  7 20:55:25 fan root: Metadata, DUP: total=11.00GiB, used=2.30GiB
Mar  7 20:55:25 fan root:     Metadata ratio:#011#011      1.00
Mar  7 20:55:25 fan root: Metadata,single: Size:4.01GiB, Used:2.18GiB
Mar  7 20:55:25 fan root: BEGIN btrfs balance start -dprofiles=single /media/tempdisk
Mar  7 21:22:06 fan root: Metadata, DUP: total=11.00GiB, used=2.31GiB
Mar  7 21:22:06 fan root:     Metadata ratio:#011#011      1.00
Mar  7 21:22:06 fan root: Metadata,single: Size:4.01GiB, Used:2.18GiB
Mar  7 21:22:06 fan root: BEGIN btrfs balance start -mprofiles=dup /media/tempdisk
Mar  7 21:22:37 fan root: Metadata, DUP: total=19.00GiB, used=2.30GiB
Mar  7 21:22:37 fan root:     Metadata ratio:#011#011      1.00
Mar  7 21:22:37 fan root: Metadata,single: Size:4.01GiB, Used:2.18GiB
Mar  7 21:22:37 fan root: BEGIN btrfs balance start --force -sprofiles=dup /media/tempdisk
Mar  7 21:22:37 fan root: Metadata, DUP: total=19.00GiB, used=2.30GiB
Mar  7 21:22:37 fan root:     Metadata ratio:#011#011      1.00
Mar  7 21:22:37 fan root: Metadata,single: Size:4.01GiB, Used:2.18GiB
Mar  7 21:22:37 fan root: BEGIN btrfs balance start /media/tempdisk
Mar  7 21:49:47 fan root: Metadata, DUP: total=27.00GiB, used=2.30GiB
Mar  7 21:49:47 fan root:     Metadata ratio:#011#011      1.00
Mar  7 21:49:47 fan root: Metadata,single: Size:4.01GiB, Used:2.18GiB
Mar  7 21:49:47 fan root: END btrfs-balance script
[13/511]mh@fan:~$ 

If there is some additional information I can give, please ask. I do intend to keep the file system around for some months, until I need the SSD space.

Workload: The btrfs holds the root filesystem of my main home workstation, running Debian unstable. It was created in September 2015 with btrfs-tools 4.1.2. I do daily snapshots of about ten subvolumes, and one snapshot of another subvolume (/home) every ten minutes. And the cleanup-script I wrote was broken for a few months, so a five digit number of snapshots has accumulated over the months until I cleaned them up. The filesystem now has about 500 snapshots left and is no longer in daily use. I copied the data over to a new btrfs and am now working with that one.

Is there anything I can help with?

Greetings
Marc
Comment 1 Marc Haber 2016-03-12 19:55:41 UTC
Created attachment 208731 [details]
syslog of a balance script run
Comment 2 Marc Haber 2016-03-13 12:19:21 UTC
I have the same issue now with a filesystem made a week ago with current btrfs-tools (4.4), so it cannot be blamed on the "old" btrfs-tools the othe btrfs was created with.

The new btrfs has never exceeded a thousand snapshots, and currently is at 150 snapshots, so it is also not the sheer number of snapshots the old btrfs has experienced.

Will a new syslog help?

Note You need to log in before you can comment on or make changes to this bug.