Bug 16508 - BTRFS ENOSPC at 39% use and resulting kernel warnings filling dmesg
Summary: BTRFS ENOSPC at 39% use and resulting kernel warnings filling dmesg
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 high
Assignee: fs_btrfs@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-04 02:20 UTC by devsk
Modified: 2013-04-30 15:59 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.3.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description devsk 2010-08-04 02:20:47 UTC
Steps to reproduce:

1. Boot into 2.6.35 vanilla kernel.
2. Create an loopback btrfs with 4GB disk file.

time dd if=/dev/zero of=btrfs-fs.img count=8000 bs=512k oflag=direct
mkfs.btrfs btrfs-fs.img 
mount -o loop,noatime btrfs-fs.img /mnt/floppy

3. Create a large file in it
cd /mnt/floppy
time dd if=/dev/zero of=tempfile count=7000 bs=512k oflag=direct 

4. Copy a large number of files in it.

cd /mnt/floppy
time cp -a /usr/portage/[a-z]* .
cp: writing `./local/ccache/x64/9/3/591f3f514cd2144eb42e3498afd146-3170951': No space left on device
cp: writing `./local/ccache/x64/9/3/d0ad1d270e933bcf1874a61dfa476d-3505981': No space left on device
...

5. See usage

# df .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/loop0             4096000   3880592    215408  95% /mnt/floppy

# btrfs filesystem df .
Data: total=3.49GB, used=3.48GB
Metadata: total=208.00MB, used=112.05MB
System: total=12.00MB, used=4.00KB

# find .|wc -l
59739

Total is 3.49GB instead of 4GB. It failed before it could hit 3.49GB anyway. If you take 1KB=1024 into account and add up metadata total, about 220MB space is still missing.
Anyway, let's try to break this very fragile FS.

6. Remove the large 3.5GB file

# \rm tempfile 

7. Copy small files again.

# time cp -a /usr/portage/local/ .
cp: writing `./local/ccache/x32/5/9/1c179b03251ae69c511aa5f50a988a-282393': No space left on device
cp: cannot create regular file `./local/ccache/x32/5/9/75785af807da1c7dabc90d4643c43b-144796': No space left on device
cp: cannot create regular file `./local/ccache/x32/5/9/30bd5db7b5a0fa5cefe6514fda23d5-1642583': No space left on device
...

8. See usage

# df .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/loop0             4096000   1568312   2527688  39% /mnt/floppy

# btrfs filesystem df .
Data: total=3.49GB, used=1.12GB
Metadata: total=208.00MB, used=191.92MB
System: total=12.00MB, used=4.00KB

# df -i .
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/loop0                 0       0       0    -  /mnt/floppy

# cp /usr/lib/libc.a .
cp: cannot create regular file `./libc.a': No space left on device

# find .|wc -l
132860

9. Remove files at this time

# \rm -rf dev-cpp

This hangs, soaking up 2 full CPUs. Had to ctrl-C out of it. Fills up (very large) dmesg buffer with:

[43545.427000] ------------[ cut here ]------------
[43545.427000] WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x10c/0x13f()
[43545.427000] Hardware name: OEM
[43545.427000] Modules linked in: f71882fg pvrusb2 dvb_core cx2341x tveeprom vboxnetadp vboxnetflt vboxdrv snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss tuner_simple tuner_types uvcvideo usb_storage usblp ohci_hcd tun snd_intel8x0 snd_ac97_codec ac97_bus acp
i_cpufreq mperf cifs fuse coretemp tda9887 tda8290 wm8775 tuner cx25840 v4l2_common videodev v4l1_compat v4l2_compat_ioctl32 snd_hda_codec_realtek
 snd_hda_intel snd_hda_codec snd_pcm ohci1394 nvidia(P) snd_timer i2c_i801 ieee1394 r8169 pcspkr snd snd_page_alloc pata_jmicron ehci_hcd uhci_hcd
 evdev [last unloaded: f71882fg]
[43545.427000] Pid: 19024, comm: rm Tainted: P        W   2.6.35 #1
[43545.427000] Call Trace:
[43545.427000]  [<ffffffff81034994>] ? warn_slowpath_common+0x78/0x8c
[43545.427000]  [<ffffffff8116836e>] ? btrfs_block_rsv_check+0x10c/0x13f
[43545.427000]  [<ffffffff81176ae7>] ? __btrfs_end_transaction+0x9f/0x1a8
[43545.427000]  [<ffffffff8117c8b7>] ? btrfs_delete_inode+0x169/0x184
[43545.427000]  [<ffffffff810ac7e3>] ? generic_delete_inode+0x86/0x104
[43545.427000]  [<ffffffff810a513f>] ? do_unlinkat+0xe6/0x13e
[43545.427000]  [<ffffffff810a74cd>] ? vfs_readdir+0x86/0x9c
[43545.427000]  [<ffffffff81099f78>] ? filp_close+0x5f/0x6a
[43545.427000]  [<ffffffff81001f2b>] ? system_call_fastpath+0x16/0x1b
[43545.427000] ---[ end trace be14a72aca4c094f ]---
[43545.427000] block_rsv size 33554432 reserved 8613888 freed 0 0
[43545.428000] ------------[ cut here ]------------ 

It does not delete any files. Still can't create a small file.

# df -i .
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/loop0                 0       0       0    -  /mnt/floppy

# cp /usr/lib/libc.a .
cp: cannot create regular file `./libc.a': No space left on device

I have a lzma compressed image file if you want it but its so darn fragile and repeatable, you wouldn't need that. I have no idea how and why it was not even tested for basic stress like this one.
Comment 1 devsk 2010-08-06 18:01:56 UTC
The ENOSPC was reproduced easily following the steps by another user. The hang he couldn't.

Please refer: http://forums.gentoo.org/viewtopic.php?p=6375043#6375043 for more details.
Comment 2 devsk 2010-11-06 16:01:06 UTC
Can someone please tell me how a filesystem which breaks so easily and brings the system down with it, is even in the kernel? And this bug has been open for months and no comments from devs.

This condition persists with 2.6.36/2.6.37 code. If a filesystem crashes with just copy and delete operations, how is that not a showstopper bug for it? How come nobody is working on this? Is there any filesystem that BTRFS devs know of which can be brought to its knees, along with the whole system, like this?
Comment 3 Konstantin Weitz 2011-01-15 00:56:12 UTC
Ever wondered why all posts to fs_btrfs@kernel-bugs.osdl.org are in the status NEW?

Q. Why are the subsystem maintainers in kernel tracker sometimes different than the person listed in the MAINTAINER file?

A. The subsystem maintainers in kernel tracker are volunteers to help track bugs in an area they are interested in. Sometimes they are the same person as on kernel.org sometimes they are not. There are still some categories with no maintainers so more volunteers are needed. 

Maybe, other btrfs mailing lists are more appreciative of bug reports.
Comment 4 IAN DELANEY 2011-02-28 17:07:24 UTC
I verified these findings the the poster of this last year at kernel 2.6.35 vanilla.  I have tried btrfs a little and seeing months have gone by, pepeated the test on the current kerenl
gentoo64 portage-btrfs # uname -r
2.6.37-gentoo-amd64

I initially suspected an improvement.  Initially I re-ran the test but made a bigger volume file.  For what it's worth, I decided to repeat the test line for line, as a direct legitimate test comparison.

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
gentoo64 Documents # cd /var/tmp/portage-btrfs
gentoo64 portage-btrfs # time dd if=/dev/zero of=btrfs-fs.img count=8000 bs=512k oflag=direct 
8000+0 records in
8000+0 records out
4194304000 bytes (4.2 GB) copied, 79.3219 s, 52.9 MB/s

real    1m20.558s
user    0m0.004s
sys     0m1.901s
gentoo64 portage-btrfs # mkfs.btrfs btrfs-fs.img

WARNING! - Btrfs v0.19-35-g1b444cd-dirty IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on btrfs-fs.img
        nodesize 4096 leafsize 4096 sectorsize 4096 size 3.91GB
Btrfs v0.19-35-g1b444cd-dirty

gentoo64 portage-btrfs # cd  /mnt/btrfs
gentoo64 btrfs # time dd if=/dev/zero of=tempfile count=7000 bs=512k oflag=direct && df -h .
7000+0 records in
7000+0 records out
3670016000 bytes (3.7 GB) copied, 80.4149 s, 45.6 MB/s

real    1m20.425s
user    0m0.018s
sys     0m1.473s
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            4.0G  3.5G   72M  98% /mnt/btrfs


I shall go no further.  Those months ago I went further.
Those months ago, I got 

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/loop0                3595616    500384  88% /mnt/ftp 

in response to the df .
Now two kernels later, btrfs reports 98% full.

The image file made is reported on its creation as 4.3 GB
Once mounted in the volume file, btrfs reports its size as 4096000.
On making a file 3.7 GB, it reports the volume file is 98% full.

No need to go further.  It has lost far far more than two kernels ago.
Never mind out btrees and sub-volumes.  The btrfs cannot read and report the correct size of files of its own making.
This is fundamental.
This is why you lost the poster of this bug as a follower of btrfs.
This is why btrfs must be considered experimental.
This still has not been addressed by a dev since its posting almost a year ago.
Comment 5 IAN DELANEY 2011-03-03 04:59:51 UTC
Well went further

gentoo64 btrfs # uname -r
2.6.38-rc7-amd64

gentoo64 btrfs # time dd if=/dev/zero of=tempfile count=7000 bs=512k oflag=direct 
7000+0 records in
7000+0 records out
3670016000 bytes (3.7 GB) copied, 81.1489 s, 45.2 MB/s

real    1m21.160s
user    0m0.012s
sys     0m1.477s
gentoo64 btrfs # df .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/loop0             4096000   3593864     73728  98% /mnt/btrfs

gentoo64 btrfs # time cp -a /mnt/genny/usr/portage/[a-z]* .
^X^Z
[1]+  Stopped                 cp -a /mnt/genny/usr/portage/[a-z]* .

real    16m14.464s
user    0m0.000s
sys     0m0.001s


ls reveals folders copied from first portage folder to metadata folder.
It seems btrfs has changed its spots.

Expected outcome was as in the initial post, no more space left on device.
This time it just hung.  No escape with error output. Just hung.

Removed tempfile.  Resumed copying of portage files.  It hung.

Changed the spots.
Comment 6 Christian Kujau 2012-03-24 05:23:04 UTC
The -ENOSPC thingy is still an issue. I've filled on directory with lots of zero-byte files:

$ ls | wc -l
4962285

$ touch foo
touch: cannot touch `foo': No space left on device


But there should still be space on the filesystem:

$ df -k .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sdb        10485760 8478852   1718236  84% /mnt/disk

$ df -ki .
Filesystem     Inodes IUsed IFree IUse% Mounted on
/dev/sdb            0     0     0     - /mnt/disk

$ btrfs filesystem df .
Data: total=5.01GB, used=3.22GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=2.48GB, used=2.36GB
Metadata: total=8.00MB, used=0.00

$ uname -r
3.3.0-rc7

There's nothing in the kernel logs though. I too remember btrfs having issues with ENOSPC a long (!) time ago, it's sad that this has not been fixed yet (not that I could contribute something...)
Comment 7 Josef Bacik 2013-04-30 15:59:59 UTC
Closing, if this is still affecting you on a newer kernel please reopen.

Note You need to log in before you can comment on or make changes to this bug.