Bug 15420
Summary: | EXT4_USE_FOR_EXT23 causes wrong free space calculation on ext2 and ext3 | ||
---|---|---|---|
Product: | File System | Reporter: | Thomas Bächler (thomas) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, bluewind, jack, sandeen, tytso |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.33 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
.config for my personal kernel
Possible patch from Jan Kara Fix logic in ext4_indirect_calc_metadata_amount |
Description
Thomas Bächler
2010-03-01 18:18:24 UTC
Created attachment 25293 [details]
.config for my personal kernel
Ping? Anyone actually reading this? Yes, but we are reading a lot of things ;) Is it du, or df, or both, which shows the "wrong" amount until sync? It is definitely df. I didn't test du, and have no test case right now (and the other guys who had the problem are not reachable right now). I'll make another test with du as soon as I get to it. This sounds like the block reservation code for delayed allocation is over-aggressive about estimating how many blocks are need for the indirect blocks in the case of indirect-mapped inodes (which would be what you get when mounting an ext2 filesystem using the ext4 file system driver). This should be an aesthetic issue only, but it would be nice to have this be fixed. > http://bugzilla.kernel.org/show_bug.cgi?id=15420
>
>
> Theodore Tso <tytso@mit.edu> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |tytso@mit.edu
>
>
>
>
> --- Comment #5 from Theodore Tso <tytso@mit.edu> 2010-03-10 03:12:30 ---
> This sounds like the block reservation code for delayed allocation is
> over-aggressive about estimating how many blocks are need for the indirect
> blocks in the case of indirect-mapped inodes (which would be what you get
> when
> mounting an ext2 filesystem using the ext4 file system driver).
>
> This should be an aesthetic issue only, but it would be nice to have this be
> fixed.
Since bugzilla seems to be down now, I'll reply by email:
Won't it make sence to use 'nodelalloc' by default when ext4 driver is used
for ext2 or ext3 filesystem? Because delayed allocation has some implications
people need not expect - like the overestimation of needed blocks, or changed
semantics of ordered data mode etc...
So something like the patch below?
Honza
(In reply to comment #5) > This sounds like the block reservation code for delayed allocation is > over-aggressive about estimating how many blocks are need for the indirect > blocks in the case of indirect-mapped inodes (which would be what you get > when > mounting an ext2 filesystem using the ext4 file system driver). > > This should be an aesthetic issue only, but it would be nice to have this be > fixed. It is not purely aesthetic, lots of applications abort their operation if space is supposedly running low. However, as this doesn't seem to be high priority for anyone here, we should probably consider disabling EXT4_USE_FOR_EXT23 in our distribution kernel for the time being. (In reply to comment #6) > Won't it make sence to use 'nodelalloc' by default when ext4 driver is used > for ext2 or ext3 filesystem? Because delayed allocation has some implications > people need not expect - like the overestimation of needed blocks, or changed > semantics of ordered data mode etc... Don't see a patch here. What applications do you actually call statfs() and check the amount of space that you have? I'm not aware of any. Does the problem go away if you add the mount option "nodelalloc"? If it does, that's Jan's patch is probably the right workaround. It looks like Bugzilla stripped it off (it's in the e-mail version of his reply). I'll add it to this bug, and you can try it out as well. Created attachment 25448 [details]
Possible patch from Jan Kara
I don't know the exact application that was used, but it one guy said he was unable to work because many applications would constantly complain that there was not enough space and thus abort. I'll ask again. I'll test this later and report back. If delalloc doesn't work with indirect files on ext3 filesystems, then we need to stop telling people they can migrate ext3 to ext4 simply by turning on extents etc, and mounting as ext4. Well, let's figure out what actually broke, and why. The vast majority of files are not appended to after they are first written, I suspect the migration story still works. And this issue with over-aggressive estimation for space needed for indirect blocks is probably something recent in 2.6.33, and I think we can fix it --- and even if it isn't, I'd really like to know what applications are checking statfs and aborting as a result, and under what situations this is happening. OK, I can confirm that the problem is with delayed allocation and indirect files. While downloading a 70 meg kernel source tar ball, the amount of space shown as being used by df goes as high as 1 gigabyte (50% of the space on my disk, about 25% of my available free memory) before dropping back down. as the blocks get allocated. Mounting with nodelalloc makes the problem go away. Using extents, the df result will occasionally be as much as 1-2 megs but it is much smaller in practice. Using nodelalloc for ext2/3 mounts may make sense not just as a workaround, but to help protect users using crappy desktop applications that don't know how to use fsync(). (They'll still get screwed by btrfs, but the lazy application writers and Phoronix they can complain to the btrfs developers about how they are incompetent, and it's no longer my problem at that point. :-) We should try to improve the estimation logic for indirect blocks (I remember being in a hurry to write the code to work around the regression caused by the quota bugfix that has caused us so much heartache), but I don't think this is a problem for people who are migrating from ext3 to ext4, since it's rare that they will be extending an already existing indirect-mapped file. The problem with EXT4_USE_FOR_EXT23 is that they are _not_ migrating, and we're seeing a weakness in delalloc with indirect block files; a weakness that was only introduced recently, and one that I hope we can fix. But for other reasons mentioned above, we probably want to turn on nodelalloc anyway... Indeed, the logic in ext4_indirect_calc_metadata_amount seems to be broken. Attached patch fixes it for me... Created attachment 25477 [details]
Fix logic in ext4_indirect_calc_metadata_amount
As for comment #13 - I fully agree with Ted. When admin does "mount -t ext3 ...", I think we should give him as close behavior to ext3 as possible by the principle of the least surprise - and that means nodelalloc. If user wishes better performance etc, he can always mount the filesystem as ext4 explicitely. Also I remember some discussions with KDE guys when they were at some point considering detecting filesystem type and avoid doing fsync if it was ext3 - hopefully, I convinced them not to do it but I bet someone else had the same idea and actually implemented it. Not that I'd have too much pity with broken applications but I prefer to avoid breaking them silently like this... (In reply to comment #16) > As for comment #13 - I fully agree with Ted. When admin does "mount -t ext3 > ...", I think we should give him as close behavior to ext3 as possible by the > principle of the least surprise - and that means nodelalloc. If user wishes > better performance etc, he can always mount the filesystem as ext4 > explicitely. Under what circumstances, then, do we recommend mounting ext3 with the ext4 driver? The user will get different allocation behavior as well; will that also be a surprise? I just want to be clear in our recommendations about this stuff... > Also I remember some discussions with KDE guys when they were at some point > considering detecting filesystem type and avoid doing fsync if it was ext3 - > hopefully, I convinced them not to do it but I bet someone else had the same > idea and actually implemented it. Not that I'd have too much pity with broken > applications but I prefer to avoid breaking them silently like this... I think by having this config option at -all- we have really muddied the waters, to be honest ... but I suppose this is all better hashed outon the list, not in this bug. :) On Thu, Mar 11, 2010 at 04:00:27PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > Indeed, the logic in ext4_indirect_calc_metadata_amount seems to be broken. > Attached patch fixes it for me... Nice catch. Yeah, I really screwed that up! I'll get that into the ext4 patch queue, quickly, and get that pushed to Linus. - Ted |