Bug 103111 - auto_da_alloc mount option not working
Summary: auto_da_alloc mount option not working
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-19 13:34 UTC by Rakesh
Modified: 2015-08-29 19:33 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.39
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Sample code (234 bytes, text/x-c++src)
2015-08-20 05:54 UTC, Rakesh
Details

Description Rakesh 2015-08-19 13:34:48 UTC
As per the ext4 guide,
ext4 will detect the replace-via-rename and replace-via-truncate patterns
              and force that any delayed allocation blocks are allocated such that at the next journal commit, in  the
              default data=ordered mode, the data blocks of the new file are forced to disk before the rename() opera‐
              tion is committed.


But it looks like this feature is not working anymore.

Kernel version: 2.6.39.
Filesystem: ext4

Here is the sample code:

  ofstream myfile;

  myfile.open ("example.txt",std::ofstream::trunc);

  myfile << "Writing this to a file.\n";

  system("mv example.txt example.txt1");

Expected behaviour: Ext4 should detect trunc() call and should allocate blocks for same. So there should be no zero-length file after abnormal shutdown(withing 30sec).

Actual results: File is having zero-length after abnormal reboot(power outage).
Comment 1 Eric Sandeen 2015-08-19 16:13:38 UTC
2.6.39 is 4 years old at this point.  Have you tested upstream?

> Actual results: File is having zero-length after abnormal reboot(power
> outage).

How long after your sample code runs do you cut the power?

Which "file?"  Do you see "example.txt" or "example.txt1" post-reboot?
Comment 2 Rakesh 2015-08-20 05:48:19 UTC
Same behaviour on kernel 3.14.27 also.

Power cut for 15sec, 20sec and 25sec. All 3times, files is having zero length.

For above code file "example.txt1" is empty.

If I rename command(system(mv example.txt example.txt1)) then also example.txt is blank.

ofstream myfile;

  myfile.open ("example.txt",std::ofstream::trunc);

  myfile << "Writing this to a file.\n";
Comment 3 Rakesh 2015-08-20 05:53:20 UTC
Same behaviour on kernel 3.14.27 also.

Power cut for 15sec, 20sec and 25sec. All 3times, files is having zero length.

For above code file "example.txt1" is empty.

If I remove rename command(system(mv example.txt example.txt1)) then also example.txt is blank.

  ofstream myfile;

  myfile.open ("example.txt",std::ofstream::trunc);

  myfile << "Writing this to a file.\n";

Looks like rename via trunc call is not wroking.

Uploaded sample file.
Comment 4 Rakesh 2015-08-20 05:54:04 UTC
Created attachment 185311 [details]
Sample code
Comment 5 Rakesh 2015-08-21 14:56:55 UTC
Changing priority as per data loss issue.

Can anyone is working on this? Any contact person to discuss this?
Comment 6 Eric Sandeen 2015-08-21 14:59:51 UTC
I'll look at it when I have time...

If you want to be sure to avoid data loss, you should use data integrity syscalls (fsync & friends): http://lwn.net/Articles/457667/

The auto-alloc heuristics are just that; if you want guarantees, call fsync.
Comment 7 Rakesh 2015-08-21 15:13:08 UTC
Thanks Eric. I have already gone through that link and many other forums also. fsync and fdatasync is the guaranteed solution but not all of the open-source libraries are doing fsync.
Comment 8 Eric Sandeen 2015-08-21 15:16:25 UTC
Yes, that's unfortunately true.
FWIW, you mention that you tested v3.14; that's still over a year old.
If you have the time, a test on latest upstream would be great.
Comment 9 Rakesh 2015-08-21 18:36:08 UTC
Ok Eric. I will check for v4.x.x and updates you.
Comment 10 Eric Sandeen 2015-08-25 17:49:33 UTC
Ok, so there are 2 basic heuristics here.

One is that if we call ext4_truncate to size 0, we set the AUTO_DA_ALLOC flag so that it'll call ext4_alloc_da_blocks in ext4_release_file (essentially on close).

The other is that if we call rename, and we're overwriting an existing file, we call ext4_alloc_da_blocks.

ext4_alloc_da_blocks will start writeback on the file (i.e. the file which was truncated, or the new file overwriting the old file) if there are any delayed allocations still pending; if not, it does nothing.

Note, we don't get to ext4_truncate if the file is already zero length when you open it O_TRUNC.

Also, notice that if we strace your c++ program (with the rename call included), we see:

open("example.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
rename("example.txt", "example.txt1")   = 0
write(3, "Writing this to a file.\n", 24) = 24
close(3)                                = 0

so the rename happens before the write; even if that is overwriting an existing file, there are no delalloc blocks on the new file yet, so the rename heuristic does nothing in this case.

So there are a few prerequisites to make your c++ test work
Comment 11 Eric Sandeen 2015-08-25 18:05:08 UTC
IOWS, it handles these two cases:

fd = open("foo.new")
write(fd,..)
close(fd)
rename("foo.new", "foo") // syncs out foo.new if it has delalloc blocks

and

fd = open("foo", O_TRUNC)
write(fd,..)
close(fd) // syncs out foo if "foo" had blocks prior to the O_TRUNC truncate

Your testcase does this if foo.new doesn't already exist:

fd = open("foo.new", O_TRUNC) // if foo.new has no blocks, O_TRUNC does nothing
rename("foo.new", "foo") // foo.new has no delalloc blocks, does nothing
write(fd)
close(fd) 

If "foo.new" does exist, 

fd = open("foo.new", O_TRUNC) // if foo.new has blocks, sets da_alloc flag
rename("foo.new", "foo") // foo.new has no delalloc blocks, does nothing
write(fd)
close(fd) // syncs out the data

IOWS, if example.txt starts with allocated blocks, this:

# rm example.txt1
# echo foobar > example.txt
# sync
# ./testcase

works as you hope, because testcase does:

fd = open("example.txt", O_TRUNC) // example.txt has blocks, sets da_alloc flag
rename("example.txt", "example.txt1") // "example.txt" no has delalloc blocks nothing happens
write(fd) // now we have delalloc blocks
close(fd) // syncs out the data


So it's not that the heuristic is broken; your testcase just doesn't necessarily meet the conditions of the heuristic.
Comment 12 Rakesh 2015-08-28 16:49:38 UTC
Ok Eric. Got it. It looks like there is only way to do fsync and/or do truc/rename call.

So file system is working as expected.
Comment 13 Rakesh 2015-08-28 16:51:55 UTC
File system working fine as expected.
Changed status accordingly.
Comment 14 Theodore Tso 2015-08-29 03:25:15 UTC
Rakesh, if you are aware of truly broken programs that rename first and then write to the file (which means that they will lose data if they crash after the rename), let me know.  The hueristics were designed to catch the most common cases of application brain-damage, to it:

1)  write foo.new
2)  fail to use fsync(2) as they should
3)  close the file descriptor for foo.new
4)  rename foo.new to foo

The fact that we also catch the case of

1)  truncate a file containing data down to zero
2)  write a new version of the file, and hope you don't crash right after 1

Was because, if I recall correctly, both GNOME and KDE had something like this in their library functions and a lot of programs were calling it.   ***Sigh***

I believe their excuse was that it was too hard to copy the ACL's and xattr's from foo to foo.new, and by using a truncate, they wouldn't have to do all of that hard work to read the acl and xattr's from the old file, and set them on foo.new before doing the rename.

One especially brilliant application was rewriting the config file after each time the window was moved a pixel or two, so that the window location could be saved.   So if you dragged the window around, the file would get written dozens if not hundreds of times.

Just in case you ever wondered why many file system developers don't trust application / desktop programmers....
Comment 15 Theodore Tso 2015-08-29 03:28:27 UTC
Just to be clear, no one should be *relying* on these hueristics.  They are not mandated by Posix, and there is no guarantee that future file systems will implement these hueristics.  They are workarounds for broken applications, in the hopes that these broken applications will get **fixed**.
Comment 16 Rakesh 2015-08-29 19:33:54 UTC
Hi Theodore, Thanks for your reply.

I have faced this issue with my own code. so i am not aware of open-source libs those are doing this way.

I am agree with you. Hueristics replace via rename and truncate in ext4 are good for broken application. But for 100% durability, fsync is the solution.

Note You need to log in before you can comment on or make changes to this bug.