As per the ext4 guide, ext4 will detect the replace-via-rename and replace-via-truncate patterns and force that any delayed allocation blocks are allocated such that at the next journal commit, in the default data=ordered mode, the data blocks of the new file are forced to disk before the rename() opera‐ tion is committed. But it looks like this feature is not working anymore. Kernel version: 2.6.39. Filesystem: ext4 Here is the sample code: ofstream myfile; myfile.open ("example.txt",std::ofstream::trunc); myfile << "Writing this to a file.\n"; system("mv example.txt example.txt1"); Expected behaviour: Ext4 should detect trunc() call and should allocate blocks for same. So there should be no zero-length file after abnormal shutdown(withing 30sec). Actual results: File is having zero-length after abnormal reboot(power outage).
2.6.39 is 4 years old at this point. Have you tested upstream? > Actual results: File is having zero-length after abnormal reboot(power > outage). How long after your sample code runs do you cut the power? Which "file?" Do you see "example.txt" or "example.txt1" post-reboot?
Same behaviour on kernel 3.14.27 also. Power cut for 15sec, 20sec and 25sec. All 3times, files is having zero length. For above code file "example.txt1" is empty. If I rename command(system(mv example.txt example.txt1)) then also example.txt is blank. ofstream myfile; myfile.open ("example.txt",std::ofstream::trunc); myfile << "Writing this to a file.\n";
Same behaviour on kernel 3.14.27 also. Power cut for 15sec, 20sec and 25sec. All 3times, files is having zero length. For above code file "example.txt1" is empty. If I remove rename command(system(mv example.txt example.txt1)) then also example.txt is blank. ofstream myfile; myfile.open ("example.txt",std::ofstream::trunc); myfile << "Writing this to a file.\n"; Looks like rename via trunc call is not wroking. Uploaded sample file.
Created attachment 185311 [details] Sample code
Changing priority as per data loss issue. Can anyone is working on this? Any contact person to discuss this?
I'll look at it when I have time... If you want to be sure to avoid data loss, you should use data integrity syscalls (fsync & friends): http://lwn.net/Articles/457667/ The auto-alloc heuristics are just that; if you want guarantees, call fsync.
Thanks Eric. I have already gone through that link and many other forums also. fsync and fdatasync is the guaranteed solution but not all of the open-source libraries are doing fsync.
Yes, that's unfortunately true. FWIW, you mention that you tested v3.14; that's still over a year old. If you have the time, a test on latest upstream would be great.
Ok Eric. I will check for v4.x.x and updates you.
Ok, so there are 2 basic heuristics here. One is that if we call ext4_truncate to size 0, we set the AUTO_DA_ALLOC flag so that it'll call ext4_alloc_da_blocks in ext4_release_file (essentially on close). The other is that if we call rename, and we're overwriting an existing file, we call ext4_alloc_da_blocks. ext4_alloc_da_blocks will start writeback on the file (i.e. the file which was truncated, or the new file overwriting the old file) if there are any delayed allocations still pending; if not, it does nothing. Note, we don't get to ext4_truncate if the file is already zero length when you open it O_TRUNC. Also, notice that if we strace your c++ program (with the rename call included), we see: open("example.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 rename("example.txt", "example.txt1") = 0 write(3, "Writing this to a file.\n", 24) = 24 close(3) = 0 so the rename happens before the write; even if that is overwriting an existing file, there are no delalloc blocks on the new file yet, so the rename heuristic does nothing in this case. So there are a few prerequisites to make your c++ test work
IOWS, it handles these two cases: fd = open("foo.new") write(fd,..) close(fd) rename("foo.new", "foo") // syncs out foo.new if it has delalloc blocks and fd = open("foo", O_TRUNC) write(fd,..) close(fd) // syncs out foo if "foo" had blocks prior to the O_TRUNC truncate Your testcase does this if foo.new doesn't already exist: fd = open("foo.new", O_TRUNC) // if foo.new has no blocks, O_TRUNC does nothing rename("foo.new", "foo") // foo.new has no delalloc blocks, does nothing write(fd) close(fd) If "foo.new" does exist, fd = open("foo.new", O_TRUNC) // if foo.new has blocks, sets da_alloc flag rename("foo.new", "foo") // foo.new has no delalloc blocks, does nothing write(fd) close(fd) // syncs out the data IOWS, if example.txt starts with allocated blocks, this: # rm example.txt1 # echo foobar > example.txt # sync # ./testcase works as you hope, because testcase does: fd = open("example.txt", O_TRUNC) // example.txt has blocks, sets da_alloc flag rename("example.txt", "example.txt1") // "example.txt" no has delalloc blocks nothing happens write(fd) // now we have delalloc blocks close(fd) // syncs out the data So it's not that the heuristic is broken; your testcase just doesn't necessarily meet the conditions of the heuristic.
Ok Eric. Got it. It looks like there is only way to do fsync and/or do truc/rename call. So file system is working as expected.
File system working fine as expected. Changed status accordingly.
Rakesh, if you are aware of truly broken programs that rename first and then write to the file (which means that they will lose data if they crash after the rename), let me know. The hueristics were designed to catch the most common cases of application brain-damage, to it: 1) write foo.new 2) fail to use fsync(2) as they should 3) close the file descriptor for foo.new 4) rename foo.new to foo The fact that we also catch the case of 1) truncate a file containing data down to zero 2) write a new version of the file, and hope you don't crash right after 1 Was because, if I recall correctly, both GNOME and KDE had something like this in their library functions and a lot of programs were calling it. ***Sigh*** I believe their excuse was that it was too hard to copy the ACL's and xattr's from foo to foo.new, and by using a truncate, they wouldn't have to do all of that hard work to read the acl and xattr's from the old file, and set them on foo.new before doing the rename. One especially brilliant application was rewriting the config file after each time the window was moved a pixel or two, so that the window location could be saved. So if you dragged the window around, the file would get written dozens if not hundreds of times. Just in case you ever wondered why many file system developers don't trust application / desktop programmers....
Just to be clear, no one should be *relying* on these hueristics. They are not mandated by Posix, and there is no guarantee that future file systems will implement these hueristics. They are workarounds for broken applications, in the hopes that these broken applications will get **fixed**.
Hi Theodore, Thanks for your reply. I have faced this issue with my own code. so i am not aware of open-source libs those are doing this way. I am agree with you. Hueristics replace via rename and truncate in ext4 are good for broken application. But for 100% durability, fsync is the solution.