Bug 33372

Summary: ecrypts does not preserve correctly timestamps for "cp -p"
Product: File System Reporter: rocko (rockorequin)
Component: ecryptfsAssignee: fs_ecryptfs
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, code, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.39-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 32012    

Description rocko 2011-04-17 05:18:09 UTC
With the 2.6.39-rc3 kernel, assuming I have a file tt in a non-ecryptfs file system and I do the following:

 mkdir test
 sudo mount -t ecryptfs test test -o ecryptfs_cipher=aes,ecryptfs_key_bytes=16,ecryptfs_passthrough=n,ecryptfs_enable_filename_crypto=y
 cd test
 cp -p ../tt .

then the recently-copied file 'tt' in ecrypts has its modified timestamp set to when it was copied rather than that of original file.

This also happens when copying within the ecryptfs system.

Strangely, it doesn't always happen immediately - the timestamp may be correct for a minute or so - so maybe there is some caching involved.
Comment 1 Andrew Morton 2011-04-18 22:20:58 UTC
From which kernel version did we regress?  Was 2.6.38 OK?

Thanks.
Comment 2 rocko 2011-04-19 01:11:08 UTC
Yes, 2.6.38 works fine. I'm pretty sure that 2.6.39-rc1 is the first version with the regression.
Comment 3 Tyler Hicks 2011-04-19 16:19:01 UTC
Thanks for the report, rocko.

57db4e8d73ef2b5e94a3f412108dff2576670a8a ("ecryptfs: modify write path to encrypt page in writepage") caused this regression. That patch changed eCryptfs from using write-through caching to write-back caching. It is a useful change because before that patch, sequentially writing 4096 bytes, one byte at at time, caused 4096 page encrypt operations and writes to the lower filesystem. With that patch, writes are cached in the eCryptfs page cache until a page is written back, normally resulting in a single page encryption and write to the lower filesystem. It is important to note that eCryptfs uses vfs_write(), on a file opened in the lower filesystem, to write out a page.

Taking a look at the strace output of cp -p from a file in an ext4 fs (/home/tyhicks/foo) to an eCryptfs mount (/upper/foo):

open("/home/tyhicks/foo", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=1, ...}) = 0
open("/upper/foo", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
read(3, "\0", 32768)                    = 1
write(4, "\0", 1)                       = 1
read(3, "", 32768)                      = 0
utimensat(4, NULL, {{1303227483, 524432770}, {1303227476, 671099353}}, 0) = 0

This worked before 57db4e8d because the write-through cache model meant that all vfs_writes()'s to the lower filesystem were done before the utimensat() call. After 57db4e8d, all vfs_write()'s to the lower fs are not done before the utimensat() call. This can easily be seen by running sync after the cp -p:

$ cp -p /home/tyhicks/foo /upper/
$ stat -c '%20n: %y' /home/tyhicks/foo /upper/foo
   /home/tyhicks/foo: 2011-04-19 10:37:56.671099353 -0500
          /upper/foo: 2011-04-19 10:37:56.671099353 -0500
$ sync
$ stat -c '%20n: %y' /home/tyhicks/foo /upper/foo
   /home/tyhicks/foo: 2011-04-19 10:37:56.671099353 -0500
          /upper/foo: 2011-04-19 11:01:40.147793258 -0500

I'll need to think about it a little more, but the easiest way to resolve this seems to be saving off the lower inode's i_mtime early in the ecryptfs_writepage() path and then restoring it before returning.
Comment 4 Tyler Hicks 2011-04-19 22:38:55 UTC
(In reply to comment #3)
> I'll need to think about it a little more, but the easiest way to resolve
> this
> seems to be saving off the lower inode's i_mtime early in the
> ecryptfs_writepage() path and then restoring it before returning.

After giving it some more thought, we probably shouldn't be directly changing the lower inode's i_mtime and a getattr() + setattr() around each write to the lower filesystem is very ugly.

I think we'll have to do something similar to what NFS does by writing out dirty data early in the ecryptfs_getattr() and ecryptfs_setattr() paths. This means that all vfs_write()'s to the lower fs will have been completed by the time that the utimensat() => notify_change(ecryptfs) => ecryptfs_setattr() => notify_change(lower fs) call chain happens and the updated mtime value won't be clobbered.
Comment 5 Tyler Hicks 2011-04-26 14:35:02 UTC
Fixed by 5be79de2e1ffa19d871a494697cf76cddee93384. Linus has merged the fix and it should be released in 2.6.39-rc5.
Comment 6 Florian Mickler 2011-04-27 06:12:30 UTC
Thanks for the update.