Bug 16157 - Ecryptfs is very slow at listing directories with many files
Summary: Ecryptfs is very slow at listing directories with many files
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ecryptfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tyler Hicks
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-08 11:57 UTC by Pedro C
Modified: 2011-06-17 07:08 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.34
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Pedro C 2010-06-08 11:57:10 UTC
Listing an ecryptfs directory with lots of files is very slow. Here is how to reproduce this:

• Mount an ecryptfs volume (using encrypted home directories or ~/Private/ for example)
• Create a directory with 10.000 files in it
• Unmount and remount the directory (or logout/login or reboot)
• Run ls on the directory

Running find with no arguments on the same directory is immediate so I guess ls is stat()'ing each file and thus needs to load from disk and decrypt every single one. 

Here is a user account of how this problem makes encrypted home directories infeasible:

http://www.satansgarden.org/2010/03/05/removing-encryption-from-home-directories-in-ubuntu-9-10/
Comment 1 Tyler Hicks 2010-06-17 22:41:16 UTC
Hi Pedro - Thanks for opening the bug.  You're correct that the stat() is what hurts performance.  It forces a lookup to be done and the first lookup on each eCryptfs file is expensive because we have to read the file's eCryptfs metadata from disk.  The unencrypted file size and the magic eCryptfs marker is stored in the first 16 bytes of the metadata.  We read these 2 values to help determine if the eCryptfs metadata is stored in the file header or in an extended attribute, to determine the decrypted file size, and to ensure that we're dealing with an eCryptfs file.

We maybe able delay this read when the ecryptfs_xattr_metadata mount argument isn't given.  Very few people, if any, use the xattr metadata support, so it shouldn't be hurting performance for everyone else.  I'll kick around the idea and see if I come up with anything.

For a temporary workaround, if you want a basic, quick directory listing be sure to use ls --color=none.  Your distro may have ls aliased to "ls --color=always", mine does.  For ls to decide what color to use, it has to stat().
Comment 2 Pedro C 2010-06-18 13:10:57 UTC
Thanks for the reply and the tip about ls. My main use case was actually python's os.walk that I think has to stat to see which directory entries are files and which are directories. It would be great to have this be fast when xattrs aren't being used. Thanks for looking into it.
Comment 3 Tyler Hicks 2010-06-18 16:32:43 UTC
After giving this more thought, this is a design issue that I don't see a way around.  We have to read the first 8 bytes of the metadata to know the decrypted file size.  If we don't do it in ecryptfs_lookup(), we will have to do it in ecryptfs_getattr() in order for the stat() to be correct.  Both of those functions are in the path of a stat() syscall.

The reason we can't skip this step when the metadata xattr isn't used is because the header size is still variable (between 8 kB and PAGE_CACHE_SIZE on the system creating the file) and the padding at the end is variable (between 0 and PAGE_CACHE_SIZE bytes).
Comment 4 Pedro C 2010-06-18 16:49:26 UTC
Can't the metadata be stuffed in the directory entry instead of the file so it can be read all in one go? I guess it's a big change of format but as it stands ecryptfs won't work to do full encryption of user directories, which is a shame.
Comment 5 Tyler Hicks 2010-06-18 17:00:58 UTC
That would be a significant design change.  Since eCryptfs stacks on top of another file system, we are restricted to placing metadata at the front of the file or in an xattr.

It isn't fair to say that eCryptfs won't work to encrypt user directories because very few people have 100,000 files in a single directory.  This is the first time I've heard of this performance complaint.

I'm hoping to begin some design changes towards the end of the year.  I will definitely keep this issue in mind.  Thanks again for pointing it out.
Comment 6 Pedro C 2010-06-18 19:01:58 UTC
I see the problem, thanks for looking into it. This tends to crop up when you have something with lots of files. Source code trees and digital photography both generate those and are quite common. They don't have to necessarily all be in the same directory, they just need to be enumerated all at once by an app. Accessing a photo library or grep'ing a source tree will tend to do that.
Comment 7 Pedro C 2011-06-17 07:08:15 UTC
Any progress on the design changes to fix this?

msznapka in the Ubuntu bug report did some benchmarking:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/587408/comments/6

In a nutshell to list (ls -lR) 10k 1 byte files:

ecryptfs: 64.6s
encfs: 0.9s
unencrypted:0.7s

Seems like encfs doesn't have this issue and can be a reasonable substitute as it's not much slower than normal.

To give you an idea how important this is my ~/Photos/ dir alone has 40k files and a kernel tree has 37k so a test with 10k files is far from academic, and does indeed make ecryptfs unsuitable to encrypt user directories at the moment.

Note You need to log in before you can comment on or make changes to this bug.