Bug 92271 - Provide a way to really delete files, please
Summary: Provide a way to really delete files, please
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-29 11:47 UTC by Alexander Holler
Modified: 2015-02-03 18:20 UTC (History)
4 users (show)

See Also:
Kernel Version: All
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Alexander Holler 2015-01-29 11:47:59 UTC
Filesystems should offer the user a way to really delete files by really destroying the contents of a deleted file.

E.g. by overwriting the deleted content or by using "Secure trim".

See here for more elaborate flame:

https://lkml.org/lkml/2015/1/24/41

Thanks in advance.
Comment 1 Alexander Holler 2015-01-29 19:13:08 UTC
Just to avoid any misunderstanding:

I'm speaking about a way to selectively securely delete one file. Having to shred a whole device or partition isn't appropriate.

One shouldn't have to burn down a whole house in order to destroy a small piece of paper.
Comment 2 Koosha KM 2015-01-29 19:30:11 UTC
What's the problem with doing 'shred -u'?
Comment 3 Alexander Holler 2015-01-29 19:49:51 UTC
It doesn't work for on FLASH based devices which have their own logic (e.g. SD-cards or SSDs, there "Secure Trim" is needed). 

And just overwriting a file by using open(); write(); close(); likely doesn't work with ext4 on traditional harddisks (or similiar blockdevices) too.

Also I'm not sure about the last fact, as I'm not really aware of the internal workings of ext4.

At least I've never read anywhere a statement, that by writing the same amount of bytes to a file as already there, the fs will for sure overwrite the same blocks on disk as already occupied/used by the file.

So a special syscall or similiar is needed to instruct the fs to overwrite the same, already used blocks (even if they are stored only in the metadata because the file is small enough).
Comment 4 Alexander Holler 2015-01-29 20:59:20 UTC
But I have to admit that ext4 seems to be better than btrfs in regard to that problem.

At lease the quick test I just did makes me believe that using shred on ext4 might overwrite the previously used blocks.

See bug #92271 to see what I did to quickly test it.

But the problem with SSDs and similiar still exists.

Besides that the FS should somehow promise, that it really has overwritten the old blocks (even if it hasn't really control about what the device does (if not using something like "Secure Trim").
Comment 5 Alexander Holler 2015-01-29 21:01:25 UTC
Sorry, the bug I've filed for BTRFS is bug #92261 (has the same topic).
Comment 6 Eric Sandeen 2015-01-29 21:20:00 UTC
As long as you don't truncate the file first, overwriting an existing ext4 file will indeed overwrite the existing blocks allocated to that file.  Ext4 is not a copy on write filesystem.

FWIW, recent kernels and recent util-linux also have "fstrim --secure" to issue a secure discard post-deletion.

-Eric
Comment 7 Alexander Holler 2015-01-29 21:55:50 UTC
The problem is the missing promise from the FS that is has overwritten the used blocks.

Even using fstrim --secure doesn't give that promise, because the user doesn't know if the FS has given the blocks back to the device (as free).

And as long as the FS doesn't give such a promise, the only way for the user to be sure is to shred the whole device/partition.

So in case of ext4, implementing a syscall like s[ecure_]unlink() doesn't seem to be a big complicated thing and would be a very great start in solving the problem.

Of course that might not be final ultimate solution because it doesn't solve the problem about what happens when files have been modified, moved and then deleted. Here something like the 's' bit could help.

But I'm a friend of starting with small steps and I think a special unlink syscall would be a great and already useable start in solving the problem.
Comment 8 Alexander Holler 2015-01-29 22:02:48 UTC
Or just make the promise that a file which has the 's' bit set, will be overwritten with unlink(). That might avoid an additional syscall.
Comment 9 Alexander Holler 2015-01-29 22:11:31 UTC
Unfortunately that 's' flag wouldn't be usable for other filesystems which don't know about it, so I would still prefer a special unlink().
Comment 10 Theodore Tso 2015-01-31 17:21:37 UTC
The problem is for flash based devices, there ultimately is no way to be completely sure.   Even in the case of eMMC devices that support secure trim, that is only going to trim most recent flash page where the data was located.   The FTL could have copied the data to another flash page, and not gotten around to erasing the older flash page.  And most flash devices don't even have secure trim, and trim is a discretionary command that the flash device is free to ignore whenever it feels like it.  So ultimately, this is a Very Hard Problem.

It's true that we could try to do better at it, but ultimately most users will never know if they can actually trust a file that has been "securely unlinked", whether you do this with a system call or by honoring the 's' chattr bit.   It will depend on the file system you are using, the sort of storage stack used by the file system, and so in practice having such a "secure unlink" feature could do more harm than good, because it many users could end up trusting it when they really shouldn't.

In terms of your original use case, the simplest solution is to simply not let the signing keys hit a storage device at all.  So for example, you could use a ramdisk.  Perhaps more convenient for users who are not running as root is to use tmpfs.   Try seeing if /var/run/$UID is present, or if not, fall back to a directory in /tmp.   Yes, it's possible for tmpfs to be written to swap, but you can get around that problem by forking a process which mmap's the private keys, and then calls mlock(2) to make sure they stay in memory.   Each user is allowed to mlock 64k of memory by default, which should be plenty for storing some private keys that you want to discard when you're done with the kernel compile.

This has the advantage of working for *everyone*, whereas most SSD's don't have secure trim. and most kernel developers, not having the patience of saints, and not trusting the reliability of cheap flash, are unlikely to be building on top of eMMC flash.

The final thing to consider is what exactly is your threat environment.   If the attacker has sufficient privilegs to steal deleted files from your build machine, you have much bigger problems.   The attacker would be much better served to try compromise the C compiler, or simply make a copy of the keys during the build process.   Sure, there are scenarios where the bad guy might be able to steal the disk after the build was finished, and before the disk blocks gets reused, and somehow didn't have the privileges to play monkey business during the build.   Such scenarios can be imagined.  But is it worth it to spend huge amounts of engineering time creating a "secure unlink" that is only guaranteed to work on storage devices which are highly unlikely to be used for kernel builds in the first place?

This is, of course, open source, so if you want to send patches, we'll be happy to review them.    But in terms of a feature request, this is going to be extremely low priority, especially since there are workarounds which are much better, more applicable, and more secure than your proposed solution.
Comment 11 Alexander Holler 2015-01-31 18:36:36 UTC
Thanks for closing it with a invalid instead of at least a wontfix.

I don't care for military security levels and the use case the discussion evolved from. I suffer since 30 years under that problem. and if FS devs even refuse to implement some simple steps (like giving userspace a promise to overwrite something), this problem will never go away.

But I've expected nothing else.
Comment 12 Alexander Holler 2015-01-31 20:56:12 UTC
I've took the time to write a bit more.

As already said, this bug isn't about the little problem which lead me to
write this bug.

It's a general problem.

People should be able to give away their phones, tablets, laptops or other
electronic devices by beeing able to delete everything they don't want
to give away, including but not restricted to, contact lists, photos,
messages and any other documents with informations they don't want to
share with others or even become public.

And it doesn't help if filesystem developers now blame storage for not
offering a way do delete blccks. Storage people just will answer that
filesystems didn't deleted files anyway in the past 30 years, so why
should storage offer a way to do so? And they are right, at least I'm
not aware of any filesystem which offers the user a way to selectively
delete a file.

So we are now in a catch 22, which is more bad than ever as it becomes
increasingly hard, if not impossible, to erase storage, especially because
storage will be more and more fixed (soldered) to devices, and devices
often don't offer a way to destroy the content on the storage.

Maybe you will think again about it, the next time you want to give away
your old phone or tablet.
Comment 13 Theodore Tso 2015-02-01 01:51:46 UTC
I don't like making promises that I can't be 100% sure I can keep.   For example, it's possible to create storage devices that simulate a read/write disk but which is backed on top of a write-only drive, where the copy-on-write is done at the storage device level.   You can set up such a beast using ceph, for example.

The only real solution if you want to be able to give away a phone and make sure that the next owner can't recover anything at all is to use encryption.   So for example, if you buy a Nexus 6, it comes with encryption enabled by default, and if you do a factory reset, one of the things that happens is that the key which can be derived from your pin/password (the downside is you have to type in your pin after every single reboot/power cycle), and that will guarantee that your data is secure.  In fact, all of GCE's VM storage options provide encryption as a non-optional feature.  See https://cloud.google.com/compute/docs/disks

This is how Google Compute Engine guarantees security for its local SSD product offering --- each SSD attached to each VM gets its own encryption key, and when the VM is destroyed, the encryption key is flushed, and that way you don't have to feed the SSD to a shredder (which is the *only* other way to 100% guarantee that all of the data is deleted).  In the commercial world (never mind the military world), you really care about your reputation, which means that you must make sure that data can never leak, which is why you don't trust the FTL, and you don't trust vendors claims around "trim" or "secure trim".   You use crypto.

So if you want to use crypto, what to do?   You can either use dm-crypt or ecryptfs, which is what Android and Chrome are currently using, or we're currently working on an ext4 encryption feature which will give you per-file control over which keys get used for which files.   This feature is still very much in development, so if it breaks, you can keep both pieces, but this is why I'm not terribly interested in wasting time trying to wire up secure discard.   If someone else wants to send me patches, I'll look at them, but personally I don't think it's a good use of my time.
Comment 14 Alexander Holler 2015-02-01 10:14:39 UTC
The FS doesn't have to give a promise for the storage.

If the storage doesn't overwrite the same blocks (which it did for 25 years and still does on most conventional HDs) and doesn't offer a way to really get rid of stuff (like Secure Trim), then the storage can be blamed.

Using encryption is just a workaround. And what makes you believe that the used encryption is secure? Or that the used encryption is still secure in some years. Encryption often suffers under bugs in the implementation, bugs in the algorithm itself and from Moore. Just think at all the "encrypted" zip-archives. I a few years people might laught about the stuff used today to encrypt.
Comment 15 Alexander Holler 2015-02-01 10:35:31 UTC
I didn't expect that you will fix the bug, nor did I demand that someone else has to fix it.

Of course, it would be nice if someone would fix it (maybe I'll do it myself), but there shouldn't be a need to close this bug just to hide it.

I neither have a problem with open bugs, nor with TODO lines in sources and I think people which have, have a problem. ;)
Comment 16 Alexander Holler 2015-02-01 11:04:58 UTC
And by the way, I'm using e.g. an encrypted loop mounted partition as mail-storage for my personal imap-server since many years (can't remember, I assume around 10). So it's nothing new for me nor many other people and nothing Google has invented.

Using encryption is a nice surplus, but it's just a very bad, cumbersome and very uncomfortable workaround for the inability of all filesystems to offer users a way to really delete files.
Comment 17 Alexander Holler 2015-02-01 16:33:23 UTC
BTW. I wonder how Google does flush the encryption key it must have stored somewhere. ;)
Comment 18 Alexander Holler 2015-02-01 17:18:46 UTC
And if the encrypted content isn't really deleted, what happens if the key has gone away before it was "flushed" (whatever that means)? Which doesn't sound that unlikely after we've learned about the resources governments are using to get their hands on such stuff. So one could say one problem was exchanged against another, if the storage isn't really erased (which leads to original problem).

Anyway, maybe the easy way I've used to fix fat in a few hours works for extN too. No idea if and when I will try or even post patches, as I personally don't have any problem with all that and are happily able to secure my own stuff according to most of my needs (hopefully). So I'll now stop arguing (likely to your pleasure) ;)
Comment 19 Dmitry Monakhov 2015-02-02 08:32:54 UTC
(In reply to Theodore Tso from comment #10)
> The problem is for flash based devices, there ultimately is no way to be
> completely sure.   Even in the case of eMMC devices that support secure
> trim, that is only going to trim most recent flash page where the data was
> located.   The FTL could have copied the data to another flash page, and not
> gotten around to erasing the older flash page.  And most flash devices don't
> even have secure trim, and trim is a discretionary command that the flash
> device is free to ignore whenever it feels like it.  So ultimately, this is
> a Very Hard Problem.
> 
> It's true that we could try to do better at it, but ultimately most users
> will never know if they can actually trust a file that has been "securely
> unlinked", whether you do this with a system call or by honoring the 's'
> chattr bit.   It will depend on the file system you are using, the sort of
> storage stack used by the file system, and so in practice having such a
> "secure unlink" feature could do more harm than good, because it many users
> could end up trusting it when they really shouldn't.
> 
> In terms of your original use case, the simplest solution is to simply not
> let the signing keys hit a storage device at all.  So for example, you could
> use a ramdisk.  Perhaps more convenient for users who are not running as
> root is to use tmpfs.   Try seeing if /var/run/$UID is present, or if not,
> fall back to a directory in /tmp.   Yes, it's possible for tmpfs to be
> written to swap, but you can get around that problem by forking a process
> which mmap's the private keys, and then calls mlock(2) to make sure they
> stay in memory.   Each user is allowed to mlock 64k of memory by default,
> which should be plenty for storing some private keys that you want to
> discard when you're done with the kernel compile.
> 
> This has the advantage of working for *everyone*, whereas most SSD's don't
> have secure trim. and most kernel developers, not having the patience of
> saints, and not trusting the reliability of cheap flash, are unlikely to be
> building on top of eMMC flash.
> 
> The final thing to consider is what exactly is your threat environment.   If
> the attacker has sufficient privilegs to steal deleted files from your build
> machine, you have much bigger problems.   The attacker would be much better
> served to try compromise the C compiler, or simply make a copy of the keys
> during the build process.   Sure, there are scenarios where the bad guy
> might be able to steal the disk after the build was finished, and before the
> disk blocks gets reused, and somehow didn't have the privileges to play
> monkey business during the build.   Such scenarios can be imagined.  But is
> it worth it to spend huge amounts of engineering time creating a "secure
> unlink" that is only guaranteed to work on storage devices which are highly
> unlikely to be used for kernel builds in the first place?
> 
> This is, of course, open source, so if you want to send patches, we'll be
> happy to review them.    But in terms of a feature request, this is going to
> be extremely low priority, especially since there are workarounds which are
> much better, more applicable, and more secure than your proposed solution.

I always want to implement reliable (but 100% secure) way to delete inodes.
Main idea is to append freed space to special inode (similar to what we do with orphan list) and reliably cleanup that blocks in background (actual logic depends on lower device: perform multiple rewrites or sec-trim).  I'll back with the proof of concept patch-set.
Comment 20 Alexander Holler 2015-02-02 11:13:06 UTC
My proof of concepts now works for fat and ext4 (just by overwriting blocks w/o secure_trim, but that's not necessary for the proof):

=====
laptopahbt ~ # dd if=/dev/zero of=/tmp/test.img bs=1M count=50
50+0 Datensätze ein
50+0 Datensätze aus
52428800 Bytes (52 MB) kopiert, 0,0372579 s, 1,4 GB/s
laptopahbt ~ # mkfs.ext4 /tmp/test.img                 
mke2fs 1.42.10 (18-May-2014)
Blöcke des Gerätes werden verworfen: erledigt                        
Creating filesystem with 51200 1k blocks and 12824 inodes
Filesystem UUID: fece1e42-07d5-449f-be29-650a8514648b
Superblock-Sicherungskopien gespeichert in den Blöcken: 
        8193, 24577, 40961

Platz für Gruppentabellen wird angefordert: erledigt                        
Inode-Tabellen werden geschrieben: erledigt                        
Erstelle Journal (4096 Blöcke): erledigt
Schreibe Superblöcke und Dateisystem-Accountinginformationen: erledigt

laptopahbt ~ # grep -a akrakadabra /tmp/test.img 
laptopahbt ~ # mount -o loop /tmp/test.img /mnt/
laptopahbt ~ # echo akrakadabra >/mnt/foo.txt
laptopahbt ~ # umount /mnt/
laptopahbt ~ # grep -a akrakadabra /tmp/test.img 
�@@#����▒�����7((�▒���((����7((+Q��~ ((s��c7((i���H�((�1akrakadabra
laptopahbt ~ # mount -o loop /tmp/test.img /mnt/
laptopahbt ~ # rm -s /mnt/foo.txt 
laptopahbt ~ # umount /mnt/
laptopahbt ~ # grep -a akrakadabra /tmp/test.img 
laptopahbt ~ # mount -o loop /tmp/test.img /mnt/
laptopahbt ~ # echo akrakadabra >/mnt/foo.txt
laptopahbt ~ # umount /mnt/
laptopahbt ~ # mount -o loop /tmp/test.img /mnt/
laptopahbt ~ # rm /mnt/foo.txt 
laptopahbt ~ # umount /mnt/
laptopahbt ~ # grep -a akrakadabra /tmp/test.img 
�@@#����▒�����7((�▒���((����7((+Q��~ ((s��c7((i���H�((�1akrakadabra
laptopahbt ~ # cd /usr/src/linux
laptopahbt linux # PAGER= git diff --stat HEAD~5
 arch/x86/syscalls/syscall_32.tbl      |  1 +
 arch/x86/syscalls/syscall_64.tbl      |  1 +
 fs/ext4/ext4.h                        |  2 ++
 fs/ext4/mballoc.c                     | 21 ++++++++++++++++++++-
 fs/ext4/super.c                       | 10 ++++++++++
 fs/fat/fat.h                          |  1 +
 fs/fat/fatent.c                       | 17 +++++++++++++++++
 fs/fat/inode.c                        | 14 ++++++++++++++
 fs/namei.c                            | 35 ++++++++++++++++++++++++++++++-----
 include/asm-generic/audit_dir_write.h |  1 +
 include/linux/fs.h                    |  1 +
 include/linux/syscalls.h              |  1 +
 include/uapi/asm-generic/unistd.h     |  4 +++-
 tools/perf/builtin-trace.c            |  2 ++
 14 files changed, 104 insertions(+), 7 deletions(-)
=====

Most changes are to add the new syscall unlinkat_s(), the changes for fat and ext4 itself are pretty small.

Will post patches when I've found the time to make them such, that I don't end up again in silly discussions.
Comment 21 Alexander Holler 2015-02-02 17:12:23 UTC
https://lkml.org/lkml/2015/2/2/495
Comment 22 Theodore Tso 2015-02-03 17:29:29 UTC
Alexander, merely overwriting the blocks won't necessarily help if you are using a SSD or something like dm-thin.   That's the point; such patches will be making promises that can't be kept in all circumstances. In the case of an SSD, it might be require accessing the flash cells directly without the FTL getting in the way. 

So before you advertise such a scheme, you need to be very clear what your threat model and what capabilities the adversary is supposed to have.   And, also, what responsibility you might bear (or a commercial company like Red Hat or SuSE might bear --- both in terms of legal liability and reputational damage) if they advertise such a feature, and then it turns out that the way the user happened to configure their system, it wasn't secure after all.


-- Ted
Comment 23 Eric Sandeen 2015-02-03 17:44:47 UTC
I would also suggest reading over previous proposals for this:

http://lwn.net/Articles/462437/

to learn from and build on past efforts, if you wish.

-Eric
Comment 24 Alexander Holler 2015-02-03 17:47:27 UTC
Sorry, but that doesn't make any sense (comment #22).

As all these companies already fail with 'rm', still offer 'shred', there is nothing more to loose. It can't become worse, just better.

And that is would this approach is or was about (as written on LKML, I've already given up).

And if people do answer with complications like snapshots and whatever (as bug #92261), I really have to wonder. Where does the documentation for 'rm' or even 'shred' mention that it is unable to delete files from backups, snapshots or the like?

Nobody has to advertise this scheme in way it doesn't work.

Maybe it was a fault to mention the word secure, and I should have used unlinkat_w() (for wipe), because everyone seems to end up in panic nowadays when they get in contact with the word secure.
Comment 25 Alexander Holler 2015-02-03 18:20:03 UTC
Thanks for the link to LKML. Nice reading.

As it is from 2011, I assume it ended up somehow like my even more primitive (or worse) approach. ;)

Note You need to log in before you can comment on or make changes to this bug.