Created attachment 242821 [details] The reproducer The child process created by the attached program gets EOVERFLOW on its mkdir(2) call on its tmpfs mount with 4.8.1. With earlier versions, up to 4.7.5 at least, the mkdir(2) call would succeed as expected.
EOVERFLOW also with 4.8.4.
It seems like this "bug" (if it is a bug) is introduced with the following commit: commit 036d523641c66bef713042894a17f4335f199e49 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Fri Jul 1 12:52:06 2016 -0500 vfs: Don't create inodes with a uid or gid unknown to the vfs It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> That was merged following this pull request: https://lkml.org/lkml/2016/7/26/297 Is there a use case Ludovic where you need tmpfs to handle user namespaces in this way?
I don't really have a "use case" for this. I stumbled upon this issue because one of our unit tests relies on the original behavior: http://git.savannah.gnu.org/cgit/guix.git/tree/tests/syscalls.scm#n149 That said, mounting a tmpfs inside a user namespace doesn't seem that "exotic" to me, so I wouldn't be surprised if there's code out there that broke because of this. Also, AIUI, mkdir(2) is not documented to return EOVERFLOW. Thoughts? Thanks for your reply!
I'm investigating this further. I am however really new to kernel development, so I can't promise any progress. Thank you for the unit test, I'll stick to the C program you provided though, but it's nice to have things in context. I agree with you that EOVERFLOW is a "strange" error code to be returned by mkdir in this case.
So, after further investigation it turns out that this doesn't happen if there is a mapping for the uid and gid in the user namespace (/proc/pid/uid_map and /proc/pid/gid_map). However, tmpfs is a FS_USERNS_MOUNT type filesystem so it would kind of make sense (I think?) to allow non-mappable uids and gids to be used inside tmpfs. Some input from someone who knows more in-depth how this is supposed to work would help greatly :) My current plan is to submit a small patch enabling tmpfs to create objects with non-mapped uids and gids and see what the response is.
Created attachment 244701 [details] The patch that "solves" the bug. Use at your own risk. I leave asbolutely no guarantee that this patch will not make your system insecure. Please make your own risk assessment before applying this patch.
So the right thing for testing system calls in your tests is to run the tests as a mapped user in a user namespace. Likely 0. Otherwise you are going to hit weird and strange corner cases, simply because you uid and gid are not mapped. Weird cases like: What uid and gid does stat return? If this failure shows up in something other than a test it will be fixed.
(In reply to Eric W. Biederman from comment #7) > So the right thing for testing system calls in your tests is to run the > tests as a mapped user in a user namespace. Likely 0. Otherwise you are > going to hit weird and strange corner cases, simply because you uid and gid > are not mapped. > > Weird cases like: What uid and gid does stat return? > > If this failure shows up in something other than a test it will be fixed. IT returns the overflow uid/gid: UID: 65534, GID 65534 In this case, it should indeed return an error? If yes, maybe man pages should be updated to reflect this behavior? Thanks,
FYI, this change broke (I believe) an important tool that had been shipped to users: https://github.com/sandstorm-io/vagrant-spk/issues/213 . (I have no opinion on whether the tool should or shouldn't use a mapped uid.) I also believe it would have been diagnosed and fixed much faster if the mkdir(2) manpage had documented what circmustances could cause EOVERFLOW to be returned. Thanks for opening this bug and for documenting it here.
bugzilla-daemon@bugzilla.kernel.org writes: > https://bugzilla.kernel.org/show_bug.cgi?id=183461 > > Adam Bliss (abliss@gmail.com) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |abliss@gmail.com > > --- Comment #9 from Adam Bliss (abliss@gmail.com) --- > FYI, this change broke (I believe) an important tool that had been shipped to > users: https://github.com/sandstorm-io/vagrant-spk/issues/213 . (I have no > opinion on whether the tool should or shouldn't use a mapped uid.) When you are doing anything non-trivial you need to be performing it in a process that has it's uids and gids mapped. The case you are describing worked almost by chance. Mounting a filesystem inside a user namespace limits it's uids and gids to the uids and gids that can be represented by the user namespace. Which means in general you can not create files or pretty much anything else that updates an inode. In general attempting to support that would result in filesystem corruption. I suspect the user namespace you are entering already has an appropriate mapping and the program just needs to call setuid and setgid to get it's mappings into the user namespace. At which point you are more thorougly entered and you won't be hitting very strange edge cases. Eric