Bug 183461 - mkdir(2) returns EOVERFLOW on tmpfs in user namespace
Summary: mkdir(2) returns EOVERFLOW on tmpfs in user namespace
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-26 15:53 UTC by Ludovic Courtès
Modified: 2020-01-24 01:25 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.8.1
Tree: Mainline
Regression: Yes


Attachments
The reproducer (1.06 KB, application/octet-stream)
2016-10-26 15:53 UTC, Ludovic Courtès
Details
The patch that "solves" the bug. Use at your own risk. (1.83 KB, patch)
2016-11-15 22:30 UTC, Johanna
Details | Diff

Description Ludovic Courtès 2016-10-26 15:53:51 UTC
Created attachment 242821 [details]
The reproducer

The child process created by the attached program gets EOVERFLOW on its mkdir(2) call on its tmpfs mount with 4.8.1.  With earlier versions, up to 4.7.5 at least, the mkdir(2) call would succeed as expected.
Comment 1 Ludovic Courtès 2016-11-01 20:37:36 UTC
EOVERFLOW also with 4.8.4.
Comment 2 Johanna 2016-11-03 14:10:01 UTC
It seems like this "bug" (if it is a bug) is introduced with the following commit:

commit 036d523641c66bef713042894a17f4335f199e49
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Fri Jul 1 12:52:06 2016 -0500

    vfs: Don't create inodes with a uid or gid unknown to the vfs
    
    It is expected that filesystems can not represent uids and gids from
    outside of their user namespace.  Keep things simple by not even
    trying to create filesystem nodes with non-sense uids and gids.
    
    Acked-by: Seth Forshee <seth.forshee@canonical.com>
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

That was merged following this pull request: https://lkml.org/lkml/2016/7/26/297

Is there a use case Ludovic where you need tmpfs to handle user namespaces in this way?
Comment 3 Ludovic Courtès 2016-11-04 13:08:00 UTC
I don't really have a "use case" for this.  I stumbled upon this issue because one of our unit tests relies on the original behavior:

  http://git.savannah.gnu.org/cgit/guix.git/tree/tests/syscalls.scm#n149

That said, mounting a tmpfs inside a user namespace doesn't seem that "exotic" to me, so I wouldn't be surprised if there's code out there that broke because of this.

Also, AIUI, mkdir(2) is not documented to return EOVERFLOW.

Thoughts?

Thanks for your reply!
Comment 4 Johanna 2016-11-09 13:19:57 UTC
I'm investigating this further. I am however really new to kernel development, so I can't promise any progress.
Thank you for the unit test, I'll stick to the C program you provided though, but it's nice to have things in context.

I agree with you that EOVERFLOW is a "strange" error code to be returned by mkdir in this case.
Comment 5 Johanna 2016-11-15 02:57:27 UTC
So, after further investigation it turns out that this doesn't happen if there is a mapping for the uid and gid in the user namespace (/proc/pid/uid_map and /proc/pid/gid_map).

However, tmpfs is a FS_USERNS_MOUNT type filesystem so it would kind of make sense (I think?) to allow non-mappable uids and gids to be used inside tmpfs.

Some input from someone who knows more in-depth how this is supposed to work would help greatly :)

My current plan is to submit a small patch enabling tmpfs to create objects with non-mapped uids and gids and see what the response is.
Comment 6 Johanna 2016-11-15 22:30:26 UTC
Created attachment 244701 [details]
The patch that "solves" the bug. Use at your own risk.

I leave asbolutely no guarantee that this patch will not make your system insecure. Please make your own risk assessment before applying this patch.
Comment 7 Eric W. Biederman 2016-11-22 18:45:29 UTC
So the right thing for testing system calls in your tests is to run the tests as a mapped user in a user namespace.  Likely 0.  Otherwise you are going to hit weird and strange corner cases, simply because you uid and gid are not mapped.

Weird cases like: What uid and gid does stat return?

If this failure shows up in something other than a test it will be fixed.
Comment 8 Marcos Souza 2017-11-21 22:55:00 UTC
(In reply to Eric W. Biederman from comment #7)
> So the right thing for testing system calls in your tests is to run the
> tests as a mapped user in a user namespace.  Likely 0.  Otherwise you are
> going to hit weird and strange corner cases, simply because you uid and gid
> are not mapped.
> 
> Weird cases like: What uid and gid does stat return?
> 
> If this failure shows up in something other than a test it will be fixed.

IT returns the overflow uid/gid:
UID: 65534, GID 65534

In this case, it should indeed return an error? If yes, maybe man pages should be updated to reflect this behavior?

Thanks,
Comment 9 Adam Bliss 2020-01-23 19:22:04 UTC
FYI, this change broke (I believe) an important tool that had been shipped to users: https://github.com/sandstorm-io/vagrant-spk/issues/213 . (I have no opinion on whether the tool should or shouldn't use a mapped uid.)

I also believe it would have been diagnosed and fixed much faster if the mkdir(2) manpage had documented what circmustances could cause EOVERFLOW to be returned.

Thanks for opening this bug and for documenting it here.
Comment 10 Eric W. Biederman 2020-01-24 01:25:34 UTC
bugzilla-daemon@bugzilla.kernel.org writes:

> https://bugzilla.kernel.org/show_bug.cgi?id=183461
>
> Adam Bliss (abliss@gmail.com) changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |abliss@gmail.com
>
> --- Comment #9 from Adam Bliss (abliss@gmail.com) ---
> FYI, this change broke (I believe) an important tool that had been shipped to
> users: https://github.com/sandstorm-io/vagrant-spk/issues/213 . (I have no
> opinion on whether the tool should or shouldn't use a mapped uid.)

When you are doing anything non-trivial you need to be performing it
in a process that has it's uids and gids mapped.

The case you are describing worked almost by chance.  Mounting a
filesystem inside a user namespace limits it's uids and gids to the uids
and gids that can be represented by the user namespace.  Which means in
general you can not create files or pretty much anything else that
updates an inode.  In general attempting to support that would result in
filesystem corruption.

I suspect the user namespace you are entering already has an appropriate
mapping and the program just needs to call setuid and setgid to get it's
mappings into the user namespace.  At which point you are more thorougly
entered and you won't be hitting very strange edge cases.

Eric

Note You need to log in before you can comment on or make changes to this bug.