Bug 209971

Summary: [regression][bisected] procfs: sendfile of /proc/self/mountinfo fails (5.10-rc1, breaks LXC)
Product: File System Reporter: joanbrugueram
Component: OtherAssignee: fs_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: carnil, hch, jussi.kivilinna
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.10-rc1 Subsystem:
Regression: Yes Bisected commit-id:

Description joanbrugueram 2020-10-30 21:07:32 UTC
Consider the following C program, which copies /proc/self/mountinfo to a memfd using sendfile:

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <assert.h>
    #include <errno.h>
    #include <fcntl.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <sys/mman.h>
    #include <sys/sendfile.h>
    
    int main() {
        int mntinfo_fd = open("/proc/self/mountinfo", O_RDONLY | O_CLOEXEC);
        assert(!(mntinfo_fd < 0));
    
        int memfd = memfd_create("mountinfo", MFD_CLOEXEC);
        assert(!(memfd < 0));
    
        ssize_t copied;
    again:
        copied = sendfile(memfd, mntinfo_fd, NULL, 0x7ffff000);
        if (copied < 0) {
            if (errno == EINTR)
                goto again;
    
            fprintf(stderr, "Failed to copy \"/proc/self/mountinfo\"\n");
            return -1;
        }
    
        return 0;
    }

In Linux 5.9, this succeeds (prints nothing), however, in Linux 5.10-rc1, this fails (prints 'Failed to copy "/proc/self/mountinfo"').

This program comes as reduced test case from some LXC code, as can be seen here: https://github.com/lxc/lxc/blob/8ddf34f7a037325565b8cf8ff995cbf573f9932e/src/lxc/conf.c#L2987 . On my system (current Arch Linux, LXC 4.0.5), this code runs when trying to start a container, and the container fails to start (in 5.10-rc1 but not on 5.9) due to this problem:

    $ lxc-create -t download -n talpine -- -d alpine -r 3.12 -a amd64
    [...]
    $ lxc-start -n talpine -F
    lxc-start: talpine: conf.c: turn_into_dependent_mounts: 3012 Invalid argument - Failed to copy "/proc/self/mountinfo"
    [...]
    lxc-start: talpine: tools/lxc_start.c: main: 308 The container failed to start

I bisected this problem back to commit 36e2c7421f02a22f71c9283e55fdb672a9eb58e7 "fs: don't allow splice read/write without explicit ops". As far as I can tell, sendfile relies on splice operations, and since this commit removes the fallback for splice operations for those files that don't implement them, such as /proc/self/mountinfo and some other procfs files, this problem is kind of expected from that commit.

Could some kind of fallback or proper implementation for procfs files be re-implemented in the kernel? Or should user space applications not be expected to rely on sendfile in this case and use a read/write loop instead?
Comment 1 joanbrugueram 2020-11-08 08:04:55 UTC
Actually, if my understanding is correct, man sendfile(2) implies that this usage by LXC was not supposed to work?

    The in_fd argument must correspond to a file which supports mmap(2)-like operations (i.e., it cannot be a socket).
Comment 2 joanbrugueram 2020-11-08 22:16:27 UTC
Looks like this was independently found and splice for some /proc files was reimplemented in commit 6b2c4d52fd38e676fc9ab5d9241a056de565eb1a, but not /proc/___/mountinfo.
Comment 3 joanbrugueram 2020-11-16 15:32:48 UTC
LXC issue: https://github.com/lxc/lxc/issues/3580

They changed the code not to use sendfile: https://github.com/lxc/lxc/commit/a39fc34bd6842ad1adc6144391071d8b1078667e

I keep this bugzilla open for now... probably there may be other projects affected by this so I'd like to see if this is worth solving at the kernel level.
Comment 4 jussi.kivilinna 2020-12-26 20:02:58 UTC
5.10.3 is still affected.
Comment 5 joanbrugueram 2020-12-27 20:31:09 UTC
Looks like it got fixed in linux-mainline:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14e3e989f6a5d9646b6cf60690499cc8bdc11f7d

I'm building a kernel now to test, but it should work since I tested a patch that looked like this some time ago.
Comment 6 joanbrugueram 2020-12-27 23:54:13 UTC
Fixed in Linux 5.11-rc1.
Comment 7 joanbrugueram 2020-12-31 02:30:44 UTC
Fixed in Linux 5.10.4.