as written in http://lwn.net/Articles/281157/ , you still have to issue two commands: mount --bind /vital_data /untrusted_container/vital_data mount -o remount,ro /untrusted_container/vital_data instead of just one: mount -o bind,ro /vital_data /untrusted_container/vital_data and worse - you have to create two entries into fstab per each bind-mount to get some directories binded in read-only. not long ago even `mount` utility was confused by this ridiculous behaviour of kernel [http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/2771] and reported bind-mount with "ro" option while in reality it was in "rw". now it fixed in `mount` and it shows 'rw' but kernel itself still just ignores 'ro' option unless it passed with redundant command/line in fstab. patch from bug URL fixes the issue and makes kernel properly obey (few last lines need to be tweaked a little to apply). they say [http://www.linuxquestions.org/questions/linux-general-1/how-do-i-create-a-read-only-bind-mount-in-fstab-778248/] Fedora, probably, have incorporated the fix but mainstream kernel still is not. what the hell, isn't Linus himself uses Fedora ? this is pain in an ass to set up proper logical paths for fileserver at boot-time and an unexpected behaviour. please, do something with it :(
It would nice to be told why this was marked as obsolete rather than just going ahead and doing it. Has bind changed? Has it been removed? Anything.............?
It was filed against a kernel we no longer care about. If it is still present in current kernels please update the kernel version accordingly. Bugzilla is not however used as a work queue so if you want to change this send patches to the linux filesystem lists.
It's true of ALL kernels. I cannot change the version as I do not have admin access.
Isn't "version" imply "starting from X and all the way up" ? Because changing it for every single update over years-persistent bugs would be fantastically ridiculous. And it is a "work queue", at least for issues, if not for improvements. The one that may be not directed to someone in particular, but anyone's unwillingness to fix some clearly incorrect behaviour does not justify its denial for everyone. Especially for such a thing, for the introduction of which quite a fuss was made at the time (read-only binds, that is).
A lot of reports get filed and then fixed without anyone even realising, or are invalidated simply by other changes so old ones get closed, they are mostly just noise and their time cost exceeds their value in actually being able to find things that do need sorting out. They are still in the database so searches for things like a specific CD drive or a specific oops pattern can still find them. But if you want it to happen then it's sat in bugzilla since 2010, it's not going to suddenely happen. If you want it to happen then I'd do two things - verify if the change is in Fedora as you claim - either grab the Fedora patch or forward put the upstream patch - test it - submit it.
This bug seems worse than it might appear. I don't think it's possible to work around this bug by remounting filesystems if one is working inside a container. For example, the following sandbox fails silently due to this bug and working around the bug by remounting fails with EPERM. #define _GNU_SOURCE #include <assert.h> #include <errno.h> #include <fcntl.h> #include <mntent.h> #include <sched.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #include <sys/prctl.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <linux/capability.h> #include <linux/sched.h> #define SHELL "./bin/sh" #define RUNTIME_NAME "sandbox" #define ARRAY_SIZE(A) (sizeof A / sizeof A[0u]) static char * const shell_arguments[] = { (char *)SHELL, NULL }; static char * const shell_environment[] = { NULL }; static char fstab_config[] = "# <file system> <mount point> <type> <options>\n" "proc proc proc ro,nodev,noexec,nosuid\n" "sys sys sysfs ro,nodev,noexec,nosuid\n" "\n" "/dev dev none rbind\n" "\n" "tmpfs run tmpfs none\n" "tmpfs var tmpfs none\n" "\n" "# Allow connecting to X11\n" "/tmp tmp none ro,bind\n" "\n" "/etc etc none ro,bind\n" "\n" "/lib lib none ro,bind\n" "/lib32 lib32 none ro,bind\n" "/lib64 lib64 none ro,bind\n" "/lib64 lib64 none ro,bind\n" "\n" "/bin bin none ro,bind\n" "/sbin sbin none ro,bind\n" "/usr usr none ro,bind\n"; struct mount { char const * source; char const * target; mode_t mode; char const * filesystem_type; unsigned long mountflags; char const *data; }; int main(void) { int errnum; if (-1 == unshare(CLONE_NEWUSER /* Needed to do the rest of the unsharing */ /* Prevent signals, ptracing of other processes */ | CLONE_NEWPID /* With chroot prevent messing with user files */ | CLONE_NEWNS /* The rest are kind of meh but might as well be done */ | CLONE_NEWIPC | CLONE_NEWUTS | CLONE_NEWNET)) { perror("unshare"); return EXIT_FAILURE; } /* Fork to allow for multithreading and to make the shell less * buggy. */ { pid_t child = fork(); if (-1 == child) { perror("fork"); return EXIT_FAILURE; } if (child != 0) { siginfo_t info; do { errnum = -1 == waitid(P_PID, child, &info, WEXITED) ? errno : 0; } while (EINTR == errnum); if (errnum != 0) { assert(errnum != EINVAL); assert(errnum != ECHILD); assert(false); } return info.si_status; } } FILE * tmp = tmpfile(); if (NULL == tmp) { perror("tmpfile"); return EXIT_FAILURE; } { size_t bytes_to_write = sizeof fstab_config - 1u; if (fwrite(fstab_config, 1u, bytes_to_write, tmp) != sizeof fstab_config - 1u) { perror("fwrite"); return EXIT_FAILURE; } } char tmppath[] = "/proc/self/fd/XXXXXXXXXXX"; sprintf(tmppath, "/proc/self/fd/%i", fileno(tmp)); FILE * fstab = setmntent(tmppath, "r"); if (NULL == fstab) { perror("setmtent"); return EXIT_FAILURE; } if (EOF == fclose(tmp)) { perror("fclose"); return EXIT_FAILURE; } { char const *runtime_dir_string = getenv("XDG_RUNTIME_DIR"); if (NULL == runtime_dir_string) { /* TODO: Fallback somewhere smart */ fputs("missing XDG_RUNTIME_DIR\n", stderr); return EXIT_FAILURE; } if (-1 == chdir(runtime_dir_string)) { perror("chdir"); return EXIT_FAILURE; } } if (-1 == mkdir(RUNTIME_NAME, S_IRWXU)) { errnum = errno; if (errnum != EEXIST) { perror("mkdir"); return EXIT_FAILURE; } } if (-1 == mount(NULL, RUNTIME_NAME, "tmpfs", 0, NULL)) { perror("mount"); return EXIT_FAILURE; } if (-1 == chdir(RUNTIME_NAME)) { perror("chdir"); return EXIT_FAILURE; } for (;;) { errno = 0; struct mntent * entry = getmntent(fstab); if (NULL == entry) { errnum = errno; if (errnum != 0) { perror("getmntent"); return EXIT_FAILURE; } break; } enum { BIND, RBIND, REMOUNT, RO, RW, SUID, NOSUID, NODEV, NOEXEC }; char * const token[] = { [BIND] = "bind", [RBIND] = "rbind", [REMOUNT] = "remount", [RO] = MNTOPT_RO, [RW] = MNTOPT_RW, [SUID] = MNTOPT_SUID, [NOSUID] = MNTOPT_NOSUID, [NODEV] = "nodev", [NOEXEC] = "noexec", NULL }; bool bind = false; bool rec = false; bool remount = false; bool readonly = false; bool readwrite = false; bool suid = true; bool dev = true; bool exec = true; char *value = NULL; { char *subopts = entry->mnt_opts; if (0 == strcmp("none", subopts)) { goto mount; } while (*subopts != '\0') { switch (getsubopt(&subopts, token, &value)) { case BIND: bind = true; break; case RBIND: bind = true; rec = true; break; case REMOUNT: remount = true; break; case RO: readonly = true; break; case RW: readwrite = true; break; case SUID: suid = true; break; case NOSUID: suid = false; break; case NODEV: dev = false; break; case NOEXEC: exec = false; break; default: fprintf(stderr, "No match found for token: /%s/\n", value); goto mount; } } } mount: if (readwrite && readonly) { fprintf(stderr, "Only one of '%s' and '%s' can be specified\n", token[RO], token[RW]); return EXIT_FAILURE; } unsigned long mountflags = 0; if (bind) { mountflags |= MS_BIND; } if (rec) { mountflags |= MS_REC; } if (remount) { mountflags |= MS_REMOUNT; } if (readonly) { mountflags |= MS_RDONLY; } if (!suid) { mountflags |= MS_NOSUID; } if (!dev) { mountflags |= MS_NODEV; } if (!exec) { mountflags |= MS_NOEXEC; } if (-1 == mkdir(entry->mnt_dir, S_IRWXU)) { errnum = errno; if (errnum != EEXIST) { perror("mkdir"); return EXIT_FAILURE; } } if (-1 == mount(0 == strcmp("none", entry->mnt_fsname) ? NULL : entry->mnt_fsname, entry->mnt_dir, entry->mnt_type, mountflags, value)) { perror("mount"); return EXIT_FAILURE; } } if (endmntent(fstab) != 1) { perror("endmntent"); return EXIT_FAILURE; } if (-1 == chdir("..")) { perror("chdir"); return EXIT_FAILURE; } if (-1 == chdir(RUNTIME_NAME)) { perror("chdir"); return EXIT_FAILURE; } if (-1 == chroot("./")) { perror("chroot"); return EXIT_FAILURE; } if (-1 == chdir("/")) { perror("chdir"); return EXIT_FAILURE; } /* Prevent future privilege gains */ if (-1 == prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { perror("prctl"); return EXIT_FAILURE; } /* Drop all privileges I might possibly have. I'm not sure I need * to do this and I probably can do this in a better way. */ static unsigned long const capability[] = { CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_KILL, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_LINUX_IMMUTABLE, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_ADMIN, CAP_NET_RAW, CAP_IPC_LOCK, CAP_IPC_OWNER, CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_SYS_CHROOT, CAP_SYS_PTRACE, CAP_SYS_PACCT, CAP_SYS_ADMIN, CAP_SYS_BOOT, CAP_SYS_NICE, CAP_SYS_RESOURCE, CAP_SYS_TIME, CAP_SYS_TTY_CONFIG, CAP_MKNOD, CAP_LEASE, CAP_AUDIT_WRITE, CAP_AUDIT_CONTROL, CAP_SETFCAP, CAP_MAC_OVERRIDE, CAP_MAC_ADMIN, CAP_SYSLOG, CAP_WAKE_ALARM }; for (size_t ii = 0u; ii < ARRAY_SIZE(capability); ++ii) { if (-1 == prctl(PR_CAPBSET_DROP, capability[ii], 0, 0, 0)) { perror("prctl"); return EXIT_FAILURE; } } /* Favor other processes over this process hierarchy. Only * superuser may lower priorities so this is not stoppable. This * also makes the process hierarchy nicer for the OOM killer. */ if (-1 == setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0) + 1)) { perror("setpriority"); return EXIT_FAILURE; } execve(SHELL, shell_arguments, shell_environment); perror("execve"); return EXIT_FAILURE; }
A hacky alternative is to use overlayfs. For example, "overlayfs lib overlayfs ro,lowerdir=/lib,upperdir=/lib".
Unfortunately, overlayfs doesn't work as an alternative for readonly recursive bind mounts.
I sent a patch to the LKML to fix this: https://lkml.org/lkml/2014/8/1/563 You might be able to workaround the problem by fetching the sources for your kernel, applying the patch, recompiling and installing the new kernel binary. You would lose whatever kernel signing RedHat might have by doing this though.
I just spotted the patch in the URL. It is functionally identical to what I sent to the list. It would have been the right fix, but it was never merged. I stumbled across this bug report in some casual Google searches after I had written that patch, so it had been written independently.
while the remount workaround does not work for recursive bind mounts it does seem to work for shared bind mounts which are actually better than recursive bind mounts in many ways.
NVM. I must have made a mistake and this doesn't seem to work for shared bind mounts either.
The remount workaround does not seem to work for a bind to a subdirectory of a mount and not a direct bind.