Bug 205727

Summary: creat() fails (EACCES) on non-root owned file when sticky bit set on dir
Product: IO/Storage Reporter: Trevor Cordes (kernelbugs)
Component: DIOAssignee: Andrew Morton (akpm)
Status: RESOLVED DOCUMENTED    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.3.12 Subsystem:
Regression: No Bisected commit-id:

Description Trevor Cordes 2019-12-01 08:52:37 UTC
Just upgraded to F30 with kernel 5.3.12.  Hit a weird error that didn't happen in the previous kernel/OS (5.3.11 F29):

If my dir has sticky bit set, in this case /tmp, then root can't use tcsh's ">!" redirection on a file in that dir owned by another user (permission denied).  F29 ">!" allowed it just fine (kernel 5.3.11).  I checked and tcsh versions   didn't change, except just the bump to F30.

">!" in tcsh is the same as > in bash.  tcsh lets you set an option to >! require explicit instruction to clobber existing files so you don't accidentally > an existing file without realizing it.  That's the extra ! after the >.

It's funny, because root can still append to the file (>>), rm it, chmod, etc., just not this explicit clobber.

Funnier still is it works in bash (using >) but the OS is still giving it the permission denied.  (See below.)

It looks like tcsh is using creat() and that is the syscall returning the error.

selinux is disabled.  If the sticky bit is not set on the parent dir, then things work as expected.

as root:

tcsh
sudo -u trevor touch /tmp/t
echo foo >> /tmp/t	# ok
echo foo >! /tmp/t	# Permission denied.

strace on 5.3.12 F30 (error):

dup(19)                                 = 0
fcntl(0, F_SETFD, 0)                    = 0
dup2(17, 1)                             = 1
dup2(18, 2)                             = 2
creat("t", 0666)                        = -1 EACCES (Permission denied)
write(18, "t: Permission denied.\n", 22l: Permission denied.
) = 22

strace on 5.3.11 F29 (ok):

dup(19)                                 = 0
fcntl(0, F_SETFD, 0)                    = 0
dup2(17, 1)                             = 1
dup2(18, 2)                             = 2
creat("t", 0666)                        = 3
fcntl(3, F_GETFL)                       = 0x8001 (flags O_WRONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_WRONLY|O_LARGEFILE) = 0
...

strace with bash on 5.3.12 F30 (no error reported by bash! note the double try!)
(command for bash is just:  echo foo > /tmp/t)

openat(AT_FDCWD, "/tmp/t", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/tmp/t", O_WRONLY|O_TRUNC) = 3
fcntl(1, F_GETFD)                       = 0
fcntl(1, F_DUPFD, 10)                   = 10
...

strace with bash on 5.3.11 F29 (nor error returned)
openat(AT_FDCWD, "/tmp/t", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_GETFD)                       = 0
fcntl(1, F_DUPFD, 10)                   = 10

Note, F30 has bash 5.0.7, F29 4.4.23, so unlike tcsh, bash logic for all of this may have changed.

So bash uses openat instead of creat, but still gets EACCESS in 5.3.12, but then retries with slightly different options... almost like bash hit this same bug but has already worked around it.

I checked and the version of tcsh did not change between F29 and F30, so I'm guessing this is a kernel change/bug?  I've been using this >! paradigm for 20+ years and it's always worked in this instance before.  F30 is the first one to break it.

P.S. not sure what bz component to select, so modify or notify me as appropriate.  Thanks!
Comment 1 Trevor Cordes 2019-12-29 00:59:18 UTC
I've simplified the problem somewhat with a sample c program.

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
  int result;
  result=creat("t",0666);
  printf("result=%d\n",result);
}


Then these very simple tests show the difference in behaviour of the one system call (creat).  Run as root, replace the chown user with a non-root user that exists on your system.  /tmp must have stickybit.

#cd /tmp/ ; rm -f t ; touch t ; ./a.out ; chown trevor: t ; ./a.out

F29 / 5.2.11 (good)
result=3
result=3

F30 / 5.3.14 (broken)
result=3
result=-1

Not sure, but in my opinion the semantics of how creat() works should not change between 5.2 and 5.3.  Again, this only occurs on stickybit dirs like /tmp.  In normal dirs the results are always 3 no matter the kernel version.  Unless there's some weird cap thing that's been added in.  (Again, I have selinux disabled.)
Comment 2 Trevor Cordes 2019-12-31 05:45:58 UTC
Al Viro figured it out; per Al:

It is Fedora, all right, but not the kernel.  The idiocy in question
is controlled by /proc/sys/fs/protected_regular (gotta love the approach
to naming, BTW).  Write 0 to it and you'll get the normal behaviour
back.

Setting sits in /usr/lib/sysctl.d/50-default.conf and AFAICS that
comes from systemd 241 and later.  After checking their git tree the
following shows up:

commit 2732587540035227fe59e4b64b60127352611b35
Author: Lucas Werkmeister <mail@lucaswerkmeister.de>
Date:   Wed Jan 16 00:16:10 2019 +0100

    Enable regular file and FIFO protection
    
    These sysctls were added in Linux 4.19 (torvalds/linux@30aba6656f), and
    we should enable them just like we enable the older hardlink/symlink
    protection since v199. Implements #11414.

diff --git a/NEWS b/NEWS
index ee926a1203..c64ef5871b 100644
--- a/NEWS
+++ b/NEWS
@@ -29,6 +29,19 @@ CHANGES WITH 241 in spe:
           -Db_pie=true option to meson to build position-independent
           executables. Note that the meson option is supported since meson-0.49.
 
+        * The fs.protected_regular and fs.protected_fifos sysctls, which were
+          added in Linux 4.19 to make some data spoofing attacks harder, are
+          now enabled by default. While this will hopefully improve the
+          security of most installations, it is technically a backwards
+          incompatible change; to disable these sysctls again, place the
+          following lines in /etc/sysctl.d/60-protected.conf or a similar file:
+
+              fs.protected_regular = 0
+              fs.protected_fifos = 0
+
+          Note that the similar hardlink and symlink protection has been
+          enabled since v199, and may be disabled likewise.
+
 CHANGES WITH 240:
 
         * NoNewPrivileges=yes has been set for all long-running services
diff --git a/sysctl.d/50-default.conf b/sysctl.d/50-default.conf
index b0645f33e7..27084f6242 100644
--- a/sysctl.d/50-default.conf
+++ b/sysctl.d/50-default.conf
@@ -36,3 +36,7 @@ net.core.default_qdisc = fq_codel
 # Enable hard and soft link protection
 fs.protected_hardlinks = 1
 fs.protected_symlinks = 1
+
+# Enable regular file and FIFO protection
+fs.protected_regular = 1
+fs.protected_fifos = 1