Thanks to the addition of seccomp_addfd, now it is possible to emulate a vast number of system calls to achieve a TOCTOU-free sandbox in userspace. There're however three exceptions to this: 1. exec family calls cannot be emulated so a sandbox disallowing exec calls has no choice but to continue the exec call in sandbox process allowing TOCTOU. 2. chdir family calls cannot be emulated so a sandbox disallowing chdir calls to hide paths has no choice but to continue the chdir call in sandbox process allowing TOCTOU. 3. open calls with the O_PATH flag cannot be emulated (addfd returns EBADF on o_path fds) again a sandbox disallowing open calls with O_PATH flag to hide paths has no choice but to continue the open call in sandbox process allowing TOCTOU. It'd be awesome for the kernel to provide TOCTOU-free ways to sandbox these three cases.
For a bit of context, I am the author of syd, a seccomp and landlock based application sandbox with support for namespaces, you can read here about why this feature request is relevant and more: http://man.exherbolinux.org/syd.7.html
To quote the relevant bit from the manual page: BUGS In the operation of syd, certain system calls are not fully emulated due to seccomp(2) limitations, resulting in the sandbox process continuing these calls directly. These include execve(2), execveat(2) for execution, chdir(2), fchdir(2) for directory changes, and open(2) operations with O_PATH flag. Consequently, this behavior exposes vulnerabilities to time-of-check to time-of-use attacks, allowing for the circumvention of Exec Sandboxing to execute denylisted paths, the bypass of Stat Sandboxing for unauthorized directory access without disclosing directory contents (owing to getdents(2) call emulation), and the detection of hidden files without revealing file metadata, as stat(2) calls are emulated.
I figured the exec-TOCTOU can be effectively mitigated using a ptracer utilizing PTRACE_EVENT_EXEC such that the new /proc/pid/exe magic symbolic link is checked against the sandbox and process is killed if the new path is denylisted. This adds the ptrace dependency to the sandbox though and it would be nice to have a seccomp-only API to achieve this. In my case this is not a problem because the privileges required by PTRACE_SEIZE and pidfd_getfd are equivalent.
It should be noted that continuing the system call with seccomp and having a ptracer do post-validation gives one the added chance to deny execve{,at} system calls for the sandbox process itself with a seccomp-bpf filter. Arguably this can be interpreted as a hardening against arbitrary code execution inside a compromised sandbox process. syd implements this as of version 3.17.0, see here if you're interested in further details: http://man.exherbolinux.org/syd.7.html#Enhanced_Execution_Control_(EEC)
> 3. open calls with the O_PATH flag cannot be emulated (addfd returns EBADF on > o_path fds) again a sandbox disallowing open calls with O_PATH flag to hide > paths has no choice but to continue the open call in sandbox process allowing > TOCTOU. Again, this may be mitigated by "promoting" O_PATH file descriptors into O_RDONLY and using seccomp_addfd as usual. Imho, it would be nice if O_PATH is treated like O_CLOEXEC on seccomp_addfd (a flag to enable O_PATH like O_CLOEXEC?).
I have written an article about these three issues, possible mitigations and solutions: https://git.sr.ht/~alip/syd/tree/main/item/doc/toctou-or-gtfo.md