Created attachment 260497 [details] test program The attached test program should, according to my fairly well informed reading of POSIX, print nothing and exit successfully, and that is what it does on FreeBSD (11.1) and NetBSD (7.1). However, when run on Linux (4.13/x86-64), it prints fd 256: select timed out and exits unsuccessfully - the fact that descriptor 256 is closed does not trigger an EBADF failure, as expected. (N.B. the program will exit upon encountering the first closed file descriptor in the range [5, FD_SETSIZE) that doesn't cause select to fail with EBADF, but if you take the `return 1;` out, you will see that _all_ fds in the range [256, FD_SETSIZE) show the bug.) Even wackier, if the program is being ptraced _at all_ (it suffices to do "gdb ./a.out" and then "r" -- no breakpoints need to be set or anything), the range of fds that show this effect changes from [256, FD_SETSIZE) to [64, FD_SETSIZE). Test program originally found at https://stackoverflow.com/questions/47098097 and revised by me.
Created attachment 260499 [details] test program using poll() instead of select() does not show the problem This additional test program indicates this is a problem specific to select(); poll() will correctly return 1 with revents set to POLLNVAL for every file descriptor from 3 up to whatever the RLIMIT_NOFILE cap is.
A Stack Overflow commenter pointed out that the 256 .vs. 64 thing isn't about ptrace at all; rather, an interactive 'bash' process holds file descriptor 255 open on the terminal, therefore the kernel allocates space for 256 file descriptors in its directly forked children. But if one of those children calls execve, and then the new process image calls fork without opening any high-numbered fd's (which is what happens in either the strace or gdb scenario), the grandchild will only get the minimum 64-entry fd table. The threshold for the bug is the size of the currently-allocated fd table. To demonstrate that this is indeed the issue, let the compiled first test program be 'a.out' and then run these perl one-liners: # the threshold drops to 64 with any intermediate process; ptrace isn't involved $ perl -le 'if (fork() == 0) { exec "./a.out" } else { wait; exit $?>>8 }'; echo $? fd 64: select timed out 1 # if the intermediate process opens a sufficiently high-numbered fd, # the bug doesn't manifest $ perl -le 'use POSIX "dup2"; dup2(0, 1023); if (fork() == 0) { exec "./a.out" } else { wait; exit $?>>8 }'; echo $? 0
(Additional important detail: the fd 255 held open by bash is in close-on-exec mode.)