Bug 197751 - select() doesn't report EBADF for closed fd's >= 256 (or >= 64 if ptraced)
Summary: select() doesn't report EBADF for closed fd's >= 256 (or >= 64 if ptraced)
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-03 16:35 UTC by Zack Weinberg
Modified: 2017-11-03 20:02 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.13
Subsystem:
Regression: No
Bisected commit-id:


Attachments
test program (842 bytes, text/x-csrc)
2017-11-03 16:35 UTC, Zack Weinberg
Details
test program using poll() instead of select() does not show the problem (1.24 KB, text/plain)
2017-11-03 17:17 UTC, Zack Weinberg
Details

Description Zack Weinberg 2017-11-03 16:35:41 UTC
Created attachment 260497 [details]
test program

The attached test program should, according to my fairly well informed reading of POSIX, print nothing and exit successfully, and that is what it does on FreeBSD (11.1) and NetBSD (7.1).  However, when run on Linux (4.13/x86-64), it prints

    fd 256: select timed out

and exits unsuccessfully - the fact that descriptor 256 is closed does not trigger an EBADF failure, as expected.  (N.B. the program will exit upon encountering the first closed file descriptor in the range [5, FD_SETSIZE) that doesn't cause select to fail with EBADF, but if you take the `return 1;` out, you will see that _all_ fds in the range [256, FD_SETSIZE) show the bug.)

Even wackier, if the program is being ptraced _at all_ (it suffices to do "gdb ./a.out" and then "r" -- no breakpoints need to be set or anything), the range of fds that show this effect changes from [256, FD_SETSIZE) to [64, FD_SETSIZE).

Test program originally found at https://stackoverflow.com/questions/47098097 and revised by me.
Comment 1 Zack Weinberg 2017-11-03 17:17:30 UTC
Created attachment 260499 [details]
test program using poll() instead of select() does not show the problem

This additional test program indicates this is a problem specific to select(); poll() will correctly return 1 with revents set to POLLNVAL for every file descriptor from 3 up to whatever the RLIMIT_NOFILE cap is.
Comment 2 Zack Weinberg 2017-11-03 20:00:19 UTC
A Stack Overflow commenter pointed out that the 256 .vs. 64 thing isn't about ptrace at all; rather, an interactive 'bash' process holds file descriptor 255 open on the terminal, therefore the kernel allocates space for 256 file descriptors in its directly forked children.  But if one of those children calls execve, and then the new process image calls fork without opening any high-numbered fd's (which is what happens in either the strace or gdb scenario), the grandchild will only get the minimum 64-entry fd table.  The threshold for the bug is the size of the currently-allocated fd table.

To demonstrate that this is indeed the issue, let the compiled first test program be 'a.out' and then run these perl one-liners:

# the threshold drops to 64 with any intermediate process; ptrace isn't involved
$ perl -le 'if (fork() == 0) { exec "./a.out" } else { wait; exit $?>>8 }'; echo $?
fd 64: select timed out
1

# if the intermediate process opens a sufficiently high-numbered fd, 
# the bug doesn't manifest
$ perl -le 'use POSIX "dup2"; dup2(0, 1023); if (fork() == 0) { exec "./a.out" } else { wait; exit $?>>8 }'; echo $?
0
Comment 3 Zack Weinberg 2017-11-03 20:02:02 UTC
(Additional important detail: the fd 255 held open by bash is in close-on-exec mode.)

Note You need to log in before you can comment on or make changes to this bug.