Created attachment 93221 [details] close.2.patch Simply calling close() is not sufficient if the recv(), or read() is blocking on another thread. That is because the recv() or read() will not be notified that the descriptor has been closed. This can only be done via shutdown(). This behaviour is different than on Solaris where simple close() is sufficient to kill recv() or read() on another thread. See attached reproducer of this problem from additional info.
Created attachment 93231 [details] close.2.reproducer.c
Returning to this ancient bug (and with apologies that I never followed up in the mail thread long ago). Here are some relevant discussions [1] https://lore.kernel.org/lkml/3B1E3D86.C7A7874@canal-plus.fr/ From: thierry.lelegard@canal-plus.fr To: linux-kernel@vger.kernel.org Subject: PROBLEM: I/O system call never returns if file desc is closed in the meantime Date: Wed, 06 Jun 2001 16:26:14 +0200 [2] https://bugzilla.kernel.org/show_bug.cgi?id=546 "Close notification in poll/select never arrives" Reported: 2003-04-07 05:38 UTC by Sampo Kellom
Created attachment 289009 [details] Reproducer using pipes (also allowing experiments with poll()/select())
The behavior you describe is not limited to sockets. The reproducer in reply 3 can be used to show the same thing happening with pipes. In this case, a blocked read manages to return data that is written to the pipe after a close fo the read end of the pipe in a parallel thread. $ ./pipe_thread_close_read - starting parent 17:22:32: Parent: sending data... - starting child Loop 1 17:22:33: Child received: This is message 1 Loop 2 17:22:36: Thread: closing FD 3 17:22:37: Parent: sending more data... - Parent about to wait on child 17:22:37: Child received: Another message Loop 3 read: Bad file descriptor - exiting child - wait completed in parent
Created attachment 289011 [details] Reproduced for writes to a pipe I created a similar reproducer for the write() system call on a pipe. $ ./pipe_thread_close_write PID of child: 45196 child started Child: about to do write 1 Parent 0: read 10000 bytes (total 10000) Parent 1: read 10000 bytes (total 20000) Parent 2: read 10000 bytes (total 30000) Parent 3: read 10000 bytes (total 40000) Child: write 1 completed Child: about to do write 2 Parent 4: read 10000 bytes (total 50000) Parent 5: read 10000 bytes (total 60000) Parent 6: read 10000 bytes (total 70000) Parent 7: read 10000 bytes (total 80000) Thread: closing FD 4 Parent 8: read 10000 bytes (total 90000) Parent 9: read 10000 bytes (total 100000) Parent 10: read 10000 bytes (total 110000) Parent 11: read 10000 bytes (total 120000) Parent 12: read 10000 bytes (total 130000) Parent 13: read 10000 bytes (total 140000) Child: write 2 completed Child: about to do write 3 write: Bad file descriptor Parent 14: read 10000 bytes (total 150000) Parent 15: read 10000 bytes (total 160000) Parent 16: read 10000 bytes (total 170000) Parent 17: read 10000 bytes (total 180000) Parent 18: read 10000 bytes (total 190000) Parent 19: read 10000 bytes (total 200000) Read returned 0 Parent about to wait() on child - wait completed in parent
The opinions from [1] and [2] on comment 2 seem to be that applications that expect some different behavior here are broken. I ten d to agree. And I don't think documenting shutdown() as a "solution" is the right thing to do. These applications are broken, and should be fixed. But the user-space programmer should at least be advised of what's going on. I added this text to the close(2) manual page: Furthermore, consider the following scenario where two threads are performing operations on the same file descriptor: 1. One thread is blocked in an I/O system call on the file descriptor. For example, it is trying to write(2) to a pipe that is already full, or trying to read(2) from a stream socket which currently has no available data. 2. Another thread closes the file descriptor. The behavior in this situation varies across systems. On some systems, when the file descriptor is closed, the blocking system call returns immediately with an error. On Linux (and possibly some other systems), the behavior is dif‐ ferent. the blocking I/O system call holds a reference to the underlying open file description, and this reference keeps the description open until the I/O system call completes. (See open(2) for a discussion of open file descriptions.) Thus, the blocking system call in the first thread may successfully complete after the close(2) in the second thread. I'll close this bug now.
Finally, I note a point for some future work. What happens if a thread is blocked in poll() or select() on a file descriptor that is closed in a parallel thread? The program in Comment 3 can be used to investigate this. From my experiments: * select() remains blocked; presumably, the select() call seems to become equivalent to: select(n, NULL, NULL, NULL, NULL); Matthias Ulrichs seems to have noted the same in a reply in the LKML discussion mentioned in Comment 2: https://lore.kernel.org/lkml/20010607080626.S21844@noris.de/ * poll() returns immediately, telling us POLLNVAL in the 'revents'. This difference is a little surprising, but are perhaps explicable in terms of the differences in the two APIs.