Created attachment 190501 [details] Test program illustrating the problem The Linux behaviour in the current scenario is incorrect: 1. ThreadA opens, binds, listens and accepts on a socket, waiting for connections. 2. Some time later ThreadB calls shutdown on the socket ThreadA is waiting in accept on. Here is what happens: On Linux, the shutdown call in ThreadB succeeds and the accept call in ThreadA returns with EINVAL. On Solaris, the shutdown call in ThreadB fails and returns ENOTCONN. ThreadA continues to wait in accept. Relevant POSIX manpages: http://pubs.opengroup.org/onlinepubs/9699919799/functions/accept.html http://pubs.opengroup.org/onlinepubs/9699919799/functions/shutdown.html The POSIX shutdown manpage says: "The shutdown() function shall cause all or part of a full-duplex connection on the socket associated with the file descriptor socket to be shut down." ... "[ENOTCONN] The socket is not connected." Page 229 & 303 of "UNIX System V Network Programming" say: "shutdown can only be called on sockets that have been previously connected" "The socket [passed to accept that] fd refers to does not participate in the connection. It remains available to receive further connect indications" That is pretty clear, sockets being waited on with accept are not connected by definition. Nor is it the accept socket connected when a client connects to it, it is the socket returned by accept that is connected to the client. Therefore the Solaris behaviour of failing the shutdown call is correct. In order to get the required behaviour of ThreadB causing ThreadA to exit the accept call with an error, the correct way is for ThreadB to call close on the socket that ThreadA is waiting on in accept. On Solaris, calling close in ThreadB succeeds, and the accept call in ThreadA fails and returns EBADF. On Linux, calling close in ThreadB succeeds but ThreadA continues to wait in accept until there is an incoming connection. That accept returns successfully. However subsequent accept calls on the same socket return EBADF. The Linux behaviour is fundamentally broken in three places: 1. Allowing shutdown to succeed on an unconnected socket is incorrect. 2. Returning a successful accept on a closed file descriptor is incorrect, especially as future accept calls on the same socket fail. 3. Once shutdown has been called on the socket, calling close on the socket fails with EBADF. That is incorrect, shutdown should just prevent further IO on the socket, it should not close it.
See also https://issues.apache.org/jira/browse/HADOOP-12487
If the listen(3) socket is checked with poll(3) before accept(3) then the behaviour is even more bizarre - poll(3) returns immediately with (POLLOUT | POLLWRBAND) set but any attempt to write(2)
If the listen()ing socket is checked with poll() before accept() then the behaviour is even more bizarre - poll() returns immediately with (POLLOUT|POLLWRBAND) set even when nothing is trying to connect to the listening socket. Any attempt to write() fails with ENOTCONN despite poll() just having said the socket is available for output. An accept() immediately after the poll() returns then waits until there's an actual incoming connection, even though poll() just said the socket was ready. On Solaris the poll() waits until there is an incoming connection on the socket and an accept() on the socket returns immediately with the new connection.
Created attachment 190521 [details] Extended test case with poll()
I think Linux's behavior is more useful than Solaris' here. With Solaris' behavior, there is no way to break out of a blocking accept() other than closing the socket. As a real-world example, Hadoop uses shutdown() on sockets which are calling accept() to cause those accept()s to terminate. It would not be possible to use close() in this scenario, since there would be a race condition where the socket FD number could be reused by a newly opened FD between the call to close() and the call to accept(). I would also argue that POSIX doesn't forbid Linux's behavior, and that changing it now would be a compatibility problem.
In the case of Hadoop, if you mare closing the accept() socket why do you care if it gets reused anywhere else? The Hadoop code in question invalidates the object containing the filehandle so the filehandle won't be used for any future IO anyway. The POSIX spec says that shutdown() should return ENOTCONN if the socket is not connected and sockets in accept() aren't ever connected, so POSIX does rule out the Linux behaviour. Having said that, I do agree that changing the Linux behaviour would most likely break existing software. In which case documenting the behaviour in the manpage may be the only option. Having said all that, the Linux behaviour of close() and poll() on listen()ing sockets also seems incorrect. Yes, calling close() on a socket that's in accept() may well expose MT applications to potential races but that's not a justification for the current close() behaviour, where the accept() continues after the close().