Bug 53781

Summary: [PATCH] close.2: Mention a need of shutdown before closing socket
Product: Documentation Reporter: Peter Schiffer (pschiffe)
Component: man-pagesAssignee: documentation_man-pages (documentation_man-pages)
Status: RESOLVED CODE_FIX    
Severity: normal CC: lczerner, mtk.manpages
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://thread.gmane.org/gmane.linux.man/2075
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: close.2.patch
close.2.reproducer.c
Reproducer using pipes (also allowing experiments with poll()/select())
Reproduced for writes to a pipe

Description Peter Schiffer 2013-02-13 13:49:59 UTC
Created attachment 93221 [details]
close.2.patch

Simply calling close() is not sufficient if the recv(), or read() is
blocking on another thread. That is because the recv() or read() will not be
notified that the descriptor has been closed. This can only be done via
shutdown().

This behaviour is different than on Solaris where simple close()
is sufficient to kill recv() or read() on another thread.

See attached reproducer of this problem from additional info.
Comment 1 Peter Schiffer 2013-02-13 13:50:40 UTC
Created attachment 93231 [details]
close.2.reproducer.c
Comment 2 Michael Kerrisk 2020-05-08 15:19:27 UTC
Returning to this ancient bug (and with apologies that I never followed up in the mail thread long ago).

Here are some relevant discussions

[1]
https://lore.kernel.org/lkml/3B1E3D86.C7A7874@canal-plus.fr/
From: thierry.lelegard@canal-plus.fr
To: linux-kernel@vger.kernel.org
Subject: PROBLEM: I/O system call never returns if file desc is closed in the  meantime
Date: Wed, 06 Jun 2001 16:26:14 +0200

[2]
https://bugzilla.kernel.org/show_bug.cgi?id=546
"Close notification in poll/select never arrives"
Reported: 2003-04-07 05:38 UTC by Sampo Kellom
Comment 3 Michael Kerrisk 2020-05-08 15:21:32 UTC
Created attachment 289009 [details]
Reproducer using pipes (also allowing experiments with poll()/select())
Comment 4 Michael Kerrisk 2020-05-08 15:26:45 UTC
The behavior you describe is not limited to sockets. The reproducer in reply 3 can be used to show the same thing happening with pipes. In this case, a blocked read manages to return data that is written to the pipe after a close fo the read end of the pipe in a parallel thread.

$ ./pipe_thread_close_read
 - starting parent
17:22:32: Parent: sending data...
 - starting child
Loop 1
17:22:33: Child received: This is message 1
Loop 2
17:22:36: Thread: closing FD 3
17:22:37: Parent: sending more data...
 - Parent about to wait on child
17:22:37: Child received: Another message
Loop 3
read: Bad file descriptor
 - exiting child
 - wait completed in parent
Comment 5 Michael Kerrisk 2020-05-08 15:30:11 UTC
Created attachment 289011 [details]
Reproduced for writes to a pipe

I created a similar reproducer for the write() system call on a pipe.

$ ./pipe_thread_close_write
PID of child: 45196
child started
Child: about to do write 1
	Parent 0: read 10000 bytes (total 10000)
	Parent 1: read 10000 bytes (total 20000)
	Parent 2: read 10000 bytes (total 30000)
	Parent 3: read 10000 bytes (total 40000)
Child: write 1 completed
Child: about to do write 2
	Parent 4: read 10000 bytes (total 50000)
	Parent 5: read 10000 bytes (total 60000)
	Parent 6: read 10000 bytes (total 70000)
	Parent 7: read 10000 bytes (total 80000)
Thread: closing FD 4
	Parent 8: read 10000 bytes (total 90000)
	Parent 9: read 10000 bytes (total 100000)
	Parent 10: read 10000 bytes (total 110000)
	Parent 11: read 10000 bytes (total 120000)
	Parent 12: read 10000 bytes (total 130000)
	Parent 13: read 10000 bytes (total 140000)
Child: write 2 completed
Child: about to do write 3
write: Bad file descriptor
	Parent 14: read 10000 bytes (total 150000)
	Parent 15: read 10000 bytes (total 160000)
	Parent 16: read 10000 bytes (total 170000)
	Parent 17: read 10000 bytes (total 180000)
	Parent 18: read 10000 bytes (total 190000)
	Parent 19: read 10000 bytes (total 200000)
	Read returned 0
Parent about to wait() on child
 - wait completed in parent
Comment 6 Michael Kerrisk 2020-05-08 15:35:25 UTC
The opinions from [1] and [2] on comment 2 seem to be that applications that expect some different behavior here are broken. I ten d to agree. And I don't think documenting shutdown() as a "solution" is the right thing to do. These applications are broken, and should be fixed. But the user-space programmer should at least be advised of what's going on. I added this text to the close(2) manual page:

       Furthermore, consider the following scenario where two threads are
       performing operations on the same file descriptor:

       1. One thread is blocked  in  an  I/O  system  call  on  the  file
          descriptor.   For  example,  it is trying to write(2) to a pipe
          that is already full, or trying to read(2) from a stream socket
          which currently has no available data.

       2. Another thread closes the file descriptor.

       The  behavior  in  this  situation varies across systems.  On some
       systems, when the file descriptor is closed, the  blocking  system
       call returns immediately with an error.

       On  Linux  (and possibly some other systems), the behavior is dif‐
       ferent.  the blocking I/O system call holds  a  reference  to  the
       underlying  open  file  description,  and this reference keeps the
       description open  until  the  I/O  system  call  completes.   (See
       open(2)  for  a  discussion of open file descriptions.)  Thus, the
       blocking system call in the first thread may successfully complete
       after the close(2) in the second thread.

I'll close this bug now.
Comment 7 Michael Kerrisk 2020-05-08 15:43:40 UTC
Finally, I note a point for some future work. What happens if a thread is blocked in poll() or select() on a file descriptor that is closed in a parallel thread?

The program in Comment 3 can be used to investigate this. From my experiments:

* select() remains blocked; presumably, the select() call seems to 
  become equivalent to:

       select(n, NULL, NULL, NULL, NULL);

  Matthias Ulrichs seems to have noted the same in a reply in the LKML
  discussion mentioned in Comment 2:
  https://lore.kernel.org/lkml/20010607080626.S21844@noris.de/

* poll() returns immediately, telling us POLLNVAL in the 'revents'.

This difference is a little surprising, but are perhaps explicable in terms of the differences in the two APIs.