Bug 9149 - accept() doesn't wake with error when socket descriptor closed
Summary: accept() doesn't wake with error when socket descriptor closed
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-12 07:42 UTC by Marek Kielar
Modified: 2007-10-29 22:55 UTC (History)
0 users

See Also:
Kernel Version: 2.6.18
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
demo program (1.48 KB, text/x-csrc)
2007-10-12 17:03 UTC, Stephen Hemminger
Details

Description Marek Kielar 2007-10-12 07:42:05 UTC
Most recent kernel where this bug did not occur: no idea
Distribution: no idea - remote server
Hardware Environment: i686
Software Environment: no idea - probably pure console server

Problem Description:
In multithreaded process, one thread launches accept() on a valid-so-far listening socket file descriptor sockfd and waits on it. After this second thread launches close( sockfd ). First thread further waits on accept() even though the descriptor is now invalid. accept() should wake up and return with -1 and errno EBADF.

Steps to reproduce:
described above
Comment 1 Anonymous Emailer 2007-10-12 10:06:54 UTC
Reply-To: akpm@linux-foundation.org

On Fri, 12 Oct 2007 07:42:06 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9149
> 
>            Summary: accept() doesn't wake with error when socket descriptor
>                     closed
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.18
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: mkielar@go2.pl
> 
> 
> Most recent kernel where this bug did not occur: no idea
> Distribution: no idea - remote server
> Hardware Environment: i686
> Software Environment: no idea - probably pure console server
> 
> Problem Description:
> In multithreaded process, one thread launches accept() on a valid-so-far
> listening socket file descriptor sockfd and waits on it. After this second
> thread launches close( sockfd ). First thread further waits on accept() even
> though the descriptor is now invalid. accept() should wake up and return with
> -1 and errno EBADF.
> 

I have a feeling this is an FAQ, but I forget what the answer is?
Comment 2 Marek Kielar 2007-10-12 12:43:28 UTC
Searching for it, there was no trace of a report including "accept", so I assumed there was no such report at all.
Comment 3 Stephen Hemminger 2007-10-12 17:03:23 UTC
Created attachment 13140 [details]
demo program
Comment 4 Stephen Hemminger 2007-10-25 13:42:25 UTC
The answer is that although all threads share the same file table and socket,
in linux the file handle is reference counted. Therefore when a thread calls
close(), it doesn't have any effect until the last thread sharing the same
file calls close.

Therefore the other threads waiting on accept() will work normally, and
can even accept new connections!
Comment 5 Marek Kielar 2007-10-25 19:31:01 UTC
1. Isn't it true that the reference count of the file descriptor IS NOT affected when creating new thread sharing (not copied like in a process, but using exactly the same memory space) the same file descriptor table?

2. If it is so, that threads B,C,D,etc. can do whatever they want while thread A calls close() on a socket file descriptor and that this call affects only thread A, then why threads B,C,D,etc. cannot accept() connections after thread A calls close() on this socket?

Important note here on how it actually works now:
     a) if thread B calls accept() and afterwards thread A calls close() on the same socket file descriptor, thread B still waits for incoming connections

     and AT THE SAME TIME (!):
     b) if thread A calls close() and afterwards thread B calls accept() it immediately returns with an errno EBADF

This means that if thread B is in a connection-accepting-loop it will accept ONLY ONE connection after thread A calls close() - i.e. it will accept connections only with the calls to accept() which started prior to the call to close() and any subsequent calls to accept() will be denied and will return with EBADF. This clearly opposes to the statement that "the other threads waiting on accept() will work normally" because they will not, they will accept() only one connection and none subsequent. Because of this inconsistency I reopen the report. Read on for more explanation.

As for me this situation looks like this:
 ---- even though multiple threads work with the socket file descriptor its reference count is still 1
 ---- accept() checks for validity (process's - not thread's - reference count > 0) of the file descriptor at the startup and then waits
 ---- when a call to close() is made, accept() SHOULD BE AWAKENED and check the validity once again, and if the file descriptor would still be valid wait furthermore (it would be invalid if close() works out alright, though)

but instead accept() does not wake at close() and check the validity, it wakes only on an incoming connection, which is buggy, because one (the system as well) could reopen (even unintentionally) the same file descriptor to get a different connection and in such a situation the aforementioned accept() will erroneously accept new file descriptor's incoming connections as if they were ment for him. Both programs might then work erroneously (first one could interpret it as if the other side is making mistakes in the protocol and the second one would not get all the connections or even think that the other side is making mistakes in the protocol, too - as an aftermath, the other side might be told to resend the "erroneous" request which would be again interpreted as erroneous and would end  up in an endless repeat-bad-request-loop).
Comment 6 Stephen Hemminger 2007-10-29 22:55:53 UTC
Comment by Al Viro:

 Threads in his case _share_ descriptor
table and the thing he's complaining about is that another thread has
removed the descriptor from their (shared) descriptor table and he's
not getting notified.  It's not about struct file (or socket) at all;
it's all on descriptor level.

The reference to struct file is held by accept() itself, _not_ by descriptor
table.  And he would have the same problem if the opened socket had been
inherited from parent (and still opened by it) - it's really not about
the damn thing getting shut down, etc.

What happens is that there is a mapping from descriptors to opened files,
a reference to opened file is obtained by it once per syscall and that file
remains open at least until the end of syscall.   Whether the descriptor
you've passed remains refering to the same file is up to userland code.
If you have another thread and that thread rips the descriptor out of your
shared descriptor table, it's your responsibility to keep them sane and
happy.

close() from another thread is not a way to abort blocked accept().  Never
promised to be that.  Just as close() from another thread is not a way to
abort blocked write() or read() or sendmsg() or...

Note You need to log in before you can comment on or make changes to this bug.