Bug 8013 - select for write hangs on a socket after write returned ECONNRESET
Summary: select for write hangs on a socket after write returned ECONNRESET
Status: REJECTED INVALID
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-14 19:32 UTC by Andrew Dixie
Modified: 2007-02-22 10:23 UTC (History)
0 users

See Also:
Kernel Version: 2.6.16
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Sample program (3.23 KB, text/plain)
2007-02-14 19:33 UTC, Andrew Dixie
Details

Description Andrew Dixie 2007-02-14 19:32:51 UTC
Distribution: Debian
Also reproduced on: 2.4 based Redhat.

Hardware Environment:
i686/Xeon
Problem Description:

If you write() to a disconnected socket, write returns ECONNRESET.
If you then select() on that socket, checking for write, the select never 
returns.

For example from strace:
write(4, "fred", 4)                     = 4
...
write(4, "fred", 4)                     = -1 ECONNRESET (Connection reset by 
peer)
select(5, NULL, [4], NULL, NULL ... hung in select

The select documentation says "those in writefds will be watched to see if a 
write will not block".
A write on this socket will not block, therefore select should return 
immediately.

When the program is run on Solaris, AIX and HPUX, the select returns 
immediately.
Comment 1 Andrew Dixie 2007-02-14 19:33:46 UTC
Created attachment 10424 [details]
Sample program
Comment 2 Stephen Hemminger 2007-02-22 10:23:36 UTC
After much discussion on netdev mailing list, it was concluded that the
existing behaviour is correct.

See message from David Miller:

Oh is that the problem?  Someone sees a fatal connection error from
write() then attempts to poll() the socket?

That is illegal.

Socket is dead, you cannot do anything reasonable with it and you know
the socket is errored so there is nothing you can possibly try to
poll() on it for.

One should close() the file descriptor at this point.  Even
getpeername() cannot work at this point, since socket is closed and
has lost identity.

Socket errors are delivered as unique events, once error is delivered
the socket is not in error state any more, it is instead closed.
That's why we clear sk->sk_err after error delivery.

BTW, there was a query about this back in Feb. 2006 on linux-kernel,
nobody replied, he reposted to linux-net in September 2006 and this
is likely where this kernel bugzilla comes from :-)

This is not a kernel bug, let's close this and move on.

Note You need to log in before you can comment on or make changes to this bug.