In 2.6.35 and earlier, shutdown(2) will fully remove a socket. This does not appear to be true any more and is causing software to misbehave. 2.6.35: $ ./testcase parent: 5957 before: tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN after: child: 5961 $ ./testcase parent: 6001 before: tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN after: child: 6002 2.6.38: $ ./testcase parent: 1138 before: tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN after: child: 1142 $ ./testcase bind: Address already in use The listener doesn't show up in netstat any more, but as long as the child process is running, the socket is unavailable. It is as if the shutdown(2) behavior has partially reverted to close(2) behavior (but in the case of using close(2), the child's socket would remain visible in netstat).
Created attachment 53732 [details] test case showing the behavioral change
Forwarded from https://launchpad.net/bugs/731878
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 6 Apr 2011 22:42:38 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=32832 There is a tescase attached to this bugzilla report. > Summary: shutdown(2) does not fully shut down socket any more > Product: Networking > Version: 2.5 > Kernel Version: 2.6.38 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: kees@outflux.net > Regression: Yes > > > In 2.6.35 and earlier, shutdown(2) will fully remove a socket. This does not > appear to be true any more and is causing software to misbehave. > > 2.6.35: > $ ./testcase > parent: 5957 > before: > tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN > after: > child: 5961 > $ ./testcase > parent: 6001 > before: > tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN > after: > child: 6002 > > 2.6.38: > $ ./testcase > parent: 1138 > before: > tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN > after: > child: 1142 > $ ./testcase > bind: Address already in use > > The listener doesn't show up in netstat any more, but as long as the child > process is running, the socket is unavailable. It is as if the shutdown(2) > behavior has partially reverted to close(2) behavior (but in the case of > using > close(2), the child's socket would remain visible in netstat). >
From: Andrew Morton <akpm@linux-foundation.org> Date: Tue, 12 Apr 2011 16:15:56 -0700 > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). Stephen Hemminger forwarded this to the list last week, and Eric Dumazet is actively working on a fix.
On Tue, 12 Apr 2011 16:17:44 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Tue, 12 Apr 2011 16:15:56 -0700 > > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > Stephen Hemminger forwarded this to the list last week, and Eric > Dumazet is actively working on a fix. OK. Please don't forward bugzilla reports! Instead do a reply-to-all and add the cc's. That way, bugzilla will capture the email discussion and you won't get akpmspammed. Sigh, kernel bugzilla is a PITA. Some people do seem to like and expect it though.
Le mardi 12 avril 2011 à 16:17 -0700, David Miller a écrit : > From: Andrew Morton <akpm@linux-foundation.org> > Date: Tue, 12 Apr 2011 16:15:56 -0700 > > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > Stephen Hemminger forwarded this to the list last week, and Eric > Dumazet is actively working on a fix. I worked on it this week end to discover FreeBSD 8.1 would not allow several CLOSE sockets bound to same port even with REUSEADDR. So haproxy claim is a bit wrong (its trick doesnt work on FreeBSD), and used an undocumented linux feature. I feel this case is a call for SO_REUSEPORT, eventually. http://www.unixguide.net/network/socketfaq/4.11.shtml SO_REUSEADDR allows your server to bind to an address which is in a TIME_WAIT state. It does not allow more than one server to bind to the same address. It was mentioned that use of this flag can create a security risk because another server can bind to a the same port, by binding to a specific address as opposed to INADDR_ANY. The SO_REUSEPORT flag allows multiple processes to bind to the same address provided all of them use the SO_REUSEPORT option. Since SO_REUSEPORT is not a 'stable fix', I suggest we revert the patch, and eventually work on SO_REUSEPORT on net-next-2.6 What do you think ?
Le mercredi 13 avril 2011 à 04:55 +0200, Eric Dumazet a écrit : > Since SO_REUSEPORT is not a 'stable fix', I suggest we revert the patch, > and eventually work on SO_REUSEPORT on net-next-2.6 > > What do you think ? > Sorry, I should have mentioned commit id : c191a836a908d1dd6 (tcp: disallow bind() to reuse addr/port)
Reply-To: cyril.bonte@free.fr Le mercredi 13 avril 2011 04:55:27, Eric Dumazet a écrit : > I worked on it this week end to discover FreeBSD 8.1 would not allow > several CLOSE sockets bound to same port even with REUSEADDR. Just to complete the information, yes it does, but only after a shutdown() call. And this is the use case of haproxy, amavisd (quoted in the bugzilla bug report), and others. > So haproxy claim is a bit wrong (its trick doesnt work on FreeBSD), and > used an undocumented linux feature. Both test cases (the one I provided to explain the haproxy issue and the one provided by Kees) are not about binding 2 sockets at the same time but binding a new socket after the first one has been shutdown. Sadly this also looks undocumented on FreeBSD (only saw a reference on it in a code comment). > Since SO_REUSEPORT is not a 'stable fix', I suggest we revert the patch, > and eventually work on SO_REUSEPORT on net-next-2.6 > > What do you think ? Agree. Many thanks for the time you already spent on that.
Le mercredi 13 avril 2011 à 09:06 +0200, Cyril Bonté a écrit : > Le mercredi 13 avril 2011 04:55:27, Eric Dumazet a écrit : > > I worked on it this week end to discover FreeBSD 8.1 would not allow > > several CLOSE sockets bound to same port even with REUSEADDR. > > Just to complete the information, yes it does, but only after a shutdown() > call. And this is the use case of haproxy, amavisd (quoted in the bugzilla > bug > report), and others. Yes, but after a shutdown(), FreeBSD doesnt allow a reuse of the socket. listen() is not available anymore. Its a bit like an unbind, or a full close().
Reply-To: dbaluta@ixiacom.com Hi Eric, Cyril, >> Since SO_REUSEPORT is not a 'stable fix', I suggest we revert the patch, >> and eventually work on SO_REUSEPORT on net-next-2.6 >> >> What do you think ? >> > > Sorry, I should have mentioned commit id : c191a836a908d1dd6 > (tcp: disallow bind() to reuse addr/port) Cyril's use case looks suspect. I don't think that this is a good reason for reverting this commit. I would love to help on adding SO_REUSEPORT option in kernel. thanks, Daniel.
From: Daniel Baluta <dbaluta@ixiacom.com> Date: Wed, 13 Apr 2011 14:57:18 +0300 > Cyril's use case looks suspect. I don't think that this is a good > reason for reverting this commit. I complete disagree. Something that worked perfectly fine, probably for years, we broke. We simply cannot do that, especially since we do not have a reasonable alternative at this time. Adding SO_REUSEPORT is a long range option, and not something that will provide a fix for users right now. So please don't even pretend to suggest that we shouldn't fix this with a revert unless a simple, obvious, kernel fix presents itself.
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 13 Apr 2011 05:00:08 +0200 > Le mercredi 13 avril 2011 à 04:55 +0200, Eric Dumazet a écrit : > >> Since SO_REUSEPORT is not a 'stable fix', I suggest we revert the patch, >> and eventually work on SO_REUSEPORT on net-next-2.6 >> >> What do you think ? >> > > Sorry, I should have mentioned commit id : c191a836a908d1dd6 > (tcp: disallow bind() to reuse addr/port) I'm commiting the revert as follows to net-2.6, and will queue it up for -stable as well: -------------------- Revert "tcp: disallow bind() to reuse addr/port" This reverts commit c191a836a908d1dd6b40c503741f91b914de3348. It causes known regressions for programs that expect to be able to use SO_REUSEADDR to shutdown a socket, then successfully rebind another socket to the same ID. Programs such as haproxy and amavisd expect this to work. This should fix kernel bugzilla 32832. Signed-off-by: David S. Miller <davem@davemloft.net> --- net/ipv4/inet_connection_sock.c | 5 ++--- net/ipv6/inet6_connection_sock.c | 2 +- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 6c0b7f4..38f23e7 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -73,7 +73,7 @@ int inet_csk_bind_conflict(const struct sock *sk, !sk2->sk_bound_dev_if || sk->sk_bound_dev_if == sk2->sk_bound_dev_if)) { if (!reuse || !sk2->sk_reuse || - ((1 << sk2->sk_state) & (TCPF_LISTEN | TCPF_CLOSE))) { + sk2->sk_state == TCP_LISTEN) { const __be32 sk2_rcv_saddr = sk_rcv_saddr(sk2); if (!sk2_rcv_saddr || !sk_rcv_saddr(sk) || sk2_rcv_saddr == sk_rcv_saddr(sk)) @@ -122,8 +122,7 @@ again: (tb->num_owners < smallest_size || smallest_size == -1)) { smallest_size = tb->num_owners; smallest_rover = rover; - if (atomic_read(&hashinfo->bsockets) > (high - low) + 1 && - !inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) { + if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) { spin_unlock(&head->lock); snum = smallest_rover; goto have_snum; diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 1660546..f2c5b0f 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -44,7 +44,7 @@ int inet6_csk_bind_conflict(const struct sock *sk, !sk2->sk_bound_dev_if || sk->sk_bound_dev_if == sk2->sk_bound_dev_if) && (!sk->sk_reuse || !sk2->sk_reuse || - ((1 << sk2->sk_state) & (TCPF_LISTEN | TCPF_CLOSE))) && + sk2->sk_state == TCP_LISTEN) && ipv6_rcv_saddr_equal(sk, sk2)) break; }
Reply-To: shemminger@vyatta.com On Wed, 13 Apr 2011 10:43:17 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Daniel Baluta <dbaluta@ixiacom.com> > Date: Wed, 13 Apr 2011 14:57:18 +0300 > > > Cyril's use case looks suspect. I don't think that this is a good > > reason for reverting this commit. > > I complete disagree. > > Something that worked perfectly fine, probably for years, we broke. > > We simply cannot do that, especially since we do not have a reasonable > alternative at this time. > > Adding SO_REUSEPORT is a long range option, and not something that > will provide a fix for users right now. > > So please don't even pretend to suggest that we shouldn't fix this > with a revert unless a simple, obvious, kernel fix presents itself. Just to echo what Dave said. Even though the semantics of this is not documented in some standard, applications have been built on Linux expecting a certain behavior. If you want to change what happens in this case, you have to have a really good reason (like crash, security hole, or standards violation).
Le mercredi 13 avril 2011 à 12:09 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 13 Apr 2011 05:00:08 +0200 > > > Le mercredi 13 avril 2011 à 04:55 +0200, Eric Dumazet a écrit : > > > >> Since SO_REUSEPORT is not a 'stable fix', I suggest we revert the patch, > >> and eventually work on SO_REUSEPORT on net-next-2.6 > >> > >> What do you think ? > >> > > > > Sorry, I should have mentioned commit id : c191a836a908d1dd6 > > (tcp: disallow bind() to reuse addr/port) > > I'm commiting the revert as follows to net-2.6, and will queue > it up for -stable as well: > > -------------------- > Revert "tcp: disallow bind() to reuse addr/port" > > This reverts commit c191a836a908d1dd6b40c503741f91b914de3348. > > It causes known regressions for programs that expect to be able to use > SO_REUSEADDR to shutdown a socket, then successfully rebind another > socket to the same ID. > > Programs such as haproxy and amavisd expect this to work. > > This should fix kernel bugzilla 32832. > > Signed-off-by: David S. Miller <davem@davemloft.net> > --- Thanks David, I was planning to submit this revert this morning, you beat me ;)
On Wed, Apr 13, 2011 at 04:55:27AM +0200, Eric Dumazet wrote: > Le mardi 12 avril 2011 à 16:17 -0700, David Miller a écrit : > > From: Andrew Morton <akpm@linux-foundation.org> > > Date: Tue, 12 Apr 2011 16:15:56 -0700 > > > > > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > > bugzilla web interface). > > > > Stephen Hemminger forwarded this to the list last week, and Eric > > Dumazet is actively working on a fix. > > I worked on it this week end to discover FreeBSD 8.1 would not allow > several CLOSE sockets bound to same port even with REUSEADDR. > > So haproxy claim is a bit wrong (its trick doesnt work on FreeBSD), and > used an undocumented linux feature. > > I feel this case is a call for SO_REUSEPORT, eventually. > > http://www.unixguide.net/network/socketfaq/4.11.shtml > > SO_REUSEADDR allows your server to bind to an address which is in a > TIME_WAIT state. It does not allow more than one server to bind to > the same address. It was mentioned that use of this flag can create a > security risk because another server can bind to a the same port, by > binding to a specific address as opposed to INADDR_ANY. The > SO_REUSEPORT flag allows multiple processes to bind to the same > address provided all of them use the SO_REUSEPORT option. > > > Since SO_REUSEPORT is not a 'stable fix', I suggest we revert the patch, > and eventually work on SO_REUSEPORT on net-next-2.6 > > What do you think ? Not entirely related, but FWIW I think that SO_REUSEPORT would be rather useful for haproxy. I've been working on allowing haproxy to be reconfigured without dropping or reusing connections. I have done this by re-using open sockets. But it would have been rather a lot easier to achieve using SO_REUSEPORT - assuming that I understand SO_REUSEPORT correctly.
2.6.38 behaviour is not only different from previous user-space ABI, but also inconsistent. If port is not specified in bind(), i.e. port is selected by kernel, then it is possible to bind another socket to the port reported by getsockname() after listen-and-shutdown of the first socket (both sockets have SO_REUSEADDR set).
A patch referencing this bug report has been merged in v2.6.39-rc5: commit 3e8c806a08c7beecd972e7ce15c570b9aba64baa Author: David S. Miller <davem@davemloft.net> Date: Wed Apr 13 12:01:14 2011 -0700 Revert "tcp: disallow bind() to reuse addr/port"
How can I get the bug code and the fixed code?