Setup - TCP connection in ESTABLISHED state. Local socket calls shutdown(SHUT_RDWR). After that peer calls shutdown(SHUT_RDWR). Local socket should now be in TIME_WAIT state (from specification point of view). And it's indeed in TIME_WAIT (TCP_TIME_WAIT) state if we look at /proc/net/tcp (or netstat -t). However, if one tries to get connection state via tcp_info (getsockopt(TCP_INFO)) the reported state is CLOSED (TCP_CLOSE). Looks like the problem is in tcp_time_wait() function (net/ipv4/tcp_minisocks.c). It's called with state=TCP_TIME_WAIT, and sets inet_timewaitk_sock *tw->tw_state field to TCP_TIME_WAIT. That's why the state is reported correctly when looking into /proc. However, at the end it calls tcp_done(sk), which itself calls tcp_set_state(TCP_CLOSE), so sk->sk_state is set to TCP_CLOSE instead of TCP_TIME_WAIT. And it's reported this way via TCP_INFO socket option. Problem is reproduced on 2.6.26, 2.6.38 and is probably observed on earlier kernels.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 25 Apr 2011 08:08:36 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=33902 > > Summary: tcpi_state field in tcp_info structure reports > TCP_CLOSE instead of TCP_TIME_WAIT state > Product: Networking > Version: 2.5 > Kernel Version: 2.6.38 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: Dmitry.Izbitsky@oktetlabs.ru > Regression: No > > > Setup - TCP connection in ESTABLISHED state. Local socket calls > shutdown(SHUT_RDWR). After that peer calls shutdown(SHUT_RDWR). > > Local socket should now be in TIME_WAIT state (from specification point > of view). And it's indeed in TIME_WAIT (TCP_TIME_WAIT) state if we look at > /proc/net/tcp (or netstat -t). However, if one tries to get connection state > via tcp_info (getsockopt(TCP_INFO)) the reported state is CLOSED (TCP_CLOSE). > > Looks like the problem is in tcp_time_wait() function > (net/ipv4/tcp_minisocks.c). > It's called with state=TCP_TIME_WAIT, and sets inet_timewaitk_sock > *tw->tw_state field to TCP_TIME_WAIT. That's why the state is reported > correctly when looking into /proc. However, at the end it calls tcp_done(sk), > which itself calls tcp_set_state(TCP_CLOSE), so sk->sk_state is set to > TCP_CLOSE instead of TCP_TIME_WAIT. And it's reported this way via TCP_INFO > socket option. > > Problem is reproduced on 2.6.26, 2.6.38 and is probably observed on earlier > kernels.
From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 25 Apr 2011 14:34:21 -0700 > On Mon, 25 Apr 2011 08:08:36 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > >> Setup - TCP connection in ESTABLISHED state. Local socket calls >> shutdown(SHUT_RDWR). After that peer calls shutdown(SHUT_RDWR). >> >> Local socket should now be in TIME_WAIT state (from specification point >> of view). And it's indeed in TIME_WAIT (TCP_TIME_WAIT) state if we look at >> /proc/net/tcp (or netstat -t). However, if one tries to get connection state >> via tcp_info (getsockopt(TCP_INFO)) the reported state is CLOSED >> (TCP_CLOSE). >> >> Looks like the problem is in tcp_time_wait() function >> (net/ipv4/tcp_minisocks.c). >> It's called with state=TCP_TIME_WAIT, and sets inet_timewaitk_sock >> *tw->tw_state field to TCP_TIME_WAIT. That's why the state is reported >> correctly when looking into /proc. However, at the end it calls >> tcp_done(sk), >> which itself calls tcp_set_state(TCP_CLOSE), so sk->sk_state is set to >> TCP_CLOSE instead of TCP_TIME_WAIT. And it's reported this way via TCP_INFO >> socket option. >> >> Problem is reproduced on 2.6.26, 2.6.38 and is probably observed on earlier >> kernels. As far as the user side of the socket is concerned, it is TCP_CLOSE. For timewait connections we create a completely seperate light-weight object to manage the network side visible state of the TCP flow. This is not accessible from, and is entirely differently from, the heavy-weight full socket we keep around until the user gives up his final reference. So I do not see this behavior changing, it would be quite invasive and expensive to make this work as you expect, and only for marginal gain.