Bug 18952
Summary: | The amount of SYN retries is not equal to /proc/sys/net/ipv4/tcp_syn_retries | ||
---|---|---|---|
Product: | Networking | Reporter: | Yuri Chislov (ychislov) |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, ole |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32.12, 2.6.32.15, 2.6.35.4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Yuri Chislov
2010-09-22 08:50:11 UTC
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 22 Sep 2010 08:50:12 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=18952 > > Summary: The mount of SYN retries is not equal to > /proc/sys/net/ipv4/tcp_syn_retries > Product: Networking > Version: 2.5 > Kernel Version: 2.6.32.12, 2.6.32.15, 2.6.35.4 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: yuri@itinteg.net > Regression: No > > > When setting a value in /proc/sys/net/ipv4/tcp_syn_retries, the actual > number > of syn retries is the number set in /proc/sys/net/ipv4/tcp_syn_retries minus > 2. > > For example: > When setting /proc/sys/net/ipv4/tcp_syn_retries to 5 the actual number of SYN > retries is 3. > When setting /proc/sys/net/ipv4/tcp_syn_retries to 7 the actual number of SYN > retries is 5. > However, when setting setting /proc/sys/net/ipv4/tcp_syn_retries to 2 the > actual number of SYN retries is 2. > > Note: In the kernel 2.6.31.9 actual number of SYN = tcp_syn_retries + 1 > > > Steps to reproduce: > sudo iptables -I INPUT 1 -i lo -p tcp --dport 88 -j DROP > sudo tcpdump -n -i lo -v tcp port 88 > > from another console run: > time wget -t 1 -O - --connect-timeout=300 http://0:88 > > tcpdump output: > sudo tcpdump -n -i lo -v tcp port 88 > 11:29:39.820058 IP (tos 0x0, ttl 64, id 14664, offset 0, flags [DF], proto > TCP > (6), length 60) > 127.0.0.1.43730 > 127.0.0.1.kerberos: Flags [S], cksum 0xfe30 (incorrect > -> > 0xecf4), seq 1012617667, win 32792, options [mss 16396,sackOK,TS val 12871819 > ecr 0,nop,wscale 7], length 0 > 11:29:42.824091 IP (tos 0x0, ttl 64, id 14665, offset 0, flags [DF], proto > TCP > (6), length 60) > 127.0.0.1.43730 > 127.0.0.1.kerberos: Flags [S], cksum 0xfe30 (incorrect > -> > 0xe137), seq 1012617667, win 32792, options [mss 16396,sackOK,TS val 12874824 > ecr 0,nop,wscale 7], length 0 > 11:29:48.832153 IP (tos 0x0, ttl 64, id 14666, offset 0, flags [DF], proto > TCP > (6), length 60) > 127.0.0.1.43730 > 127.0.0.1.kerberos: Flags [S], cksum 0xfe30 (incorrect > -> > 0xc9bf), seq 1012617667, win 32792, options [mss 16396,sackOK,TS val 12880832 > ecr 0,nop,wscale 7], length 0 > > wget output: > time wget -t 1 -O - --connect-timeout=300 http://0:88 > Resolving 0... 0.0.0.0 > Connecting to 0|0.0.0.0|:88... failed: Connection timed out. > Giving up. > > > real 0m21.050s > user 0m0.003s > sys 0m0.004s > > It valid for remote host also. > tcp_syn_retries is not an exact calculation. It is input into a calculation which estimates how long that many retransmits (with suitable backoff applied) will take, and that time estimte in turn determines the time limit for when we'll kill the connection attempt. Feel free to update the documentation in Documentation/networking/ip-sysctl.txt to more closely match the behavior. The logic is in net/ipv4/tcp_timer.c:retransmits_timed_out(). From: Yuri Chislov <yuri@itinteg.net> Date: Mon, 27 Sep 2010 10:07:06 +0200 > It looks like the behavior changed in 2.6.32. 2.6.32 and up, uses some > calculation instead of a direct definition of the retries number, that makes > it > harder to achieve the necessary system behavior. > > The default behavior of the system changed completely > (the old default connect timeout was ~ 180 seconds, while the new one is ~21 > sec). > > The new behavior invalidates the kernel documentation and tcp man page. > > It's not possible to set a connect timeout > 25 sec in the applications while > using the default values in /proc. > > From my view point is regression. Agreed, Damian you have to fix this. Otherwise I'm reverting all of your Revert Backoff commits. It looks like the behavior changed in 2.6.32. 2.6.32 and up, uses some
calculation instead of a direct definition of the retries number, that makes it
harder to achieve the necessary system behavior.
The default behavior of the system changed completely
(the old default connect timeout was ~ 180 seconds, while the new one is ~21
sec).
The new behavior invalidates the kernel documentation and tcp man page.
It's not possible to set a connect timeout > 25 sec in the applications while
using the default values in /proc.
From my view point is regression.
On Saturday, September 25, 2010 05:05:57 am David Miller wrote:
> tcp_syn_retries is not an exact calculation.
>
> It is input into a calculation which estimates how long that many
> retransmits (with suitable backoff applied) will take, and that time
> estimte in turn determines the time limit for when we'll kill the
> connection attempt.
>
> Feel free to update the documentation in
> Documentation/networking/ip-sysctl.txt to more closely match the
> behavior.
>
> The logic is in net/ipv4/tcp_timer.c:retransmits_timed_out().
Reply-To: damian@tvk.rwth-aachen.de Hi David, give me some time, please. I'll take a closer look in the evening. Regards Damian Am Montag, den 27.09.2010, 01:10 -0700 schrieb David Miller: > From: Yuri Chislov <yuri@itinteg.net> > Date: Mon, 27 Sep 2010 10:07:06 +0200 > > > It looks like the behavior changed in 2.6.32. 2.6.32 and up, uses some > > calculation instead of a direct definition of the retries number, that > makes it > > harder to achieve the necessary system behavior. > > > > The default behavior of the system changed completely > > (the old default connect timeout was ~ 180 seconds, while the new one is > ~21 > > sec). > > > > The new behavior invalidates the kernel documentation and tcp man page. > > > > It's not possible to set a connect timeout > 25 sec in the applications > while > > using the default values in /proc. > > > > From my view point is regression. > > Agreed, Damian you have to fix this. > > Otherwise I'm reverting all of your Revert Backoff commits. Reply-To: damian@tvk.rwth-aachen.de Ok, I see. When I read your mail this morning, I was afraid, that this is another bug in the calculation routine. However, the routine seems ok to me. The problem is the high discrepancy between the RTO_MIN-based calculation and TCP_TIMEOUT_INIT-based actual backoff in the SYN-case. Yuris example did not reveal a conceptually new bug, as the same behaviour can be observed on pre 2.6.32 kernels at higher values. I tested wget on a 2.6.26 kernel with timeout values of 300 and above. All runs did time out after 189 seconds, with the sysctl-value as their hard limit. My suggestion for solving this issue: Introducing a third boolean parameter for retransmits_timed_out() indicating whether the socket is in SYN state or not. In the SYN case, TCP_TIMEOUT_INIT will be used for the calculation instead of TCP_RTO_MIN. Is that ok? Regards Damian Am Montag, den 27.09.2010, 01:10 -0700 schrieb David Miller: > From: Yuri Chislov <yuri@itinteg.net> > Date: Mon, 27 Sep 2010 10:07:06 +0200 > > > It looks like the behavior changed in 2.6.32. 2.6.32 and up, uses some > > calculation instead of a direct definition of the retries number, that > makes it > > harder to achieve the necessary system behavior. > > > > The default behavior of the system changed completely > > (the old default connect timeout was ~ 180 seconds, while the new one is > ~21 > > sec). > > > > The new behavior invalidates the kernel documentation and tcp man page. > > > > It's not possible to set a connect timeout > 25 sec in the applications > while > > using the default values in /proc. > > > > From my view point is regression. > > Agreed, Damian you have to fix this. > > Otherwise I'm reverting all of your Revert Backoff commits. From: Damian Lukowski <damian@tvk.rwth-aachen.de> Date: Mon, 27 Sep 2010 22:00:08 +0200 > My suggestion for solving this issue: > Introducing a third boolean parameter for retransmits_timed_out() > indicating whether the socket is in SYN state or not. In the SYN case, > TCP_TIMEOUT_INIT will be used for the calculation instead of > TCP_RTO_MIN. > > Is that ok? Sounds fine to me, please prepare a patch. Thanks! What is advantage in the replace compare by calculation? Please clarify, if possible. Thanks. Yuri. --- linux-2.6.31.14/net/ipv4/tcp_timer.c 2010-07-05 17:11:43.000000000 +0000 +++ linux-2.6.32.15/net/ipv4/tcp_timer.c 2010-06-01 16:56:03.000000000 +0000 @@ -137,13 +137,14 @@ { struct inet_connection_sock *icsk = inet_csk(sk); int retry_until; + bool do_reset; if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { if (icsk->icsk_retransmits) dst_negative_advice(&sk->sk_dst_cache); retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries; } else { - if (icsk->icsk_retransmits >= sysctl_tcp_retries1) { + if (retransmits_timed_out(sk, sysctl_tcp_retries1)) { /* Black hole detection */ tcp_mtu_probing(icsk, sk); @@ -155,13 +156,15 @@ const int alive = (icsk->icsk_rto < TCP_RTO_MAX); retry_until = tcp_orphan_retries(sk, alive); + do_reset = alive || + !retransmits_timed_out(sk, retry_until); - if (tcp_out_of_resources(sk, alive || icsk- >icsk_retransmits < retry_until)) + if (tcp_out_of_resources(sk, do_reset)) return 1; } } - if (icsk->icsk_retransmits >= retry_until) { + if (retransmits_timed_out(sk, retry_until)) { /* Has it gone just too far? */ tcp_write_err(sk); return 1; @@ -279,7 +282,7 @@ * The TCP retransmit timer. */ -static void tcp_retransmit_timer(struct sock *sk) +void tcp_retransmit_timer(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); @@ -385,7 +388,7 @@ out_reset_timer: icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX); inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX); - if (icsk->icsk_retransmits > sysctl_tcp_retries1) + if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1)) __sk_dst_reset(sk); out:; @@ -499,8 +502,7 @@ elapsed = tcp_time_stamp - tp->rcv_tstamp; if (elapsed >= keepalive_time_when(tp)) { - if ((!tp->keepalive_probes && icsk->icsk_probes_out >= sysctl_tcp_keepalive_probes) || - (tp->keepalive_probes && icsk->icsk_probes_out >= tp- >keepalive_probes)) { + if (icsk->icsk_probes_out >= keepalive_probes(tp)) { tcp_send_active_reset(sk, GFP_ATOMIC); tcp_write_err(sk); goto out; On Tuesday, September 28, 2010 06:52:41 am David Miller wrote: > From: Damian Lukowski <damian@tvk.rwth-aachen.de> > Date: Mon, 27 Sep 2010 22:00:08 +0200 > > > My suggestion for solving this issue: > > Introducing a third boolean parameter for retransmits_timed_out() > > indicating whether the socket is in SYN state or not. In the SYN case, > > TCP_TIMEOUT_INIT will be used for the calculation instead of > > TCP_RTO_MIN. > > > > Is that ok? > > Sounds fine to me, please prepare a patch. > > Thanks! On Tue, 28 Sep 2010, Yuri Chislov wrote: > What is advantage in the replace compare by calculation? > Please clarify, if possible. > Thanks. > Yuri. If you did take this from kernel history, reading the particular log message might have helped?!? > --- linux-2.6.31.14/net/ipv4/tcp_timer.c 2010-07-05 17:11:43.000000000 > +0000 > +++ linux-2.6.32.15/net/ipv4/tcp_timer.c 2010-06-01 16:56:03.000000000 > +0000 > @@ -137,13 +137,14 @@ > { > struct inet_connection_sock *icsk = inet_csk(sk); > int retry_until; > + bool do_reset; > > if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { > if (icsk->icsk_retransmits) > dst_negative_advice(&sk->sk_dst_cache); > retry_until = icsk->icsk_syn_retries ? : > sysctl_tcp_syn_retries; > } else { > - if (icsk->icsk_retransmits >= sysctl_tcp_retries1) { > + if (retransmits_timed_out(sk, sysctl_tcp_retries1)) { > /* Black hole detection */ > tcp_mtu_probing(icsk, sk); > > @@ -155,13 +156,15 @@ > const int alive = (icsk->icsk_rto < TCP_RTO_MAX); > > retry_until = tcp_orphan_retries(sk, alive); > + do_reset = alive || > + !retransmits_timed_out(sk, retry_until); > > - if (tcp_out_of_resources(sk, alive || icsk- > >icsk_retransmits < retry_until)) > + if (tcp_out_of_resources(sk, do_reset)) > return 1; > } > } > > - if (icsk->icsk_retransmits >= retry_until) { > + if (retransmits_timed_out(sk, retry_until)) { > /* Has it gone just too far? */ > tcp_write_err(sk); > return 1; > @@ -279,7 +282,7 @@ > * The TCP retransmit timer. > */ > > -static void tcp_retransmit_timer(struct sock *sk) > +void tcp_retransmit_timer(struct sock *sk) > { > struct tcp_sock *tp = tcp_sk(sk); > struct inet_connection_sock *icsk = inet_csk(sk); > @@ -385,7 +388,7 @@ > out_reset_timer: > icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX); > inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, > TCP_RTO_MAX); > - if (icsk->icsk_retransmits > sysctl_tcp_retries1) > + if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1)) > __sk_dst_reset(sk); > > out:; > @@ -499,8 +502,7 @@ > elapsed = tcp_time_stamp - tp->rcv_tstamp; > > if (elapsed >= keepalive_time_when(tp)) { > - if ((!tp->keepalive_probes && icsk->icsk_probes_out >= > sysctl_tcp_keepalive_probes) || > - (tp->keepalive_probes && icsk->icsk_probes_out >= tp- > >keepalive_probes)) { > + if (icsk->icsk_probes_out >= keepalive_probes(tp)) { > tcp_send_active_reset(sk, GFP_ATOMIC); > tcp_write_err(sk); Thank you. The log is really helpful.
On Tuesday, September 28, 2010 11:47:23 am Ilpo Järvinen wrote:
> On Tue, 28 Sep 2010, Yuri Chislov wrote:
> > What is advantage in the replace compare by calculation?
> >
> > Please clarify, if possible.
> > Thanks.
> > Yuri.
>
> If you did take this from kernel history, reading the particular log
> message might have helped?!?
|