Bug 798 - last timestamp value received (ts_recent) not updated properly; includes solution
Summary: last timestamp value received (ts_recent) not updated properly; includes solu...
Status: REJECTED INVALID
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: i386 Linux
: P2 low
Assignee: Bugme Janitors Team
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-10 16:47 UTC by Martin Duke
Modified: 2003-06-27 08:25 UTC (History)
0 users

See Also:
Kernel Version: from 2.4.18 through 2.5.70
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Martin Duke 2003-06-10 16:47:16 UTC
Distribution: Red Hat 7.3
Hardware Environment: i386
Software Environment:
Problem Description: Although the current kernel complies with RFC1323 (about 
timestamps, among other things), it does not follow the authors' later 
correction to the timestamp (ts) algorithm (http://www.kohala.com/start/tcplw-
extensions.txt).  In a nutshell:
    If the connection goes down, the subsequent retransmissions will have later 
ts values.  However, if an ACK was lost, the kernel currently does not update 
ts_recent with these values.  If the channel comes back up and a retrans 
succeeds, the client will send an ack with an ts echo equal to the old value of 
ts_recent.
    As the difference between the echo and the current time is quite large, it 
will result in an extremely large RTT measurement, triggering a huge increase 
in the RTO.  In an unreliable, bursty link, such as satellites, another outage 
could occur shortly thereafter, and an inappropriately long timeout can occur.

Steps to reproduce:
(1) Start a long TCP transfer and a tcpdump on the client side
(2) unplug the cable for a while
(3) look at the results:

16:06:01.336173 192.168.2.100.32808 > 192.168.1.100.5000: . 13566465:13567913
(1448) ack 1 win 5840 <nop,nop,timestamp 158539946 20339> (DF)
16:06:01.336296 192.168.2.100.32808 > 192.168.1.100.5000: . 13567913:13569361
(1448) ack 1 win 5840 <nop,nop,timestamp 158539946 20339> (DF)
16:06:01.336320 192.168.1.100.5000 > 192.168.2.100.32808: . ack 13569361 win 
63712 <nop,nop,timestamp 20339 158539946> (DF)
< outage >
16:06:27.993187 192.168.2.100.32808 > 192.168.1.100.5000: . 13505649:13507097
(1448) ack 1 win 5840 <nop,nop,timestamp 158542613 20339> (DF)
16:06:27.993340 192.168.1.100.5000 > 192.168.2.100.32808: . ack 13569361 win 
63712 <nop,nop,timestamp 23005 158539946,nop,nop,sack sack 1 
{13505649:13507097} > (DF)
16:06:27.993827 192.168.2.100.32808 > 192.168.1.100.5000: . 13569361:13570809
(1448) ack 1 win 5840 <nop,nop,timestamp 158542613 23005> (DF)
16:06:27.993866 192.168.1.100.5000 > 192.168.2.100.32808: . ack 13570809 win 
63712 <nop,nop,timestamp 23005 158542613> (DF)

Note that the first packet after the outage triggers an ack with an echo not 
from its timestamp, but from one before the outage.

Solution:
In tcp_rcv_established() of tcp_input.c:

        if (!tcp_sequence(tp, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq)) {
		/* RFC793, page 37: "In all states except SYN-SENT, all reset
		 * (RST) segments are validated by checking their SEQ-fields."
		 * And page 69: "If an incoming segment is not acceptable,
		 * an acknowledgment should be sent in reply (unless the RST bit
		 * is set, if so drop the segment and return)".
		 */
		if (!th->rst)
			tcp_send_dupack(sk, skb);
		goto discard;
	}

	if(th->rst) {
		tcp_reset(sk);
		goto discard;
	}

	tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);

Move the tcp_replace_ts_recent() call from the end to the beginning of this 
block of code.  This updates ts_recent prior to the conditional which is 
executed in the case described above.

Sample correct output (same scenario):
15:54:48.067421 192.168.2.100.32806 > 192.168.1.100.5000: . 9999249:10000697
(1448) ack 1 win 5840 <nop,nop,timestamp 158472657 2096312> (DF)
15:54:48.067544 192.168.2.100.32806 > 192.168.1.100.5000: . 10000697:10002145
(1448) ack 1 win 5840 <nop,nop,timestamp 158472657 2096312> (DF)
15:54:48.067569 192.168.1.100.5000 > 192.168.2.100.32806: . ack 10002145 win 
63712 <nop,nop,timestamp 2096313 158472657> (DF)
<outage>
15:55:14.725019 192.168.2.100.32806 > 192.168.1.100.5000: . 9938433:9939881
(1448) ack 1 win 5840 <nop,nop,timestamp 158475324 2096312> (DF)
15:55:14.725161 192.168.1.100.5000 > 192.168.2.100.32806: . ack 10002145 win 
63712 <nop,nop,timestamp 2098979 158475324,nop,nop,sack sack 1 
{9938433:9939881} > (DF)
15:55:14.725611 192.168.2.100.32806 > 192.168.1.100.5000: . 10002145:10003593
(1448) ack 1 win 5840 <nop,nop,timestamp 158475324 2098979> (DF)
15:55:14.725735 192.168.2.100.32806 > 192.168.1.100.5000: . 10003593:10005041
(1448) ack 1 win 5840 <nop,nop,timestamp 158475324 2098979> (DF)
15:55:14.725790 192.168.1.100.5000 > 192.168.2.100.32806: . ack 10003593 win 
63712 <nop,nop,timestamp 2098979 158475324> (DF)
Comment 1 deletedbugzillaaccount 2003-06-12 19:08:35 UTC
Said draft is buggy and our behavior is correct.  The TCP implementation
community considers our behavior acceptable.  There was a discussion about
exactly this on the tcp-impl list some time ago, sorry I don't have a
reference handy.

In particular, said draft opens up a hole where anyone can break the
connection on us.

Note You need to log in before you can comment on or make changes to this bug.