Bug 202287 - netfilter/iptales conntrack - INVALID tcp ack fin packets
Summary: netfilter/iptales conntrack - INVALID tcp ack fin packets
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-15 16:18 UTC by jb
Modified: 2019-08-14 12:33 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.20
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Tpr Browser and kernels run log (24.34 KB, text/plain)
2019-01-15 16:18 UTC, jb
Details
git bisect log (1.70 KB, text/plain)
2019-02-14 21:25 UTC, jb
Details
git bisect log (2.12 KB, text/plain)
2019-02-18 19:53 UTC, jb
Details

Description jb 2019-01-15 16:18:13 UTC
Created attachment 280501 [details]
Tpr Browser and kernels run log

Since kernel 4.20 there is a problem with Tor Browser (TB) on close.
The last good kernel was 4.19.12 .

Tested:
kernel 4.20.1 and 4.19.12 .
Tor Browser 8.0.4 (prod version).

Tor Browser (TB) and kernel run logs (as attached):
tor-netfilter-4.20.1-and-4.19.12.txt

kernel 4.20.1
Results:
There is a problem with TB on close:
- TB process left behind
- INVALID messages (ACK FIN and ACK PSH FIN) cought in OUTPUT chain of iptables,
  on lo interface, and logged to system log/journal.
  -A OUTPUT -m conntrack --ctstate INVALID -j loginv

kernel 4.19.12
Results:
There is no problem with TB, i.e. clean close.

There is an old reference to this problem
https://www.linode.com/community/questions/2495/iptables-drop-ackfin-as-invalid
Comment 1 jb 2019-02-14 21:25:43 UTC
Created attachment 281145 [details]
git bisect log
Comment 2 jb 2019-02-14 21:37:46 UTC
Tested on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Per 'git bisect' (see attached log):
first bad commit: [6f41617bf23a17d9cb7cc6ca8161534f05f80293] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

$ git show 6f41617bf23a17d9cb7cc6ca8161534f05f80293
commit 6f41617bf23a17d9cb7cc6ca8161534f05f80293 (refs/bisect/bad)
Merge: 7bdaae270cb5 cec4de302c5f
Author: David S. Miller <davem@davemloft.net>
Date:   Wed Oct 3 21:00:17 2018 -0700

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
    
    Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
    overlapped the renaming of a netlink attribute in net-next.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
...

Can you or David S. Miller figure it out ?
Comment 3 jb 2019-02-18 19:53:42 UTC
Created attachment 281191 [details]
git bisect log
Comment 4 jb 2019-02-18 20:05:50 UTC
In ref to Tor Browser (TB) and kernel run logs (as attached):
The 5 min (300 secs) timeout is a deciding factor.
The Tor Browser has to be open (just leave it idle or do some browsing) longer
than 5 min. After that, closing the browser causes INVALID messages on lo
interface and a browser process left behind.

net/netfilter/nf_conntrack_proto_tcp.c
...
        [TCP_CONNTRACK_RETRANS]         = 5 MINS,
        [TCP_CONNTRACK_UNACK]           = 5 MINS,
...

# sysctl -a
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300

Here is my second bisect, with wider source code scope, and thus more reliable.
Attached: bisect-2.log

Per 'git bisect':
# first bad commit: [a337531b942bd8a03e7052444d7e36972aac2d92] tcp: up initial rmem to 128KB and SYN rwin to around 64KB

$ git show a337531b9
commit a337531b942bd8a03e7052444d7e36972aac2d92 (refs/bisect/bad)
Author: Yuchung Cheng <ycheng@google.com>
Date:   Thu Sep 27 11:21:19 2018 -0700

    tcp: up initial rmem to 128KB and SYN rwin to around 64KB
    
    Previously TCP initial receive buffer is ~87KB by default and
    the initial receive window is ~29KB (20 MSS). This patch changes
    the two numbers to 128KB and ~64KB (rounding down to the multiples
    of MSS) respectively. The patch also simplifies the calculations s.t.
    the two numbers are directly controlled by sysctl tcp_rmem[1]:
    
      1) Initial receiver buffer budget (sk_rcvbuf): while this should
         be configured via sysctl tcp_rmem[1], previously tcp_fixup_rcvbuf()
         always override and set a larger size when a new connection
         establishes.
    
      2) Initial receive window in SYN: previously it is set to 20
         packets if MSS <= 1460. The number 20 was based on the initial
         congestion window of 10: the receiver needs twice amount to
         avoid being limited by the receive window upon out-of-order
         delivery in the first window burst. But since this only
         applies if the receiving MSS <= 1460, connection using large MTU
         (e.g. to utilize receiver zero-copy) may be limited by the
         receive window.
    
    With this patch TCP memory configuration is more straight-forward and
    more properly sized to modern high-speed networks by default. Several
    popular stacks have been announcing 64KB rwin in SYNs as well.
...

Host data - for comparison:

bisect good 1288aa7a
--------------------
# sysctl -a |grep rmem
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.ipv4.tcp_rmem = 4096        87380   6291456
net.ipv4.udp_rmem_min = 4096
# sysctl -a |grep wmem
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_wmem = 4096        16384   4194304
net.ipv4.udp_wmem_min = 4096
# sysctl -a |grep rwin
#
Other 300 secs net values:
# sysctl -a |grep 300
net.core.netdev_budget = 300
net.ipv4.route.error_cost = 300
net.ipv4.route.gc_timeout = 300
net.ipv4.tcp_max_reordering = 300
net.ipv4.tcp_min_rtt_wlen = 300
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300

bisect bad a337531b94 [first bad commit]
----------------------------------------
# sysctl -a |grep rmem
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.ipv4.tcp_rmem = 4096        131072  6291456        <---------------
net.ipv4.udp_rmem_min = 4096
# sysctl -a |grep wmem
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_wmem = 4096        16384   4194304
net.ipv4.udp_wmem_min = 4096
# sysctl -a |grep rwin
#
Other 300 secs net values:
# sysctl -a |grep 300
net.core.netdev_budget = 300
net.ipv4.route.error_cost = 300
net.ipv4.route.gc_timeout = 300
net.ipv4.tcp_max_reordering = 300
net.ipv4.tcp_min_rtt_wlen = 300
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
Comment 5 Yuchung Cheng 2019-02-19 18:09:40 UTC
Hi the patch is indeed buggy. The fix was merged subsequently 

commit 041a14d2671573611ffd6412bc16e2f64469f7fb
Author: Yuchung Cheng <ycheng@google.com>
Date:   Mon Oct 1 15:42:32 2018 -0700

    tcp: start receiver buffer autotuning sooner
Comment 6 jb 2019-02-20 15:01:48 UTC
I verified your fix commit 041a14d26715:

$ git log --oneline v4.19..v4.20 -- include net
...
041a14d26715 tcp: start receiver buffer autotuning sooner <- BAD
...
a337531b942b tcp: up initial rmem to 128KB and SYN rwin to around 64KB <- BAD
9e288cefcc55 clk: renesas: Convert to SPDX identifiers <- GOOD
$

In theory, there could be a GOOD commit in this range, but branch deteriorated
later:
$ git log a337531b..041a14d26 -- include net |wc -l
146
$
So, I bisected it and all commits are BAD.

------

I also verified wider scope, just in case:

$ git log --oneline v4.19..v4.20
...
041a14d26715 tcp: start receiver buffer autotuning sooner <- BAD
...
a337531b942b tcp: up initial rmem to 128KB and SYN rwin to around 64KB <- BAD
3ff6cde84685 hns3: Another build fix. <- GOOD
...
9e288cefcc55 clk: renesas: Convert to SPDX identifiers <- GOOD
$

So, we are back to this faulty commit:
a337531b942b tcp: up initial rmem to 128KB and SYN rwin to around 64KB
Comment 7 Thomas Jarosch 2019-08-08 08:48:53 UTC
Can you give this patch a try:
https://lore.kernel.org/netfilter-devel/20190802132302.13537-4-sashal@kernel.org/

?

There was a bug related to TCP window scaling causing the wrong conntrack timeout to be set for TCP connections (300s instead of 5d). It might be related since your bisect points to changes of the window scaling handling.

(I found your bugzilla entry while tracing down the exact issue the mentioned patch fixes)

Here's the thread how this was diagnosed:
https://lore.kernel.org/netfilter-devel/alpine.DEB.2.20.1907101242560.17522@blackhole.kfki.hu/
Comment 8 jb 2019-08-12 18:27:52 UTC
Great.
Tested on 5.3.0-rc4 from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
It is OK now.
Thanks to Florian Westphal for the fix and all other users/devs involved.
If there are no comments I will close it in a couple of days.

Note You need to log in before you can comment on or make changes to this bug.