Bug 203671
Summary: | Stuck connections if flow offload enabled in nftables | ||
---|---|---|---|
Product: | Networking | Reporter: | nucleo (nucleo) |
Component: | Netfilter/Iptables | Assignee: | networking_netfilter-iptables (networking_netfilter-iptables) |
Status: | NEW --- | ||
Severity: | normal | CC: | otto, pablo |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.1.15-300.fc30.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | skip fixup on teardown state |
Description
nucleo
2019-05-21 23:16:54 UTC
I should add that no problem after removing "ip protocol tcp flow offload @ft" or "oif "eth1" ip saddr 10.0.0.0/24 snat to 198.51.100.1" rules. The same behaviour with nftables-0.9.1 and kernel 5.1.15-300.fc30.x86_64. I am also experiencing similar connection stalling with TCP-connections. In my case the offload is applied on NAT-gateway before ct rules like this: cat /etc/nftables.conf | grep flow #add flowtable ip filter flows { hook ingress priority -50; devices = {enp6s0f0, enp6s0f1}; } #add rule ip filter FORWARD counter flow offload @flows comment "FASTPATH TEST" The stalling occurs systematically on long running TCP-connections but I have not seen any issues with UDP. When the offloading is disabled, the issue no longer occurs. I also can reproduce bug with this minimal nftables 0.9.1 setup: table inet filter { flowtable ft { hook ingress priority filter devices = { eth1, eth2 } } chain forward { type filter hook forward priority filter; policy accept; ip protocol tcp flow add @ft } } table ip nat { chain prerouting { type nat hook prerouting priority dstnat; policy accept; } chain postrouting { type nat hook postrouting priority srcnat; policy accept; oif "eth1" ip saddr 10.0.0.0/24 snat to 198.51.100.1 } } Could you give a try to these fixes? https://patchwork.ozlabs.org/patch/1102703/ https://patchwork.ozlabs.org/patch/1102704/ https://patchwork.ozlabs.org/patch/1102705/ https://patchwork.ozlabs.org/patch/1102706/ I can request for inclusion into -stable. Thanks. No stale connection with patches applied to to 4.19.56. I am going to test also 5.1.15. No stale connection also in 5.1.15 with patches. But I noticed in /proc/net/nf_conntrack sometimes left TIME_WAIT or CLOSE with large timeout 86399 but this is hard to reproduce because most of connections disappeared shortly after closing. When iperf3 connection is active all [OFFLOAD] entries disappeared from /proc/net/nf_conntrack after about 60 seconds. There is a race that might trigger the 85400 timeout with TIME_WAIT (this is actually one day, in seconds, which is an internal offload timeout that is leaking to userspace when hitting this bug). http://patchwork.ozlabs.org/patch/1144577/ http://patchwork.ozlabs.org/patch/1144578/ Please, give a try to these patches. Regarding the "entries dissapeared from /proc/net/nf_conntrack after 60 seconds", I cannot reproduce it. Could you tell me how you do reproduce it there? Disappearing is hard to reproduce. I just run client client "iperf3 -c 198.51.100.254 -n 100G" several times and there is randomly different behaviour entries in /proc/net/nf_conntrack. First, here results with kernel 5.2.6-200.fc30.x86_64 without patches from comment 9 with this ruleset table inet filter { flowtable ft { hook ingress priority filter devices = { eth1, eth2 } } chain forward { type filter hook forward priority filter; policy accept; flow add @ft } } table ip nat { chain prerouting { type nat hook prerouting priority dstnat; policy accept; } chain postrouting { type nat hook postrouting priority srcnat; policy accept; oif "eth1" ip saddr 10.0.0.0/24 snat to 198.51.100.1 } } First iperf3 test without rule "flow add @ft": server side: tcp6 0 0 198.51.100.254:5201 198.51.100.1:49806 ESTABLISHED 27619/iperf3 tcp6 0 0 198.51.100.254:5201 198.51.100.1:49808 ESTABLISHED 27619/iperf3 client side: tcp 0 0 10.0.0.2:49806 198.51.100.254:5201 ESTABLISHED 27660/iperf3 tcp 0 3082920 10.0.0.2:49808 198.51.100.254:5201 ESTABLISHED 27660/iperf3 /proc/net/nf_conntrack during all test have correct entries: ipv4 2 tcp 6 431932 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49806 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49806 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 300 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49808 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49808 [ASSURED] mark=0 zone=0 use=2 /proc/net/nf_conntrack after interrupting client with ctrl+c: ipv4 2 tcp 6 118 TIME_WAIT src=10.0.0.2 dst=198.51.100.254 sport=49806 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49806 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 8 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=49808 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49808 [ASSURED] mark=0 zone=0 use=2 Now several tests with "flow add @ft" rule: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49722 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49722 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 24 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49722 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49722 mark=0 zone=0 use=2 /proc/net/nf_conntrack after 59 seconds (on lient side still two established connections) ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after interrupting client ipv4 2 tcp 6 2 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 mark=0 zone=0 use=2 ipv4 2 tcp 6 52 CLOSE_WAIT src=10.0.0.2 dst=198.51.100.254 sport=49722 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49722 mark=0 zone=0 use=2 Next test: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 86378 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 mark=0 zone=0 use=2 /proc/net/nf_conntrack after 59 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 86344 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 mark=0 zone=0 use=2 /proc/net/nf_conntrack after interrupting client ipv4 2 tcp 6 7 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 mark=0 zone=0 use=2 ipv4 2 tcp 6 117 TIME_WAIT src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 [ASSURED] mark=0 zone=0 use=2 Next test: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49760 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49760 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49762 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49762 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 30 seconds ipv4 2 tcp 6 26 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49760 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49760 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49762 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49762 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 59 seconds empty, on cleint side [ 4] 59.00-60.00 sec 659 MBytes 5.53 Gbits/sec 1764 1.17 MBytes iperf3: error - unable to write to stream socket: Connection reset by peer Tests with kernel 5.2.7-200.fc30.x86_64 without patches from comment 9: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 33 seconds ipv4 2 tcp 6 117 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 154 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after interrupting client ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after couple of seconds ipv4 2 tcp 6 39 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 mark=0 zone=0 use=2 Last test was with kernel 5.2.7-200.fc30.x86_64 and with patches from comment 9 (In reply to nucleo from comment #10) > Tests with kernel 5.2.7-200.fc30.x86_64 _with_ patches from comment 9: > > /proc/net/nf_conntrack after test started > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 > zone=0 use=3 > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 Both flows have been placed in the flowtable. > /proc/net/nf_conntrack after 33 seconds > ipv4 2 tcp 6 117 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 > sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 > dport=49854 mark=0 zone=0 use=2 > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 Flowtable sees no packets for flow sport=49854 after 30 seconds, this flow is pushed out from flowtable and conntrack recover control on it. The pick up timeout (120 seconds) kicks in and the entry is set in ESTABLISHED state (tracking is also set to liberal). > /proc/net/nf_conntrack after 154 seconds > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 Flow sport=49854 is gone. No traffic for it after a while, conntrack saw no packets after 120 seconds either (this is 30 seconds flowtable timeout + 120 seconds for the pickup timeout) so the entry sport=49854 is released. > /proc/net/nf_conntrack after interrupting client > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 > zone=0 use=3 > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 After pressing ctrl-c on the client, here in my testbed I see one entry in TIME_WAIT (in your case, that would be the flow identified by sport=49856) and another flow in ESTABLISHED state which is this one below... > /proc/net/nf_conntrack after couple of seconds > ipv4 2 tcp 6 39 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 > sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 > dport=49854 mark=0 zone=0 use=2 ... this is sport=49854, the fin/rst packet is sent back to the flowtable, then the entry expires (no packets after 30 seconds) and it goes back to conntrack. I'll be posting two patches here: 1) do not push back flow to flowtable if packet is fin/rst. 2) likely increase default flowtable timeout to 120 seconds. I'll also expose toggles to make this configurable too. Thanks for your feedback. (In reply to Pablo Neira Ayuso from comment #12) [...] > 1) do not push back flow to flowtable if packet is fin/rst. https://patchwork.ozlabs.org/patch/1146133/ With this patch, conntrack entries enter TIME_WAIT state fin/rst after interrupting client. (In reply to Pablo Neira Ayuso from comment #13) > (In reply to Pablo Neira Ayuso from comment #12) > [...] > > 1) do not push back flow to flowtable if packet is fin/rst. > > https://patchwork.ozlabs.org/patch/1146133/ Patch version 2: https://patchwork.ozlabs.org/patch/1146419/ Here my tests with Fedora 5.2.9 kernel with applied patches from comment 9 and comment 14. I repeated the several times "iperf3 -c 198.51.100.254 -n 100G" interrupting it with ctrl+c. First run Contents of /proc/net/nf_conntrack ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51994 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51994 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51994 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51994 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 18 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 mark=0 zone=0 use=2 after 60 seconds /proc/net/nf_conntrack empty and after yet one second ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51994 dport=5201 [UNREPLIED] src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51994 [OFFLOAD] mark=0 zone=0 use=3 ctrl+c ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 [OFFLOAD] mark=0 zone=0 use=3 and after that ipv4 2 tcp 6 112 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 mark=0 zone=0 use=2 Second run ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52000 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52000 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52002 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52002 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 26 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52000 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52000 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52002 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52002 [OFFLOAD] mark=0 zone=0 use=3 after 60 seconds /proc/net/nf_conntrack empty, on client side: [ 4] 59.00-60.00 sec 528 MBytes 4.42 Gbits/sec 498 1.34 MBytes iperf3: error - unable to write to stream socket: Connection reset by peer In one of other runs test continued with empty /proc/net/nf_conntrack. Third run ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 25 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 mark=0 zone=0 use=2 after 60 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ctrl+c ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after that ipv4 2 tcp 6 3 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after that ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after that ipv4 2 tcp 6 71 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 mark=0 zone=0 use=2 Fourth run finished without interrupting ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52020 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52020 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52022 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52022 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 10 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52020 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52020 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52022 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52022 [OFFLOAD] mark=0 zone=0 use=3 after 60 seconds /proc/net/nf_conntrack empty and after 1 second ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52022 dport=5201 [UNREPLIED] src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52022 [OFFLOAD] mark=0 zone=0 use=3 test finished ipv4 2 tcp 6 107 TIME_WAIT src=10.0.0.2 dst=198.51.100.254 sport=52020 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52020 mark=0 zone=0 use=2 I cannot reproduce this here on 5.3-rc, I've been repeating similar tests here. Would you mind to check for all patches between 4.19 and 5.3 for the flowtable infrastructure? You could do this via: git log --oneline v4.19..v5.3-rc3 net/netfilter/nft_flow_offload.c also check for these files: net/netfilter/nf_flow_table_core.c net/netfilter/nf_flow_table_ip.c net/netfilter/nf_flow_table_inet.c net/ipv4/netfilter/nf_flow_table_ipv4.c net/ipv6/netfilter/nf_flow_table_ipv6.c include/net/netfilter/nf_flow_table.h Make sure you get a fresh clone of: https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git to check for any missing patch. If there are relevant patches already upstream that are not in 4.19 that fix the problem that you report, please tell a list of commit IDs and I'll request -stable maintainers to include them in the 4.19 -stable release. I agree that the 30 seconds timer to evict a flow from the flowtable that has seen no traffic is too aggresive, but before making a patch to rise this default timeout (and to expose a knob to allow users to configure this), it would be good to make sure no relevant patch is missing. Thanks! Created attachment 284587 [details]
skip fixup on teardown state
After 150 seconds (30 seconds to evict the iperf control flow from the flowtable + 120 in ESTABLISHED state), if I press ctr-c, I can see this:
ipv4 2 tcp 6 104 TIME_WAIT src=192.168.10.2 dst=10.0.1.2 s
port=33994 dport=5201 src=10.0.1.2 dst=10.0.1.1 sport=5201 dport=33
994 mark=0 secctx=null zone=0 use=2
ipv4 2 tcp 6 104 ESTABLISHED src=192.168.10.2 dst=10.0.1.2
sport=33992 dport=5201 src=10.0.1.2 dst=10.0.1.1 sport=5201 dport=
33992 mark=0 secctx=null zone=0 use=2
The flow tcp sport/33992 is the iperf control plane flow.
It seems that iperf sends a data packet on the after ctrl-c
20:13:22.268161 IP 192.168.10.2.33992 > 10.0.1.2.5201: Flags [P.], seq 3952723680:3952723681, ack 3326915136, win 502, options [nop,nop,TS val 2165195608 ecr 2773810022], length 1
this pushed in the flow into the flowtable again, however...
20:13:22.268434 IP 10.0.1.2.5201 > 192.168.10.2.33992: Flags [F.], seq 1, ack 1, win 509, options [nop,nop,TS val 2773964852 ecr 2165195608], length 0
20:13:22.268472 IP 192.168.10.2.33992 > 10.0.1.2.5201: Flags [F.], seq 1, ack 2, win 502, options [nop,nop,TS val 2165195608 ecr 2773964852], length 0
20:13:22.268492 IP 10.0.1.2.5201 > 192.168.10.2.33992: Flags [.], ack 2, win 509, options [nop,nop,TS val 2773964852 ecr 2165195608], length 0
These tcp fin packet schedules the flowtable entry to be removed, but the state fixup routine takes the conntrack entry from FIN_WAIT -> ESTABLISHED.
Scratch that, patch is not correct. This patch fixes incorrect timeout initialization of the flowtable entry: https://patchwork.ozlabs.org/patch/1156702/ Now needed all of pacthes from comments 9, 14, 19? I can't test 4.19 kernel because 9,14 comments patches I can't apply to last 4.19.x. Could you try with Fedora 5.2.9 kernel? |