Hi, I am trying to enable flow offload in nftables. Testing Fedora 30 virtual machine with three virtio interfaced (eth0 -for communication with host system, eth1 and eth2 for routing), all default settings except net.ipv4.ip_forward=1. Interfaces configuration: eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.122.247 netmask 255.255.255.0 broadcast 192.168.122.255 inet6 fe80::5054:ff:fefd:4919 prefixlen 64 scopeid 0x20<link> ether 52:54:00:fd:49:19 txqueuelen 1000 (Ethernet) RX packets 375 bytes 27949 (27.2 KiB) RX errors 0 dropped 7 overruns 0 frame 0 TX packets 168 bytes 39187 (38.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 198.51.100.1 netmask 255.255.255.0 broadcast 198.51.100.255 inet6 fe80::5054:ff:fefd:4921 prefixlen 64 scopeid 0x20<link> ether 52:54:00:fd:49:21 txqueuelen 1000 (Ethernet) RX packets 1204680 bytes 79543005 (75.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1547644 bytes 12823278521 (11.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.1 netmask 255.255.255.0 broadcast 10.0.0.255 inet6 fe80::5054:ff:fefd:4920 prefixlen 64 scopeid 0x20<link> ether 52:54:00:fd:49:20 txqueuelen 1000 (Ethernet) RX packets 8785433 bytes 13300974656 (12.3 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1204716 bytes 79548109 (75.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 nftables configuration: table inet filter { flowtable ft { hook ingress priority 0 devices = { eth1, eth2 } } chain forward { type filter hook forward priority 0; policy accept; ip protocol tcp flow offload @ft oif "eth2" jump lan } chain lan { ct state established,related accept drop } } table ip nat { chain prerouting { type nat hook prerouting priority -100; policy accept; } chain postrouting { type nat hook postrouting priority 100; policy accept; oif "eth1" ip saddr 10.0.0.0/24 snat to 198.51.100.1 } } In host system two veth pairs with one side added in bridges where vm's inetrfaces eth1,eth2 and other side in two netns: veth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 198.51.100.254 netmask 255.255.255.0 broadcast 198.51.100.255 inet6 fe80::83b:2cff:fe75:6ea6 prefixlen 64 scopeid 0x20<link> ether 0a:3b:2c:75:6e:a6 txqueuelen 1000 (Ethernet) RX packets 6917456 bytes 56984049726 (53.0 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5336448 bytes 352365569 (336.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 veth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.2 netmask 255.255.255.0 broadcast 10.0.0.255 inet6 fe80::1461:8eff:fe7b:8c13 prefixlen 64 scopeid 0x20<link> ether 16:61:8e:7b:8c:13 txqueuelen 1000 (Ethernet) RX packets 5336683 bytes 352452369 (336.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 920209 bytes 56590718932 (52.7 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 In first netns started server "iperf3 -s", in second - client "iperf3 -c 198.51.100.254 -n 100G". If I interrupt test on client side on 29 sec or earlier then test on server side terminated as it should ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 198.51.100.1, port 58010 [ 5] local 198.51.100.254 port 5201 connected to 198.51.100.1 port 58012 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 273 MBytes 2.29 Gbits/sec [ 5] 1.00-2.00 sec 308 MBytes 2.58 Gbits/sec [ 5] 2.00-3.00 sec 289 MBytes 2.42 Gbits/sec .................................................... [ 5] 28.00-29.00 sec 282 MBytes 2.37 Gbits/sec [ 5] 28.00-29.00 sec 282 MBytes 2.37 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-29.00 sec 8.17 GBytes 2.42 Gbits/sec receiver iperf3: the client has terminated ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- In /proc/net/nf_conntrack after test interruption appears ipv4 2 tcp 6 5 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=58030 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=58030 mark=0 zone=0 use=2 But if test terminated on client after > 30 sec then on server side it never terminates Accepted connection from 198.51.100.1, port 58018 [ 5] local 198.51.100.254 port 5201 connected to 198.51.100.1 port 58020 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 251 MBytes 2.10 Gbits/sec [ 5] 1.00-2.00 sec 278 MBytes 2.33 Gbits/sec [ 5] 2.00-3.00 sec 286 MBytes 2.40 Gbits/sec .................................................... [ 5] 28.00-29.00 sec 301 MBytes 2.53 Gbits/sec [ 5] 29.00-30.00 sec 293 MBytes 2.46 Gbits/sec [ 5] 30.00-31.00 sec 283 MBytes 2.38 Gbits/sec [ 5] 31.00-32.00 sec 116 MBytes 974 Mbits/sec [ 5] 32.00-33.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 33.00-34.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 34.00-35.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 35.00-36.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 36.00-37.00 sec 0.00 Bytes 0.00 bits/sec [ 5] 37.00-38.00 sec 0.00 Bytes 0.00 bits/sec in /proc/net/nf_conntrack after test interruption appears only ipv4 2 tcp 6 37 CLOSE_WAIT src=10.0.0.2 dst=198.51.100.254 sport=58034 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=58034 mark=0 zone=0 use=2
I should add that no problem after removing "ip protocol tcp flow offload @ft" or "oif "eth1" ip saddr 10.0.0.0/24 snat to 198.51.100.1" rules.
The same behaviour with nftables-0.9.1 and kernel 5.1.15-300.fc30.x86_64.
I am also experiencing similar connection stalling with TCP-connections. In my case the offload is applied on NAT-gateway before ct rules like this: cat /etc/nftables.conf | grep flow #add flowtable ip filter flows { hook ingress priority -50; devices = {enp6s0f0, enp6s0f1}; } #add rule ip filter FORWARD counter flow offload @flows comment "FASTPATH TEST" The stalling occurs systematically on long running TCP-connections but I have not seen any issues with UDP. When the offloading is disabled, the issue no longer occurs.
I also can reproduce bug with this minimal nftables 0.9.1 setup: table inet filter { flowtable ft { hook ingress priority filter devices = { eth1, eth2 } } chain forward { type filter hook forward priority filter; policy accept; ip protocol tcp flow add @ft } } table ip nat { chain prerouting { type nat hook prerouting priority dstnat; policy accept; } chain postrouting { type nat hook postrouting priority srcnat; policy accept; oif "eth1" ip saddr 10.0.0.0/24 snat to 198.51.100.1 } }
Could you give a try to these fixes? https://patchwork.ozlabs.org/patch/1102703/ https://patchwork.ozlabs.org/patch/1102704/ https://patchwork.ozlabs.org/patch/1102705/ https://patchwork.ozlabs.org/patch/1102706/ I can request for inclusion into -stable. Thanks.
No stale connection with patches applied to to 4.19.56. I am going to test also 5.1.15.
No stale connection also in 5.1.15 with patches. But I noticed in /proc/net/nf_conntrack sometimes left TIME_WAIT or CLOSE with large timeout 86399 but this is hard to reproduce because most of connections disappeared shortly after closing.
When iperf3 connection is active all [OFFLOAD] entries disappeared from /proc/net/nf_conntrack after about 60 seconds.
There is a race that might trigger the 85400 timeout with TIME_WAIT (this is actually one day, in seconds, which is an internal offload timeout that is leaking to userspace when hitting this bug). http://patchwork.ozlabs.org/patch/1144577/ http://patchwork.ozlabs.org/patch/1144578/ Please, give a try to these patches. Regarding the "entries dissapeared from /proc/net/nf_conntrack after 60 seconds", I cannot reproduce it. Could you tell me how you do reproduce it there?
Disappearing is hard to reproduce. I just run client client "iperf3 -c 198.51.100.254 -n 100G" several times and there is randomly different behaviour entries in /proc/net/nf_conntrack. First, here results with kernel 5.2.6-200.fc30.x86_64 without patches from comment 9 with this ruleset table inet filter { flowtable ft { hook ingress priority filter devices = { eth1, eth2 } } chain forward { type filter hook forward priority filter; policy accept; flow add @ft } } table ip nat { chain prerouting { type nat hook prerouting priority dstnat; policy accept; } chain postrouting { type nat hook postrouting priority srcnat; policy accept; oif "eth1" ip saddr 10.0.0.0/24 snat to 198.51.100.1 } } First iperf3 test without rule "flow add @ft": server side: tcp6 0 0 198.51.100.254:5201 198.51.100.1:49806 ESTABLISHED 27619/iperf3 tcp6 0 0 198.51.100.254:5201 198.51.100.1:49808 ESTABLISHED 27619/iperf3 client side: tcp 0 0 10.0.0.2:49806 198.51.100.254:5201 ESTABLISHED 27660/iperf3 tcp 0 3082920 10.0.0.2:49808 198.51.100.254:5201 ESTABLISHED 27660/iperf3 /proc/net/nf_conntrack during all test have correct entries: ipv4 2 tcp 6 431932 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49806 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49806 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 300 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49808 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49808 [ASSURED] mark=0 zone=0 use=2 /proc/net/nf_conntrack after interrupting client with ctrl+c: ipv4 2 tcp 6 118 TIME_WAIT src=10.0.0.2 dst=198.51.100.254 sport=49806 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49806 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 8 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=49808 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49808 [ASSURED] mark=0 zone=0 use=2 Now several tests with "flow add @ft" rule: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49722 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49722 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 24 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49722 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49722 mark=0 zone=0 use=2 /proc/net/nf_conntrack after 59 seconds (on lient side still two established connections) ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after interrupting client ipv4 2 tcp 6 2 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=49724 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49724 mark=0 zone=0 use=2 ipv4 2 tcp 6 52 CLOSE_WAIT src=10.0.0.2 dst=198.51.100.254 sport=49722 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49722 mark=0 zone=0 use=2 Next test: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 86378 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 mark=0 zone=0 use=2 /proc/net/nf_conntrack after 59 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 86344 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 mark=0 zone=0 use=2 /proc/net/nf_conntrack after interrupting client ipv4 2 tcp 6 7 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=49748 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49748 mark=0 zone=0 use=2 ipv4 2 tcp 6 117 TIME_WAIT src=10.0.0.2 dst=198.51.100.254 sport=49746 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49746 [ASSURED] mark=0 zone=0 use=2 Next test: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49760 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49760 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49762 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49762 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 30 seconds ipv4 2 tcp 6 26 SYN_RECV src=10.0.0.2 dst=198.51.100.254 sport=49760 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49760 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49762 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49762 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 59 seconds empty, on cleint side [ 4] 59.00-60.00 sec 659 MBytes 5.53 Gbits/sec 1764 1.17 MBytes iperf3: error - unable to write to stream socket: Connection reset by peer Tests with kernel 5.2.7-200.fc30.x86_64 without patches from comment 9: /proc/net/nf_conntrack after test started ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 33 seconds ipv4 2 tcp 6 117 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after 154 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after interrupting client ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 zone=0 use=3 /proc/net/nf_conntrack after couple of seconds ipv4 2 tcp 6 39 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 mark=0 zone=0 use=2
Last test was with kernel 5.2.7-200.fc30.x86_64 and with patches from comment 9
(In reply to nucleo from comment #10) > Tests with kernel 5.2.7-200.fc30.x86_64 _with_ patches from comment 9: > > /proc/net/nf_conntrack after test started > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 > zone=0 use=3 > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 Both flows have been placed in the flowtable. > /proc/net/nf_conntrack after 33 seconds > ipv4 2 tcp 6 117 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 > sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 > dport=49854 mark=0 zone=0 use=2 > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 Flowtable sees no packets for flow sport=49854 after 30 seconds, this flow is pushed out from flowtable and conntrack recover control on it. The pick up timeout (120 seconds) kicks in and the entry is set in ESTABLISHED state (tracking is also set to liberal). > /proc/net/nf_conntrack after 154 seconds > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 Flow sport=49854 is gone. No traffic for it after a while, conntrack saw no packets after 120 seconds either (this is 30 seconds flowtable timeout + 120 seconds for the pickup timeout) so the entry sport=49854 is released. > /proc/net/nf_conntrack after interrupting client > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49854 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49854 [OFFLOAD] mark=0 > zone=0 use=3 > ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=49856 dport=5201 > src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=49856 [OFFLOAD] mark=0 > zone=0 use=3 After pressing ctrl-c on the client, here in my testbed I see one entry in TIME_WAIT (in your case, that would be the flow identified by sport=49856) and another flow in ESTABLISHED state which is this one below... > /proc/net/nf_conntrack after couple of seconds > ipv4 2 tcp 6 39 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 > sport=49854 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 > dport=49854 mark=0 zone=0 use=2 ... this is sport=49854, the fin/rst packet is sent back to the flowtable, then the entry expires (no packets after 30 seconds) and it goes back to conntrack. I'll be posting two patches here: 1) do not push back flow to flowtable if packet is fin/rst. 2) likely increase default flowtable timeout to 120 seconds. I'll also expose toggles to make this configurable too. Thanks for your feedback.
(In reply to Pablo Neira Ayuso from comment #12) [...] > 1) do not push back flow to flowtable if packet is fin/rst. https://patchwork.ozlabs.org/patch/1146133/ With this patch, conntrack entries enter TIME_WAIT state fin/rst after interrupting client.
(In reply to Pablo Neira Ayuso from comment #13) > (In reply to Pablo Neira Ayuso from comment #12) > [...] > > 1) do not push back flow to flowtable if packet is fin/rst. > > https://patchwork.ozlabs.org/patch/1146133/ Patch version 2: https://patchwork.ozlabs.org/patch/1146419/
Here my tests with Fedora 5.2.9 kernel with applied patches from comment 9 and comment 14. I repeated the several times "iperf3 -c 198.51.100.254 -n 100G" interrupting it with ctrl+c. First run Contents of /proc/net/nf_conntrack ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51994 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51994 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51994 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51994 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 18 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 mark=0 zone=0 use=2 after 60 seconds /proc/net/nf_conntrack empty and after yet one second ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51994 dport=5201 [UNREPLIED] src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51994 [OFFLOAD] mark=0 zone=0 use=3 ctrl+c ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 [OFFLOAD] mark=0 zone=0 use=3 and after that ipv4 2 tcp 6 112 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=51992 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=51992 mark=0 zone=0 use=2 Second run ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52000 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52000 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52002 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52002 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 26 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52000 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52000 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52002 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52002 [OFFLOAD] mark=0 zone=0 use=3 after 60 seconds /proc/net/nf_conntrack empty, on client side: [ 4] 59.00-60.00 sec 528 MBytes 4.42 Gbits/sec 498 1.34 MBytes iperf3: error - unable to write to stream socket: Connection reset by peer In one of other runs test continued with empty /proc/net/nf_conntrack. Third run ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 25 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 mark=0 zone=0 use=2 after 60 seconds ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ctrl+c ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after that ipv4 2 tcp 6 3 CLOSE src=10.0.0.2 dst=198.51.100.254 sport=52008 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52008 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after that ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 [OFFLOAD] mark=0 zone=0 use=3 after that ipv4 2 tcp 6 71 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52006 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52006 mark=0 zone=0 use=2 Fourth run finished without interrupting ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52020 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52020 [OFFLOAD] mark=0 zone=0 use=3 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52022 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52022 [OFFLOAD] mark=0 zone=0 use=3 after 30 seconds ipv4 2 tcp 6 10 ESTABLISHED src=10.0.0.2 dst=198.51.100.254 sport=52020 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52020 mark=0 zone=0 use=2 ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52022 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52022 [OFFLOAD] mark=0 zone=0 use=3 after 60 seconds /proc/net/nf_conntrack empty and after 1 second ipv4 2 tcp 6 src=10.0.0.2 dst=198.51.100.254 sport=52022 dport=5201 [UNREPLIED] src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52022 [OFFLOAD] mark=0 zone=0 use=3 test finished ipv4 2 tcp 6 107 TIME_WAIT src=10.0.0.2 dst=198.51.100.254 sport=52020 dport=5201 src=198.51.100.254 dst=198.51.100.1 sport=5201 dport=52020 mark=0 zone=0 use=2
I cannot reproduce this here on 5.3-rc, I've been repeating similar tests here. Would you mind to check for all patches between 4.19 and 5.3 for the flowtable infrastructure? You could do this via: git log --oneline v4.19..v5.3-rc3 net/netfilter/nft_flow_offload.c also check for these files: net/netfilter/nf_flow_table_core.c net/netfilter/nf_flow_table_ip.c net/netfilter/nf_flow_table_inet.c net/ipv4/netfilter/nf_flow_table_ipv4.c net/ipv6/netfilter/nf_flow_table_ipv6.c include/net/netfilter/nf_flow_table.h Make sure you get a fresh clone of: https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git to check for any missing patch. If there are relevant patches already upstream that are not in 4.19 that fix the problem that you report, please tell a list of commit IDs and I'll request -stable maintainers to include them in the 4.19 -stable release. I agree that the 30 seconds timer to evict a flow from the flowtable that has seen no traffic is too aggresive, but before making a patch to rise this default timeout (and to expose a knob to allow users to configure this), it would be good to make sure no relevant patch is missing. Thanks!
Created attachment 284587 [details] skip fixup on teardown state After 150 seconds (30 seconds to evict the iperf control flow from the flowtable + 120 in ESTABLISHED state), if I press ctr-c, I can see this: ipv4 2 tcp 6 104 TIME_WAIT src=192.168.10.2 dst=10.0.1.2 s port=33994 dport=5201 src=10.0.1.2 dst=10.0.1.1 sport=5201 dport=33 994 mark=0 secctx=null zone=0 use=2 ipv4 2 tcp 6 104 ESTABLISHED src=192.168.10.2 dst=10.0.1.2 sport=33992 dport=5201 src=10.0.1.2 dst=10.0.1.1 sport=5201 dport= 33992 mark=0 secctx=null zone=0 use=2 The flow tcp sport/33992 is the iperf control plane flow. It seems that iperf sends a data packet on the after ctrl-c 20:13:22.268161 IP 192.168.10.2.33992 > 10.0.1.2.5201: Flags [P.], seq 3952723680:3952723681, ack 3326915136, win 502, options [nop,nop,TS val 2165195608 ecr 2773810022], length 1 this pushed in the flow into the flowtable again, however... 20:13:22.268434 IP 10.0.1.2.5201 > 192.168.10.2.33992: Flags [F.], seq 1, ack 1, win 509, options [nop,nop,TS val 2773964852 ecr 2165195608], length 0 20:13:22.268472 IP 192.168.10.2.33992 > 10.0.1.2.5201: Flags [F.], seq 1, ack 2, win 502, options [nop,nop,TS val 2165195608 ecr 2773964852], length 0 20:13:22.268492 IP 10.0.1.2.5201 > 192.168.10.2.33992: Flags [.], ack 2, win 509, options [nop,nop,TS val 2773964852 ecr 2165195608], length 0 These tcp fin packet schedules the flowtable entry to be removed, but the state fixup routine takes the conntrack entry from FIN_WAIT -> ESTABLISHED.
Scratch that, patch is not correct.
This patch fixes incorrect timeout initialization of the flowtable entry: https://patchwork.ozlabs.org/patch/1156702/
Now needed all of pacthes from comments 9, 14, 19? I can't test 4.19 kernel because 9,14 comments patches I can't apply to last 4.19.x.
Could you try with Fedora 5.2.9 kernel?