my ipsec stack (ipsec-tools + racoon + opennhrp) works perfect on linux 4.10 w/ a troughput around 700mbps, but when upgraded to kernel >=4.12 the performance drastically reduces to around 500 Kbits. tried loading esp4_offload module, but no help. using aes256 for p1 and aes128 for p2. root@sr-a940d:/home/vdeluca# iperf -c tacacs1 -i 1 ------------------------------------------------------------ Client connecting to tacacs1, TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 3] local 172.24.0.156 port 16248 connected with 10.104.1.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 191 KBytes 1.57 Mbits/sec [ 3] 1.0- 2.0 sec 80.4 KBytes 659 Kbits/sec [ 3] 2.0- 3.0 sec 63.3 KBytes 518 Kbits/sec [ 3] 3.0- 4.0 sec 79.1 KBytes 648 Kbits/sec [ 3] 4.0- 5.0 sec 145 KBytes 1.19 Mbits/sec [ 3] 5.0- 6.0 sec 83.1 KBytes 680 Kbits/sec [ 3] 6.0- 7.0 sec 80.4 KBytes 659 Kbits/sec [ 3] 7.0- 8.0 sec 65.9 KBytes 540 Kbits/sec root@sr-a940d:/home/vdeluca# lsmod | grep esp esp4_offload 16384 40 esp4 20480 41 esp4_offload xfrm_algo 16384 2 esp4,af_key rolling back to kernel 4.10.0 totally resolves the issue. thank you!
This issues seems to be related to same root cause, but in a different context: https://github.com/weaveworks/weave/issues/3075 https://github.com/moby/moby/issues/33133
maybe related to this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d77e38e612a017480157fe6d2c1422f42cb5b7e3
also worth mention that my tests were hosted on AWS and GCP, so virtualized instances. network driver = ixgbevf tried different versions of ixgbevf, they all result in same behavior.
Disabling TCP segmentation offload (tso off) resolves the TCP troughput bottleneck, but (as expect) increases a lot the CPU utilization by ksoftirqd
Problem happens under this circumstances: root@sr-a940d:/home/vdeluca# uname -a Linux sr-a940d.us-west-2a.usw2-infra.medusa.zdsys.com 4.13.10-041310-generic #201710270531 SMP Fri Oct 27 09:33:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux root@sr-a940d:/home/vdeluca# ethtool -k gre1 Features for gre1: rx-checksumming: off [fixed] tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: on tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp-mangleid-segmentation: on tx-tcp6-segmentation: on udp-fragmentation-offload: on generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: off [fixed] ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: on rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: on [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-gre-csum-segmentation: off [fixed] tx-ipxip4-segmentation: off [fixed] tx-ipxip6-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-udp_tnl-csum-segmentation: off [fixed] tx-gso-partial: off [fixed] tx-sctp-segmentation: on tx-esp-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] hw-tc-offload: off [fixed] esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed]
If you experiencing same behavior, you can try this patch to your kernel -- gently provided by Steffen Klassert tested on my setup and it resolved the performance issue. diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index 31a2e6d..73ad8c8 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -105,6 +105,9 @@ static int xfrm_output_one(struct sk_buff *skb, int err) if (xfrm_offload(skb)) { x->type_offload->encap(x, skb); } else { + /* Inner headers are invalid now. */ + skb->encapsulation = 0; + err = x->type->output(x, skb); if (err == -EINPROGRESS) goto out; @@ -208,7 +211,6 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb) int err; secpath_reset(skb); - skb->encapsulation = 0; if (xfrm_dev_offload_ok(skb, x)) { struct sec_path *sp;