Bug 197513 - IPsec (ESP) performance drastically reduced from 4.10 to >=4.12
Summary: IPsec (ESP) performance drastically reduced from 4.10 to >=4.12
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: Intel Linux
: P1 blocking
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-28 07:06 UTC by Vicente De Luca
Modified: 2017-10-30 23:54 UTC (History)
0 users

See Also:
Kernel Version: >=4.12
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Vicente De Luca 2017-10-28 07:06:45 UTC
my ipsec stack (ipsec-tools + racoon + opennhrp) works perfect on linux 4.10 w/ a troughput around 700mbps, but when upgraded to kernel >=4.12 the performance drastically reduces to around 500 Kbits.

tried loading esp4_offload module, but no help.
using aes256 for p1 and aes128 for p2.

root@sr-a940d:/home/vdeluca# iperf -c tacacs1 -i 1
------------------------------------------------------------
Client connecting to tacacs1, TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  3] local 172.24.0.156 port 16248 connected with 10.104.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   191 KBytes  1.57 Mbits/sec
[  3]  1.0- 2.0 sec  80.4 KBytes   659 Kbits/sec
[  3]  2.0- 3.0 sec  63.3 KBytes   518 Kbits/sec
[  3]  3.0- 4.0 sec  79.1 KBytes   648 Kbits/sec
[  3]  4.0- 5.0 sec   145 KBytes  1.19 Mbits/sec
[  3]  5.0- 6.0 sec  83.1 KBytes   680 Kbits/sec
[  3]  6.0- 7.0 sec  80.4 KBytes   659 Kbits/sec
[  3]  7.0- 8.0 sec  65.9 KBytes   540 Kbits/sec


root@sr-a940d:/home/vdeluca# lsmod | grep esp
esp4_offload           16384  40
esp4                   20480  41 esp4_offload
xfrm_algo              16384  2 esp4,af_key

rolling back to kernel 4.10.0 totally resolves the issue.



thank you!
Comment 1 Vicente De Luca 2017-10-29 20:43:15 UTC
This issues seems to be related to same root cause, but in a different context:

https://github.com/weaveworks/weave/issues/3075

https://github.com/moby/moby/issues/33133
Comment 3 Vicente De Luca 2017-10-29 21:49:22 UTC
also worth mention that my tests were hosted on AWS and GCP, so virtualized instances.
network driver = ixgbevf

tried different versions of ixgbevf, they all result in same behavior.
Comment 4 Vicente De Luca 2017-10-29 21:59:57 UTC
Disabling TCP segmentation offload (tso off) resolves the TCP troughput bottleneck, but (as expect) increases a lot the CPU utilization by ksoftirqd
Comment 5 Vicente De Luca 2017-10-30 03:11:49 UTC
Problem happens under this circumstances:

root@sr-a940d:/home/vdeluca# uname -a
Linux sr-a940d.us-west-2a.usw2-infra.medusa.zdsys.com 4.13.10-041310-generic #201710270531 SMP Fri Oct 27 09:33:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@sr-a940d:/home/vdeluca# ethtool -k gre1
Features for gre1:
rx-checksumming: off [fixed]
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp-mangleid-segmentation: on
	tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
Comment 6 Vicente De Luca 2017-10-30 23:54:50 UTC
If you experiencing same behavior, you can try this patch to your kernel 
-- gently provided by Steffen Klassert 

tested on my setup and it resolved the performance issue. 

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 31a2e6d..73ad8c8 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -105,6 +105,9 @@ static int xfrm_output_one(struct sk_buff *skb, int err)
		if (xfrm_offload(skb)) {
			x->type_offload->encap(x, skb);
		} else {
+			/* Inner headers are invalid now. */
+			skb->encapsulation = 0;
+
			err = x->type->output(x, skb);
			if (err == -EINPROGRESS)
				goto out;
@@ -208,7 +211,6 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
	int err;

	secpath_reset(skb);
-	skb->encapsulation = 0;

	if (xfrm_dev_offload_ok(skb, x)) {
		struct sec_path *sp;

Note You need to log in before you can comment on or make changes to this bug.