Bug 218552
Summary: | GRE passing Linux MPLS network has poor performance for TCP | ||
---|---|---|---|
Product: | Networking | Reporter: | Easynet (devel) |
Component: | Other | Assignee: | Stephen Hemminger (stephen) |
Status: | NEW --- | ||
Severity: | high | ||
Priority: | P3 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: | GRE over MPLS poor performance |
Created attachment 305949 [details] GRE over MPLS poor performance I'm facing a strange behavior of the MPLS network between 2 routers build on linux. Then I'm creating a GRE tunnel on a router or a GRE tunnel which is passing the Linux MPLS network I have a very-very poor performance of the TCP traffic, even I'm shrinking the MTU or the MSS. The setup is like this: Inbound traffic: ISP -> (eth3-0) R02 (eth4-0) -> (MPLS) -> (eth4-0) R01 (eth3-1 & eth4-1) -> VPN server Outbound traffic VPN server -> (eth3-1 & eth4-1) R01 (eth4-0) -> (MPLS) -> (eth4-0) R02 (eth3-0) -> ISP Routing table on R02: R02# show ip route vrf internet 89.A.B.1 Routing entry for 89.A.B.1/32 Known via "bgp", distance 200, metric 0, vrf internet, best Last update 12:43:11 ago 10.100.1.1(vrf default) (recursive), label 81, weight 1 * 10.100.0.1, via mpls0(vrf default), label IPv4 Explicit Null/81, weight 1 R02# show ip route vrf servers 89.A.B.161 Routing entry for 89.A.B.128/26 Known via "bgp", distance 200, metric 0, vrf servers, best Last update 12:40:35 ago 10.100.1.1(vrf default) (recursive), label 85, weight 1 * 10.100.0.1, via mpls0(vrf default), label IPv4 Explicit Null/85, weight 1 R02# show ip route vrf internet 89.A.B.161 Routing entry for 89.A.B.128/26 Known via "bgp", distance 200, metric 0, vrf internet, best Last update 12:42:56 ago 10.100.1.1(vrf default) (recursive), label 85, weight 1 * 10.100.0.1, via mpls0(vrf default), label IPv4 Explicit Null/85, weight 1 R02# show ip route vrf internet 178.C.D.0/15 Routing entry for 178.C.D.0/15 Known via "bgp", distance 20, metric 0, vrf internet, best Last update 14:28:23 ago 193.230.200.47 (recursive), weight 1 * 89.238.245.113, via wan0.650, weight 1 R02# show ip route vrf servers Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure VRF servers: S>* 0.0.0.0/0 [1/0] is directly connected, internet (vrf internet), weight 1, 14:29:46 Routing table on R01: R01# show ip route vrf internet 89.A.B.1 Routing entry for 89.A.B.1/32 Known via "local", distance 0, metric 0, vrf internet Last update 15:07:48 ago * directly connected, internet Routing entry for 89.A.B.1/32 Known via "connected", distance 0, metric 0, vrf internet, best Last update 15:07:48 ago * directly connected, internet R01# show ip route vrf servers 89.A.B.161 Routing entry for 89.A.B.128/26 Known via "connected", distance 0, metric 0, vrf servers, best Last update 14:53:40 ago * directly connected, lan0.11 R01# show ip route vrf internet 89.A.B.161 Routing entry for 89.A.B.128/26 Known via "bgp", distance 20, metric 0, vrf internet, best Last update 14:53:50 ago * directly connected, servers(vrf servers), weight 1 R01# show ip route vrf internet 178.C.D.0/15 Routing entry for 178.C.D.0/15 Known via "bgp", distance 200, metric 0, vrf internet, best Last update 12:44:27 ago 10.100.2.1(vrf default) (recursive), label 81, weight 1 * 10.100.0.2, via eth4-0(vrf default), label IPv4 Explicit Null/81, weight 1 Create a GRE tunnel: R01# /sbin/ip link add name gre1001 numtxqueues $(nproc) numrxqueues $(nproc) type gre remote 178.C.D.X local 89.A.B.1 ttl 225 key 1001 R01# ip link set gre1001 up R10# /sbin/ip link add name gre1001 numtxqueues $(nproc) numrxqueues $(nproc) type gre remote 89.A.B.1 local 178.C.D.X ttl 225 key 1001 R10# ip link set gre1001 up R01# show interface gre1001 Interface gre1001 is up, line protocol is up Link ups: 6 last: 2024/03/02 16:50:46.82 Link downs: 6 last: 2024/03/02 16:50:46.82 vrf: default Description: R01-R10 GRE index 206 metric 0 mtu 65507 speed 0 txqlen 1000 flags: <UP,POINTOPOINT,RUNNING,NOARP> Ignore all v4 routes with linkdown Ignore all v6 routes with linkdown Type: GRE over IP HWaddr: 59:26:3a:01 inet 10.100.100.129/30 inet6 fe80::5926:3a01/64 Interface Type GRE Interface Slave Type None VTEP IP: 89.A.B.1 , remote 178.C.D.X protodown: off R10# show interface gre1001 Interface gre1001 is up, line protocol is up Link ups: 38 last: 2024/03/02 16:51:35.67 Link downs: 30 last: 2024/03/02 16:51:35.66 vrf: default Description: R01-R10 GRE index 357 metric 0 mtu 1472 speed 0 txqlen 1000 flags: <UP,POINTOPOINT,RUNNING,NOARP> Type: GRE over IP HWaddr: b2:26:6c:bc inet 10.100.100.130/30 inet6 fe80::b226:6cbc/64 Interface Type GRE Interface Slave Type None VTEP IP: 178.C.D.X , remote 89.A.B.1 protodown: off Testing: R10# iperf3 -c 10.100.100.129 Connecting to host 10.100.100.129, port 5201 [ 5] local 10.100.100.130 port 51610 connected to 10.100.100.129 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 2.38 MBytes 19.9 Mbits/sec 20 2.77 KBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 8 2.77 KBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 8 2.77 KBytes [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 8 2.77 KBytes [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 8 4.16 KBytes [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 8 5.55 KBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 12 2.77 KBytes [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 8 2.77 KBytes [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 12 2.77 KBytes [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 10 2.77 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.38 MBytes 2.00 Mbits/sec 102 sender [ 5] 0.00-10.04 sec 128 KBytes 104 Kbits/sec receiver R10# iperf3 -c 10.100.100.129 -R Connecting to host 10.100.100.129, port 5201 Reverse mode, remote host 10.100.100.129 is sending [ 5] local 10.100.100.130 port 50280 connected to 10.100.100.129 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.03 sec 47.4 MBytes 386 Mbits/sec [ 5] 1.03-2.02 sec 30.9 MBytes 261 Mbits/sec [ 5] 2.02-3.00 sec 25.8 MBytes 220 Mbits/sec [ 5] 3.00-4.01 sec 27.0 MBytes 224 Mbits/sec [ 5] 4.01-5.01 sec 28.4 MBytes 238 Mbits/sec [ 5] 5.01-6.00 sec 28.0 MBytes 238 Mbits/sec [ 5] 6.00-7.01 sec 28.1 MBytes 235 Mbits/sec [ 5] 7.01-8.00 sec 28.8 MBytes 242 Mbits/sec [ 5] 8.00-9.03 sec 29.0 MBytes 237 Mbits/sec [ 5] 9.03-10.01 sec 28.2 MBytes 242 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.05 sec 305 MBytes 255 Mbits/sec 8 sender [ 5] 0.00-10.01 sec 302 MBytes 253 Mbits/sec receiver Even in TCPDUMP over the MPLS network I'm capturing very low number of packates: I test it from VPN server to some Cisco routers and each time when the GRE tunnel is passing Linux MPLS network I have such huge TCP degradation. If I'm moving the tunnels to GUE, FOU or IPIP the performances are over 250Mbits/s. R10# ip a l gre1001 358: gre1001@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000 link/ipip 178.C.D.X peer 89.A.B.1 inet 10.100.100.130/30 brd 10.100.100.131 scope global gre1001 valid_lft forever preferred_lft forever inet6 fe80::200:5efe:b226:6cbc/64 scope link proto kernel_ll valid_lft forever preferred_lft forever R01# ip a l gre1001 207: gre1001@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000 link/ipip 89.A.B.1 peer 178.C.D.X inet 10.100.100.129/30 brd 10.100.100.131 scope global gre1001 valid_lft forever preferred_lft forever inet6 fe80::200:5efe:5926:3a01/64 scope link proto kernel_ll valid_lft forever preferred_lft forever root@R10:~# iperf3 -c 10.100.100.129 -R Connecting to host 10.100.100.129, port 5201 Reverse mode, remote host 10.100.100.129 is sending [ 5] local 10.100.100.130 port 42162 connected to 10.100.100.129 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.02 sec 48.0 MBytes 395 Mbits/sec [ 5] 1.02-2.01 sec 33.4 MBytes 283 Mbits/sec [ 5] 2.01-3.01 sec 35.0 MBytes 292 Mbits/sec [ 5] 3.01-4.01 sec 36.6 MBytes 307 Mbits/sec [ 5] 4.01-5.01 sec 37.6 MBytes 317 Mbits/sec [ 5] 5.01-6.00 sec 38.4 MBytes 322 Mbits/sec [ 5] 6.00-7.00 sec 38.2 MBytes 321 Mbits/sec [ 5] 7.00-8.00 sec 38.5 MBytes 323 Mbits/sec [ 5] 8.00-9.01 sec 39.1 MBytes 327 Mbits/sec [ 5] 9.01-10.01 sec 38.9 MBytes 327 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.05 sec 388 MBytes 324 Mbits/sec 12 sender [ 5] 0.00-10.01 sec 384 MBytes 322 Mbits/sec receiver iperf Done. root@R10:~# iperf3 -c 10.100.100.129 Connecting to host 10.100.100.129, port 5201 [ 5] local 10.100.100.130 port 43416 connected to 10.100.100.129 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 41.1 MBytes 345 Mbits/sec 0 3.75 MBytes [ 5] 1.00-2.00 sec 46.2 MBytes 388 Mbits/sec 5 1.34 MBytes [ 5] 2.00-3.00 sec 36.2 MBytes 304 Mbits/sec 0 1.42 MBytes [ 5] 3.00-4.00 sec 36.2 MBytes 304 Mbits/sec 0 1.48 MBytes [ 5] 4.00-5.00 sec 38.8 MBytes 325 Mbits/sec 0 1.52 MBytes [ 5] 5.00-6.00 sec 40.0 MBytes 335 Mbits/sec 0 1.55 MBytes [ 5] 6.00-7.00 sec 38.8 MBytes 325 Mbits/sec 0 1.57 MBytes [ 5] 7.00-8.00 sec 40.0 MBytes 336 Mbits/sec 0 1.57 MBytes [ 5] 8.00-9.00 sec 40.0 MBytes 336 Mbits/sec 0 1.57 MBytes [ 5] 9.00-10.00 sec 40.0 MBytes 335 Mbits/sec 0 1.57 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 397 MBytes 333 Mbits/sec 5 sender [ 5] 0.00-10.04 sec 396 MBytes 330 Mbits/sec receiver iperf Done. Routers NICs are 40GbE Mellanox MCX354A-FCBT. MPLS interfaces has MTU 9216. The solution was to move all tunnels between VPN server and Cisco routers to IPIP. Does anybody faced such issue? I have no clue what to optimize or is a Kernel bug. Tested on 6.5.x an 6.6.x Kernels. I've attached a small capture of traffic from the test with poor performance.