Bug 199995 - Ramdomly sent TCP Reset from Kernel with bonding mode "brodcast"
Summary: Ramdomly sent TCP Reset from Kernel with bonding mode "brodcast"
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-08 16:06 UTC by l.bendel
Modified: 2018-06-08 16:06 UTC (History)
0 users

See Also:
Kernel Version: since 4.15.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
TCP Dump (1.69 KB, application/vnd.tcpdump.pcap)
2018-06-08 16:06 UTC, l.bendel
Details

Description l.bendel 2018-06-08 16:06:40 UTC
Created attachment 276401 [details]
TCP Dump

Hi,

after a dist upgrade from Ubuntu 17.10 (Kernel 4.13.x) to Ubuntu 18.04 (Kernel 4.15.0) I suffer from ramdomly generated TCP RST packets sent (presumably) by the Kernel 
on a bonding device that uses bonding mode "brodcast" with 2 physical NICs.

With tcpdump/whireshark I can see that the kernel randomly sends TCP-RST packets after the SYN/ACK/ACK packet is received (see attached PCAP).
This only happens if the kernel receives the initial SYN packet on both physical NICs (and therefore seeing it twice), before the connection is established by sending SYN/ACK.
It's not happening in 100% of all cases and only, if the system can use two or more CPU cores/threads. With only one CPU available to the system, this behaviour is not reproducable.


I can reproduce this on multiple physical servers with 2 bonded Intel NICs connected over 2 seperate Switches and with virtual machines on a KVM Host using 2 dedicated host bridges.
This also happens with a fresh installed Ubuntu 18.04 and Fedora 28 (kernel 4.16), so I decided to compile and boot with Kernel 4.17.0 on ubuntu, getting the same result.
Only disabling/blocking the second network connection or reducing the amount of CPU cores of the VM to one core solves the problem, so I think this could be a race condition on systems with more than one CPU core and thread.

For my tests I used a very basic Ubuntu 18.04 (x86-64) running xinetd tcp-echo service (port 7/TCP).
On the client I used the netcat-traditional packet with the following command:

  while true; do echo $(date) | nc.traditional -q 1 ECHO-SERVER 7; sleep 0.1 ; done
  
  
This gives the following output:

---------------------------------------
Fr 8. Jun 09:12:43 UTC 2018
Fr 8. Jun 09:12:43 UTC 2018
Fr 8. Jun 09:12:43 UTC 2018
Fr 8. Jun 09:12:43 UTC 2018
Fr 8. Jun 09:12:43 UTC 2018
Fr 8. Jun 09:12:43 UTC 2018
Fr 8. Jun 09:12:43 UTC 2018
Fr 8. Jun 09:12:43 UTC 2018
(UNKNOWN) [192.168.86.101] 7 (echo) : Connection reset by peer
(UNKNOWN) [192.168.86.101] 7 (echo) : Connection reset by peer
(UNKNOWN) [192.168.86.101] 7 (echo) : Connection reset by peer
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
Fr 8. Jun 09:12:44 UTC 2018
(UNKNOWN) [192.168.86.101] 7 (echo) : Connection reset by peer
(UNKNOWN) [192.168.86.101] 7 (echo) : Connection reset by peer
Fr 8. Jun 09:12:44 UTC 2018
---------------------------------------

Note You need to log in before you can comment on or make changes to this bug.