Bug 85091 - neighbour table overflow is not reported and impacts localhost TCP connectivity
Summary: neighbour table overflow is not reported and impacts localhost TCP connectivity
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-24 10:44 UTC by Andreas Schultz
Modified: 2016-02-15 20:07 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.8 - 3.17-rc6
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Andreas Schultz 2014-09-24 10:44:11 UTC
With the default gc_thresh values and a busy /16 network attached, the neighbour cache can overflow. No indication is given that this happens and it does impact TCP on localhost.

Test setup:

* about 16k (simulated IP/MAC's) on one interface
* web server behind on second interface
* routing between the two
* HTTP benchmark from the 16k IP's to the web server
* for localhost connectivity verification a netperf instance is run on localhost like so: 'netperf -D 1 -l 600 127.0.0.1'

Result:

Kernel has to learn the 16k IP/MAC combinations, as soon as gc_thresh3 is hit, netperf stalls, no syslog/kernel message indicates the problem.

The only indication are log entries like this:

  "net_ratelimit: 1464 callbacks suppressed"

No other messages are logged.
Comment 1 Cong Wang 2014-09-25 16:46:15 UTC
What do you mean by "netperf stalls"? When hit gc_thresh3, netperf should get EINVAL and then it should stop unless it ignores syscall return value.
Comment 2 Andreas Schultz 2014-09-25 17:17:53 UTC
I have used netperf only used to make the problem easily producible. Every *established*, local (though lo interface) TCP connection seems to be affected.

The TCP connection seems to stall, netstat shows that the send queue of the netperf server process fills up.

traces (systemtap) on the processes show that poll is not reporting the socket as having data. Neither the sender nor the receiver side is getting an EINVAL on the syscalls.

tcpdump on lo shows a distinct "gap" between a single TCP packet and ACK for it. Sometimes the gap is 20 seconds, sometime much more.
Comment 3 Cong Wang 2014-09-25 20:54:51 UTC
Hmm, interesting. loopback traffic should not need a neigh entry at all. How many concurrent TCP connections do you have? Did you see any memory pressure?

Thanks.

Note You need to log in before you can comment on or make changes to this bug.