Bug 214851 - Ipset performance - huge regression
Summary: Ipset performance - huge regression
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Netfilter/Iptables (show other bugs)
Hardware: All Linux
: P1 high
Assignee: networking_netfilter-iptables@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-27 19:25 UTC by kernel
Modified: 2021-11-21 09:21 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.13.19
Tree: Mainline
Regression: Yes


Attachments
Selected sets in kernel (9.79 KB, image/jpeg)
2021-11-02 23:28 UTC, kernel
Details
Function to add ip into ipset (1.08 KB, text/plain)
2021-11-03 00:17 UTC, kernel
Details

Description kernel 2021-10-27 19:25:42 UTC
I am using script to insert a lot of ips into ipset tables.
Number of items: about 212 000 items inserted into four tables. About 53 000 items per table
I am using script to run 4 times the same function in the same time.

Kernel: 5.6.19 -> time: about 11 minutes
Kernel: 5.13.19 -> time: 1 hour and 55 minutes

I don't remember when it happened, but probably about half year ago, maybe earlier.
Comment 1 Jozsef Kadlecsik 2021-11-02 16:28:47 UTC
Could you add some information about what type of sets do you use, including the extensions? Also, do you add entries with IP addresses only, or hostnames and DNS lookups are involved as well.
Comment 2 kernel 2021-11-02 23:28:39 UTC
Created attachment 299409 [details]
Selected sets in kernel
Comment 3 kernel 2021-11-02 23:59:48 UTC
My firewall uses 12 ipset tables. Problematic tables uses hash:net. Except these 4 tables, other tables are much smaller. 1 table about 200 ip adresses and other below 20 entries.

# ipset list -t|grep -F "Type"
Type: hash:net
Type: hash:ip,mac
Type: hash:ip,mac
Type: hash:net
Type: hash:net
Type: hash:net
Type: hash:net
Type: hash:net
Type: hash:net
Type: hash:net
Type: hash:net
Type: hash:net

Source file with ip adresses, contains entries like these:
123.123.123.123
0.0.0.0/8

All entries are ip, not domain name.

In iptables, I am using standard extensions like:
- States (INVALID, ESTABLISHED etc)
- Ports (src and dst)
- IP (src and dst)
- Protocols (UDP, TCP, ICMP)
- MAC
- LOG/ACCEPT/DROP/REJECT
- match-set
Comment 4 kernel 2021-11-03 00:17:31 UTC
Created attachment 299411 [details]
Function to add ip into ipset

This file (function.txt), contains function, what I am using to add ip into ipset. Names are in polish, but function is short, so you should understand, what this function do.
This function is run four times in the same time.
Comment 5 kernel 2021-11-09 19:44:44 UTC
Is it enough for you to fix it? or you need more informations?
Comment 6 Jozsef Kadlecsik 2021-11-10 19:26:16 UTC
Could you send me in private a saved set content in order to be able to reproduce the issue more easily?


In your function you call "ipset add" for every element, which is not efficient. Better create the content of the set in saved format (simply "add setname element" lines in a file) and call "ipset restore < filename", which is way faster.

However, it does explain the slowdown in adding single entries one by one.
Comment 7 kernel 2021-11-10 20:34:13 UTC
I can't send you all my sets, but I will send you as much as it's possible. I will try to send you more functions (without data), but it will be in next week (On next Tuesday or Wednesday I should be free to prepare it).
Problematic sets are lists from https://iplists.firehol.org (firehol_level1, firehol_level2, firehol_level3, and firehol_level4).

all_lists = firehol_level1 + firehol_level2 + firehol_level3 + firehol_level4
list1,list2,list3,list4 = all_lists/4
Files named list1, list2 etc. are added into ipset. 

I know, that I can save and load all ip adresses as one file, but I prepared these functions for smaller sets and it wasn't problem. Lists from firehol I started to use later.  For now, 10 minutes waiting is not a big problem, but 2h is a big problem, especially at laptop, where fan are loud.

Yes, it explains, why adding is slow, but it doesn't explain, why there is so big difference between old and new kernel. I am using the same data and functions.
Comment 8 Mahdi Dibaiee 2021-11-20 12:02:42 UTC
Can you isolate the issue to ipset by removing the ipcalc call in your script and running it to see if it's still slower than you anticipate? I tried your script locally and ipcalc is the slowest part for me.

You might pre-filter your IPs using your ipcalc logic, and then feed that into a script that doesn't run the ipcalc check.
Comment 9 kernel 2021-11-21 02:56:48 UTC
You are right. I checked it and problem is connected with ipcalc.
When I tried to use older version of ipcalc, my script worked well.
Probably, when I installed new version of kernel, in the meantime ipcalc was updated too and I thought that problem is connected with ipset.

They added IPv6 support into ipcalc and it cause performance decreasing.

PS. Do you know, how can I verify if item is correct ip address or network address? I have to replace ipcalc and use other tool.
Comment 10 Mahdi Dibaiee 2021-11-21 09:21:50 UTC
Thanks for confirming. Can you not just stick with the older ipcalc version? You might also want to open an issue on ipcalc to let them know of this significant performance degradation: https://github.com/kjokjo/ipcalc

Note You need to log in before you can comment on or make changes to this bug.