Most recent kernel where this bug did *NOT* occur: 2.6.19.5 Distribution: slackware based Hardware Environment: Xeon + e1000 Software Environment: gcc 3.3.x Problem Description: noticeable (10-15%) routing and bridging (possibly overall) performance drop between 2.6.19.5 - 2.6.20 (2.6.20.1 2.6.21-rc1 too) same configs we are experiencing performance drop on our rather busy(>150kpps >700Mbps in one direction) bridge with ebtables and routers (210k routes, >30kpps duplex). It's not e1000 issue. Steps to reproduce:
Reply-To: akpm@linux-foundation.org ooh. Begin forwarded message: Date: Mon, 26 Feb 2007 07:10:48 -0800 From: bugme-daemon@bugzilla.kernel.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 8085] New: performance drop in 2.6.20 http://bugzilla.kernel.org/show_bug.cgi?id=8085 Summary: performance drop in 2.6.20 Kernel Version: 2.6.20 Status: NEW Severity: normal Owner: shemminger@osdl.org Submitter: a1bert@atlas.cz Most recent kernel where this bug did *NOT* occur: 2.6.19.5 Distribution: slackware based Hardware Environment: Xeon + e1000 Software Environment: gcc 3.3.x Problem Description: noticeable (10-15%) routing and bridging (possibly overall) performance drop between 2.6.19.5 - 2.6.20 (2.6.20.1 2.6.21-rc1 too) same configs we are experiencing performance drop on our rather busy(>150kpps >700Mbps in one direction) bridge with ebtables and routers (210k routes, >30kpps duplex). It's not e1000 issue. Steps to reproduce: ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
Created attachment 10539 [details] cpu load
It would be helpful to get some profiling on the system see Documentation/basic_profiling.txt And possibly bisect to a specific change.
What makes you say it s not a e1000 driver or configuration change?
the same - latest e1000 driver was used (with the same results as e1000 drivers from vanilla); the config for 2.6.20 was done by cp .config ; make oldconfig ...
well, looks like it's conntrack related, when I do not use conntrack, the performace is same for both kernels
Created attachment 11345 [details] profile 2.6.19.5 (NAT enabled) captured with: readprofile -r ;sleep 300 ; readprofile -m ....... since it's tested on live traffic, the traffic processed during the test it's not the same but simillar ;)
Created attachment 11346 [details] profile 2.6.20.1 (NAT enabled) captured with: readprofile -r ;sleep 300 ; readprofile -m ....... since it's tested on a live traffic, the traffic processed during the test it's not the same but similar ;)
The second profile only shows a single connection tracking function: 625 nf_ct_attach 25.0000 Please add the .configs used for both kernels, information which modules are loaded (not sure what you mean by "enabled") and whether any iptables rules are used.
NAT enabled means NAT rules inserted => conntrack modules loaded: # Generated by iptables-save v1.3.1 on Tue May 1 22:43:06 2007 *nat :PREROUTING ACCEPT [50781211298:29582995904030] :POSTROUTING ACCEPT [1450:93488] :OUTPUT ACCEPT [1450:93488] -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 80 -A PREROUTING -i eth0 -p tcp -m tcp --dport 3128 -j REDIRECT --to-ports 80 COMMIT # Completed on Tue May 1 22:43:06 2007 Module Size Used by ebt_ip 2432 346 ebtable_broute 2048 1 ebtables 18688 2 ebt_ip,ebtable_broute ipt_REDIRECT 2048 2 xt_tcpudp 3328 2 iptable_nat 6148 1 ip_nat 15020 2 ipt_REDIRECT,iptable_nat ip_conntrack 43148 2 iptable_nat,ip_nat nfnetlink 5272 2 ip_nat,ip_conntrack ip_tables 10712 1 iptable_nat x_tables 11396 4 ipt_REDIRECT,xt_tcpudp,iptable_nat,ip_tables ipv6 227244 10 bridge 48028 1 ebtable_broute llc 5908 1 bridge e1000 116032 0 8139too 21248 0 crc32 4224 1 8139too
.config files please. This module list is from the old or the new kernel?
module list is from 2.6.19.5, I will provide 2.6.20.1 modules list tomorrow (CET), I will also try 2.6.20.1 with CONFIG_IP_NF_CONNTRACK tomorrow (tested kernel with performance problems uses CONFIG_NF_CONNTRACK_SUPPORT)
Created attachment 11371 [details] config.2.6.19.5
Created attachment 11372 [details] config.2.6.20.1 please ignore CONFIG_LOCALVERSION
2.6.20.1 module list: ipt_REDIRECT 1920 1 xt_tcpudp 3072 1 iptable_nat 5892 1 nf_nat 13612 2 ipt_REDIRECT,iptable_nat nf_conntrack_ipv4 13068 2 iptable_nat nf_conntrack 47832 3 iptable_nat,nf_nat,nf_conntrack_ipv4 nfnetlink 4888 2 nf_conntrack_ipv4,nf_conntrack ip_tables 9544 1 iptable_nat x_tables 11012 4 ipt_REDIRECT,xt_tcpudp,iptable_nat,ip_tables ebt_ip 2176 346 ebtable_broute 1920 1 ebtables 17152 2 ebt_ip,ebtable_broute ipv6 226988 10 bridge 44312 1 ebtable_broute llc 5524 1 bridge e1000 114368 0 8139too 19968 0 bitrev 1792 1 8139too crc32 3968 1 8139too
so the problem is definitely in CONFIG_NF_CONNTRACK_SUPPORT, with CONFIG_IP_NF_CONNTRACK_SUPPORT there is no difference in performance of 2.6.19.5 and 2.6.20.1.
Thanks for narrowing it down. A small performance decrease is expected, but not this large. Your last profile didn't show (almost) anything conntrack related, could you please post another nf_conntrack profile (maybe let it run a bit longer) and the output of /proc/net/stat/nf_conntrack? Thanks.
Created attachment 11388 [details] profile.2.6.20.1-nf-conntrack 2.6.20.1 nf-conntrack 10min profile
Created attachment 11389 [details] cat /proc/net/stat/nf_conntrack cat /proc/net/stat/ip_conntrack (2.6.20.1 nf-conntrack )
Created attachment 11434 [details] Avoid double helper/expectation lookup Unfortunately the profile still doesn't show anything conntrack related, so I'll have to shoot in the dark. Can you try if this patch improves things please? It avoids some nf_conntrack related overhead that might be responsible.
>Unfortunately the profile still doesn't show anything conntrack related there is no traffic that will match NAT iptables, so no NATting is actually done.... Is this why there is no trace of conntrack in profile?
No, connection tracking is always performed and even most of the NAT functions are always called, even without any rules.
Any news on the patch?
Created attachment 11508 [details] 2.6.20.1 +nf_conntrack+patch sorry for delay but still about 10% difference.. :(
Any updates on this problem? Thanks.
We have similar problem server with 2.6.20 kernel has about 30Mbit/s http output traffic (30Mbit/s maximum) but same server with 2.6.19 kernel has about 95Mbit/s http output traffic (125Mbit/s maximum)
Please try 2.6.23.
2.6.23. too
Please try to capture a profile as described in Documentation/basic-profiling.txt. The previous profiles attached to this report didn't show anything related to conntrack. What hardware are you running on?
Now we have dedicated server to solve this problem. We tried on different x86_64 hardware (Intel and AMD) with different NIC (e1000 and tg3). kernels 2.6.15, 2.6.16, 2.6.18, 2.6.19, 2.6.19.7 had network performance ok kernels 2.6.20, 2.6.23, 2.6.24 had network performance degradation Kernel 2.6.19.7 handle about 2000 requests per second. Kernel 2.6.20 handle about 400 requests per second. Kernel 2.6.20 begin handle requests like 2.6.19.7 about 2000 requests per second, but in 10 seconds period after starting it step down to 400 requests per second. After stop requests sending and wait 3 minute kernel 2.6.20 begin handle about 2000 requests per second again and step down to 400 requests per second.
Created attachment 14750 [details] config for 2.6.19.7 kernel
Created attachment 14751 [details] config of 2.6.20 kernel, very similar to 2.6.19.7 config
Created attachment 14752 [details] profiler output of 2.6.19.7 kernel generated by readprofile -r && sleep 60 && readprofile -m /boot/System.map > captured_profile-2.6.19.7
Created attachment 14753 [details] profiler output of 2.6.20 kernel generated by readprofile -r && sleep 60 && readprofile -m /boot/System.map > captured_profile-2.6.20
I found my problem by git bisect. commit 72a3effaf633bcae9034b7e176bdbd78d64a71db since this commit I should increase somaxconn sysctl variable I set net.core.somaxconn=8192 and all work ok now and net.ipv4.tcp_max_syn_backlog=24376
Thanks for tracking this down. Could you send the description of your problem and the related commit to netdev@vger.kernel.org? Please also CC the patch author. Thanks.
No, I should not. This is not a bug. As I understand before this patch net.core.somaxconn was hardcoded in kernel to 512, now it sysctl variable with 128 default value. For high load http server this is not enough only.