I'm using packaged rpms by centos and elrepo with the same results and I can replicate this on any server in our cluster. I have tried installing kernel-3.10.56-11.el6.centos.alt.x86_64 Also currently running [root@web125-east.domain.com /var/www/html]# uname -a Linux web125-east.domain.com 3.17.2-1.el6.elrepo.x86_64 #1 SMP Fri Oct 31 10:37:44 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux from the centosplus repo to solve a problem where 2.6 was locking up process tree on high cpu and it fixed it but it introduced another issue where we have a lot of softirq requests when under a lot of traffic load. Here is a powertop from a 2.6 series server Summary: 42492.1 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 2422.0% CPU use Usage Events/s Category Description 22613 ms/s 23637.4 Process php-fpm: pool www 716.9 ms/s 15783.2 Process nginx: worker process 21.3 ms/s 1096.1 Process /usr/bin/java -Xms200m -Xmx2000m -Xss256k -XX:MaxDirectMemorySize=516m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Dage 5.8 ms/s 674.4 Process /usr/sbin/gmond 130.0 ms/s 494.5 Process /usr/bin/redis-server 127.0.0.1:6379 73.2 ms/s 487.4 Process python /usr/bin/statsd-relay.py 3.8 ms/s 82.7 Process java -Xmx6g -server -Dfile.encoding=utf-8 -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError -XX:HeapD 212.4 ms/s 0.00 Interrupt [3] net_rx(softirq) Here it is from 3.10 Usage Events/s Category Description 10.2 ms/s 1033.6 Timer hrtimer_wakeup 3.3 ms/s 932.7 Process /usr/bin/java -Xms200m -Xmx2000m -Xss256k 591.1 ms/s 624.3 Process php-fpm: pool www 41.5 ms/s 724.0 Interrupt [3] net_rx(softirq) Load pretty much just keeps crawling up to the 500's There also is a lot of CPU usage from 116 root 20 0 0 0 0 R 75.0 0.0 0:04.57 kworker/u66:0 Which from my understanding handles a lot of the acpi calls that softirq is doing. I've tried many other 3.x kernels above 3.10 with the same results.. so I'm wondering if this is a known issue
Sorry here's the nics we have on the system 06:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 06:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 06:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
kworker just handles offloaded work, so if the box is being hammered then it's not unreasonable for it to be high. What makes you think its doing lots of acpi calls ??
Same issue here with all Centos 7 and elrepo kernels. NET_RX in many times bigger than NET_TX: NET_TX: 94345 94868 94714 94441 96972 97641 NET_RX: 466312374 484706991 484924300 494927859 500039928 499807940
i'm having a similar behavior, same network card. Updated from kernel 3.10.58 to 4.4.38, with higher traffic i start to have a lot pkt loss, with one of two of my cpu cores getting lock (output on htop). Recently update to kernel 4.9.20 but the same results. I've try some options (queue size, gro,...) to improve network performance, but the issue is still at softirq level, in top what i see is that the traffic seams to lock to a softirq, where in the previous kernel this doesn't happen: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29 root 20 0 0 0 0 R 99.9 0.0 0:21.30 ksoftirqd/3 114 root 20 0 0 0 0 R 99.6 0.0 0:19.02 ksoftirqd/17 in /proc/softirqs and /proc/interrupts i didn't find anything strange. Please tel me if is there any more info i can get to help. My server is just doing routing with iptables. Regards,
Some extra info, after reading issue #109581 i set the qdisc to prio_fast and no more cpu usage in softirq. Rules that i have installed: ip link set dev eth2.24 txqlen 1000 tc qdisc del dev eth2.24 root tc qdisc add dev eth2.24 root handle 1: prio bands 3 tc qdisc add dev eth2.24 parent 1:1 handle 10: pfifo limit 50 tc qdisc add dev eth2.24 parent 1:2 handle 2: hfsc default 2 tc class add dev eth2.24 parent 2: classid 2:1 hfsc sc rate 300000kbit ul rate 300000kbit tc class add dev eth2.24 parent 2: classid 2:2 hfsc sc rate 300000kbit ul rate 300000kbit Side note: I notice that is issue is more that one year old.. did you manage to solve your issue? can you share how? Regards,