kernel process "kworker/events_power_efficient" uses a lot of cpu power (100% on ESXI 6.7, ~30% on ESXI 7.0U3 or later) after upgrading from 5.17.3 to 5.19.17 or later. dmesg log: [ 2430.973102] </TASK> [ 2430.973131] Sending NMI from CPU 1 to CPUs 0: [ 2430.973241] NMI backtrace for cpu 0 [ 2430.973247] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 6.3.3 #1 [ 2430.973254] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 2430.973258] Workqueue: events_power_efficient htable_gc [xt_hashlimit] [ 2430.973275] RIP: 0010:preempt_count_sub+0x2e/0xa0 [ 2430.973289] Code: 36 01 85 c9 75 1b 65 8b 15 a7 da f8 5e 89 d1 81 e1 ff ff ff 7f 39 f9 7c 16 81 ff fe 00 00 00 76 3b f7 df 65 01 3d 8a da f8 5e <c3> cc cc cc cc e8 98 aa 25 00 85 c0 74 f2 8b 15 da 71 ed 00 85 d2 [ 2430.973294] RSP: 0018:ffffb15ec00dbe58 EFLAGS: 00000297 [ 2430.973299] RAX: 0000000000000000 RBX: ffffb15ec12ad000 RCX: 0000000000000001 [ 2430.973302] RDX: 0000000080000001 RSI: ffffffffa1c3313b RDI: 00000000ffffffff [ 2430.973306] RBP: dead000000000122 R08: 0000000000000010 R09: 0000746e65696369 [ 2430.973309] R10: 8080808080808080 R11: 0000000000000018 R12: 0000000000000000 [ 2430.973312] R13: 0000000000001e2b R14: ffffb15ec12ad048 R15: ffff91c279c26a05 [ 2430.973316] FS: 0000000000000000(0000) GS:ffff91c279c00000(0000) knlGS:0000000000000000 [ 2430.973320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2430.973324] CR2: 000055fc138890e0 CR3: 000000010810e002 CR4: 00000000001706f0 [ 2430.973374] Call Trace: [ 2430.973388] <TASK> [ 2430.973390] __local_bh_enable_ip+0x32/0x70 [ 2430.973413] htable_selective_cleanup+0x95/0xc0 [xt_hashlimit] [ 2430.973428] htable_gc+0xf/0x30 [xt_hashlimit] [ 2430.973440] process_one_work+0x1d4/0x360 [ 2430.973459] ? process_one_work+0x360/0x360 [ 2430.973467] worker_thread+0x25/0x3b0 [ 2430.973476] ? process_one_work+0x360/0x360 [ 2430.973483] kthread+0xe1/0x110 [ 2430.973499] ? kthread_complete_and_exit+0x20/0x20 [ 2430.973507] ret_from_fork+0x1f/0x30 [ 2430.973526] </TASK> Is there any known issue/resolution for the same?
> Is there any known issue/resolution for the same? None, that I'm aware of but you could bisect: https://docs.kernel.org/admin-guide/bug-bisect.html
There is a lot of version between 5.17.3 to 5.19.17 It will take time to check exact version from where the issue is coming. Meanwhile can you help me with troubleshooting steps why it is consuming more cpu? Observation from perf report: # Children Self Command Shared Object Symbol # ........ ........ ............... .......................... ............................................................................................................................. # 46.20% 0.00% kworker/0:3-eve [kernel.kallsyms] [k] worker_thread | ---worker_thread | --46.20%--process_one_work | --46.19%--htable_gc | --46.19%--htable_selective_cleanup | |--20.76%--_raw_spin_lock_bh | | | --15.30%--preempt_count_add | | | --4.78%--in_lock_functions | |--16.30%--__local_bh_enable_ip | | | |--6.12%--preempt_count_sub | | | |--2.45%--check_preemption_disabled | | | --1.29%--__this_cpu_preempt_check | |--3.25%--__cond_resched | --2.37%--_raw_spin_unlock_bh 46.20% 0.00% kworker/0:3-eve [kernel.kallsyms] [k] kthread | ---kthread worker_thread | --46.20%--process_one_work | --46.19%--htable_gc | --46.19%--htable_selective_cleanup | |--20.76%--_raw_spin_lock_bh | | | --15.30%--preempt_count_add | | | --4.78%--in_lock_functions | |--16.30%--__local_bh_enable_ip | | | |--6.12%--preempt_count_sub | | | |--2.45%--check_preemption_disabled | | | --1.29%--__this_cpu_preempt_check | |--3.25%--__cond_resched | --2.37%--_raw_spin_unlock_bh
(In reply to Vivek Anand from comment #2) > There is a lot of version between 5.17.3 to 5.19.17 > It will take time to check exact version from where the issue is coming. > To help narrowing version range: does v5.15 have this regression?
(In reply to Bagas Sanjaya from comment #3) > (In reply to Vivek Anand from comment #2) > > There is a lot of version between 5.17.3 to 5.19.17 > > It will take time to check exact version from where the issue is coming. > > > > To help narrowing version range: does v5.15 have this regression? Oops, I mean please try v5.18 first, and then try latest mainline.
As per analysis it seems that the RCA for this issue is the enablement of IBRS mitigation for spectre_v2 vulnerability which was added in Kernel 5.18.14 by commit 6ad0ad2bf8a6 ("x86/bugs: Report Intel retbleed vulnerability"). I have tried reverting the code changes which were added as part of above commit in 5.19.17 and could see significant improvement in CPU consumption. Also , I tried with different Kernel versions and I could not see the issue till Kernel 5.18.13 and from Kernel 5.18.14 onwards I can see the issue. Below are the links : https://lkml.org/lkml/2022/7/23/205 https://lkml.iu.edu/hypermail/linux/kernel/2209.1/02248.html https://www.bleepingcomputer.com/news/linux/vmware-70-percent-drop-in-linux-esxi-vm-performance-with-retbleed-fixes/
Can you try latest mainline (v6.4) to see if it fixes your regression?
Issue observed with latest kernel (v6.4) also. ~ # uname -r 6.4.0 ~ # top | grep events_power_efficient 8 root 20 0 0 0 0 R 100.0 0.0 11:00.58 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 100.0 0.0 11:03.58 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 99.3 0.0 11:06.57 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 99.7 0.0 11:09.57 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 99.7 0.0 11:12.57 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 99.7 0.0 11:15.57 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 99.7 0.0 11:18.57 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 100.0 0.0 11:21.57 kworker/0:1+events_power_efficient 8 root 20 0 0 0 0 R 99.7 0.0 11:24.57 kworker/0:1+events_power_efficient
Unsetting config "CONFIG_CPU_IBRS_ENTRY" also improved CPU (CPU consumption by events_power_efficient process dropped from ~100% to ~7%). Any suggestions if I can go with above change or not. What could be the impact if I disable "CONFIG_CPU_IBRS_ENTRY" config.
Created attachment 304973 [details] check empty slots before locking
It may be the routine is get called more than it used to (or is getting called way too often). By any chance have you increased the hash size for netfilter? Regardless of any of the above, the cleaning routine is incredibly inefficient. Can you try out the patch I attached? I'm probably going to redo it later, but it would be nice to know if it sorts it out as is.
The gcinterval/expire entry was set too low (30ms/15ms instead of 30s/15s), which was causing the Linux kworker thread events_power_efficient to be called excessively. Setting it to appropriate value fixed the issue. Issue Scenario: --hashlimit-htable-gcinterval 30 --hashlimit-htable-expire 15 Fixed Scenario: --hashlimit-htable-gcinterval 30000 --hashlimit-htable-expire 15000