Bug 217586 - High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later
Summary: High cpu usage caused by kernel process when upgraded to linux 5.19.17 or later
Status: NEEDINFO
Alias: None
Product: Linux
Classification: Unclassified
Component: Kernel (show other bugs)
Hardware: Other Linux
: P3 high
Assignee: Virtual assignee for kernel bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-22 10:30 UTC by Vivek Anand
Modified: 2023-09-05 08:17 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
check empty slots before locking (561 bytes, patch)
2023-08-28 20:34 UTC, Mateusz Guzik
Details | Diff

Description Vivek Anand 2023-06-22 10:30:01 UTC
kernel process "kworker/events_power_efficient" uses a lot of cpu power (100% on ESXI 6.7, ~30% on ESXI 7.0U3 or later) after upgrading from 5.17.3 to 5.19.17 or later.

dmesg log:
[ 2430.973102]  </TASK>
[ 2430.973131] Sending NMI from CPU 1 to CPUs 0:
[ 2430.973241] NMI backtrace for cpu 0
[ 2430.973247] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 6.3.3 #1
[ 2430.973254] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 2430.973258] Workqueue: events_power_efficient htable_gc [xt_hashlimit]
[ 2430.973275] RIP: 0010:preempt_count_sub+0x2e/0xa0
[ 2430.973289] Code: 36 01 85 c9 75 1b 65 8b 15 a7 da f8 5e 89 d1 81 e1 ff ff ff 7f 39 f9 7c 16 81 ff fe 00 00 00 76 3b f7 df 65 01 3d 8a da f8 5e <c3> cc cc cc cc e8 98 aa 25 00 85 c0 74 f2 8b 15 da 71 ed 00 85 d2
[ 2430.973294] RSP: 0018:ffffb15ec00dbe58 EFLAGS: 00000297
[ 2430.973299] RAX: 0000000000000000 RBX: ffffb15ec12ad000 RCX: 0000000000000001
[ 2430.973302] RDX: 0000000080000001 RSI: ffffffffa1c3313b RDI: 00000000ffffffff
[ 2430.973306] RBP: dead000000000122 R08: 0000000000000010 R09: 0000746e65696369
[ 2430.973309] R10: 8080808080808080 R11: 0000000000000018 R12: 0000000000000000
[ 2430.973312] R13: 0000000000001e2b R14: ffffb15ec12ad048 R15: ffff91c279c26a05
[ 2430.973316] FS:  0000000000000000(0000) GS:ffff91c279c00000(0000) knlGS:0000000000000000
[ 2430.973320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2430.973324] CR2: 000055fc138890e0 CR3: 000000010810e002 CR4: 00000000001706f0
[ 2430.973374] Call Trace:
[ 2430.973388]  <TASK>
[ 2430.973390]  __local_bh_enable_ip+0x32/0x70
[ 2430.973413]  htable_selective_cleanup+0x95/0xc0 [xt_hashlimit]
[ 2430.973428]  htable_gc+0xf/0x30 [xt_hashlimit]
[ 2430.973440]  process_one_work+0x1d4/0x360
[ 2430.973459]  ? process_one_work+0x360/0x360
[ 2430.973467]  worker_thread+0x25/0x3b0
[ 2430.973476]  ? process_one_work+0x360/0x360
[ 2430.973483]  kthread+0xe1/0x110
[ 2430.973499]  ? kthread_complete_and_exit+0x20/0x20
[ 2430.973507]  ret_from_fork+0x1f/0x30
[ 2430.973526]  </TASK>


Is there any known issue/resolution for the same?
Comment 1 Artem S. Tashkinov 2023-06-22 10:55:48 UTC
> Is there any known issue/resolution for the same?

None, that I'm aware of but you could bisect:

https://docs.kernel.org/admin-guide/bug-bisect.html
Comment 2 Vivek Anand 2023-06-22 13:20:19 UTC
There is a lot of version between 5.17.3 to 5.19.17
It will take time to check exact version from where the issue is coming.

Meanwhile can you help me with troubleshooting steps why it is consuming more cpu?

Observation from perf report:

# Children      Self  Command          Shared Object               Symbol                                                                                                                       
# ........  ........  ...............  ..........................  .............................................................................................................................
#
    46.20%     0.00%  kworker/0:3-eve  [kernel.kallsyms]           [k] worker_thread
            |
            ---worker_thread
               |          
                --46.20%--process_one_work
                          |          
                           --46.19%--htable_gc
                                     |          
                                      --46.19%--htable_selective_cleanup
                                                |          
                                                |--20.76%--_raw_spin_lock_bh
                                                |          |          
                                                |           --15.30%--preempt_count_add
                                                |                     |          
                                                |                      --4.78%--in_lock_functions
                                                |          
                                                |--16.30%--__local_bh_enable_ip
                                                |          |          
                                                |          |--6.12%--preempt_count_sub
                                                |          |          
                                                |          |--2.45%--check_preemption_disabled
                                                |          |          
                                                |           --1.29%--__this_cpu_preempt_check
                                                |          
                                                |--3.25%--__cond_resched
                                                |          
                                                 --2.37%--_raw_spin_unlock_bh

    46.20%     0.00%  kworker/0:3-eve  [kernel.kallsyms]           [k] kthread
            |
            ---kthread
               worker_thread
               |          
                --46.20%--process_one_work
                          |          
                           --46.19%--htable_gc
                                     |          
                                      --46.19%--htable_selective_cleanup
                                                |          
                                                |--20.76%--_raw_spin_lock_bh
                                                |          |          
                                                |           --15.30%--preempt_count_add
                                                |                     |          
                                                |                      --4.78%--in_lock_functions
                                                |          
                                                |--16.30%--__local_bh_enable_ip
                                                |          |          
                                                |          |--6.12%--preempt_count_sub
                                                |          |          
                                                |          |--2.45%--check_preemption_disabled
                                                |          |          
                                                |           --1.29%--__this_cpu_preempt_check
                                                |          
                                                |--3.25%--__cond_resched
                                                |          
                                                 --2.37%--_raw_spin_unlock_bh
Comment 3 Bagas Sanjaya 2023-06-23 00:40:25 UTC
(In reply to Vivek Anand from comment #2)
> There is a lot of version between 5.17.3 to 5.19.17
> It will take time to check exact version from where the issue is coming.
> 

To help narrowing version range: does v5.15 have this regression?
Comment 4 Bagas Sanjaya 2023-06-23 00:41:22 UTC
(In reply to Bagas Sanjaya from comment #3)
> (In reply to Vivek Anand from comment #2)
> > There is a lot of version between 5.17.3 to 5.19.17
> > It will take time to check exact version from where the issue is coming.
> > 
> 
> To help narrowing version range: does v5.15 have this regression?

Oops, I mean please try v5.18 first, and then try latest mainline.
Comment 5 Vivek Anand 2023-06-26 06:18:18 UTC
As per analysis it seems that the RCA for this issue is the enablement of IBRS mitigation for spectre_v2 vulnerability which was added in Kernel 5.18.14 by commit 6ad0ad2bf8a6 ("x86/bugs: Report Intel retbleed vulnerability").

I have tried reverting the code changes which were added as part of above commit in 5.19.17 and could see significant improvement in CPU consumption.
Also , I tried with different Kernel versions and I could not see the issue till Kernel 5.18.13 and from Kernel 5.18.14 onwards I can see the issue.
Below are the links :
https://lkml.org/lkml/2022/7/23/205
https://lkml.iu.edu/hypermail/linux/kernel/2209.1/02248.html
https://www.bleepingcomputer.com/news/linux/vmware-70-percent-drop-in-linux-esxi-vm-performance-with-retbleed-fixes/
Comment 6 Bagas Sanjaya 2023-06-27 02:22:59 UTC
Can you try latest mainline (v6.4) to see if it fixes your regression?
Comment 7 Vivek Anand 2023-06-27 11:19:50 UTC
Issue observed with latest kernel (v6.4) also.

~ # uname -r
6.4.0

~ # top | grep events_power_efficient
    8 root      20   0       0      0      0 R 100.0   0.0  11:00.58 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R 100.0   0.0  11:03.58 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R  99.3   0.0  11:06.57 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R  99.7   0.0  11:09.57 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R  99.7   0.0  11:12.57 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R  99.7   0.0  11:15.57 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R  99.7   0.0  11:18.57 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R 100.0   0.0  11:21.57 kworker/0:1+events_power_efficient
    8 root      20   0       0      0      0 R  99.7   0.0  11:24.57 kworker/0:1+events_power_efficient
Comment 8 Vivek Anand 2023-06-27 11:38:17 UTC
Unsetting config "CONFIG_CPU_IBRS_ENTRY" also improved CPU (CPU consumption by events_power_efficient process dropped from ~100% to ~7%).

Any suggestions if I can go with above change or not.

What could be the impact if I disable "CONFIG_CPU_IBRS_ENTRY" config.
Comment 9 Mateusz Guzik 2023-08-28 20:34:23 UTC
Created attachment 304973 [details]
check empty slots before locking
Comment 10 Mateusz Guzik 2023-08-28 20:35:27 UTC
It may be the routine is get called more than it used to (or is getting called way too often).

By any chance have you increased the hash size for netfilter?

Regardless of any of the above, the cleaning routine is incredibly inefficient. Can you try out the patch I attached? I'm probably going to redo it later, but it would be nice to know if it sorts it out as is.
Comment 11 Vivek Anand 2023-09-05 08:17:56 UTC
The gcinterval/expire entry was set too low (30ms/15ms instead of 30s/15s), which was causing the Linux kworker thread events_power_efficient to be called excessively.
Setting it to appropriate value fixed the issue.

Issue Scenario: --hashlimit-htable-gcinterval 30 --hashlimit-htable-expire 15
Fixed Scenario: --hashlimit-htable-gcinterval 30000 --hashlimit-htable-expire 15000

Note You need to log in before you can comment on or make changes to this bug.