Bug 218144 - upper limit of number of cpus in get_update_sysctl_factor is hard coded to 8 - should it be higher than that?
Summary: upper limit of number of cpus in get_update_sysctl_factor is hard coded to 8 ...
Status: NEW
Alias: None
Product: Process Management
Classification: Unclassified
Component: Scheduler (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Ingo Molnar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-14 12:47 UTC by Colin Ian King
Modified: 2024-03-23 21:52 UTC (History)
6 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
example fix (841 bytes, patch)
2023-11-14 13:52 UTC, Colin Ian King
Details | Diff

Description Colin Ian King 2023-11-14 12:47:01 UTC
It appears that the upper limit of the number of cpus using to calculate the scaling factor in get_update_sysctl_factor() in kernel/sched/fair.c is hard-coded as 8. Is this intentional? Systems nowadays have far more CPUs.

unsigned int cpus = min_t(unsigned int, num_online_cpus(), 8);

As per this article: https://thehftguy.com/2023/11/14/the-linux-kernel-has-been-accidentally-hardcoded-to-a-maximum-of-8-cores-for-nearly-20-years/?fbclid=IwAR1g-5xkqFhhCtW5bjNawQanctFmMFObKM-q9G2eMKS3pV8532Nso1KVtJg
Comment 1 Colin Ian King 2023-11-14 13:52:23 UTC
Created attachment 305403 [details]
example fix

Example fix. Tested on a 24 core Alderlake i9-12900 with stress-ng. No performance issues with schedmix, workload, fork and pthread stressors.  I observed a 15.3% improvement in context switches for the fifo stressor.
Comment 2 Ronan Pigott 2023-11-16 20:05:55 UTC
So, I don't think it's correct to say that the limit was accidentally introduced [1], and people certainly have noticed before [2].

No comment on the merit of the change, but _assuming_ there is value in allowing this parameter to scale beyond 8 cores, I think the granularity of ilog2 might become unwieldy — 24 cores would get the same value as 16, 96 cores the same as 64 etc.

[1] https://lore.kernel.org/lkml/1259253950.31676.249.camel@laptop/
[2] https://lore.kernel.org/all/CAKfTPtAKpMj15dHO1MC=dH_XJQe1Os24k93N2jDZ=kgg3O7K7A@mail.gmail.com/#t
Comment 3 Bruno Meneguele 2023-11-16 20:18:02 UTC
Peter already gave a quick reply on this regard couple years ago [1] and I think more tests and numbers are required in different environments, instead of solely think in the raw number of cores. When a high throughput server starts to fall down based on the task time slices? Maybe increasing from 8 to 24 is fine in normal memory intensive tasks, but what about 128 cores in a high-demand network server?
8 has proven to be "enough" so far (compared to other OSes), but, of course, it doesn't mean it hasn't room for improvements.

[1] https://lore.kernel.org/all/20211102160402.GX174703@worktop.programming.kicks-ass.net/

Note You need to log in before you can comment on or make changes to this bug.