Bug 218195 - Intel hybrid CPU scheduler always prefers E cores
Summary: Intel hybrid CPU scheduler always prefers E cores
Status: RESOLVED ANSWERED
Alias: None
Product: Process Management
Classification: Unclassified
Component: Scheduler (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: Ingo Molnar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-27 15:02 UTC by Ramses VdP
Modified: 2023-12-08 13:49 UTC (History)
3 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (109.95 KB, text/plain)
2023-11-27 15:02 UTC, Ramses VdP
Details
sys cpufreq dump (16.49 KB, text/plain)
2023-11-28 16:56 UTC, Ramses VdP
Details
/proc/config.gz (63.52 KB, application/gzip)
2023-11-28 19:04 UTC, Ramses VdP
Details
debug patch for ACPI CPPC probe (1.44 KB, patch)
2023-11-28 23:50 UTC, Srinivas Pandruvada
Details | Diff
acpidump (152.09 KB, text/plain)
2023-11-29 08:28 UTC, Ramses VdP
Details
dmesg with debug patch (72.66 KB, text/plain)
2023-11-29 08:32 UTC, Ramses VdP
Details

Description Ramses VdP 2023-11-27 15:02:39 UTC
Created attachment 305483 [details]
dmesg

I am running an intel alder lake system (Core i7-1260P), with a mix of P and E cores.

Since Linux 6.6, and also on the current 6.7 RC, the scheduler seems to have a strong preference for the E cores, and single threaded workloads are consistently scheduled on one of the E cores.

With Linux 6.4 and before, when I ran a single threaded CPU-bound process, it was scheduled on a P core. With 6.5, it seems that the choice of P or E seemed rather random.

I tested these by running "stress" with different amounts of threads. With a single thread on Linux 6.6 and 6.7, I always have an E core at 100% and no load on the P cores. Starting from 3 threads I get some load on the P cores as well, but the E cores stay more heavily loaded.
With "taskset" I can force a process to run on a P core, but clearly it's not very practical to have to do CPU scheduling manually.

This severely affects single-threaded performance of my CPU since the E cores are considerably slower. Several of my workflows are now a lot slower due to them being single-threaded and heavily CPU-bound and being scheduled on E cores whereas they would run on P cores before.

I am not sure what the exact desired behaviour is here, to balance power consumption and performance, but currently my P cores are barely used for single-threaded workloads.

Is this intended behaviour or is this indeed a regression? Or is there perhaps any configuration that I should have done from my side? Is there any further info that I can provide to help you figure out what's going on?
Comment 1 Srinivas Pandruvada 2023-11-28 15:38:33 UTC
Please dump:
grep -r . /sys/devices/system/cpu/cpu*/acpi_cppc/
grep -r . /sys/devices/system/cpu/cpu*/cpufreq
Comment 2 Ramses VdP 2023-11-28 16:56:19 UTC
Created attachment 305497 [details]
sys cpufreq dump
Comment 3 Ramses VdP 2023-11-28 16:57:18 UTC
Hi Srinivas

Thanks for your reply.

The first command did not match any file, I don't have any entries called acpi_cppc.

I attached the output of the second command to the issue.
Comment 4 Srinivas Pandruvada 2023-11-28 17:29:37 UTC
Can you check if you have these kernel config set:

CONFIG_SCHED_MC_PRIO=y
CONFIG_ACPI_CPPC_LIB=y

Also attach your kernel config file.
Comment 5 Ramses VdP 2023-11-28 19:04:20 UTC
Created attachment 305498 [details]
/proc/config.gz
Comment 6 Ramses VdP 2023-11-28 19:05:02 UTC
Both are set to Y, I attached my kernel config.

I will also build a kernel with the patch that was proposed on the mailing list and see if that makes a difference.
Comment 7 Srinivas Pandruvada 2023-11-28 19:44:55 UTC
What is the output of
cat /proc/sys/kernel/sched_itmt_enabled
Comment 8 Ramses VdP 2023-11-28 21:52:04 UTC
I don't have that file either...

$ ls /proc/sys/kernel/sched_*
/proc/sys/kernel/sched_autogroup_enabled
/proc/sys/kernel/sched_cfs_bandwidth_slice_us
/proc/sys/kernel/sched_child_runs_first
/proc/sys/kernel/sched_deadline_period_max_us
/proc/sys/kernel/sched_deadline_period_min_us
/proc/sys/kernel/sched_rr_timeslice_ms
/proc/sys/kernel/sched_rt_period_us
/proc/sys/kernel/sched_rt_runtime_us
Comment 9 Srinivas Pandruvada 2023-11-28 22:45:11 UTC
This means the ITMT which prefers P-core is not even enabled.
Can you add "dyndbg=file cppc_acpi.c +p" to kernel command line and send 
dmesg | grep -i cppc
Comment 10 Ramses VdP 2023-11-28 22:57:02 UTC
There's nothing in dmesg:

$ dmesg | grep -i cppc
[    0.000000] Command line: initrd=\efi\nixos\4lwk9adrrbn8nz8vr66mrl7apv1nf4dh-initrd-linux-6.6.2-initrd.efi init=/nix/store/qcmgb9gp3sc31lsy0709d91wla8wxxjk-nixos-system-starbook-20231128.dirty/init video=DP-1:3840x2160@60 iomem=relaxed i915.enable_fbc=1 i915.enable_psr=2 mem_sleep_default=deep nvme.noacpi=1 systemd.gpt_auto=false dyndbg=filecppc_acpi.c +p loglevel=4
[    0.063360] Kernel command line: initrd=\efi\nixos\4lwk9adrrbn8nz8vr66mrl7apv1nf4dh-initrd-linux-6.6.2-initrd.efi init=/nix/store/qcmgb9gp3sc31lsy0709d91wla8wxxjk-nixos-system-starbook-20231128.dirty/init video=DP-1:3840x2160@60 iomem=relaxed i915.enable_fbc=1 i915.enable_psr=2 mem_sleep_default=deep nvme.noacpi=1 systemd.gpt_auto=false dyndbg=file cppc_acpi.c +p loglevel=4
Comment 11 Ramses VdP 2023-11-28 23:00:12 UTC
The space between the command line args was lost while copying, just to avoid any confusion.

$ dmesg | grep -i cppc
[    0.000000] Command line: initrd=\efi\nixos\4lwk9adrrbn8nz8vr66mrl7apv1nf4dh-initrd-linux-6.6.2-initrd.efi init=/nix/store/qcmgb9gp3sc31lsy0709d91wla8wxxjk-nixos-system-starbook-20231128.dirty/init video=DP-1:3840x2160@60 iomem=relaxed i915.enable_fbc=1 i915.enable_psr=2 mem_sleep_default=deep nvme.noacpi=1 systemd.gpt_auto=false dyndbg=file cppc_acpi.c +p loglevel=4
[    0.063360] Kernel command line: initrd=\efi\nixos\4lwk9adrrbn8nz8vr66mrl7apv1nf4dh-initrd-linux-6.6.2-initrd.efi init=/nix/store/qcmgb9gp3sc31lsy0709d91wla8wxxjk-nixos-system-starbook-20231128.dirty/init video=DP-1:3840x2160@60 iomem=relaxed i915.enable_fbc=1 i915.enable_psr=2 mem_sleep_default=deep nvme.noacpi=1 systemd.gpt_auto=false dyndbg=file cppc_acpi.c +p loglevel=4
Comment 12 Srinivas Pandruvada 2023-11-28 23:49:33 UTC
I think you added the full option including " "

"dyndbg=file cppc_acpi.c +p"

If this was, that means https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/acpi/cppc_acpi.c#L664

is not called.


Please apply attach diff and build kernel and get dmesg output.
Also attach output of 
acpidump > acpi.out
Comment 13 Srinivas Pandruvada 2023-11-28 23:50:18 UTC
Created attachment 305500 [details]
debug patch for ACPI CPPC probe
Comment 14 Ramses VdP 2023-11-29 08:21:39 UTC
Oh I see, I had to include the quotes. I rebuilt the kernel with the provided patch and corrected the command line arg, this is what I get now:

$ dmesg | grep -i cppc
[    0.000000] Command line: initrd=\efi\nixos\v42q1s2g68jlp4ac8n4ing7njd21p4kc-initrd-linux-6.6.2-initrd.efi init=/nix/store/xv8bfrfbgjbf7sdjyhfr6776f8r4qh7z-nixos-system-starbook-20231129.dirty/init video=DP-1:3840x2160@60 iomem=relaxed i915.enable_fbc=1 i915.enable_psr=2 mem_sleep_default=deep nvme.noacpi=1 systemd.gpt_auto=false "dyndbg=file cppc_acpi.c +p" loglevel=4
[    0.062739] Kernel command line: initrd=\efi\nixos\v42q1s2g68jlp4ac8n4ing7njd21p4kc-initrd-linux-6.6.2-initrd.efi init=/nix/store/xv8bfrfbgjbf7sdjyhfr6776f8r4qh7z-nixos-system-starbook-20231129.dirty/init video=DP-1:3840x2160@60 iomem=relaxed i915.enable_fbc=1 i915.enable_psr=2 mem_sleep_default=deep nvme.noacpi=1 systemd.gpt_auto=false "dyndbg=file cppc_acpi.c +p" loglevel=4
[    0.376135] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376137] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376185] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376186] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376223] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376224] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376259] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376260] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376294] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376295] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376329] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376330] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376364] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376365] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376399] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376400] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376434] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376435] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376469] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376470] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376503] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376504] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376539] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376540] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376572] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376573] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376608] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376609] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376643] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376644] ACPI CPPC: CPPC v2 _OSC not acked
[    0.376676] ACPI CPPC: begin acpi_cppc_processor_probe
[    0.376677] ACPI CPPC: CPPC v2 _OSC not acked
[    0.379887] ACPI CPPC: No CPC descriptor for CPU:0
[    0.379945] ACPI CPPC: No CPC descriptor for CPU:1
[    0.380035] ACPI CPPC: No CPC descriptor for CPU:2
[    0.380085] ACPI CPPC: No CPC descriptor for CPU:3
[    0.380241] ACPI CPPC: No CPC descriptor for CPU:4
[    0.380314] ACPI CPPC: No CPC descriptor for CPU:5
[    0.380453] ACPI CPPC: No CPC descriptor for CPU:6
[    0.380526] ACPI CPPC: No CPC descriptor for CPU:7
[    0.380639] ACPI CPPC: No CPC descriptor for CPU:8
[    0.380762] ACPI CPPC: No CPC descriptor for CPU:9
[    0.380862] ACPI CPPC: No CPC descriptor for CPU:10
[    0.380897] ACPI CPPC: No CPC descriptor for CPU:11
[    0.380922] ACPI CPPC: No CPC descriptor for CPU:12
[    0.380993] ACPI CPPC: No CPC descriptor for CPU:13
[    0.381028] ACPI CPPC: No CPC descriptor for CPU:14
[    0.381085] ACPI CPPC: No CPC descriptor for CPU:15
Comment 15 Ramses VdP 2023-11-29 08:28:55 UTC
Created attachment 305504 [details]
acpidump
Comment 16 Ramses VdP 2023-11-29 08:32:09 UTC
Created attachment 305505 [details]
dmesg with debug patch
Comment 17 Srinivas Pandruvada 2023-11-29 13:56:14 UTC
The problem is the platform is showing no support for CPC.
https://elixir.bootlin.com/linux/v6.4.16/C/ident/osc_sb_cppc2_support_acked

Can you boot 6.4 kernel and see if you have 
cat /proc/sys/kernel/sched_itmt_enabled
and
grep -r . /sys/devices/system/cpu/cpu*/acpi_cppc/
Comment 18 Ramses VdP 2023-11-29 14:51:18 UTC
Neither exists when booting with 6.4 either...

But processes are clearly scheduled on P cores a lot more.

What can explain the absence of CPC support? Is this a platform or firmware bug?
Comment 19 Srinivas Pandruvada 2023-11-29 14:59:24 UTC
This is a firmware support issue. If this system was designed for Windows, then it can't function without CPC support.

Is this some OEM system? It seems that they populate CPC but failed to add to OSC support status.
ACPI SPEC 6.4
Section
6.2.11.2 Platform-Wide OSPM Capabilities

It is working by chance on 6.4, but some scheduler optimizations may have changed that.
We can't count on that unless we tell scheduler to prefer some cores over others.
Comment 20 Ramses VdP 2023-11-29 15:02:05 UTC
This is an OEM system from https://starlabs.systems, I think it was mainly designed to run Linux.

I will contact their support about this.

Is there anything else that can be done at this point?
Comment 21 Srinivas Pandruvada 2023-11-29 16:04:31 UTC
If they add OSC support for CPPC v2, that should be all. I think they have some reference BIOS from Intel. That should have this.
Comment 22 Adeel Arshad 2023-11-29 20:11:37 UTC
Check if the following features are enabled in the BIOS ?

Intel(R) SpeedStep(TM)
Intel Speed Shift Technology(TM)
Comment 23 Ramses VdP 2023-12-04 18:57:51 UTC
The firmware is coreboot and EDK 2, the interface does not offer a lot of options and doesn't have any explicit options for speedstep or speed shift.
Is there any way to confirm from user space that these settings are effectively enabled?

I contacted the OEM (https://github.com/StarLabsLtd/firmware/issues/143) and they said that it worked correctly on their test system. They are still looking into it, I hope something will come out of it.
A second user manifested themselves in the issue, so it's not an isolated case at least.
Comment 24 Ramses VdP 2023-12-08 08:56:26 UTC
The OEM made a firmware release to address the issue and scheduling seems to work again as expected after flashing it.

Thanks a lot for your help in pinpointing the issue!
Comment 25 Srinivas Pandruvada 2023-12-08 13:49:22 UTC
Glad to help.

Note You need to log in before you can comment on or make changes to this bug.