Bug 219364
Summary: | Stalls unless C-states disabled - Intel Lunarlake - Dell XPS 13 9350, HP OmniBook Ultra Flip Laptop 14 | ||
---|---|---|---|
Product: | Power Management | Reporter: | AceLan Kao (acelan) |
Component: | intel_idle | Assignee: | Len Brown (lenb) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | bhabeck34, hugh712, jeffbai, kevin.cheng, lenb, max.lee, neo.wong, srinivas.pandruvada |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg on DC
cpuidle/state* log dmesg max_cstate=1 dmesg max_cstate=2 test X86_BUG_MONITOR patch test X86_BUG_MONITOR patch for Linux 6.10 and earlier standalone program to detect migration stalls on LNL test X86_BUG_MONITOR patch |
I recorded a video to show the strange pause while moving cursor. https://people.canonical.com/~acelan/bugs/bz-219364/ The strange pause also happens when typing on the built-in the keyboard. When encountered the pause, it might repeat the final character many times after the pause, or nothing happened after the pause. Did this happen with earlier kernels like 6.10 or 6.11 as well? Reproduce this issue with v6.11 kernel. This is a new SoC, so didn't try older kernel. 1. Force the CPU to stay in C0 works sudo su - ; cat >/dev/cpu_dma_latency <(echo -e -n "\x0\x0\x0\x0" ; sleep inf) 2. Disable C state works “cpuidle.off=1 intel_idle.max_cstate=0“ Please share the C-states on this machine out of the box: grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* Can you discover the deepest state that results in no symptoms by booting with descending numbers in intel_idle.max_cstate=N (or disable states via /sys/devices/system/cpu/cpu*/cpuidle/state*/disable) Do you see the same symptoms if you simply boot with maxcpus=1 ? Do the boot-time delays also go away with C-states off, (please show dmesg for boot with C-states off) or does disabling C-states only help with the interactive issues? Question on delay in using the system: Are these noticeable delays in using is only DC mode issue? Did you try upstream latest 6.12-rc* kernel? Is setting makes any difference in DC mode /sys/devices/system/cpu/cpu*/power/energy_perf_bias = 7 Created attachment 307012 [details]
cpuidle/state* log
I can see pretty minor pause with `intel_idle.max_cstate=2`, and with `intel_idle.max_cstate=3` you can feel and see the pause while moving the cursor very clear.
And the boot up time also affects by the max_cstate, looks like it takes more time to boot up with the bigger cstate number.
The delay is easier to observe with DC, but I can observe it with AC with some reboot.
/sys/devices/system/cpu/cpu*/power/energy_perf_bias were 8, and set them to 7 helps, the pause becomes very minor.
Created attachment 307013 [details]
dmesg max_cstate=1
Created attachment 307014 [details]
dmesg max_cstate=2
(In reply to AceLan Kao from comment #8) > Created attachment 307012 [details] > cpuidle/state* log > > I can see pretty minor pause with `intel_idle.max_cstate=2`, and with > `intel_idle.max_cstate=3` you can feel and see the pause while moving the > cursor very clear. > > And the boot up time also affects by the max_cstate, looks like it takes > more time to boot up with the bigger cstate number. > > The delay is easier to observe with DC, but I can observe it with AC with > some reboot. > /sys/devices/system/cpu/cpu*/power/energy_perf_bias were 8, and set them to > 7 helps, the pause becomes very minor. Who set this value to 8? Is Dell BIOS is doing this? Unless BIOS is changing the value should be 6. I think two sentence were separate, from my observe, EPB is 8 in DC, and 6 in AC. But who set those EPBs? OS didn't. I got the answer. This is done by Linux power profile daemon. Please check EPB=7 fixes the issue. No, EPB=7 doesn't fix the issue. You still may observe the behavior from time to time, especially let the system idle for a while, the strange pause while moving the cursor or the abnormal keyboard behavior. also set EPB=7 on my side: echo 7 | tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias I can still observe the lag To summarize all symptoms, I observer system lag when: - using touchpad - using onboard keyboard - typing when using ssh - playing music (wav, mp3) via apps (Rythmbox, audacious, video...) Please check the following command can fix the issue or not. 1. Disable ACPI C3 for i in {0..7}; do echo 1 > /sys/devices/system/cpu/cpu$i/cpuidle/state3/disable;done 2. Offline E Core for i in {4..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online;done 3. Change the Power Mode in Settings -> Power -> Power Mode to Performance. I tried on touchpad + playing music at the same time as: 1. Make sure I observer the issue 1.1 Disable ACPI C3 --> It works for i in {0..7}; do echo 1 > /sys/devices/system/cpu/cpu$i/cpuidle/state3/disable;done 2. Reboot and make sure I observer the issue 2.2 Offline E Core --> It works for i in {4..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online;done 3. Reboot and make sure I observer the issue 3.3 Power Mode to Performance. --> this is not works Verified below tests with AC is connected(EPB = 6) > 1. Disable ACPI C3 > for i in {0..7}; do echo 1 > > /sys/devices/system/cpu/cpu$i/cpuidle/state3/disable;done This works echo 0 back to those files and reproduce the issue quickly. > 2. Offline E Core > for i in {4..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online;done This works, too. echo 1 back to those files and reproduce the issue quickly. > 3. Change the Power Mode in Settings -> Power -> Power Mode to Performance. (EPB = 0) Encountered the issue, it doesn't imporve the situation, sometimes the pause is the same serious as DC power. Can you try Linux 6.11.0-8-generic, even in #3 you mentioned 6.11 is reproduced. But just like to double confirm if this specified version can reproduce. Yes, 6.11.0-8 is hard to reproduce the issue. But you still can observe the pause while moving the cursor sometimes. The situation is much lighter than other kernels, but I won't say there is no issue with this kernel. Can you help to check if system could enter C3 or S0ix ? BTW, where I can get mainline v6.12-rc2 kernel ? Seems only update till 6.11 : https://kernel.ubuntu.com/mainline/ Yes, the Package C2, C6, and C10 counters increase in /sys/kernel/debug/pmc_core/package_cstate_show. And the slp_s0_residency_usec counter increases when suspended. We're moving our servers, so you may have to build the kernel by yourself. Created attachment 307180 [details]
test X86_BUG_MONITOR patch
please try this patch
Created attachment 307184 [details]
test X86_BUG_MONITOR patch for Linux 6.10 and earlier
Same logical patch as above,
but this version applies to the syntax of Linux 6.10 and earlier,
while the version above applies to the syntax of 6.11 and later.
*** Bug 219477 has been marked as a duplicate of this bug. *** Created attachment 307185 [details]
standalone program to detect migration stalls on LNL
The stand-alone program can detect stalls.
It uses sched_setaffinity(2) to migrate itself from cpu(4-7) to cpu(0-3),
using rdtsc to measure how long the migration takes.
The program will complain about migrations that take more than 1ms.
Sometimes it will detect stalls over 1000ms.
By default it migrates 1000 times, once ever 250ms, running for 250 sec.
Created attachment 307208 [details]
test X86_BUG_MONITOR patch
This patch applies to Linux-6.11 and later
Verified with the mainline v6.12 and Ubuntu 6.11 kernels, the patch fixes the issue. Thanks. (In reply to Len Brown from comment #31) > Created attachment 307208 [details] > test X86_BUG_MONITOR patch > > This patch applies to Linux-6.11 and later Fix verified on an Asus Vivobook S 2024 OLED. Thanks! |
Created attachment 306989 [details] dmesg on DC Kernel: v6.12-rc2 In the dmesg, you can see that there are some abnormal hugh delays while booting up. ex. [ 0.613464] pci_scan_bridge_extend: pci 0000:00:1c.0: scanning [bus 71-71] behind bridge, pass 1 [ 0.613471] pci_scan_child_bus_extend: pci_bus 0000:00: bus scan returning with max=71 [ 0.630696] ACPI: \_SB_.PEPD: Duplicate LPS0 _DSM functions (mask: 0x1) [ 4.617497] Low-power S0 idle used by default for system suspend [ 4.648486] ACPI: EC: interrupt unblocked and [ 8.911398] int3472-discrete INT3472:00: cannot find GPIO chip INTC10B5:00, deferring [ 8.939251] int3472-discrete INT3472:00: GPIO type 0x12 unknown; the sensor may not work [ 8.940853] int3472-discrete INT3472:00: cannot find GPIO chip INTC10B5:00, deferring [ 10.210778] xe 0000:00:02.0: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [ 11.078838] Creating 1 MTD partitions on "0000:00:1f.5": [ 11.078854] 0x000000000000-0x000004000000 : "BIOS" and [ 11.900024] NET: Registered PF_QIPCRTR protocol family [ 12.996969] iwlwifi 0000:00:14.3: RFIm is deactivated, reason = 4 [ 13.120034] iwlwifi 0000:00:14.3: Registered PHC clock: iwlwifi-PTP, with index: 0 [ 14.034314] systemd-journald[365]: /var/log/journal/a8e0c8bb11df4dac858c05d4c8ab7e6b/user-1000.journal: Journal file uses a different sequence number ID, rotating. [ 14.351742] kauditd_printk_skb: 160 callbacks suppressed The situation becomes minor with AC, but you still can feel it after booted up while using the machine.