Bug 217616

Summary: Worse Battery life, with 11th gen i3 in kernels after v 5.18(may be even earlier)
Product: Platform Specific/Hardware Reporter: prosparety
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: bagasdotme, regressions
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description prosparety 2023-06-30 02:42:37 UTC
I have posted same thing on debian user forums, and linking that so as to reduce duplication
[url]https://forums.debian.net/viewtopic.php?t=154875[/url]

I will summarise a little, there is an increase in power usage going from kernel 5.10.x to something newer (have tried many of them between 5.18.x to 6.3.x), and since these tests were done on kernels from many sources, it seems an upstream thing, On newer kernels, if we set /sys/devices/system/cpu/intel_pstate/status to active (default by kernel), then cpu freq is almost stuck to 2GHz, if we set it to passive, it goes to 0.4GHz and increases when needed, in either cases the power usage is not decreased, and is higher that that achieved in older kernel, where pstate works thee best and scaling happens as it is expected to be, sometimes evenwithout tlp installed/enabled (this is a newer observation after my original post after some tweaking and testing), i have also tested all other cpu scaling driver also with newer kernel, and acpi_cpufreq gave better scaling whereas intel_cpufreq (intel_pstate in passive) got least power, i have almost no clue on where to begin fixing this and any help is appreciated (if output of some command is required, kindly check the refered debian user forum post)

Thank you in advance, and please forgive my spelling and grammatical mistakes, and if this is a duplicate issue(which may or may not have been resolved), i only search for a few words, intel pstate, cpu scaling, power draw, and did  not read many. i also apologize if this is not an upstream thing, and will post somewhere
Comment 1 Bagas Sanjaya 2023-06-30 06:00:26 UTC
(In reply to sga from comment #0)
> I have posted same thing on debian user forums, and linking that so as to
> reduce duplication
> [url]https://forums.debian.net/viewtopic.php?t=154875[/url]
> 
> I will summarise a little, there is an increase in power usage going from
> kernel 5.10.x to something newer (have tried many of them between 5.18.x to
> 6.3.x), and since these tests were done on kernels from many sources, it
> seems an upstream thing, On newer kernels, if we set
> /sys/devices/system/cpu/intel_pstate/status to active (default by kernel),
> then cpu freq is almost stuck to 2GHz, if we set it to passive, it goes to
> 0.4GHz and increases when needed, in either cases the power usage is not
> decreased, and is higher that that achieved in older kernel, where pstate
> works thee best and scaling happens as it is expected to be, sometimes
> evenwithout tlp installed/enabled (this is a newer observation after my
> original post after some tweaking and testing), i have also tested all other
> cpu scaling driver also with newer kernel, and acpi_cpufreq gave better
> scaling whereas intel_cpufreq (intel_pstate in passive) got least power, i
> have almost no clue on where to begin fixing this and any help is
> appreciated (if output of some command is required, kindly check the refered
> debian user forum post)
> 
> Thank you in advance, and please forgive my spelling and grammatical
> mistakes, and if this is a duplicate issue(which may or may not have been
> resolved), i only search for a few words, intel pstate, cpu scaling, power
> draw, and did  not read many. i also apologize if this is not an upstream
> thing, and will post somewhere

Can you bisect between v5.10 and v5.18 please?
Comment 2 prosparety 2023-06-30 08:30:02 UTC
I tried out lubuntu 22.04 becuase i knew it had 5.15 lts, and 2 interesting pieces of information, htop still showed frequencies stuck to 2GHz, but when i checked powertop i did see that cpu was sparingly reaching C6, so for surity, i installed tlp, copied my config file from my debian install, and indeed it now stayed 71% in C6, and power usage decreased to ~2.5w, so problem starts somewhere after 5.15 and or before 5.18, this also confirms the findings of the user in the mint forums which i linked in my post on debian forum.

I will try to narrow down further, and will post if i find something
Comment 3 prosparety 2023-06-30 09:24:36 UTC
also tested fedora 36 from archive, with kernel 5.17.5, it is also broken, is there any other distro from where i can check 5.16
Comment 4 Bagas Sanjaya 2023-06-30 12:40:27 UTC
(In reply to sga from comment #3)
> also tested fedora 36 from archive, with kernel 5.17.5, it is also broken,
> is there any other distro from where i can check 5.16

I repeat: Can you please bisect (use git-bisect(1), see Documentation/admin-guide/bug-bisect.rst) between v5.10 and v5.18?
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-06-30 12:43:50 UTC
Increased power usage is considered a regression that must be fixed. But to claim that, you need to bisect this with an upstream kernel, one configuration, and one distro (you really need to fix the commit that made things worse, not the version); and you also need to check if the problem is still present in the latest upstream kernel https://docs.kernel.org/admin-guide/reporting-regressions.html

Maybe some developer will look into this report without this. But I doubt it for reasons outlined here:
https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/

I know these are likely answers that you don't like to hear, but I think it'S better to tell you this then to get no reply at all.
Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-06-30 12:44:09 UTC
*** Bug 217615 has been marked as a duplicate of this bug. ***
Comment 7 prosparety 2023-06-30 21:45:48 UTC
I was about to mark it regression, but then i thought if it would be a duplicate, then it would be just unneccesarily hyping the issue

with the newest kernel 6.4.x (compiled with config file of debian 6.1, and then using make localyesconfig and make localmodconfig, then make -j7), it still does not scale, cpu can not reach C6, but power draw is pretty low, like it dipped down to 3w, although it did not have all drivers, and neither did it have for my wifi (it is non free), this still got pretty low

i wanted to use bisect but i am too stupid, so in my mind i decided i would manually go and do binary search, i have already narrowed it down to 5.15 - 5.17(i know the testing was done with different kernels with different configs, forgive me for that), so i would keep doing this, but i am having issues compiling, i downloaded 5.16.00, copied debians 5.10s config, then for all the new options, i left them to default, except i marked that warnings need not be treated as error, this was because of the 1st issue

make HOSTCC=gcc-11 CC=gcc-11 -j7

I need some help, i know this is not the correct place to ask, but i am having troubles compiling the kernel, the problem is that gcc is throwing some warnings
In file included from help.c:12:
In function ‘xrealloc’,
    inlined from ‘add_cmdname’ at help.c:24:2:
subcmd-util.h:56:23: error: pointer may be used after ‘realloc’ [-Werror=use-after-free]
   56 |                 ret = realloc(ptr, size);
      |                       ^~~~~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to ‘realloc’ here
   52 |         void *ret = realloc(ptr, size);
      |                     ^~~~~~~~~~~~~~~~~~
subcmd-util.h:58:31: error: pointer may be used after ‘realloc’ [-Werror=use-after-free]
   58 |                         ret = realloc(ptr, 1);
      |                               ^~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to ‘realloc’ here
   52 |         void *ret = realloc(ptr, size);
      |                     ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
this was despite warning not as error

i searched this, and found that i should use gcc11 with this, so i used 
make HOSTCC=gcc-11 CC=gcc-11 -j7
it still warns but moves on

then it compiled and partway had security key mismatch, so i found i should set
scripts/config --disable SYSTEM_TRUSTED_KEYS
scripts/config --disable SYSTEM_REVOCATION_KEYS

now it almost completes, but in the end this comes
ld: warning: arch/x86/power/hibernate_asm_64.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
  MODPOST vmlinux.symvers
  MODINFO modules.builtin.modinfo
  GEN     modules.builtin
  LD      .tmp_vmlinux.btf
ld: warning: arch/x86/power/hibernate_asm_64.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: .tmp_vmlinux.btf has a LOAD segment with RWX permissions
  BTF     .btf.vmlinux.bin.o
  LD      .tmp_vmlinux.kallsyms1
ld: warning: .btf.vmlinux.bin.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: .tmp_vmlinux.kallsyms1 has a LOAD segment with RWX permissions
  KSYMS   .tmp_vmlinux.kallsyms1.S
  AS      .tmp_vmlinux.kallsyms1.S
  LD      .tmp_vmlinux.kallsyms2
ld: warning: .btf.vmlinux.bin.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: .tmp_vmlinux.kallsyms2 has a LOAD segment with RWX permissions
  KSYMS   .tmp_vmlinux.kallsyms2.S
  AS      .tmp_vmlinux.kallsyms2.S
  LD      vmlinux
ld: warning: .btf.vmlinux.bin.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: vmlinux has a LOAD segment with RWX permissions
  BTFIDS  vmlinux
FAILED: load BTF from vmlinux: Invalid argument
make: *** [Makefile:1161: vmlinux] Error 255
make: *** Deleting file 'vmlinux'

i searched about this and found https://lore.kernel.org/lkml/CABXGCsN8LHqz7=OSvBpKCqKdV4L_4FPXtQ32bgYveA9yP2_xiQ@mail.gmail.com/T/

which suggested to crosscompile from https://mirrors.edge.kernel.org/pub/tools/crosstool/
but then in the end the og poster in the above also could not complete
Comment 8 Artem S. Tashkinov 2023-07-01 11:15:46 UTC
You could try running powertop under working and broken kernels and see what changes.
Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-07-01 11:23:48 UTC
FWIW, I wonder if this might help:

https://lore.kernel.org/all/20230627062442.54008-1-mika.westerberg@linux.intel.com/
Comment 10 prosparety 2023-07-01 18:05:36 UTC
I am going to apologize, but i think it is indeed fixed in latest release contender, sorry for misreporting, yesterday when i compiled, i did not compile properly, plus my machine requires non free wireless firmware which i did not install, that could have been part of it not entering deeper cstayes, i can confirm that with 6.4-rc7 it is fixed

I am really sorry, marking this as resolvef
Comment 11 prosparety 2023-07-02 18:35:08 UTC
I want to add something to this(forgive me for not thoroughly testing the 1st and the 2nd time), i think it is still not completely there I tried 6.4.1 this time, thre problem now i am facing is that governor is not taking the correct power profile, so earlier(5.10) when not charging, my frequencies were limited to 1.5 GHz, now it is to 2 GHz, which isn't an issue, but when charging, it would do turbo in need till 3.7 GHz(cpu intensive task, when not throttled), now it does not, it is still stuck to 2 GHz. Also i noticed 1 thing, average power draw has also increased, i have a small command to show me how much time is left, with similar is work loads, where i am seeing 7-8 hours on 5.10, i am seeing 6 hours on the new one, to really see if it is something actual, i tried a somewhat sporadic load (type something for 20 seconds, then video playback for 1 minute, then mix, then idle, then doing some java compiles, then installing something, for a total of about 7 minutes(+-10 seconds) in each) and had powertop also monitoring c states (next time i will try to log with something to make it a bit sensible) and i can just tell that in newer kernel it is entering C6 and C8 less often, and stays more % in C3.(on continous loads, performance is same, like watching a video in full screen while not doing something else, or plain idle, or something else which is consistent, they are within like +-3% of each other which is totally okay). Also i encountered the bug (linked by Mr Thorsten above) that the system could not enter lower cstates after suspending, but not consistently, i tried 3 times, 1 occured naturally, i was doing something, left the system (music playing in background)(system goes to sleep after 10 mins), i come back, cant enter lower C states than c3, then restart, suspend, enters lower cstates just fine, then repeat but this time letting it suspend by itself after 10 minutes in idle (no music or anything), and it goes to lower cstates just fine. so possibly now that bug happens only if there was some load

On a (i guess) un related note, internet (wifi) has also taken a hit, where it sometimes just does not work or work very slowly, only noticable when visiting a new site or refreshing the package repos but not all the time, it is not a very serious thing though, adds at most 2-3 seconds delay, but the speed in the end is still the same
Comment 12 prosparety 2023-08-02 13:30:52 UTC
I am sorry i had not marked this resolved yet, mostly because i had stopped trying, but as of today with 6.4.7, both cpu scaling and idle states are working, and i also quantitatively measured the average (%) time spent in C6 idele during same tasks as mentioned in previous post, with both kernels, and i was wrong, the difference is mostly negligible (33% 5.10.172 vs 31% 6.4.7), i am sorry for being paranoid, and that wifi issue is was also fixed (i had setup power saving for my wifi which caused that delay, it used to work as is in older kernel, but on newer kernel, it seems to be adding a small delay, removed that, now everything is fine)

Once again, Thank You Linux Kernel Dev Team