Bug 216347
Summary: | ACPI power profile keeps switching to "low-power" on ThinkPad T14 Gen1 (AMD) | ||
---|---|---|---|
Product: | Drivers | Reporter: | madcatx |
Component: | Platform_x86 | Assignee: | Mark Pearson (mpearson-lenovo) |
Status: | ASSIGNED --- | ||
Severity: | normal | CC: | agurenko, binli, fkrueger, mario.limonciello, mpearson-lenovo, platform_x86_64, postix |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.19 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg with dyndbg
Patch to W/A default profile wrong |
Description
madcatx
2022-08-10 09:08:22 UTC
Do you have power-profiles-daemon installed? That's the most likely cause to this behavior I would expect. In general it's probably better to change the profile using the d-bus API with PPD if you have PPD installed so you don't have two things fighting over the same file. FYI - technically there are different profiles for Lenovo AMD systems on AC adapter or battery but thinkpad_acpi doesn't export them right now. If this changes in the future you may want to keep an eye on it because your udev rules won't make sense anymore. I don't have PPD installed and a quick look around didn't reveal anything that would mess with the power profile. What's the expected power_profile value after boot when nothing touches it? I can try to boot off a lean Live distro and check. I tried to remove the power_profile toggle from the script I trigger through udev rules to check that it wasn't somehow misbehaving but it made no difference. If PPD isn't the cause maybe it's the firmware causing it? If you can modify the kernel driver, can you please add some debugging statement here to see if a related event is coming through: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/thinkpad_acpi.c#L11023 I modded the driver to log the events, here's the output. Note that I log all events and have an extra output for the THM event so the output is a bit noisier that necessary. [ 110.307214] TPACPI event id 24626 * Manual switch to performance [ 110.307221] TPACPI event THM_CSM_COMPLETED [ 110.308287] TPACPI event id 4114 * Drop to low power [ 125.374888] TPACPI event id 24626 [ 125.374894] TPACPI event THM_CSM_COMPLETED * Manual switch to performance [ 178.560859] TPACPI event id 24626 [ 178.560862] TPACPI event THM_CSM_COMPLETED * Drop to low power [ 321.588092] TPACPI event id 24626 [ 321.588100] TPACPI event THM_CSM_COMPLETED * Manual switch to performance [ 340.418211] TPACPI event id 24626 [ 340.418215] TPACPI event THM_CSM_COMPLETED * Manual switch to performance [ 344.953610] TPACPI event id 24626 [ 344.953614] TPACPI event THM_CSM_COMPLETED * Drop to low power [ 350.357321] TPACPI event id 24626 [ 350.357328] TPACPI event THM_CSM_COMPLETED * Manual switch to performance [ 352.865566] TPACPI event id 24626 [ 352.865573] TPACPI event THM_CSM_COMPLETED * Drop to low power [ 362.030351] TPACPI event id 24626 [ 362.030355] TPACPI event THM_CSM_COMPLETED There doesn't seem to be anything hammering the toggle, it just switches itself back to "low-power". There doesn't seem to be any apparent pattern that would provoke this. All I can tell is that the power profile seems to switch itself back from "performance" to "low-power" after some time. It can take anywhere from 5 to 70 seconds for this to happen. It also looks like the "balanced" profile seems to stick but I need to verify this better. > There doesn't seem to be anything hammering the toggle, it just switches > itself back to "low-power". Well it at least proves this is happening from firmware/hardware and not (obviously) from your own doing in software. I think it still needs further characterization. > There doesn't seem to be any apparent pattern that would provoke this. Just as a guess; could this be related to lap mode? There is a lap mode value stored in `dytc_lapmode`. Can you add that to your output? @Mark maybe you can help explain this? If nothing else I think it would be good to extend some dev_info messaging when this is happening. /sys/devices/platform/thinkpad_acpi/dytc_lapmode does exist and reports 0. I tried to rest the computer on my lap for a while but the value didn't change. To clarify, I observe this behavior with the computer sitting on my desk with the AC and an external display plugged in. I haven't checked if having the computer disconnected from everything will make a difference but I guess that it won't. ATM I'm pretty confident that the "balanced" power profile is not getting reset to "low-power", it's just the "performance" mode that behaves oddly. > /sys/devices/platform/thinkpad_acpi/dytc_lapmode does exist and reports 0. > I tried to rest the computer on my lap for a while but the value didn't > change. That's odd - I would expect the sensor to report a change moving to your lap. But it's good you confirmed the lap movement specifically isn't the cause. > it's just the "performance" mode that behaves oddly. Another guess - could be that you hit a thermal threshold while in performance mode and the firmware is bringing you to low power mode to cool? Thermal protection was my first idea but it happens when the laptop is idling. I just checked with no CPU load, stopped fan and CPU temp at 49 °C. Can you turn on dynamic debugging for all thinkpad_acpi and share the full log from startup until this happens? Created attachment 301551 [details]
dmesg with dyndbg
Is this dump sufficient or are the more toggles I should switch on?
Before I enabled dyndbg, I wanted to test a theory that the FW is running some kind of a watchdog that flips the profile periodically. I ran a script that probed the respective sysfs node checking for "low-power" value, flipped it back to "performance" and measured time between each occurrence. Naturally, it happened only once for the entire time I ran the script and I had it running for at least 30 minutes.
As you can probably observe from the dmesg dump, it took about 23 minutes for the profile to switch itself to "low-power".
Oddly enough, the output of the script now looks like this:
It took 3.764500002034765e-05 seconds for the value to switch to "low-power"
It took 41.054596489999994 seconds for the value to switch to "low-power"
It took 57.07886033 seconds for the value to switch to "low-power"
It took 51.06944546 seconds for the value to switch to "low-power"
It took 118.16089479499988 seconds for the value to switch to "low-power"
It took 7.020622905999971 seconds for the value to switch to "low-power"
It took 19.033508302999962 seconds for the value to switch to "low-power"
It took 15.02578347900021 seconds for the value to switch to "low-power"
It took 114.14982450999992 seconds for the value to switch to "low-power"
It took 7.022350104999987 seconds for the value to switch to "low-power"
and the corresponding dmesg output:
[ 1505.869566] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1505.869574] TPACPI event id 24626
[ 1505.869577] TPACPI event THM_CSM_COMPLETED
[ 1562.724028] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1562.724036] TPACPI event id 24626
[ 1562.724038] TPACPI event THM_CSM_COMPLETED
[ 1562.948446] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1562.948454] TPACPI event id 24626
[ 1562.948456] TPACPI event THM_CSM_COMPLETED
[ 1613.144708] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1613.144716] TPACPI event id 24626
[ 1613.144718] TPACPI event THM_CSM_COMPLETED
[ 1614.017988] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1614.017995] TPACPI event id 24626
[ 1614.017997] TPACPI event THM_CSM_COMPLETED
[ 1732.062505] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1732.062512] TPACPI event id 24626
[ 1732.062515] TPACPI event THM_CSM_COMPLETED
[ 1732.181973] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1732.181981] TPACPI event id 24626
[ 1732.181983] TPACPI event THM_CSM_COMPLETED
[ 1738.888780] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1738.888788] TPACPI event id 24626
[ 1738.888791] TPACPI event THM_CSM_COMPLETED
[ 1739.198088] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1739.198096] TPACPI event id 24626
[ 1739.198098] TPACPI event THM_CSM_COMPLETED
[ 1757.413039] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1757.413047] TPACPI event id 24626
[ 1757.413049] TPACPI event THM_CSM_COMPLETED
[ 1758.229944] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1758.229952] TPACPI event id 24626
[ 1758.229954] TPACPI event THM_CSM_COMPLETED
[ 1772.953590] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1772.953598] TPACPI event id 24626
[ 1772.953600] TPACPI event THM_CSM_COMPLETED
[ 1773.258104] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1773.258112] TPACPI event id 24626
[ 1773.258115] TPACPI event THM_CSM_COMPLETED
[ 1887.298581] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1887.298588] TPACPI event id 24626
[ 1887.298591] TPACPI event THM_CSM_COMPLETED
[ 1887.410913] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1887.410920] TPACPI event id 24626
[ 1887.410921] TPACPI event THM_CSM_COMPLETED
[ 1893.848321] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1893.848329] TPACPI event id 24626
[ 1893.848332] TPACPI event THM_CSM_COMPLETED
[ 1894.428211] [6] thinkpad_acpi: EC reports: Thermal Control Command set completed (DYTC)
[ 1894.428218] TPACPI event id 24626
[ 1894.428220] TPACPI event THM_CSM_COMPLETED
> pcie_aspm=force I've mentioned this in other bug reports I've noticed it. Just FYI, this is not generally safe to do and you may compromise system stability. I would suggest that if you identify a device that is not configured for ASPM properly by the BIOS that you raise it with the OEM to investigate changing the configuration in the BIOS. They can validate the safety and efficacy of such a change using the proper equipment. > [511] thinkpad_acpi: PSC is supported OK good; this is what I wanted to confirm that you didn't inadvertently get support for both MMC and PSC simultaneously (which would be a BIOS bug) > As you can probably observe from the dmesg dump, it took about 23 minutes for > the profile to switch itself to "low-power". Kernel timestamps are hard to correlate to "real time" unless you use `dmesg -H` / `dmesg -T` I'm curious - if you let the firmware transition you from performance to low power and then you "leave it there", does it go back to performance after some time? I think we'll need to wait for Mark's guidance if this is a FW bug or thinkpad_acpi bug. I've never observed the profile changing back to performance. It's hard to draw any solid conclusions because the problem is so erraric. I did notice something else though. Sometimes after a cold boot the profile value will stay in performance for a very long time. To get the problem to manifest, I tried to poke the FW by running "sensors". That triggered the drop to low-power and the switching cycle began. This doesn't mean that each run or sensors is followed by a switch to low-power but I'm wondering if the temperature or fan speed readout probes the FW in a wrong way and causes this. After a few more days of observation, I can add just a few things: - After a reboot, the laptop always seems to start in "low-power" mode. I don't see any DYTC events in dmesg unless I change the profile through sysfs. Even a cold boot with the AC adapter plugged in defaults to "low-power" profile. Is this supposed to happen? - "balanced" profile seems to stick. Created attachment 301594 [details]
Patch to W/A default profile wrong
That doesn't seem expected to me. I think it's most likely a second BIOS bug that Mark should raise with the Lenovo team, but I think it's possible for thinkpad_acpi to W/A it directly.
Try this attached patch. It should help the default start mode, but the automatically switching modes will still need Lenovo's help and investigation.
Sorry all - I was off last week and struggling to catch up with the Inbox and everything on the todo list so only just caught this thread. I'll have a look, see if I can reproduce, and follow up with the FW team - I have internal ticket LO-1960 for tracking this issue. It does sound wrong and like it's not behaving as it should. Just for my nosiness - can you share your udev script? I tried previously to get some patches accepted to do profile switching depending on battery/AC but they were rejected and I haven't revisited them. The PSC profiles are supposed to be different depending on power mode (that's what Windows does) Give me a nag if I haven't posted an update by next week as it's a bit crazy this week :) Mark > The PSC profiles are supposed to be different depending on power mode
> (that's what Windows does)
Maybe that's the root of this problem. Because there aren't mappings for the battery vs AC profile in thinkpad_acpi something is going wrong.
(In reply to Mark Pearson from comment #15) > Sorry all - I was off last week and struggling to catch up with the Inbox > and everything on the todo list so only just caught this thread. > > I'll have a look, see if I can reproduce, and follow up with the FW team - I > have internal ticket LO-1960 for tracking this issue. > > It does sound wrong and like it's not behaving as it should. Just for my > nosiness - can you share your udev script? I tried previously to get some > patches accepted to do profile switching depending on battery/AC but they > were rejected and I haven't revisited them. The PSC profiles are supposed to > be different depending on power mode (that's what Windows does) > > Give me a nag if I haven't posted an update by next week as it's a bit crazy > this week :) > > Mark Thanks for bringing this to Lenovo techs. My power script is a very simple thing: #! /bin/bash case "$1" in 0) echo powersupersave > /sys/module/pcie_aspm/parameters/policy echo 1 > /sys/module/snd_hda_intel/parameters/power_save echo low-power > /sys/firmware/acpi/platform_profile iwconfig wlp3s0 power on iwconfig wlp3s0 txpower 18 ethtool -s enp5s0 wol d ethtool -s enp2s0f0 wol d ;; 1) echo performance > /sys/module/pcie_aspm/parameters/policy echo 1 > /sys/module/snd_hda_intel/parameters/power_save echo balanced > /sys/firmware/acpi/platform_profile iwconfig wlp3s0 txpower 20 ethtool -s enp5s0 wol d ethtool -s enp2s0f0 wol d ;; esac triggered with SUBSYSTEM=="power_supply", ATTR{online}=="0", RUN+="/opt/toggle_pwr 0" SUBSYSTEM=="power_supply", ATTR{online}=="1", RUN+="/opt/toggle_pwr 1" It's mostly a legacy from times where Linux needed some extra help to get decent PM on laptops. @Mario: Thanks for the patch, I'll give it a go! The patch to set "balanced" profile at the end dytc init seems to work. OK, I sent out that W/A patch. Let's wait for Mark to come back about the other part of the issue with it switching on it's own. I did a bit more digging into this I can't reproduce on my system (Fedora36 latest, and also with the latest 6.0-rc1 kernel). I did confirm that it comes up in default mode and then Gnome goes and sets it to whatever mode was last chosen (interestingly it seems to be retained over reboot). What's slightly interesting from my point of view is the FW isn't in PSC mode by default - so when gnome first sets the mode it's also enabling PSC. I think that means Mario's patch is reasonable (I'll post to the platform-x86 shortly Mario). It essentially forces the FW into PSC mode if that's supported. The only caveat is that it really shouldn't be needed as Gnome will do the same - but I suspect it depends a lot on your distro (or desktop environment) how good this is. Apologies if I've missed it in the above - but what are you running? I did find that if lapmode was activated that it would switch from performance to balanced (though on my system I'm finding it incredibly hard to trigger the lapmode sensor...but it's a prototype system and that particular sensor is out of SW control so I'm putting it down to HW). As you confirmed the lapmode sensor wasn't involved though I'm ignoring it for now. I'm also not sure in the script what effect these two lines have: echo performance > /sys/module/pcie_aspm/parameters/policy echo 1 > /sys/module/snd_hda_intel/parameters/power_save The snd_hda_intel line seems particularly peculiar. Do you see the issue if you're not using your script and just have the system running regularly and toggle only the profile manually? I'll mess around with this a bit more and see if I can reproduce it. > What's slightly interesting from my point of view is the FW isn't in PSC mode > by default - so when gnome first sets the mode it's also enabling PSC. That's quite an interesting result actually. So that means that the system as it boots up probably is completely "untuned" but the act of setting balanced mode will apply a "balanced" tuning. > The only caveat is that it really shouldn't be needed as Gnome will do the > same - but I suspect it depends a lot on your distro (or desktop environment) > how good this is In general I think it's best to not plan the kernel driver to assume that userspace has/will touch the knob, especially if it means better performance out of the box. Not all the distros use PPD etc. (In reply to Mark Pearson from comment #20) > I did a bit more digging into this > > I can't reproduce on my system (Fedora36 latest, and also with the latest > 6.0-rc1 kernel). > I did confirm that it comes up in default mode and then Gnome goes and sets > it to whatever mode was last chosen (interestingly it seems to be retained > over reboot). What's slightly interesting from my point of view is the FW > isn't in PSC mode by default - so when gnome first sets the mode it's also > enabling PSC. > > I think that means Mario's patch is reasonable (I'll post to the > platform-x86 shortly Mario). It essentially forces the FW into PSC mode if > that's supported. The only caveat is that it really shouldn't be needed as > Gnome will do the same - but I suspect it depends a lot on your distro (or > desktop environment) how good this is. Apologies if I've missed it in the > above - but what are you running? I'm not Arch so I don't get almost anything preinstalled. > > I did find that if lapmode was activated that it would switch from > performance to balanced (though on my system I'm finding it incredibly hard > to trigger the lapmode sensor...but it's a prototype system and that > particular sensor is out of SW control so I'm putting it down to HW). As you > confirmed the lapmode sensor wasn't involved though I'm ignoring it for now. Assuming that the value of "/sys/devices/platform/thinkpad_acpi/dytc_lapmode" is supposed to be 1 when the sensor detects a "lap mode", I found it impossible to trigger it myself. I'm not sure what the trigger conditions are so maybe I just wasn't doing the right thing. > > I'm also not sure in the script what effect these two lines have: > echo performance > /sys/module/pcie_aspm/parameters/policy > echo 1 > /sys/module/snd_hda_intel/parameters/power_save > The snd_hda_intel line seems particularly peculiar. > Do you see the issue if you're not using your script and just have the > system running regularly and toggle only the profile manually? Audio chip power saving would be a very unusual suspect. I've removed the enforced ASPM policy from the kernel command and trimmed down the power script to only disable WOL and reduce radio power when on battery. The issue remains though. Is there any update on this? Over the time I also noticed some less-than-expected power profile switches after wakeups from sleep. Kernel 6.0 should drop in a few days so I can do a bit more comprehensive test with that if you're interested. |