Bug 217076 - Charging causes high CPU usage on LG Gram laptops series Z90Q
Summary: Charging causes high CPU usage on LG Gram laptops series Z90Q
Status: NEW
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_acpica-core@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-23 13:40 UTC by RobinLabadie
Modified: 2024-03-08 11:08 UTC (History)
11 users (show)

See Also:
Kernel Version: 6.1 - 6.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description RobinLabadie 2023-02-23 13:40:47 UTC
Hello,

This issue has been spotted and reported on Ubuntu forums, and users encounter the same abnormal behavior with at least Ubuntu, Fedora and Clear Linux.
Therefore, it appears to be kernel related rather than distro kernel implementation. I do believe it has to be forwarded here; but since I'm new here, please pardon me if it shouldn't be posted here or posted this way.

To sum up, symptoms are the following:
- Original post reported this issue occurred when external monitors were displayed via USB-C.
- Charging an LG Gram laptop (14ZB90Q, 16Z90Q, 17Z90Q) via USB-C (standalone charger or docking station) causes significant and abnormal CPU utilization and therefore heat and noise.
- Using default USB-C charger, the issue seems to stop once the laptop is 100% charged.
- Using Thunderbolt USB-C docking station, the issue seems to remain regardless of the charging percentage.
- One person reported that booting to Windows 11 then rebooting to Linux appears to fix the issue for them.

Processes with high CPU utilization on Fedora:
[kworker/u32:8+USBC000:00-con0]
[kworker/0:0+events]
[kworker/0:3+kacpi_notify]

Processes with high CPU utilization on Ubuntu:
kworker/u32:15+USBC000:00-con0
kworker/0:1+kacpi_notify
kworker/u32:2-events_power_efficient
kworker/0:0-events
kworker/1:2+kec_query
kworker/0:3-events
kworker/15:2-events


More technical details and logs on the original topic from "rustyx" on Ubuntu support forum: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987829

Best regards
Comment 1 Cristian Cocos 2023-02-23 16:58:13 UTC
My experience with Clear Linux and Ubuntu:

- Setup: external monitors connected via TB4 docking station (make and model does not matter, it happens with all TB4 docks I got) to an LGGram laptop (I have two recent different models (2021 and 2022), happens on both);
- Symptoms: pretty much all symptoms mentioned in the link posted above (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1987829). The most glaring symptoms are: (a) overheating--CPU temperature hovers around 90C--and (b) syslog spamming;
- Palliative "cure":
       1. run the `sudo rmmod int3403_thermal` command sometime after boot (~1min) either manually or automatically: this stops the kernel spamming, though it does not deal w/the overheating;
       2. blacklist the int3403_thermal module fixes both issues, at the expense of very high CPU utilization(!), that renders the whole system pretty much unusable.

More details:
- https://community.clearlinux.org/t/execute-command-upon-reboot-systemd-timer/8484/10
- https://www.reddit.com/r/linuxhardware/comments/x97m6l/comment/j2r7irr/?utm_source=reddit&utm_medium=web2x&context=3
Comment 2 Pixel 2023-04-10 10:34:42 UTC
Hello,

Here is my experience. I am using an LG Gram 14Z90Q-G.AA76B and have faced high CPU usage from kworker threads on every distro I have tried (Fedora, ArchLinux, VanillaOS, Ubuntu). From my experience, there is a somewhat reliable way to temporarily stop the high CPU usage. I don't dualboot windows, so I cannot test the suggestion of booting from windows to linux. Rather, I have noticed the following pattern:

1. Plug In USB-C cable (charging/HDMI)
2. Run "top" to verify that 3 kworker threads appear with high CPU usage. The fan will eventually kick in
3. Suspend the laptop (close the lid, press power button, press Fn+s, or manually suspend) and wait for the fan to completely turn off. Optionally unplug and plug in USB-C cable (sometimes helps).
4. Wake device from sleep.
5. Run "top" again, kworkers will be gone

Now this does not work 100% of the time, as sometimes after waking from sleep the device still keeps the kworker threads up and running, but eventually after doing this they disappear until the next reboot. This workaround gets the job done for me, as long as I don't have to reboot often.
Comment 3 Víctor 2023-04-15 07:32:39 UTC
Hello,

I am experiencing similar issues. I have the LG Gram 17-ZD90Q. I am on Archlinux-KDE and I dual-booted with windows in order to check if the problem was related to the kernel or if it was in the BIOS. Windows apparently goes well, so it is something related to linux. I tried several linux kernels but any of them worked. Right now I'm on linux-zen.
One day I did something (probably update the whole packages) and the fans started to slow down as did the cpu's... That was surprising but on the next boot the problem reappeared. :(

Also I cannot sleep my laptop because when I try to do it, the laptop freezes... And I think it is a problem related to the cpu's utilization too, because that time when the error disappeared magically I could sleep my laptop without any problems.

I thought the error could come from the inappropriate handling of Intel gen12 in the linux kernels (6.xxx) but they are supposed to handle well the intel gen13, so that makes me doubt...

As this problem is also related with the large number of acpi interrupts, some people say that you should mask (at the boot-up) the gpeXX with a high number of interrupts that you get with the command

grep -r enabled /sys/firmware/acpi/interrupts

But that's not a solution!!! Because, it may reduce the high cpu usage, but on the other hand, other important interrupts that should be resolved 'instantaneously' (for example, key presses) also get masked... As a result keys such as the ones for increasing the brightness or the Captial letters one did not work properly.

Any kind of help is welcome!! Thank you.
Comment 4 Cherrot 2023-05-26 07:15:59 UTC
(In reply to Pixel from comment #2)

> Rather, I have noticed the following pattern:
> 
> 1. Plug In USB-C cable (charging/HDMI)
> 2. Run "top" to verify that 3 kworker threads appear with high CPU usage.
> The fan will eventually kick in
> 3. Suspend the laptop (close the lid, press power button, press Fn+s, or
> manually suspend) and wait for the fan to completely turn off. Optionally
> unplug and plug in USB-C cable (sometimes helps).
> 4. Wake device from sleep.
> 5. Run "top" again, kworkers will be gone
> 
> Now this does not work 100% of the time, as sometimes after waking from
> sleep the device still keeps the kworker threads up and running, but
> eventually after doing this they disappear until the next reboot. This
> workaround gets the job done for me, as long as I don't have to reboot often.

Thanks for sharing! This works for me too. I have 

In my experience the key point here seems to wait for several minutes and unplug the cable after suspending. Otherwise the 3 kworker threads will remain in `top`.
Comment 5 Cherrot 2023-05-26 07:19:05 UTC
... I have a LG Gram 16Z90Q-G.CD78C, which runs Arch Linux with kernel 6.3.3-arch1-1
Comment 6 Víctor 2023-05-27 08:41:57 UTC
I found a similar but different workaround.

1- Boot into your linux as always (plugged-in or not, I think it doesn't matter).
2- Immediately after logging into your desktop, suspend the laptop (I did it by closing the lid).
3- Wait for a few seconds (1 or 2) and wake it again from sleep. 
4- The kworkers shouldn't appear now.

It seems that this workaround prevents the kworkers from initiating during startup. Once the laptop is suspended, they are unable to appear again, for some unknown reason.
Comment 7 eric 2023-06-02 12:48:09 UTC
Same issues with LG Gram 17Z90R-A.ADB9U1 tested with debian 6.1.0-9 and torvalds/linux.git 6.4.0-rc4.

When charging via USB-C I'm getting ACPI errors messages in dmesg and noticably high CPU usage with kworker processes like others above.

ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN1._TMP due to previous error (AE_NOT_EXIST) (20220331/psparse-529)
ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN2._TMP due to previous error (AE_NOT_EXIST) (20220331/psparse-529)
ACPI Error: No handler for Region [XIN1] (000000004f00cfe9) [UserDefinedRegion] (20220331/evregion-130)
ACPI Error: Region UserDefinedRegion (ID=143) has no handler (20220331/exfldio-261)
thermal thermal_zone7: failed to read out thermal zone (-5)
thermal thermal_zone7: failed to read out thermal zone (-61)
	with thermal_zone 2,3,4,6,7 ...

Workarounds for now:
- Blacklisting module int3403_thermal prevents the ACPI and thermal errors messages.
- Booting with kernel parameter "acpi_mask_gpe=0x6E" prevents the high CPU usage.
Comment 8 PinkFromTheFuture 2023-07-15 22:38:02 UTC
Can confirm the issue on my LG Gram 17Z90Q-K (2022 model) running Manjaro (Arch) with Kernels:
6.4.2-3 (latest version I have available on my system)
6.1.38-1 (LTS)

Happens both connected to my Thunderbolt 3 dock that also charges the computer, or to the power supply. Doesn't happen when running on battery power.

Doesn't happen on my LG gram from 2021.

1. I didn't find any reports of this issue for the 2023 models. Did anyone?

I believe that we need either a Linux Kernel patch or a BIOS update, but there doesn't seem to be any updates to the BIOS of the 2022 lined up of LG Gram.

2. How can we get this fixed? Who can save us!?

3. Are there any workarounds that can be programmatically applied?

(In reply to eric from comment #7)
> Workarounds for now:
> - Blacklisting module int3403_thermal prevents the ACPI and thermal errors
> messages.
> - Booting with kernel parameter "acpi_mask_gpe=0x6E" prevents the high CPU
> usage.

I'm resistant to try it because:

```
When you specify a kernel boot parameter like acpi_mask_gpe=0x6E, you're telling the kernel to ignore or mask a specific ACPI GPE, in this case, the GPE with the address 0x6E.

This can be useful in certain scenarios where a specific GPE is causing issues, like system instability or high CPU usage. By masking the GPE, you prevent the system from processing that event, which can work around the issue.

However, you should only use this parameter if you are sure that the specific GPE is causing issues and you understand the implications of masking it. In some cases, it can lead to other problems, like certain hardware events not being correctly detected by the OS.
``` - Chat GPT 4
Comment 9 PinkFromTheFuture 2023-07-16 16:03:00 UTC
Here is my elegant workaround to the problem:
```
[Unit]
Description=Fixes many issues with the 2022 LG Grams running linux and sets charging limit to 80

# save this script to /etc/systemd/system/lg-gram.service
# then run:
# sudo systemctl daemon-reload
# sudo systemctl enable lg-gram.service
# sudo systemctl start lg-gram.service

# more documentation: https://www.reddit.com/r/LGgram/comments/150p3rg/critical_bug_affecting_the_2022_lineup_on_linux/

[Service]
Type=oneshot
# Unmask GPE interrupts to resolve the issue of high temperatures and fan noise even on idle when the laptop is charging through USB-C/TB:
ExecStart=/bin/bash -c "echo 'unmask' > /sys/firmware/acpi/interrupts/gpe6E"

# sets charging limit to 80 to increase battery longevity:
ExecStart=/bin/bash -c "echo 80 > /sys/class/power_supply/CMB0/charge_control_end_threshold"

# Disable "Silent mode":
ExecStart=/bin/bash -c "echo 1 > /sys/devices/platform/lg-laptop/fan_mode"

# Unload the int3403 temp sensor library from the kernel to fix ACPI flood issue:
# ExecStart=/bin/sh -c "rmmod int3403_thermal"

# Disable turbo boost (trade single threaded performance for lower heat output and maybe battery life)
# ExecStart=/bin/sh -c 'echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo'
# ExecStop=/bin/sh -c 'echo 0 > /sys/devices/system/cpu/intel_pstate/no_turbo'

# Fix for thermal throttle issue that on some distros can put the CPU running on low wattages:
# ExecStart=/bin/bash -c "systemctl disable --now thermald"

RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
```

--
The issue seems to be caused by:
some miscommunication between Linux thinking it can probe for thermal info on a device which did not register a handler to do so
Source - https://www.reddit.com/r/linuxhardware/comments/x97m6l/fedora_lg_gram_16_2022_12th_gen_alder_lake/ (added to the original post)

I'm happy to confirm that the kernel parameter acpi_mask_gpe=0x6E seems to fix the issue of fans blasting and high temperatures even on idle!
An observed drawback from this solution is that is breaks the functionality of the screen brighness buttons: it still works, but you can hold it, and it also has a big delay in applying each setting - This doesn't bother me as I already use some scripts with a custom keyboard shortcut to set the brighness of all my displays at once. Another solution is unmasking the GPE interrupts not through a kernel parameter, but after boot with: echo unmask > /sys/firmware/acpi/interrupts/gpe6E - This way the brightness button issues don't manifest.
Furthermore, setting the kernel parameter acpi_mask_gpe=0x6E, it could be affecting idle power draws if the values returned by GPE are used to put cpus into idle states. Thankfully the issue only happens when the laptops are charging, so it wouldn't kill battery life.
From the kernel documentation I could find:
acpi_mask_gpe=  [HW,ACPI]Due to the existence of _Lxx/_Exx, some GPEs triggeredby unsupported hardware/firmware features can result inGPE floodings that cannot be automatically disabled bythe GPE dispatcher.This facility can be used to prevent such uncontrolledGPE floodings.Format: <byte> or <bitmap-list>
So the documentation is aligned with the observations of the fix/workaround.

--
Furthermore, there is another possible fix: journalctl seems to output many ACPI errors, probably due to the miscommunication between Linux thinking it can probe for thermal info on a device which did not register a handler to do so. The ACPI issue is a flood
caused by the int3403 temp sensor library, which can be unloaded from the kernel without any other visible system effect
sudo rmmod int3403_thermal is how it's unloaded
A permanent fix seems to be patching the int3403 library to add something like
if (sensorNotPresent) then skip or unregister until next launch

--
There seems to be yet another issue related to this:
power usage goes way up to around 10W at idle (instead of like 3W) and there's a kernel thread with high load (visible in powertop) related to the i915 graphics driver
The solution seems to be:
echo 1 > /sys/kernel/debug/dri/1/i915_hpd_short_storm_ctl
That should be run on every boot to stop another interrupt flooding from happening, but this solution:
it might break multi-stream transport on DisplayPort - the short storm detection is enabled by default unless multi-stream transport is supported

--
Another finding: The laptop is by default running on  "Silent mode" which can be disabled by:
echo 1 > /sys/devices/platform/lg-laptop/fan_mode

--
The laptop indeed seems to have no BIOS updates.

--
Another issue seems to be that on some distros the laptop might thermal throttle to very low wattages, and the fix is:
systemctl disable --now thermald

--
I'm also adding to my original post a script that seems to be an elegant workaround to solve the issues observed.
Comment 10 Cristian Cocos 2023-07-16 16:21:54 UTC
(In reply to PinkFromTheFuture from comment #8)
> Doesn't happen on my LG gram from 2021.

I can confirm that on my LGGram 2021.

> 1. I didn't find any reports of this issue for the 2023 models. Did anyone?

I got a new LGGram *2023*, and yes, it DOES happen there too. (And yes, I have a 2022 LGGram as well.)

> I believe that we need either a Linux Kernel patch or a BIOS update, but
> there doesn't seem to be any updates to the BIOS of the 2022 lined up of LG
> Gram.

Fat chance! I spent a lot of time w/LG support on the phone, and their canned reply is always: "this computer is for WINDOWS ONLY. Linux need not apply!"

> 2. How can we get this fixed? Who can save us!?

Same workarounds: upon reboot (1) mask gpe0x6E, and (2) unload the int3403_thermal module. That does it for both my LGGrams 2022 and 2023. This can be elegantly done via a systemd service.

Note that, in my experience, unmasking gpe0x6E some time after reboot (as some have suggested) triggers the ACPI storm back on, so I prefer to leave it masked. I have not noticed any issues from any of these two workarounds.

Alternatively, you could go back to Windows! :-)

> 
> 3. Are there any workarounds that can be programmatically applied?

See above.
Comment 11 PinkFromTheFuture 2023-07-16 16:36:51 UTC
(In reply to Cristian Cocos from comment #10)
> Same workarounds: upon reboot (1) mask gpe0x6E, and (2) unload the
> int3403_thermal module. That does it for both my LGGrams 2022 and 2023. This
> can be elegantly done via a systemd service.
Thanks! I just posted the results of my saga above.

Could unloading int3403_thermal cause other issues?? I see you said it didn't cause any issues for you, but I heard it could, so I'm not using it as part of my solution (but I added it to the service file I shared, to help others)


> Note that, in my experience, unmasking gpe0x6E some time after reboot (as
> some have suggested) triggers the ACPI storm back on, so I prefer to leave
> it masked. I have not noticed any issues from any of these two workarounds.

I didn't quite understand this point and it feels important to me to understand it. Could you be more precise as to what you're doing?
I thought the solution was to unmask... But you're masking it instead?


> Alternatively, you could go back to Windows! :-)

NO WAY! Argh! lol XD
I rather spend hours trying to patch this issue! ;)
Comment 12 Cristian Cocos 2023-07-16 17:11:06 UTC
(In reply to PinkFromTheFuture from comment #11)
 
> Could unloading int3403_thermal cause other issues??

I haven't heard of any issues from anybody about that.

> I didn't quite understand this point and it feels important to me to
> understand it. Could you be more precise as to what you're doing?
> I thought the solution was to unmask... But you're masking it instead?

gpe0x6E needs to be MASKED: acpi_mask_gpe=0x6E (note the "mask" in there). *That* is what stops the ACPI storm. Some people have reported that UNmasking that ACPI interrupt some time AFTER masking it at boot-time gives you the best of both worlds, but that is not my experience. It may work in Ubuntu, but that is not what I am running. I, for one, prefer to keep it masked.

See also this: https://www.reddit.com/r/linuxhardware/comments/x97m6l/comment/izdwsxn/?utm_source=reddit&utm_medium=web2x&context=3
Comment 13 Cherrot 2023-07-17 10:05:41 UTC
(In reply to PinkFromTheFuture from comment #9)
> Here is my elegant workaround to the problem:
> ```
> [Unit]
> Description=...
> # more documentation:
> #
> https://www.reddit.com/r/LGgram/comments/150p3rg/critical_bug_affecting_the_2022_lineup_on_linux/
> 
> [Service]
> Type=oneshot
> Restart=on-failure
> 
> # To resolve the issue with GPE interrupts, causing high temperatures and fan
> noise even on idle when the laptop is charging through USB-C/TB, 
> # add to the kernel parameters `acpi_mask_gpe=0x6E`
> # However, this will cause issues with the keyboard screen brightness
> shortcuts which can be resolved by adding the Unmask GPE interrupts during
> boot: 
> ExecStart=/bin/bash -c "echo 'unmask' > /sys/firmware/acpi/interrupts/gpe6E"
> 
> ...


Thanks for your workaround and explanation! 

However in my case, I found that unmask ACPI interrupt immediately after boot will bring the CPU throttle issue back. 

I've fixed it by adding a 1 minute sleep: 


```
[Service]
Type=oneshot
Restart=on-failure

ExecStartPre=/bin/sleep 60
ExecStart=/bin/bash -c "echo 'unmask' > /sys/firmware/acpi/interrupts/gpe6E"
```
Comment 14 Ralph Martin 2023-09-24 07:02:12 UTC
Reneabling the interrupts after boot, even with one minute delay, causes the problem to come back for me, on Debian testing, with any recent kernel, on an 2022 LG Gram 16.


As a separate comment, the 'workaround' changes many other things that are unconnected to the problem, that the user may wish to set up differently.
Comment 15 Cristian Cocos 2023-09-24 14:39:09 UTC
(In reply to Ralph Martin from comment #14)
> Reneabling the interrupts after boot, even with one minute delay, causes the
> problem to come back for me, on Debian testing, with any recent kernel, on
> an 2022 LG Gram 16.

Yeah, just keep the interrupt masked and you're golden. I have not noticed any downside so far, and I have been using LG Grams for a couple of years now.
Comment 16 Ralph Martin 2023-09-24 19:40:04 UTC
As noted in an earlier messge:
an observed drawback from [masking these interrupts]] is that it breaks the functionality of the screen brightness buttons
Comment 17 Cristian Cocos 2023-09-25 00:33:49 UTC
(In reply to Ralph Martin from comment #16)
> As noted in an earlier messge:
> an observed drawback from [masking these interrupts]] is that it breaks the
> functionality of the screen brightness buttons

Could be. I use my LG Grams mostly through docking stations, where this does not apply.
Comment 18 Víctor 2023-09-25 08:12:53 UTC
(In reply to Ralph Martin from comment #16)
> As noted in an earlier messge:
> an observed drawback from [masking these interrupts]] is that it breaks the
> functionality of the screen brightness buttons

Yeah, exactly. For the moment I have it like that because I prefere to scrifice the brightness keys (and change it manually from a widget bar or something) than  hearing all the time the fan at 100%. 

On the other hand, I was thinking if whether is possible or not to migrate (in some sense) the lecture of the brightness keys to some other acpi events with the other keyboard keys, which are 'instantaneously' read. For example the volume keys that should have a 'similar' behaviour but instead they work perfectly. 
Maybe this is not possible, I'm not expert in that, just suggesting something to you if you know something...
Comment 19 PinkFromTheFuture 2023-12-28 05:40:06 UTC
The brightness keys are the only easy to spot issue.
Lots of other issues happen because of the workaround.
My computer definitely behaves weirdly and seems crippled.

To the point that I am even considering just using Windows on it :'(

Anyone with more luck on this?
Comment 20 Ralph Martin 2024-01-30 07:25:35 UTC
Even when masking with acpi_mask_gpe=0x6E, I am still getting around 5% cpu usage from each of several kworker tasks.

   122 root      20   0       0      0      0 D   6.6   0.0   0:12.72 kworker/u32:12+USBC000:00-con2
                                      
    168 root      20   0       0      0      0 R   5.6   0.0   0:08.44 kworker/0:2+events
                                                
      9 root      20   0       0      0      0 D   4.3   0.0   0:07.28 kworker/0:1+kacpi_notify
                                        
   1744 root      20   0       0      0      0 I   2.3   0.0   0:00.74 kworker/8:3-kec_query
                                           
    143 root      20   0       0      0      0 I   1.7   0.0   0:00.21 kworker/8:1-events
                                            
   1506 root      20   0 1200444 128252  89052 S   1.7   0.4   0:06.29 Xorg
                                                                
    298 root      20   0       0      0      0 I   1.0   0.0   0:01.01 kworker/8:2-events
                                 
   136 root      20   0       0      0      0 I   0.7   0.0   0:02.11 kworker/1:1-kec_query
                                           
    296 root     -51   0       0      0      0 S   0.7   0.0   0:01.21 irq/79-ELAN0E03:00
                                                
    708 root      20   0       0      0      0 I   0.7   0.0   0:00.63 kworker/10:2-mm_percpu_wq                                       

(Other unrelated tasks removed)
Comment 21 Ralph Martin 2024-01-30 07:28:01 UTC
The above showing ACPI masking is not working around the problem is on Debian testing, with kernel 6.5.13-1
Comment 22 Crypt0Keeper 2024-02-04 18:13:09 UTC
Running PopOs 22.04 LTS on 6.6.10 kernel

LG Gram 17Z90Q-K

Only have to blacklist int3403_thermal, reboot and done deal. System is snappy, brightness works as expected, no ACPI error flood.

Willing to pop around my system to help solve this problem for others if someone with more Linux experience (which is probably most of you) can direct me.
Comment 23 Ralph Martin 2024-02-04 18:26:23 UTC
Not for me. I have blacklisted int3403_thermal, and get various threads running as above, causing the fans to start spinning, with or without masking 6E. Seems worse when not masking, 

Still happens on kernel 6.6.13.
Comment 24 Diogo Ivo 2024-02-05 15:47:23 UTC
Can you try running "rmmod ucsi_acpi" and see if that solves the issue?
Comment 25 Ralph Martin 2024-02-05 17:15:06 UTC
Hi Diogo
thanks for that suggestion. 
rmmod ucsi_acpi does indeed seem to fix the problem.
Are there likely to be any unwanted side effects of doing this?
Ralph
Comment 26 Diogo Ivo 2024-02-05 17:57:58 UTC
Not really, UCSI is used to communicate the state of the USB-C port to the OS and also allows for the user to set the state manually, so nothing critical. What happens is that when something is plugged to the USB-C port the EC starts spamming notifications for some reason, and I am still trying to understand why.
If anyone has any suggestions I can try them out :)
Comment 27 Cristian Cocos 2024-02-05 18:27:42 UTC
(In reply to Diogo Ivo from comment #26)
> Not really, UCSI is used to communicate the state of the USB-C port to the
> OS and also allows for the user to set the state manually, so nothing
> critical. What happens is that when something is plugged to the USB-C port
> the EC starts spamming notifications for some reason, and I am still trying
> to understand why.
> If anyone has any suggestions I can try them out :)

Have you tried masking gpe0x6E + rmmod int3403_thermal? Does that do it for you?

Note that there are two distinct problems here, and I am not sure how related they are:
1. ACPI error flood;
2. High CPU usage.

Which one of these is rmmod ucsi_acpi a fix for?
Comment 28 Diogo Ivo 2024-02-05 18:38:54 UTC
This fix does not deal with the ACPI flood (in fact on my system I don't see that) and only addresses the high CPU usage.

What is happening, at least on my system and I suspect on the remaining ones as well, is that there is some bug regarding UCSI and the EC keeps reporting events to the CPU via a interrupt when there is something connected to a USB-C port, leading to high CPU usage.

Since the CPU gets notified through GPE0x6E masking it does also fix the issue, at the expense of all the other functionality also covered by the EC, so this is not the best approach IMHO. rmmod int3403_thermal does nothing on my system.

I will try to dig deeper into the ucsi driver and see if there is anything we can do there or if this is just a firmware bug in the EC and in that case I am unsure of what the best approach is.
Comment 29 Cristian Cocos 2024-02-05 19:01:37 UTC
(In reply to Diogo Ivo from comment #28)
> This fix does not deal with the ACPI flood (in fact on my system I don't see
> that) and only addresses the high CPU usage.

If I remember correctly, the ACPI flood only happens if you hook up your computer to a Thunderbolt docking station (and(?) run an external monitor on the docking station). Do you have any of those lying around?

As for issue #2, if it works, your fix would, indeed, be better than masking gpe0x6E. I'll give it a try, and report back.
Comment 30 Cristian Cocos 2024-02-05 21:42:22 UTC
I can confirm: prima facie, rmmod ucsi_acpi works fine as a substitute for masking gpe0x6E. If it is proven over time to have less of an impact on the system as a whole, I think we have a winner!

(NB: rmmod int3403_thermal is still needed to stem the ACPI flood.)
Comment 31 PinkFromTheFuture 2024-02-07 20:09:08 UTC
I am also going to test the solution with rmmod ucsi_acpi, without the masking of gpe6E.

Please keep in mind that our solution consists of first masking it with the grub options, but after the system has booted, we unmask it - it does not stay masked.

I believe the solution using rmmod ucsi_acpi could have cosnequences when it coems to battery management, or interfacing with usb devices, so I'm curious to learn more about it.

(In reply to Cristian Cocos from comment #29)
> If I remember correctly, the ACPI flood only happens if you hook up your
> computer to a Thunderbolt docking station (and(?) run an external monitor on
> the docking station). Do you have any of those lying around?

False, it happens when anything is connected on the USB-C port, including the charging cable.
Comment 32 Diogo Ivo 2024-02-07 22:44:09 UTC
Ok, so when you do that masking and unmasking can you check if something regarding an UCSI timeout appears in the dmesg?
Comment 33 Cristian Cocos 2024-02-07 22:52:25 UTC
(In reply to PinkFromTheFuture from comment #31)
> Please keep in mind that our solution consists of first masking it with the
> grub options, but after the system has booted, we unmask it - it does not
> stay masked.

Masking at boot-time and unmasking some time afterward does not always work (see, e.g., comment #14 above). To make sure the issue has been fixed, the mask needs to stay on all the time. This is my experience.
Comment 34 PinkFromTheFuture 2024-02-07 23:40:30 UTC
(In reply to Cristian Cocos from comment #33)
> (In reply to PinkFromTheFuture from comment #31)
> > Please keep in mind that our solution consists of first masking it with the
> > grub options, but after the system has booted, we unmask it - it does not
> > stay masked.
> 
> Masking at boot-time and unmasking some time afterward does not always work
> (see, e.g., comment #14 above). To make sure the issue has been fixed, the
> mask needs to stay on all the time. This is my experience.

Same experience with me, unless I add to my service:
```
ExecStartPre=/bin/sleep 120
```

Before unmasking it.

(In reply to Diogo Ivo from comment #32)
> Ok, so when you do that masking and unmasking can you check if something
> regarding an UCSI timeout appears in the dmesg?

Yes. How?

Just  do the following?
```
sudo dmesg
```

Or maybe I can grep the output?
Comment 35 Diogo Ivo 2024-02-08 15:42:42 UTC
Yes, sudo dmesg and then check if there is anything regarding ucsi by grepping; if you can it would be nice to send the full log to this thread.
Comment 36 ktecho 2024-03-07 18:06:22 UTC
(In reply to Diogo Ivo from comment #35)
> Yes, sudo dmesg and then check if there is anything regarding ucsi by
> grepping; if you can it would be nice to send the full log to this thread.

I have an LG gram 15ZD90R. This is what I get

$ sudo dmesg | grep ucsi
[    9.430616] ucsi_acpi USBC000:00: error -ETIMEDOUT: PPM init failed
Comment 37 ktecho 2024-03-07 18:09:03 UTC
I forgot to say that it's an Intel i7-1360p and I'm booting with this:

GRUB_CMDLINE_LINUX_DEFAULT="acpi_mask_gpe=0x6E"
Comment 38 Diogo Ivo 2024-03-08 10:51:35 UTC
This makes sense, when you mask GPE 0x6E during boot the initialization of the UCSI driver fails with this timeout and then it is safe for us to unmask the GPE again since the driver will not care for the EC UCSI notifications.

Thank you for checking!
Comment 39 Diogo Ivo 2024-03-08 10:58:05 UTC
I have been investigating this problem and I have found out that the UCSI implementation in these laptops does not conform to the UCSI specification in multiple locations.
I am trying to figure out the best way to approach this to mainline a fix but for the moment being my recommendation is to just 'rmmod ucsi_acpi'. Since the goal of this functionality is only to report the current state of the USB-C connectors to the OS there will be no change in how the system operates, as this control will still be done by the EC regardless.
Comment 40 ktecho 2024-03-08 11:04:22 UTC
(In reply to Diogo Ivo from comment #39)
> I have been investigating this problem and I have found out that the UCSI
> implementation in these laptops does not conform to the UCSI specification
> in multiple locations.
> I am trying to figure out the best way to approach this to mainline a fix
> but for the moment being my recommendation is to just 'rmmod ucsi_acpi'.
> Since the goal of this functionality is only to report the current state of
> the USB-C connectors to the OS there will be no change in how the system
> operates, as this control will still be done by the EC regardless.

I was just going to ask if this was an Intel problem, or an LG one, but I guess it's LG's fault, right?

And I guess that should be fixed in the BIOS, and they won't do it, because afaik, LG doesn't release BIOS fixes...
Comment 41 Diogo Ivo 2024-03-08 11:08:13 UTC
Yeah, the fix should come from LG but as you said they are not providing BIOS updates. What happens in these situations is that we accept that the thing is buggy and add quirks for these devices to the drivers (the UCSI driver already has quirks for Dell and Asus laptops).

Note You need to log in before you can comment on or make changes to this bug.