Bug 219075
Description
Chris
2024-07-21 11:19:50 UTC
I should add that I tried to install the proposed drivers from https://bugzilla.kernel.org/show_bug.cgi?id=218901, but this did not work. If that's a regression, could you bisect? https://docs.kernel.org/admin-guide/bug-bisect.html I have to admit that this is the my first time reporting a kernel ticket and cannot say if this is a regression or not. You said, quote:
> the Laptop extra features stopped working
Now I'm not a native English speaker and my English is far from perfect but as far as I understand this sentence means that the features used to work in the past with previous kernel releases.
If that's the case, it's indeed a regression.
You might want to check if 6.10 is affected as well, as 6.9 will likely be EOL in about two weeks. If it is affected, a dmesg log from a failed kernel would be really helpful. And a bisection in the 6.9 series, *if* earlier 6.9 versions worked fine: https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #5) > You might want to check if 6.10 is affected as well, as 6.9 will likely be > EOL in about two weeks. If it is affected, a dmesg log from a failed kernel > would be really helpful. And a bisection in the 6.9 series, *if* earlier 6.9 > versions worked fine: > https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html Thanks for the info. I've been a bit busy the past couple of days, but will try to get to this today if I can. Sorry for not replying in a while. I hope I did this right, but I've narrowed it down to 6.6. --> 31cf7ebee80af912aff36445ba7bd057e2146231 (In reply to Chris from comment #7) > Sorry for not replying in a while. I hope I did this right, but I've > narrowed it down to 6.6. --> 31cf7ebee80af912aff36445ba7bd057e2146231 That's 31cf7ebee80af9 ("Linux 6.6.42") [v6.6.42] – so what you are saying is that 6.6.41 worked fine and 6.6.42 did not? Normally a bisection is meant to find the change that broke things, not a version tag. Created attachment 306624 [details] attachment-7019-0.html That is correct. 6.6.42 does not work, while 6.6.41 works fine. On Sat, Jul 27, 2024 at 7:24 AM <bugzilla-daemon@kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=219075 > > --- Comment #8 from The Linux kernel's regression tracker (Thorsten > Leemhuis) (regressions@leemhuis.info) --- > (In reply to Chris from comment #7) > > Sorry for not replying in a while. I hope I did this right, but I've > > narrowed it down to 6.6. --> 31cf7ebee80af912aff36445ba7bd057e2146231 > > That's 31cf7ebee80af9 ("Linux 6.6.42") [v6.6.42] – so what you are saying > is > that 6.6.41 worked fine and 6.6.42 did not? Normally a bisection is meant > to > find the change that broke things, not a version tag. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are on the CC list for the bug. > You reported the bug. (In reply to Chris from comment #9) > > That is correct. 6.6.42 does not work, while 6.6.41 works fine. Then could you try to bisect between those versions? I fear otherwise no one will look at this. And you should also check if 6.11-rc1 is affected. I've finally finished bisecting and here is the output of the final bisection: git bisect bad 5f5d0799eb0a01d550c21b7894e26b2d9db55763 5f5d0799eb0a01d550c21b7894e26b2d9db55763 is the first bad commit commit 5f5d0799eb0a01d550c21b7894e26b2d9db55763 Author: Jann Horn <jannh@google.com> Date: Tue Jul 2 18:26:52 2024 +0200 filelock: Remove locks reliably when fcntl/close race is detected commit 3cad1bc010416c6dd780643476bc59ed742436b9 upstream. When fcntl_setlk() races with close(), it removes the created lock with do_lock_file_wait(). However, LSMs can allow the first do_lock_file_wait() that created the lock while denying the second do_lock_file_wait() that tries to remove the lock. In theory (but AFAIK not in practice), posix_lock_file() could also fail to remove a lock due to GFP_KERNEL allocation failure (when splitting a range in the middle). After the bug has been triggered, use-after-free reads will occur in lock_get_status() when userspace reads /proc/locks. This can likely be used to read arbitrary kernel memory, but can't corrupt kernel memory. This only affects systems with SELinux / Smack / AppArmor / BPF-LSM in enforcing mode and only works from some security contexts. Fix it by calling locks_remove_posix() instead, which is designed to reliably get rid of POSIX locks associated with the given file and files_struct and is also used by filp_flush(). Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240702-fs-lock-recover-2-v1-1-edd456f63789@google.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> [stable fixup: ->c.flc_type was ->fl_type in older kernels] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> fs/locks.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10) > (In reply to Chris from comment #9) > > > > That is correct. 6.6.42 does not work, while 6.6.41 works fine. > > Then could you try to bisect between those versions? I fear otherwise no one > will look at this. > > And you should also check if 6.11-rc1 is affected. I tried 6.11-rc1 and the issue still remains. thx, forwarded by email: https://lore.kernel.org/all/412463d7-5259-4c99-bfda-1f5f9d2893cf@leemhuis.info/ I find the result of that bisection a bit dubious - the "first bad" commit only changes error handling code in an error handler which is only executed when an fcntl() file locking operation races with a close() (which userspace should not be doing in the first place), and the issue description doesn't sound related at all. On top of that, there are other changes in the same stable release that explicitly talk about ACPI, including ones that explicitly are about LG platforms: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.6.42 But also, the "first bad" commit is the very first one in the range you bisected, as if during the second bisection, every version that was tested was bad, which seems a bit weird? Can you please confirm that the issue occurs at commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5f5d0799eb0a01d550c21b7894e26b2d9db55763 but not at its parent commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2eaf5c0d81911ba05bace3a722cbcd708fdbbcba ? Hi, could you share the output of "acpidump"? Created attachment 306664 [details]
ACPI Dump
acpidump, as requested.
You said that the battery_care_limit sysfs attribute does not work. Does this also apply to the other sysfs attributes? I will provide you with a patched lg-laptop driver which might fix this issue. Can you compile kernel modules on your machine? Regarding the ACPI errors: It seems that there is a difference in how Windows and Linux handle ACPI if statements, causing the AML code to access a operation region which is not accessible. I can try to fix this issue as well, provided that you can test the resulting prototype driver. (In reply to Armin Wolf from comment #17) > You said that the battery_care_limit sysfs attribute does not work. Does > this also apply to the other sysfs attributes? > > I will provide you with a patched lg-laptop driver which might fix this > issue. Can you compile kernel modules on your machine? > > Regarding the ACPI errors: It seems that there is a difference in how > Windows and Linux handle ACPI if statements, causing the AML code to access > a operation region which is not accessible. > > I can try to fix this issue as well, provided that you can test the > resulting prototype driver. Hi Armin, None of the other sysfs are writable either. I can also test the prototype driver for you as well. And I can compile kernel modules on my machine. (In reply to Jann Horn (Google) from comment #14) > I find the result of that bisection a bit dubious - the "first bad" commit > only changes error handling code in an error handler which is only executed > when an fcntl() file locking operation races with a close() (which userspace > should not be doing in the first place), and the issue description doesn't > sound related at all. > > On top of that, there are other changes in the same stable release that > explicitly talk about ACPI, including ones that explicitly are about LG > platforms: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.6.42 > > But also, the "first bad" commit is the very first one in the range you > bisected, as if during the second bisection, every version that was tested > was bad, which seems a bit weird? > > Can you please confirm that the issue occurs at commit > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/ > ?id=5f5d0799eb0a01d550c21b7894e26b2d9db55763 > but not at its parent commit > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/ > ?id=2eaf5c0d81911ba05bace3a722cbcd708fdbbcba > ? It might take me a while to get to this as I am on vaction, but I will try to get this checked as soon as possible. Created attachment 306670 [details]
lg-laptop driver with improved error handling for the WMBB method
Can you test what happens if you read/write to the "usb_charge" and "charge_control_end_threshold"? Does reading/writing return an error code (if yes which error code?). sure thing. No error messages are returned when I try to write to usb_charge. [chris@gramC ~]$ cat /sys/devices/platform/lg-laptop/usb_charge 0 [chris@gramC ~]$ ls -l /sys/devices/platform/lg-laptop/usb_charge -rw-r--r-- 1 root root 4096 Aug 6 07:54 /sys/devices/platform/lg-laptop/usb_charge [chris@gramC ~]$ echo 1 | sudo tee /sys/devices/platform/lg-laptop/usb_charge [sudo] password for chris: 1 [chris@gramC ~]$ ls -l /sys/devices/platform/lg-laptop/usb_charge -rw-r--r-- 1 root root 4096 Aug 6 08:18 /sys/devices/platform/lg-laptop/usb_charge [chris@gramC ~]$ cat /sys/devices/platform/lg-laptop/usb_charge 0 I got an error when trying to install the module: [chris@gramC lg-laptop]$ make make -C /lib/modules/`uname -r`/build M=`pwd` modules make[1]: Entering directory '/home/chris/Downloads/lg-laptop' make[1]: *** /lib/modules/6.10.2-arch1-2/build: No such file or directory. Stop. make[1]: Leaving directory '/home/chris/Downloads/lg-laptop' make: *** [Makefile:5: all] Error 2 [chris@gramC lg-laptop]$ make make -C /lib/modules/`uname -r`/build M=`pwd` modules make[1]: Entering directory '/usr/lib/modules/6.10.2-arch1-2/build' CC [M] /home/chris/Downloads/lg-laptop/lg-laptop.o MODPOST /home/chris/Downloads/lg-laptop/Module.symvers CC [M] /home/chris/Downloads/lg-laptop/lg-laptop.mod.o LD [M] /home/chris/Downloads/lg-laptop/lg-laptop.ko BTF [M] /home/chris/Downloads/lg-laptop/lg-laptop.ko make[1]: Leaving directory '/usr/lib/modules/6.10.2-arch1-2/build' [chris@gramC lg-laptop]$ sudo modprobe -r lg-laptop [chris@gramC lg-laptop]$ sudo insmod lg-laptop.ko [sudo] password for chris: insmod: ERROR: could not insert module lg-laptop.ko: Unknown symbol in module Can you check with dmesg which symbols are not found? You can try to manually load the "battery", "sparse_keymap" and "wmi" modules before executing "sudo insmod lg-laptop.ko". here are the log entries [ 587.794881] lg_laptop: version magic '6.10.2-arch1-2 SMP preempt mod_unload ' should be '6.10.3-arch1-1 SMP preempt mod_unload ' [ 826.391885] lg_laptop: loading out-of-tree module taints kernel. [ 826.391895] lg_laptop: module verification failed: signature and/or required key missing - tainting kernel [ 826.391984] lg_laptop: Unknown symbol sparse_keymap_entry_from_scancode (err -2) [ 826.392019] lg_laptop: Unknown symbol sparse_keymap_report_entry (err -2) [ 826.392065] lg_laptop: Unknown symbol sparse_keymap_setup (err -2) but I am not sure how to load the individual modules. Can you help me with that? I tried to modprobe your fix as done in 218901 with 1. zstd lg-laptop.ko 2. sudo cp -f lg-laptop.ko.zst /usr/lib/modules/$(uname -r)/kernel/drivers/platform/x86/ 3. sudo depmod 4. sudo modprobe lg-laptop but this didn't seem to work. You can load an individual module using "sudo modprobe <module name>". The "lg_laptop: version magic '6.10.2-arch1-2 SMP preempt mod_unload ' should be '6.10.3-arch1-1 SMP preempt mod_unload '" warning above looks like you built the module for one kernel version and are actually running a kernel with another version... No luck. [ 365.389414] lg_laptop: loading out-of-tree module taints kernel. [ 365.389427] lg_laptop: module verification failed: signature and/or required key missing - tainting kernel [ 365.391386] lg_laptop: product: 16ZB90R-G.AA75G year: 2019 [ 365.392320] input: LG WMI hotkeys as /devices/virtual/input/input18 [ 365.392509] ACPI: battery: new extension: LG Battery Extension [chris@gramC ~]$ (In reply to Chris from comment #29) > No luck. > > [ 365.389414] lg_laptop: loading out-of-tree module taints kernel. > [ 365.389427] lg_laptop: module verification failed: signature and/or > required key missing - tainting kernel > [ 365.391386] lg_laptop: product: 16ZB90R-G.AA75G year: 2019 > [ 365.392320] input: LG WMI hotkeys as /devices/virtual/input/input18 > [ 365.392509] ACPI: battery: new extension: LG Battery Extension > [chris@gramC ~]$ I forgot to add that I loaded the modules manually before applying the patched driver. So the prototype module can now be loaded, but the result is still the same? With results i mean the behaviour of the sysfs attributes. Can you check how the prototype module behaves if you set the charging threshold to different values between 0 and 100? (In reply to Armin Wolf from comment #31) > So the prototype module can now be loaded, but the result is still the same? (In reply to Armin Wolf from comment #32) > With results i mean the behaviour of the sysfs attributes. > > Can you check how the prototype module behaves if you set the charging > threshold to different values between 0 and 100? Yes, the module loads correctly. The driver only accepts the values 0 or 80 -> https://docs.kernel.org/admin-guide/laptops/lg-laptop.html. But I here are the values: [chris@gramC lg-laptop]$ echo 100 | sudo tee /sys/devices/platform/lg-laptop/battery_care_limit [sudo] password for chris: 100 [chris@gramC lg-laptop]$ ls -l /sys/devices/platform/lg-laptop/battery_care_limit -rw-r--r-- 1 root root 4096 Aug 6 19:42 /sys/devices/platform/lg-laptop/battery_care_limit [chris@gramC lg-laptop]$ echo 10 | sudo tee /sys/devices/platform/lg-laptop/battery_care_limit 10 tee: /sys/devices/platform/lg-laptop/battery_care_limit: Invalid argument [chris@gramC lg-laptop]$ ls -l /sys/devices/platform/lg-laptop/battery_care_limit -rw-r--r-- 1 root root 4096 Aug 6 19:44 /sys/devices/platform/lg-laptop/battery_care_limit [chris@gramC lg-laptop]$ echo 80 | sudo tee /sys/devices/platform/lg-laptop/battery_care_limit 80 [chris@gramC lg-laptop]$ ls -l /sys/devices/platform/lg-laptop/battery_care_limit -rw-r--r-- 1 root root 4096 Aug 6 19:48 /sys/devices/platform/lg-laptop/battery_care_limit The time stamps corresponds to the time that I tried to modify the driver value. My fault, i should have told you that i removed this "80 or 100" limitation from the prototype driver. Can you do the following actions and report back the results: 1. Write <value> into "battery_care_limit". 2. Read "battery_care_limit". The values for <value> i am interested in are 80, 90, 100. [chris@gramC ~]$ echo 100 | sudo tee /sys/devices/platform/lg-laptop/battery_care_limit 100 [chris@gramC ~]$ cat /sys/devices/platform/lg-laptop/battery_care_limit 0 [chris@gramC ~]$ echo 90 | sudo tee /sys/devices/platform/lg-laptop/battery_care_limit 90 [chris@gramC ~]$ cat /sys/devices/platform/lg-laptop/battery_care_limit 0 [chris@gramC ~]$ echo 80 | sudo tee /sys/devices/platform/lg-laptop/battery_care_limit 80 [chris@gramC ~]$ cat /sys/devices/platform/lg-laptop/battery_care_limit 0 Strange, it seems that the EC always returns zero. Can you share the full output of dmesg? here you go [ 91.683363] ACPI: battery: extension unregistered: LG Battery Extension [ 101.929141] ACPI Error: No handler for Region [XIN1] (0000000017e2a6aa) [UserDefinedRegion] (20240322/evregion-126) [ 101.929161] ACPI Error: Region UserDefinedRegion (ID=143) has no handler (20240322/exfldio-261) [ 101.929172] ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN2._TMP due to previous error (AE_NOT_EXIST) (20240322/psparse-529) [ 101.932559] ACPI Error: No handler for Region [XIN1] (0000000017e2a6aa) [UserDefinedRegion] (20240322/evregion-126) [ 101.932574] ACPI Error: Region UserDefinedRegion (ID=143) has no handler (20240322/exfldio-261) [ 101.932583] ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN1._TMP due to previous error (AE_NOT_EXIST) (20240322/psparse-529) [ 151.415180] ACPI Error: No handler for Region [XIN1] (0000000017e2a6aa) [UserDefinedRegion] (20240322/evregion-126) [ 151.415196] ACPI Error: Region UserDefinedRegion (ID=143) has no handler (20240322/exfldio-261) [ 151.415212] ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN2._TMP due to previous error (AE_NOT_EXIST) (20240322/psparse-529) [ 151.418485] ACPI Error: No handler for Region [XIN1] (0000000017e2a6aa) [UserDefinedRegion] (20240322/evregion-126) [ 151.418496] ACPI Error: Region UserDefinedRegion (ID=143) has no handler (20240322/exfldio-261) [ 151.418504] ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN1._TMP due to previous error (AE_NOT_EXIST) (20240322/psparse-529) [ 216.342349] lg_laptop: loading out-of-tree module taints kernel. [ 216.342359] lg_laptop: module verification failed: signature and/or required key missing - tainting kernel [ 216.343744] lg_laptop: product: 16ZB90R-G.AA75G year: 2019 [ 216.344540] input: LG WMI hotkeys as /devices/virtual/input/input18 [ 216.344631] ACPI: battery: new extension: LG Battery Extension [ 224.799863] ACPI Error: No handler for Region [XIN1] (0000000017e2a6aa) [UserDefinedRegion] (20240322/evregion-126) [ 224.799879] ACPI Error: Region UserDefinedRegion (ID=143) has no handler (20240322/exfldio-261) [ 224.799888] ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN2._TMP due to previous error (AE_NOT_EXIST) (20240322/psparse-529) [ 224.799906] thermal thermal_zone3: Unable to get temperature, disabling! [ 224.803405] ACPI Error: No handler for Region [XIN1] (0000000017e2a6aa) [UserDefinedRegion] (20240322/evregion-126) [ 224.803428] ACPI Error: Region UserDefinedRegion (ID=143) has no handler (20240322/exfldio-261) [ 224.803442] ACPI Error: Aborting method \_SB.PC00.LPCB.LGEC.SEN1._TMP due to previous error (AE_NOT_EXIST) Thanks, but i want the _full_ dmesg output, all of it. Created attachment 306678 [details]
DMESG DUMP
here it is
The dmesg output shows that the Embedded Controller initialization fails: [ 0.250239] ACPI BIOS Error (bug): Could not resolve symbol [\_TZ.FN00._OFF], AE_NOT_FOUND (20240322/psargs-330) [ 0.250254] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC.EREG due to previous error (AE_NOT_FOUND) (20240322/psparse-529) [ 0.250260] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC._REG due to previous error (AE_NOT_FOUND) (20240322/psparse-529) It also seems that you system contains two embedded controllers at the same ioport addresses. I think this causes the issue, as both controllers will constantly mess with each other. Can you check if a BIOS update is available for your device? This is the bios for my machine and there doesn't seem to be any update available BIOS Information Vendor: Phoenix Technologies Ltd. Version: R4ZH0340 X64 Release Date: 01/03/2023 Address: 0xE0000 Runtime Size: 128 kB ROM Size: 16 MB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported EDD is supported Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported BIOS boot specification is supported Function key-initiated network boot is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 0.1 Firmware Revision: 34.3 That is not good, do you have a Windows installation on this device in which you can dual-boot into? If yes, then maybe you can use the LG Update software to look for BIOS updates. Also can you share the output of "ls /sys/bus/platform/devices/"? Unfortunately not at the moment. AFAIK, from looking at the LG website, there doesnt appear to be an update there either. ls /sys/bus/platform/devices/ ACPI0003:01 ACPI0007:17 ACPI0007:1f ACPI0007:27 ACPI0007:2f ACPI0007:37 ACPI0007:3f i2c_designware.1 INTC10A1:01 LGEX0821:00 PNP0C0C:01 rtc-efi.0 ACPI0007:10 ACPI0007:18 ACPI0007:20 ACPI0007:28 ACPI0007:30 ACPI0007:38 ACPI000C:00 i8042 INTC10A1:02 lg-laptop PNP0C0D:01 serial8250 ACPI0007:11 ACPI0007:19 ACPI0007:21 ACPI0007:29 ACPI0007:31 ACPI0007:39 acpi-cpufreq idma64.0 INTC10A3:00 microcode PNP0C0E:00 skl_hda_dsp_generic ACPI0007:12 ACPI0007:1a ACPI0007:22 ACPI0007:2a ACPI0007:32 ACPI0007:3a alarmtimer.0.auto idma64.1 INTC6001:00 pcspkr PNP0C14:00 snd-soc-dummy ACPI0007:13 ACPI0007:1b ACPI0007:23 ACPI0007:2b ACPI0007:33 ACPI0007:3b coretemp.0 INT33A1:00 intel_rapl_msr.0 PNP0103:00 PNP0C14:01 USBC000:00 ACPI0007:14 ACPI0007:1c ACPI0007:24 ACPI0007:2c ACPI0007:34 ACPI0007:3c dmic-codec INTC1048:00 iTCO_wdt PNP0C09:01 PNP0C14:02 ACPI0007:15 ACPI0007:1d ACPI0007:25 ACPI0007:2d ACPI0007:35 ACPI0007:3d efivars.0 INTC1055:00 LGEX0815:00 PNP0C0A:03 reg-dummy ACPI0007:16 ACPI0007:1e ACPI0007:26 ACPI0007:2e ACPI0007:36 ACPI0007:3e i2c_designware.0 INTC10A0:00 LGEX0820:00 PNP0C0B:00 regulatory.0 I could try to restore the windows installation that came with the machine to see if an update is available, but I really would like to avoid that if there really is no update available. I just came across this. looks like LG doesn't have a good track record for updating the bios https://www.reddit.com/r/LGgram/comments/17uliaa is_the_lack_of_bios_updates_a_security_concern/ It seems that your machine indeed has to active embedded controllers using the same ioport addresses. This is definitely a BIOS bug, there should be a single embedded controller, not two of them. And since only LG can fix their BIOS, we can just hope that they provide a BIOS update in the future. Since you said that those features worked in the past, you can try bisecting again to find the faulty commit (what was the last working kernel version?). For the remaining ACPI error messages: i think i found a solution, just give me some time to come up with another prototype. Created attachment 306687 [details]
lg-laptop driver with improved error handling for the WMBB method and support for the custom operation region
This prototype driver should fix most of the ACPI errors. However it needs to be loaded during boot. Can you try to replace the preinstalled lg-laptop module with this module and report back if everything works now (and share the full dmesg log of the boot process)?. (In reply to Armin Wolf from comment #48) > This prototype driver should fix most of the ACPI errors. However it needs > to be loaded during boot. Can you try to replace the preinstalled lg-laptop > module with this module and report back if everything works now (and share > the full dmesg log of the boot process)?. I tried to replace the preinstalled module with your fix, but the kernel stops loading at this point: ``` 1 (1 of 2) A start job is running for /boot (26s/1min 31s)* [* ] (2 of 2) A start job is running for Load/Save Screen Backlight Brightness of leds:kbd_backlight (30s / 1min 30s) ``` To overwrite the module, I did the following: 1. zstd lg-laptop.ko 2. sudo cp -f lg-laptop.ko.zst /usr/lib/modules/$(uname -r)/kernel/drivers/platform/x86/ 3. sudo depmod Strange, can you try to load this module after booting the kernel without the modified lglaptop module and send me the output of dmesg? Created attachment 306694 [details]
DMESG DUMP 2
Here you go. I also had to manually "battery", "sparse_keymap" and "wmi" before the module would load
Created attachment 306695 [details]
lg-laptop driver with improved error handling for the WMBB method and support for the custom operation region v2
Can you try this again? I fixed an error i made while adding opregion support to the driver. Created attachment 306696 [details]
DMESG DUMP 3
I was able to boot with your fix
Created attachment 306697 [details]
lg-laptop driver with improved error handling for the WMBB method and support for the custom operation region v3
Seems i misunderstood how the opregion handler receives data from the ACPI interpreter. Can you try again? This time much of the ACPI error messages should be gone. Created attachment 306698 [details]
DMESG DUMP 4
Perfect, most of the ACPI errors are gone. I will add a proper interface for enabling/disabling the firmware debug messages, then i will send another prototype module for you to test. If this works, then i can submit the necessary patch upstream. Regarding the EC issues: i will try to contact the maintainer of the ACPI EC driver, maybe he can help us in that regard. (In reply to Armin Wolf from comment #58) > Perfect, most of the ACPI errors are gone. I will add a proper interface for > enabling/disabling the firmware debug messages, then i will send another > prototype module for you to test. > > If this works, then i can submit the necessary patch upstream. > > Regarding the EC issues: i will try to contact the maintainer of the ACPI EC > driver, maybe he can help us in that regard. Sure thing. let me know when you are ready and I will be more than happy to test out your patch. That would also be great if you can contact the ACPI EC driver. Thanks! Created attachment 306701 [details]
lg with opregion support only
Can you test if this driver works? Can you also load the driver with the fw_debug module parameter set to 1 and check if there are any messages like "X temperature is X °C"? Created attachment 306702 [details]
DMESG DUMP 5
I am not able to load fw_debug, but I no longer see any messages related to temperature in dmesg.
Ok, i think we can assume that the fw_debug module parameter works. I will submit the necessary patch upstream then. Regarding the ACPI EC driver maintainer: he is already aware of similar problems on different machines. I will notify him of this bug report. Thank you Armin. Regarding the ACPI bug, will you open up another ticket for this that I can track? There seems to be an thread on the ACPI mailing list, so i will use that. You can find the thread at: https://lore.kernel.org/linux-acpi/CAJZ5v0gAdAYvx=qwmQd9_tUc-d=LJW5KDzLns2eDDn=ZtCQCMw@mail.gmail.com/T/#t Some experimental patches are available, but you have to compile a whole kernel to test them. Created attachment 306710 [details] DMESG DUMP - Patch (In reply to Armin Wolf from comment #66) > You can find the thread at: > > https://lore.kernel.org/linux-acpi/CAJZ5v0gAdAYvx=qwmQd9_tUc- > d=LJW5KDzLns2eDDn=ZtCQCMw@mail.gmail.com/T/#t > > Some experimental patches are available, but you have to compile a whole > kernel to test them. Hi Armin, I tried the patch out from raphael, and it does not seem to fix the issue: uname -a Linux gramC 6.11.0-rc2-gde9c2c66ad8e #1 SMP PREEMPT_DYNAMIC Sat Aug 10 08:53:16 CEST 2024 x86_64 GNU/Linux (In reply to Chris from comment #67) > Created attachment 306710 [details] > DMESG DUMP - Patch > > (In reply to Armin Wolf from comment #66) > > You can find the thread at: > > > > https://lore.kernel.org/linux-acpi/CAJZ5v0gAdAYvx=qwmQd9_tUc- > > d=LJW5KDzLns2eDDn=ZtCQCMw@mail.gmail.com/T/#t > > > > Some experimental patches are available, but you have to compile a whole > > kernel to test them. > > Hi Armin, > > I tried the patch out from raphael, and it does not seem to fix the issue: > > uname -a > Linux gramC 6.11.0-rc2-gde9c2c66ad8e #1 SMP PREEMPT_DYNAMIC Sat Aug 10 > 08:53:16 CEST 2024 x86_64 GNU/Linux Thank you for building a whole kernel to test. Are you sure you applied all 3 patches from: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=acpi-ec-fixes ? Maybe you can just build a kernel from the HEAD of that branch (commit id c3f92dfc4d863b88f3c3c80ca416dbae8449165d) ? The reason why I'm doubting you have added all 3 patches to your test kernel is because your dmesg still shows these errors: [ 0.239044] ACPI: EC: EC started [ 0.239045] ACPI: EC: interrupt blocked [ 0.239787] ACPI BIOS Error (bug): Could not resolve symbol [\_TZ.FN00._OFF], AE_NOT_FOUND (20240322/psargs-330) [ 0.239794] fbcon: Taking over console [ 0.239800] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC.EREG due to previous error (AE_NOT_FOUND) (20240322/psparse- [ 0.239805] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC._REG due to previous error (AE_NOT_FOUND) (20240322/psparse- [ 0.239820] ACPI: EC: EC_CMD/EC_SC=0x66, EC_DATA=0x62 [ 0.239822] ACPI: \_SB_.PC00.LPCB.LGEC: Boot DSDT EC used to handle transactions Based on debugging done in: https://bugzilla.redhat.com/show_bug.cgi?id=2302253 I now believe that the root cause of recent ACPI regressions on some laptop models like some LG Gram laptops is because the ACPI tables contain 2 ACPI devices describing and embedded-controller (EC). H_EC and LGEC. With only the LGEC one actually supposed to get used by the OS. So the mentioned branch contains a patch to make sure that _REG is only called on the active embedded controller device. But your dmesg shows the LGEC is the active EC (expected), yet it still shows errors from _REG getting called on the H_EC device. Which is unexpected after applying the patches. It would also good to verify that just like on the LG Gram from: https://bugzilla.redhat.com/show_bug.cgi?id=2302253 Yours to only has the LGEC ACPI device marked as being actually present on the system and not the H_EC one. Can you please from a terminal run: cat /sys/bus/acpi/devices/PNP0C09\:00/path cat /sys/bus/acpi/devices/PNP0C09\:00/status cat /sys/bus/acpi/devices/PNP0C09\:01/path cat /sys/bus/acpi/devices/PNP0C09\:01/status and let me know the output of all 4 commands, please also copy and paste the actual commands from the terminal into your next comment here in bugzilla so that I can easily match up the output to each command. Here is an example of the output of this on the LG Gram laptop from the Red Hat bug: Here it is, with kernel 6.9.6: $ cat /sys/bus/acpi/devices/PNP0C09\:00/path \_SB_.PC00.LPCB.H_EC $ cat /sys/bus/acpi/devices/PNP0C09\:00/status 0 $ cat /sys/bus/acpi/devices/PNP0C09\:01/path \_SB_.PC00.LPCB.LGEC $ cat /sys/bus/acpi/devices/PNP0C09\:01/status 15 If possible it would also be good if you can confirm that 6.9.6 is not affected by the problems you were seeing after upgrading to 6.9.10, since 6.9.7 is the first kernel making the extra unwanted _REG call. (In reply to Hans de Goede from comment #69) > It would also good to verify that just like on the LG Gram from: > https://bugzilla.redhat.com/show_bug.cgi?id=2302253 > > Yours to only has the LGEC ACPI device marked as being actually present on > the system and not the H_EC one. > > Can you please from a terminal run: > > cat /sys/bus/acpi/devices/PNP0C09\:00/path > cat /sys/bus/acpi/devices/PNP0C09\:00/status > cat /sys/bus/acpi/devices/PNP0C09\:01/path > cat /sys/bus/acpi/devices/PNP0C09\:01/status > > and let me know the output of all 4 commands, please also copy and paste the > actual commands from the terminal into your next comment here in bugzilla so > that I can easily match up the output to each command. > > Here is an example of the output of this on the LG Gram laptop from the Red > Hat bug: > > Here it is, with kernel 6.9.6: > > $ cat /sys/bus/acpi/devices/PNP0C09\:00/path > \_SB_.PC00.LPCB.H_EC > $ cat /sys/bus/acpi/devices/PNP0C09\:00/status > 0 > $ cat /sys/bus/acpi/devices/PNP0C09\:01/path > \_SB_.PC00.LPCB.LGEC > $ cat /sys/bus/acpi/devices/PNP0C09\:01/status > 15 > > If possible it would also be good if you can confirm that 6.9.6 is not > affected by the problems you were seeing after upgrading to 6.9.10, since > 6.9.7 is the first kernel making the extra unwanted _REG call. Here you go: $ cat /sys/bus/acpi/devices/PNP0C09\:00/path \_SB_.PC00.LPCB.H_EC $ cat /sys/bus/acpi/devices/PNP0C09\:00/status 0 $ cat /sys/bus/acpi/devices/PNP0C09\:00/status 0 $ cat /sys/bus/acpi/devices/PNP0C09\:01/status 15 I can also confirm that 6.9.6 is not affected by this bug. Do you also want me to check 6.9.7? (In reply to Chris from comment #70) > Here you go: > > $ cat /sys/bus/acpi/devices/PNP0C09\:00/path > \_SB_.PC00.LPCB.H_EC > $ cat /sys/bus/acpi/devices/PNP0C09\:00/status > 0 > $ cat /sys/bus/acpi/devices/PNP0C09\:00/status > 0 > $ cat /sys/bus/acpi/devices/PNP0C09\:01/status > 15 Note the above is missing the output of: cat /sys/bus/acpi/devices/PNP0C09\:01/path but I'm sure that will point to a LGEC ACPI device so this indeed appears to be the same issue as: https://bugzilla.redhat.com/show_bug.cgi?id=2302253 > I can also confirm that 6.9.6 is not affected by this bug. Do you also want > me to check 6.9.7? If 6.9.6 works and 6.9.10 does not then I'm pretty sure 6.9.7 will be the first broken kernel, so it is not really necessary to test 6.9.7 . Did you get a chance to double-check that your test kernel has all 3 patches from: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=acpi-ec-fixes those are the patches which should fix this. Created attachment 306717 [details]
DMESG with patches applied
Sorry about not including the output:
$ cat /sys/bus/acpi/devices/PNP0C09\:01/path
\_SB_.PC00.LPCB.LGEC
As for trying the patches, I compiled the kernel from the repository you sent me, but unfortunately I am still not able to change any of the driver values
> As for trying the patches, I compiled the kernel from the repository you sent > me, but unfortunately I am still not able to change any of the driver values Are you sure you booted the right kernel? The attached dmesg from comment 72 has the exact same version "6.11.0-rc3-g7c626ce4bae1" as the one from comment 67. The reporter of: https://bugzilla.redhat.com/show_bug.cgi?id=2302253 has tested a Fedora 6.10.4 kernel on their LG gram laptop with all 3 patches from: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=acpi-ec-fixes added to it and for them these errors disappeared from their log: [ 0.239787] ACPI BIOS Error (bug): Could not resolve symbol [\_TZ.FN00._OFF], AE_NOT_FOUND (20240322/psargs-330) [ 0.239794] fbcon: Taking over console [ 0.239800] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC.EREG due to previous error (AE_NOT_FOUND) (20240322/psparse- [ 0.239805] ACPI Error: Aborting method \_SB.PC00.LPCB.H_EC._REG due to previous error (AE_NOT_FOUND) (20240322/psparse- so at a minimum I would expect the patches to go these errors to go away for you too. Created attachment 306720 [details]
DMESG with patches applied
Sorry about that. I downloaded and compiled the commit directly. The issues seem to have disappeared.
> I downloaded and compiled the commit directly. The issues seem to have > disappeared. Great thank you for confirming that the patches from: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=acpi-ec-fixes resolve this issue. These patches are on their way to be merged in Torvald's master branch and after that they should get backported to 6.9.y and 6.10.y soonish. I don't know if this is related, but I have a similar issue where the lg-laptop driver no longer works after 6.6.30. Display brightness fn keys work on 6.6.30, but does not work on 6.6.37. Battery charge limit 80% works on 6.6.30, but does not work on 6.6.37. LG Gram 17 laptop 13th gen. However, I do not see the ACPI BIOS Errors. I reported this here first: https://bugs.gentoo.org/938338 I checked the lg-laptop driver source code in the two kernel versions and no change has happened there, so must be something else that changed in the kernel code. (In reply to Benny Lønstrup Ammitzbøll from comment #76) > I don't know if this is related, but I have a similar issue where the > lg-laptop driver no longer works after 6.6.30. 6.6.y also has the ACPI EC patches causing this issue, so yes you are likely seeing the same issue. The 6.6 stable queue has the fixes for this, so this should be fixed in 6.6.48 once it is released. (In reply to Hans de Goede from comment #77) > (In reply to Benny Lønstrup Ammitzbøll from comment #76) > > I don't know if this is related, but I have a similar issue where the > > lg-laptop driver no longer works after 6.6.30. > > 6.6.y also has the ACPI EC patches causing this issue, so yes you are likely > seeing the same issue. > > The 6.6 stable queue has the fixes for this, so this should be fixed in > 6.6.48 once it is released. I can confirm it is fixed in 6.6.52 which I have just upgraded to use. |