Bug 218444
Summary: | Lenovo Legion 9i Audio, TrackPad, Battery not detected | ||
---|---|---|---|
Product: | Drivers | Reporter: | Arul (jane.arul88) |
Component: | PCI | Assignee: | Bjorn Helgaas (bjorn) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | aman06, andy.shevchenko, bjorn, jwrdegoede, linux, lmcarneiro91, mateusz.kaduk |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 6.6.13-200.fc39.x86_64 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | 07eab0901ede ("efi/x86: Remove EfiMemoryMappedIO from E820 map") |
Attachments: |
Lenovo Legion 9i Audio, TrackPad, Battery not detected
dmesg output on kernel 6.7.3 dmesg output on kernel 6.8.0 rc2 partial-dmesg-Legion9-16IRX8-83AG dmesg output on kernel 6.5.0-15-generic acpi dump Proposed workaround v1 (keep_efi_e820) dmesg-keep dmesg-nokeep Diff of dmesg-keep vs dmesg-nokeep patch to skip E820 check dmesg-patch-mcfg-e820 attachment-5325-0.html test patch to skip early E820 check for Ubuntu-6.1.0-15.15 After patch, dmesg from Ubuntu LTS 24.04 LTS |
Description
Arul
2024-02-01 10:41:38 UTC
Please give 6.7.3 or/and 6.8-rc2 a try. (In reply to Artem S. Tashkinov from comment #1) > Please give 6.7.3 or/and 6.8-rc2 a try. Hello Artem S. Tashkinov, Thanks for your comment. Unfortunately it did not work in 6.7.3-250. 6.7.3-250.vanilla.fc39.x86_64 Thanks, Arul I have the same issue with touchpad (which works on windows and driver is hid-i2c) on Linux (Debian) I tried 6.6.13, 6.7.3 and 6.8rc2 none of these work and there are lots of errors when botting related to ACPI, as well as some IRQ 11 error and kernel taint regarding I2C bus. This is Lenovo Legion 9 16IRX8 83AG model number (so likely the same), bottom line is that 6.8rc2 also fails with the same errors. Dear Artem S. Tashkinov, I have tried the same on 6.8.0-0.rc2.20240201gt6764c317.221.vanilla.fc39.x86_64 it did not work. Thanks, Arul Created attachment 305809 [details]
dmesg output on kernel 6.7.3
dmesg output on kernel 6.7.3
Created attachment 305810 [details]
dmesg output on kernel 6.8.0 rc2
dmesg output on kernel 6.8.0 rc2
Created attachment 305811 [details]
partial-dmesg-Legion9-16IRX8-83AG
Because I had to nuke Debian, as it was not usable, I am attaching partial dmesg from Live Debian, as it has a few more errors than Arul's dmesg. I allow myself to attach dmesg from 6.5 version because I know these errors were exactly the same ones up to 6.8rc2 when I had Debian installed. Created attachment 305813 [details]
dmesg output on kernel 6.5.0-15-generic
Hi All, I am also in the same boat with a Lenovo Legion 9i that does not have working audio, trackpad, battery, and in my case wifi card. The wifi card does seem to be recognized, it just seems like there is no Linux drivers for it. It's a Mediatek 79270.
I have never tried other Linux distros other than Ubuntu, so I am on Ubuntu 23.10 and Linux kernel 6.5.0-15-generic.
I am attaching my dmesg as well. If there's anything else I can provide let me know.
Some additional information I run the Lenovo Linux diagnostics tool, which apparently has kernel 6.1 I could use touchpad and everything seems to work fine there. I saw on boot it loaded i2c_designware driver, which is not loaded by my Debian Live for some reason potentially because of ACPI problems? The vendor:product id is 04F3:32BC for the touchpad. Created attachment 305842 [details]
acpi dump
Adding ACPI dump as file
Hey there! I'm having the same issues with this laptop on my end. I'm relatively new to linux but are there any more outputs I can capture on my machine to help with getting a resolution? Dear Developers, I've come across some errors and have a few questions: 1. Are ACPI errors critical for the laptop's operation, or can they be safely ignored or silenced? 2. Should touchpad, battery, and speaker issues be reported elsewhere, or is this the right place? 3. Are these problems related to each-other caused by some BIOS handling, or are they separate issues to be filed somewhere else? If so where? Additionally, I've encountered IRQ errors from i2c-i801 driver (see dmesg), which I resolved by modifying the i2c-i801 driver configuration not to use interrupts (https://www.kernel.org/doc/html/latest/i2c/busses/i2c-i801.html) which probably hurts performance but driver is loading now. echo >> /etc/modprobe.d/i2c-i801.conf options i2c-i801 disable_features=0x10 However, I'm still facing the error "azx_interrupt [snd_hda_codec] Disabling IRQ #10." Any suggestions on how to address this without resorting to disabling interrupts system-wide with irqpoll as that's gonna hurt other drivers? Thanks for your help. Quick follow-up, regarding snd_hda_codec interrupt error, I was able to resolve it by re-enabling MSI interrupt handling options snd-hda-intel enable_msi=1 As explained here it is disabled for some nvidia configurations https://www.kernel.org/doc/html/v4.13/sound/hd-audio/notes.html#interrupt-handling Sound built-in speaker still does not work (only external headphones) but interrupt issue is not showing up anymore. Battery status still not showing up acpi -V Battery 0: Discharging, 0%, rate information unavailable Touchpad and Speaker might be separate issues is my guess, but battery should be ACPI related. Follow up, apparently this TouchPad device was added to intel-lpss https://lore.kernel.org/all/20220211145055.992179-1-jarkko.nikula@linux.intel.com/ Indicated by line: + { PCI_VDEVICE(INTEL, 0x7a7d), (kernel_ulong_t)&bxt_i2c_info }, However dmesg | grep lpss shows intel-lpss: probe of 0000:00:19.0 failed with error -22 intel-lpss 0000:00:19.1: enabling device (0004 -> 0006) intel-lpss 0000:00:19.1: can't derive routing for PCI INT B intel-lpss 0000:00:19.1: PCI INT B: not connected intel-lpss: probe of 0000:00:19.1 failed with error -22 Bus 0000:00:19.1 is where the windows driver attaches and Lenovo's Linux diagnostic tool. Maybe it's possible to follow https://lwn.net/Articles/143397/ to unbind 0000:19.1 from the intel-lpss-pci device and bind it to the i2c_designware driver, but I don't know yet how to do that. Anyways, I am sure sure if this thread gets required attention from developers, or anyone able to help, so maybe switching back to Lenovo forum can get us some help? (In reply to Mateusz Kaduk from comment #15) > Follow up, apparently this TouchPad device It's not a TouchPad, it's an I²C host controller that most likely has that device (TP) being connected to. > Anyways, I am sure sure if this thread gets required attention from > developers, or anyone able to help, so maybe switching back to Lenovo forum > can get us some help? For that you need to add correct people to the list. This bug report seems like a mess. Please, don't add more to it related to routing of IRQ pins on LPSS devices, for that bug 212261 is for. I've been working with a Debian user with a Legion 9i to figure this out. We obtained the 6.1.60 kernel config from Lenovo and did a git bisect v6.2..v6.1 and found the cause is: commit 07eab0901ede8b7540c52160663bd300cc238164 Author: Bjorn Helgaas <bhelgaas@google.com> AuthorDate: Thu Dec 8 13:03:38 2022 -0600 Commit: Bjorn Helgaas <bhelgaas@google.com> CommitDate: Sat Dec 10 10:31:42 2022 -0600 efi/x86: Remove EfiMemoryMappedIO from E820 map Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped by the OS so they can be accessed by EFI runtime services, but should have no other significance to the OS (UEFI r2.10, sec 7.2). However, most bootloaders and EFI stubs convert EfiMemoryMappedIO regions to E820_TYPE_RESERVED entries, which prevent Linux from allocating space from them (see remove_e820_regions()). I'll email Bjorn and linux-pci@ with the info and see if we can get a fix. Thank you, TJ, for your invaluable assistance yesterday, especially with the bisecting and testing guidance! You also kindly shared a preliminary patch that effectively addresses the issue, confirming it as a regression. Post-patching, here's where we stand: - The touchpad is operational - Battery status is now accurately displayed - The speaker's driver recognizes the device, though there's a hiccup with firmware loading, hinting at potential success soon +Cc: Hans. IIRC you was involved in the saga of e820 EFI memory resources. Created attachment 306132 [details]
Proposed workaround v1 (keep_efi_e820)
This is a coarse solution that disables the regression change completed when "keep_efi_e820" is on the kernel command-line. Once upstream has chance to review this there may be better, more nuanced ways, to do this instead of hard-coded an arbitrary size limit.
For those wanting to fix the sound firmware issue (firmware file is in linux-firmware repo) see also: https://lore.kernel.org/linux-sound/ecf0f3a0-b83d-4aea-9fcb-8c411598f833@iam.tj/ Some additional notes from drilling down into PCI_Config from the first kernel log attached to this bug: BIOS-e820: [mem 0x00000000c0000000-0x00000000cfffffff] reserved ... efi: Remove mem97: MMIO range=[0xc0000000-0xcfffffff] (256MB) from e820 map ... PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcfffffff] (base 0xc0000000) PCI: not using MMCONFIG PCI: Using configuration type 1 for base access ... ACPI Error: AE_ERROR, Returned by Handler for [PCI_Config] (20230628/evregion-300) In this case MMCONFIG is in the largest mapping of 256MB. Looking at the attached ACPI table dump (thanks Arul) after disassembly the highest offset/range into PCI_Config seems to be: OperationRegion (PCS6, PCI_Config, 0x03E0, 0x08) So it looks like only 0x0400 (1024 bytes) is needed. I'm not familiar with how this should be handled but naively there seem to be three approaches: 1. delay removal of the 256MB range until after it has been used (but maybe it is required later for hot-plug events?) 2. manipulate the range reservation length (but we hit the same arbitrary size choice as currently unless the MMCONFIG length is encoded in the range - possibly by a null terminating struct?) 3. remove the range after copying all or part of it somewhere else and release or shrink the copy at some point later More digging. MMCONFIG term was replaced in commit 704891033b9714 with the PCIe specs term "ECAM" (Enhanced Configuration Access Method). Quoting from Wikipedia: "Each device has its own 4 KB space and each device's info is accessible through a simple array dev[bus][device][function] so that 256 MB of physical contiguous space is "stolen" for this use (256 buses × 32 devices × 8 functions × 4 KB = 256 MB). The base physical address of this array is not specified. " This explains the 256MB size. PCIe Base Specification v6.0 says, in section 7.2.2: The size and base address for the range of memory addresses mapped to the Configuration Space are determined by the design of the host bridge and the firmware. They are reported by the firmware to the operating system in an implementation specific manner. The size of the range is determined by the number of bits that the host bridge maps to the Bus Number field in the configuration address. In § Table 7-1, this number of bits is represented as n, where 1 ≤ n ≤ 8. A host bridge that maps n memory address bits to the Bus Number field supports Bus Numbers from 0 to 2 n -1, inclusive, and the base address of the range is aligned to a 2(n+20)-byte memory address boundary. Any bits in the Bus Number field that are not mapped from memory address bits must be Clear. Table 7-1 Enhanced Configuration Address Mapping Memory Address PCI Express Configuration Space A[(20+n-1):20] Bus Number 1 ≤ n ≤ 8 A[19:15] Device Number A[14:12] Function Number A[11:8] Extended Register Number A[7:2] Register Number A[1:0] Along with size of the access, used to generate Byte Enable So the key phrase here for determining the minimum range size seems to be "... They are reported by the firmware to the operating system in an implementation specific manner. ..." Focusing on ECAM address and size determination. This is stored in the ACPI MCFG table: ACPI: MCFG 0x000000003FACE000 00003C (v01 LENOVO CB-01 00000001 ACPI 00040000) According to osdev it has this layout: Offset Length Description 0 4 Table Signature ("MCFG") 4 4 Length of table (in bytes) 8 1 Revision (1) 9 1 Checksum (sum of all bytes in table & 0xFF = 0) 10 6 OEM ID (same meaning as other ACPI tables) 16 8 OEM table ID (manufacturer model ID) 24 4 OEM Revision (same meaning as other ACPI tables) 28 4 Creator ID (same meaning as other ACPI tables) 32 4 Creator Revision (same meaning as other ACPI tables) 36 8 Reserved 44 + (16 * n) 16 Configuration space base address allocation structures. Each structure uses the following format: Offset Length Description 0 8 Base address of enhanced configuration mechanism 8 2 PCI Segment Group Number 10 1 Start PCI bus number decoded by this host bridge 11 1 End PCI bus number decoded by this host bridge 12 4 Reserved https://wiki.osdev.org/PCI_Express#Enhanced_Configuration_Mechanism Based on this and knowing the size of the Legion 9i MCFG (0x3c = 60 bytes), and the fixed header size being 44 bytes, that leaves 16 bytes which equals a single entry. Therefore, we can get the number of devices reserved in ECAM from offset 44+11=55 (End PCI bus number) and that can be used with the PCIe "Table 7-1 Enhanced Configuration Address Mapping" info previously quoted to determine the addresses and therefore minimum size of the reserved region if we want to shrink it. Obviously a full implementation of this needs to take into account there may be multiple entries in the MCFG and each resulting ECAM would need to be evaluated to determine the size, but we might be able to do a rough proof of concept for the Legion 9i to begin with. Can somebody please attach two dmesg logs from a kernel with TJ's comment #21 patch? 1) Boot where the devices are not detected 2) Boot with "keep_efi_e820", where the devices are detected Please use the newest kernel that's convenient for you, and use the same kernel for both boots; just add the "keep_efi_e820" kernel parameter for the second boot. To ensure we capture all useful data, here is the Configuration space base address allocation structure from the Legion 9i that Mateusz kindly provided yesterday, captured using: sudo od -A none -t x1 -w60 -j44 /sys/firmware/acpi/tables/MCFG 00 00 00 c0 00 00 00 00 00 00 00 e0 00 00 00 00 |____base address_____| |_end bus number The base address matches: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcfffffff] (base 0xc0000000) End bus number 0xe0 (224) seems excessively large for a laptop (in comparison my AMD Zen-based workstation has 60). Created attachment 306145 [details]
dmesg-keep
This file contains dmesg with a boot option enabling keep_efi_e820 patch.
Created attachment 306146 [details]
dmesg-nokeep
This file contains dmesg without keep_efi_e820 patch boot option, equivalent to booting unpatched kernel.
(In reply to Bjorn Helgaas from comment #26) > Can somebody please attach two dmesg logs from a kernel with TJ's comment > #21 patch? > > 1) Boot where the devices are not detected > 2) Boot with "keep_efi_e820", where the devices are detected > > Please use the newest kernel that's convenient for you, and use the same > kernel for both boots; just add the "keep_efi_e820" kernel parameter for the > second boot. I attached two dmesg logs one using option and one that is not using it, with a few days old linux master branch (6.9rc3-git 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702). If you have additional questions, let me know. Created attachment 306147 [details]
Diff of dmesg-keep vs dmesg-nokeep
Generated using:
sed 's/^[^]]*\] \(.*\)/\1/' dmesg-keep.txt > dmesg-keep.clean.txt
sed 's/^[^]]*\] \(.*\)/\1/' dmesg-nokeep.txt > dmesg-nokeep.clean.txt
diff -u dmesg-keep.clean.txt dmesg-nokeep.clean.txt > dmesg-keep-nokeep.diff
Great. Lenovo BIOS blew it. The ECAM space described in the MCFG table is supposed to be reserved by a PNP0C02 device (PCI Firmware r3.3, sec 4.1.2, footnote 2), but this BIOS doesn't do that. I think what happened is that MCFG described the ECAM space, the "early" MCFG init ignored it because it wasn't reserved correctly, which looks like it caused a bunch of ACPI methods to fail, which caused a train wreck even though the "late" MCFG init did actually enable ECAM usage: PCI: ECAM [mem 0xc0000000-0xce0fffff] (base 0xc0000000) for domain 0000 [bus 00-e0] PCI: not using ECAM ([mem 0xc0000000-0xce0fffff] not reserved) ACPI Error: AE_ERROR, Returned by Handler for [PCI_Config] (20230628/evregion-300) ACPI: Ignoring error and continuing table load ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.RP01._SB.PC00], AE_NOT_FOUND (20230628/dswload2-162) ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230628/psobject-220) PCI: ECAM [mem 0xc0000000-0xce0fffff] (base 0xc0000000) for domain 0000 [bus 00-e0] PCI: [Firmware Info]: ECAM [mem 0xc0000000-0xce0fffff] not reserved in ACPI motherboard resources PCI: ECAM [mem 0xc0000000-0xce0fffff] is EfiMemoryMappedIO; assuming valid PCI: ECAM [mem 0xc0000000-0xce0fffff] reserved to work around lack of ACPI motherboard _CRS Created attachment 306175 [details] patch to skip E820 check Please test this patch and attach the dmesg log. In a nutshell, we currently check that ECAM space is reserved in E820. That check is invalid because the PCI Firmware spec doesn't *require* that space to be mentioned in E820. This patch skips the check for "recent" (2016 and newer) BIOSes. If you test this and are OK with your name and email address being included in the public git history, please include the relevant "Reported-by" and "Tested-by" tags in your comment. E.g., mine would look like: Reported-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Bjorn Helgaas <bhelgaas@google.com> Created attachment 306176 [details] dmesg-patch-mcfg-e820 Reported-by: Mateusz Kaduk <mateusz.kaduk@gmail.com> Tested-by: Mateusz Kaduk <mateusz.kaduk@gmail.com> (In reply to Bjorn Helgaas from comment #33)> > Please test this patch and attach the dmesg log. I forgot to mention, patch indeed fixes the problem. Touchpad, battery status are working. Built-in speaker works after driver reloading whenever I pause audio but that's probably separate issue. Thanks! Created attachment 306177 [details] attachment-5325-0.html Thanks for the hard work! Any word on if this also impacts the wifi card not being picked up? On Wed, Apr 17, 2024 at 1:50 PM <bugzilla-daemon@kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=218444 > > --- Comment #35 from Mateusz Kaduk (mateusz.kaduk@gmail.com) --- > (In reply to Bjorn Helgaas from comment #33)> > > Please test this patch and attach the dmesg log. > > I forgot to mention, patch indeed fixes the problem. Touchpad, battery > status > are working. Built-in speaker works after driver reloading whenever I pause > audio but that's probably separate issue. > > Thanks! > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are on the CC list for the bug. Prior to a kernel patch, the touchpad, battery sensor, and speaker were not detected by the kernel (as mentioned in title). WiFi was unaffected and remained operational. Applying the kernel patch resolved the detection issues for the touchpad, battery sensor, and speaker without impacting the WiFi connectivity, which continues to function properly. Hi all, Thanks for your work on this, I wish I could help but I am out of my depth when it comes to this stuff. I tried to apply the patch but since I am on a Ubuntu distro, I wasn't able to. Is there any way to create a similar patch for a Ubuntu distro? Or can I only test this by running Arch? Any guidance on what a noob can do would be appreciated. Thanks again. (In reply to lmcarneiro91 from comment #38) > Any guidance on what a noob can do would be appreciated. I would recommend waiting for the patch to be integrated into the main kernel. In the meantime, please avoid posting questions about how to apply the kernel patch in the bug report thread. Thank you! Created attachment 306197 [details] test patch to skip early E820 check for Ubuntu-6.1.0-15.15 lmcarneiro91, I'm very sorry that you tripped over this issue. Thank you very much for reporting it and collecting the dmesg log. This attachment is a backport of the comment #33 test patch to Ubuntu-6.1.0-15.15, which I think is the kernel you're running. I got that Ubuntu kernel source by adding this to my .git/config: [remote "mantic"] url = https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic fetch = +refs/heads/*:refs/remotes/mantic/* followed by "git fetch mantic; git checkout -b local-Ubuntu-6.1.0-15.15 Ubuntu-6.1.0-15.15", so this patch should apply cleanly there. I'm not familiar with the process to build a .deb from there, but here's a place to start if you want to try it: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel If you manage to build and test it and it actually works, please let us know. Also, if you're OK with being credited in the public git log include the details as in comment #34. Thanks again for all your help! Created attachment 306230 [details] After patch, dmesg from Ubuntu LTS 24.04 LTS Hello Every One, Hope you all doing well! I would like to convey my sincere thanks all and in specially (In no order) to - Mateusz Kaduk - Andy Shevchenko - TJ - Bjorn Helgaas - lmcarneiro91@gmail.com I have tested the same in Ubuntu 24.04 LTS - kernel version (6.8.0-30-generic). I can confirm that Trackpad, Battery & Audio are working. Kindly find the attached dmesg. One again I am extending my sincere thanks to all of you! - Arul Thank you all. I have a question regarding audio driver though. I think the outstanding issue is that tas2781 works randomly, but reloading the driver re-enables the sound, until I pause the playback then audio is gone again, there is nothing peculiar in the dmesg or in the journal. I think it's sound driver related and should be reported there to upstream. Do you know what would be appropriate channel to open a naw bug report regarding problems with tas2781 ? Thanks, Mateusz https://git.kernel.org/linus/199f968f1484 ("x86/pci: Skip early E820 check for ECAM region") appeared in v6.10-rc1 and should resolve the original issue of audio, trackpad, and battery not being detected, so I'm closing this issue. If you see this issue with a distro kernel, ask the distro to include 199f968f1484 in their kernel. If you see the same or similar issue again, with a kernel that includes this commit, please open a new report and attach a dmesg log. Mateusz, the tas2781 issue you're seeing sounds like it's probably a different problem. I don't know whether the sound folks pay attention to bugzilla.kernel.org, and since you don't see anything peculiar in dmesg or the journal, there aren't really any useful artifacts to attach to a bug report anyway. So if you can reproduce the problem on an upstream kernel, ideally v6.9 or v6.10-rc1, please send a report to the maintainers. From ./scripts/get_maintainer.pl sound/pci/hda/tas2781_hda_i2c.c: Shenghao Ding <shenghao-ding@ti.com> (maintainer:TEXAS INSTRUMENTS AUDIO (ASoC/HDA) DRIVERS) Kevin Lu <kevin-lu@ti.com> (maintainer:TEXAS INSTRUMENTS AUDIO (ASoC/HDA) DRIVERS) Baojun Xu <baojun.xu@ti.com> (maintainer:TEXAS INSTRUMENTS AUDIO (ASoC/HDA) DRIVERS) Jaroslav Kysela <perex@perex.cz> (maintainer:SOUND) Takashi Iwai <tiwai@suse.com> (maintainer:SOUND) alsa-devel@alsa-project.org (moderated list:TEXAS INSTRUMENTS AUDIO (ASoC/HDA) DRIVERS) linux-sound@vger.kernel.org (open list:SOUND) linux-kernel@vger.kernel.org (open list) This search: "https://lore.kernel.org/linux-sound/?q=b%3Atas2781" shows recent tas2781 activity, so hopefully you can get it resolved! |