Bug 216565
Summary: | pci=no_e820 required for Clevo NL4XLU laptop | ||
---|---|---|---|
Product: | Drivers | Reporter: | Florent DELAHAYE (kernelorg) |
Component: | PCI | Assignee: | Bjorn Helgaas (bjorn) |
Status: | REOPENED --- | ||
Severity: | low | CC: | bjorn, jwrdegoede |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 5.19 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lspci + dmesg + dmidecode
possible workaround dmesg results after patch 1 workaround v2 dmesg results after patch 2 workaround v3 dmesg results after patch 3 dmesg 6.1-rc4 unmodified dmesg results after patch 3 |
Description
Florent DELAHAYE
2022-10-09 14:30:08 UTC
Thank you for your bugreport. I have submitted a patch making the kernel automatically do: pci=no_e820 for your model laptop from now on. If possible please try building a kernel with this patch and confirm that things work without adding "pci=no_e820" to your kernel commandline. I expect this to be merged soon, so I'm closing this bug now so that we don't forget to close it later. If this issue somehow ends up not getting resolved, please re-open the bug. You can find the patch resolving this here: https://lore.kernel.org/linux-pci/20221010150206.142615-1-hdegoede@redhat.com/ Hi, I confirm the patch works, thank you. Florent Created attachment 303111 [details]
possible workaround
Florent, could I trouble you to test this patch instead of the one Hans posted? If you can, also boot with the "efi=debug" kernel parameter and attach the dmesg log.
Separately, I would also like to see the dmesg log showing the problem, i.e., the log without either fix and without the "pci=no_e820" parameter. I assume it includes a "no space available" or similar warning related to the touchpad device.
Created attachment 303117 [details]
dmesg results after patch 1
Hi Bjorn,
I have adapted your patch to kernel 5.19 (ubuntu 5.19.0-23-generic) since line numbers differed slightly but it didn't solve the issue anyway. Please let me know if a more recent kernel must be used for the test.
Please find dmesg logs in attachment for:
1. Official kernel without parameters
2. Patched kernel without parameters
3. Patched kernel with efi=debug paramer
Florent
Created attachment 303123 [details]
workaround v2
Thanks very much for testing this. I don't think it should matter whether you test v5.19 or v6.0. But I am very confused about why it didn't make a difference.
I tested on my own laptop and I see this difference (- is v6.0, + is v6.0+patch):
-kernel: BIOS-e820: [mem 0x00000000f0000000-0x00000000f7ffffff] reserved
-kernel: BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
-kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
-kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
-kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
kernel: efi: mem35: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000f0000000-0x00000000f7ffffff] (128MB)
kernel: efi: mem36: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x00000000fe000000-0x00000000fe010fff] (0MB)
kernel: efi: mem37: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x00000000fec00000-0x00000000fec00fff] (0MB)
kernel: efi: mem38: [MMIO |RUN| | | | | | |WP| | |WB|WT| |UC] range=[0x00000000fee00000-0x00000000fee00fff] (0MB)
kernel: efi: mem39: [MMIO |RUN| | | | | | |WP| | |WB|WT| |UC] range=[0x00000000ff000000-0x00000000ffffffff] (16MB)
So with the patch, the EFI MMIO regions are omitted from the E820 table.
I added a little more debugging to this patch, but it shouldn't make any functional difference. Can you double-check to be sure you actually booted the kernel with the patch?
On my system, setup_e820() in the EFI x86-stub.c is the important change. I really doubt your system would exercise the do_add_efi_memmap() path, but the debug would tell us for sure.
I suppose there must be a path to get the E820 table directly from the BIOS (instead of converting the EFI memory map to the E820 format), but your system *seems* to be using EFI.
Indeed I was confused too, I might have booted an unpatched kernel so I will test again and let you know. I think my patch at comment #4, which tries to omit EfiMemoryMappedIO regions from the E820 table, cannot be a workable solution. I'm not a boot expert, so some of the following is speculation, but here's what I think is happening: Linux doesn't use the EFI memory map directly; it uses the E820 format. The BIOS INT15 0x0e820 interface fetches the E820 map from the firmware, but Linux doesn't use that interface either. Linux relies on the bootloader to pass the E820 table in memory in a struct boot_params (the "zeropage") [1]. System firmware might supply only an E820 map (via INT15), only an EFI memory map (via EFI_BOOT_SERVICES.GetMemoryMap()), or both. These interfaces are used by either a bootloader (e.g., grub, syslinux, etc.) or an EFI boot stub to construct the E820 map passed to Linux. For example, the EFI boot stub uses EFI_BOOT_SERVICES.GetMemoryMap() and builds an E820 map here: efi_main exit_boot efi_exit_boot_services efi_get_memory_map [2] efi_bs_call(get_memory_map, ...) setup_e820 [3] In this particular case we could change the EFI->E820 conversion because while the EFI boot stub is not part of the Linux image proper, it is included in the Linux source tree and the compiled stub is included in the EFI executable. But in many cases the EFI->E820 conversion is done in bootloaders, e.g., grub [5], syslinux [6], and we can't change them. If bootloaders decided that EfiMemoryMappedIO regions should show up as "reserved" in the E820 maps they build, we're stuck with that. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/uapi/asm/bootparam.h?id=v6.0#n228 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/efi/libstub/mem.c?id=v6.0#n26 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/efi/libstub/x86-stub.c?id=v6.0#n556 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/efi-stub.rst?id=v6.0 [5] https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/mmap/efi/mmap.c?id=grub-2.06#n32 [6] https://repo.or.cz/syslinux.git/blob/syslinux-6.04-pre3:/efi/main.c#l1005 Created attachment 303146 [details]
dmesg results after patch 2
This time I directly compiled the sources from kernel.org instead of rebuilding the deb package to be 100% sure. I did it for current stable 6.0.7 which ended up in dumping the expected EFI debug table.
Please find dmesg output attached.
Thanks for going to that trouble. Comment #8 was a long-winded way to say "patch 2 cannot possibly work". And indeed, your dmesg shows that the EFI->E820 conversion is being done by a bootloader in your case, so a Linux kernel patch or a Linux EFI stub patch cannot fix it. Thank you for the information. I close the case since it will work with the workaround anyway. Created attachment 303228 [details]
workaround v3
If it's practical for you to test this with "efi=debug", I'd be interested in the results. My idea is to remove EfiMemoryMappedIO regions from the E820 map because apparently EfiMemoryMappedIO is not supposed to prevent the OS from using the region.
Created attachment 303269 [details]
dmesg results after patch 3
Sure, please find dmesg attached.
Thank you very much for testing this, Florent! Just to confirm, I assume your touchpad still does not work with that v6.1-rc4 kernel unless you either boot with "pci=no_e820" or add the patch from comoment #12. Is that right? Here's what I think you have seen; correct me if I'm wrong: 1) Your initial report was v5.19 and the touchpad doesn't work unless you boot with "pci=no_e820". 2) You tested the comment #2 patch that made the touchpad work even without "pci=no_e820". 3) You're now testing v6.1-rc kernels, and I assume the touchpad still does not work with v6.1-rc4 unless you boot with "pci=no_e820". 4) You tested v6.1-rc4 with the comment #12 patch, and I think the touchpad *does* work even without "pci=no_e820". Right? The reason v6.1-rc still requires "pci=no_e820" is that we eventually merged https://git.kernel.org/linus/d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks"), which checks for "Clevo X170KM-G Barebone" but not your machine ("Clevo NL4XLU Barebone"). This is the maintenance headache of quirks like this. I'm going to reopen this for now. I know "pci=no_e820" is a workaround, but I don't consider it a fix. The touchpad needs to work out of the box, with no special "pci=" parameters. We need to either add your machine to the quirk list or merge a patch similar to comment #12. (In reply to Bjorn Helgaas from comment #14) > Thank you very much for testing this, Florent! > Glad to help ! > Just to confirm, I assume your touchpad still does not work with that > v6.1-rc4 kernel unless you either boot with "pci=no_e820" or add the patch > from comoment #12. Is that right? No it's unfortunately worse, when I wrote comment #13 I tested 6.1-rc4 kernel without "pci=no_e820" and the touchpad was not detected like previous kernels. However I just tested 6.1-rc4 kernel again with "pci=no_e820" and it surprisingly doesn't work either (unlike previous kernels) which means the patch prevents the touchpad from working whatever "pci" is set to. > > Here's what I think you have seen; correct me if I'm wrong: > > 1) Your initial report was v5.19 and the touchpad doesn't work unless you > boot with "pci=no_e820". > > 2) You tested the comment #2 patch that made the touchpad work even > without "pci=no_e820". > > 3) You're now testing v6.1-rc kernels, and I assume the touchpad still > does not work with v6.1-rc4 unless you boot with "pci=no_e820". > Correct so far. > 4) You tested v6.1-rc4 with the comment #12 patch, and I think the > touchpad *does* work even without "pci=no_e820". Right? As described above, it doesn't work with or without "pci=no_e820". > > The reason v6.1-rc still requires "pci=no_e820" is that we eventually merged > https://git.kernel.org/linus/d341838d776a ("x86/PCI: Disable E820 reserved > region clipping via quirks"), which checks for "Clevo X170KM-G Barebone" but > not your machine ("Clevo NL4XLU Barebone"). This is the maintenance > headache of quirks like this. I understand, if you want I can test with specific kernel rc/versions to avoid such collisions. > > I'm going to reopen this for now. I know "pci=no_e820" is a workaround, but > I don't consider it a fix. The touchpad needs to work out of the box, with > no special "pci=" parameters. > > We need to either add your machine to the quirk list or merge a patch > similar to comment #12. To get back to comment #8, do you know whether there is a reason for Linux to not use INT15 0x0e820 directly instead of Zero Page? Perhaps INT15 0x0e820 is 16 bits only? I have read some discussions about it especially with the introduction of EFI_BOOT_SERVICES.GetMemoryMap() which makes things redundant - therefore more complicated - and I am a bit surprised there isn't a single authoritative source of memory map but that's above this case. Oh, dear. So the touchpad works in v6.1-rc4 (without my patch) booted with "pci=no_e820". But v6.1-rc4 + comment #12 doesn't work. From your comment #13 dmesg: Linux version 6.1.0-rc4+patch3 Command line: BOOT_IMAGE=/vmlinuz-6.1.0-rc4+patch3 root=UUID=053924a2-45b4-4dd4-9c1c-3e2549068e33 ro quiet splash efi=debug BIOS-e820: [mem 0x000000005bc50000-0x00000000cfffffff] reserved efi: removing MMIO range=[0x000000006d800000-0x00000000cfffffff] (1576MB) from E820 reservations PCI: MMCONFIG at [mem 0xc0000000-0xcfffffff] reserved in ACPI motherboard resources PCI: Using E820 reservations for host bridge windows pci_bus 0000:00: root bus resource [mem 0x6d800000-0xbfffffff window] pci 0000:00:15.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit] pci 0000:00:15.0: BAR 0: assigned [mem 0x6d800000-0x6d800fff 64bit] pci 0000:00:15.1: BAR 0: assigned [mem 0x6d801000-0x6d801fff 64bit] So far this is what I expect, and the 00:15.0 assignments are the same as in your original v5.19 dmesg with "pci=no_e820", so that's good. I think your touchpad is on 00:15.0, and I don't see any problem with that device in the log. However, v5.19 includes this, while v6.1-rc4 + comment #12 does not: input: FTCS1000:01 2808:0101 Mouse as /devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-1/i2c-FTCS1000:01/0018:2808:0101.0001/input/input11 input: FTCS1000:01 2808:0101 Touchpad as /devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-1/i2c-FTCS1000:01/0018:2808:0101.0001/input/input12 I don't know where that's from; I don't see any of that text in my kernel tree. Can you attach the dmesg log from the working scenario (unmodified v6.1-rc4 with "pci=no_e820")? Sorry, I don't have a good answer about INT15 0xe820. There's a long messy history of getting the memory map, and I don't know very much of it. Created attachment 303298 [details]
dmesg 6.1-rc4 unmodified
Working scenario dmesg (6.1-rc4 unpatched + pci=no_e820) in attachment. I have added another dmesg without pci=no_e820. Both have efi=debug.
Thanks, Florent! I'm poring over the comment #17 "pci=no_e820" dmesg (which works) and the comment #13 dmesg (which fails). The PCI configuration is identical between them, so I'm mystified. I do see several dmesg differences that make me suspect a different kernel config between them, e.g., - dmesg with "pci=no_e820" + dmesg with comment #12 patch -ee1004 2-0050: 512 byte EE1004-compliant SPD EEPROM, read-only -hid-generic 0018:2808:0101.0001: input,hidraw0: I2C HID v1.00 Mouse [FTCS1000:01 2808:0101] on i2c-FTCS1000:01 -hid-multitouch 0018:2808:0101.0001: input,hidraw0: I2C HID v1.00 Mouse [FTCS1000:01 2808:0101] on i2c-FTCS1000:01 -integrity: Platform Keyring initialized -intel_pmc_core INT33A1:00: initialized -landlock: Up and running. -LSM support for eBPF active -mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915]) -MPTCP token hash table entries: 8192 (order: 5, 196608 bytes, linear) -NET: Registered PF_QIPCRTR protocol family -NET: Registered PF_XDP protocol family -rtsx_pci 0000:01:00.0: enabling device (0000 -> 0002) -sgx: EPC section 0x600c0000-0x65d7ffff -systemd[1]: Inserted module 'autofs4' +systemd[1]: Failed to find module 'autofs4' Those hid-generic and hid-multitouch lines in particular make me wonder if the kernel is missing those drivers, and maybe that's why the touchpad didn't work? Indeed, for some reason I compiled using an old kernel config for one kernel and not the other. I will recompile both of them (6.1.0-rc4 and 6.1.0-rc4+patch3) and provide the dmesg soon. Sorry for the confusion. For posterity, the comment #12 patch will *not* help if the machine can be booted with the BIOS in "legacy" or "CSM" mode because in that case the BIOS itself generates the E820 map, and the EFI memory map used by comment #12 is not available to the kernel. Florent checked and did not find a way to boot the Clevo NL4XLU in CSM mode: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1948811/comments/8 If it does turn out to be possible (e.g., with an older BIOS version), we may need the comment #2 patch in addition to the comment #12 one. Created attachment 303372 [details]
dmesg results after patch 3
Good news, the touchpad is indeed detected without "pci=no_e820" when using patch3. Dmesg and config files attached.
Great, thank you, Florent! I know from personal experience that building a bootable, working kernel from scratch is a real hassle, so I appreciate all the time and effort you've put in. You are welcome, thank you for your help! |