Bug 216565 - pci=no_e820 required for Clevo NL4XLU laptop
Summary: pci=no_e820 required for Clevo NL4XLU laptop
Status: REOPENED
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P1 low
Assignee: Bjorn Helgaas
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-09 14:30 UTC by Florent DELAHAYE
Modified: 2022-12-06 19:21 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.19
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci + dmesg + dmidecode (26.91 KB, application/zip)
2022-10-09 14:30 UTC, Florent DELAHAYE
Details
possible workaround (2.52 KB, patch)
2022-10-31 18:34 UTC, Bjorn Helgaas
Details | Diff
dmesg results after patch 1 (59.18 KB, application/zip)
2022-11-01 17:42 UTC, Florent DELAHAYE
Details
workaround v2 (4.51 KB, patch)
2022-11-02 22:42 UTC, Bjorn Helgaas
Details | Diff
dmesg results after patch 2 (42.92 KB, application/zip)
2022-11-08 10:54 UTC, Florent DELAHAYE
Details
workaround v3 (1.35 KB, patch)
2022-11-19 00:14 UTC, Bjorn Helgaas
Details | Diff
dmesg results after patch 3 (89.83 KB, text/plain)
2022-11-22 20:07 UTC, Florent DELAHAYE
Details
dmesg 6.1-rc4 unmodified (44.71 KB, application/zip)
2022-11-26 19:27 UTC, Florent DELAHAYE
Details
dmesg results after patch 3 (181.57 KB, application/zip)
2022-12-06 19:01 UTC, Florent DELAHAYE
Details

Description Florent DELAHAYE 2022-10-09 14:30:08 UTC
Created attachment 302965 [details]
lspci + dmesg + dmidecode

Hi,

As per https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt, I am sending you this email to inform you that I need to set "pci=no_e820" parameter to get the touchpad detected on my Clevo laptop.

Please find lspci, dmesg and dmidecode in attachment.

Florent DELAHAYE
Comment 1 Hans de Goede 2022-10-10 15:06:10 UTC
Thank you for your bugreport.

I have submitted a patch making the kernel automatically do: pci=no_e820 for your model laptop from now on.

If possible please try building a kernel with this patch and confirm that things work without adding "pci=no_e820" to your kernel commandline.

I expect this to be merged soon, so I'm closing this bug now so that we don't forget to close it later.

If this issue somehow ends up not getting resolved, please re-open the bug.
Comment 2 Hans de Goede 2022-10-10 15:07:04 UTC
You can find the patch resolving this here:

https://lore.kernel.org/linux-pci/20221010150206.142615-1-hdegoede@redhat.com/
Comment 3 Florent DELAHAYE 2022-10-11 17:50:56 UTC
Hi,

I confirm the patch works, thank you.

Florent
Comment 4 Bjorn Helgaas 2022-10-31 18:34:05 UTC
Created attachment 303111 [details]
possible workaround

Florent, could I trouble you to test this patch instead of the one Hans posted?  If you can, also boot with the "efi=debug" kernel parameter and attach the dmesg log.

Separately, I would also like to see the dmesg log showing the problem, i.e., the log without either fix and without the "pci=no_e820" parameter.  I assume it includes a "no space available" or similar warning related to the touchpad device.
Comment 5 Florent DELAHAYE 2022-11-01 17:42:08 UTC
Created attachment 303117 [details]
dmesg results after patch 1

Hi Bjorn,

I have adapted your patch to kernel 5.19 (ubuntu 5.19.0-23-generic) since line numbers differed slightly but it didn't solve the issue anyway. Please let me know if a more recent kernel must be used for the test.

Please find dmesg logs in attachment for:
1. Official kernel without parameters
2. Patched kernel without parameters
3. Patched kernel with efi=debug paramer

Florent
Comment 6 Bjorn Helgaas 2022-11-02 22:42:35 UTC
Created attachment 303123 [details]
workaround v2

Thanks very much for testing this.  I don't think it should matter whether you test v5.19 or v6.0.  But I am very confused about why it didn't make a difference.

I tested on my own laptop and I see this difference (- is v6.0, + is v6.0+patch):

  -kernel: BIOS-e820: [mem 0x00000000f0000000-0x00000000f7ffffff] reserved
  -kernel: BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
  -kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
  -kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
  -kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
   kernel: efi: mem35: [MMIO        |RUN|  |  |  |  |  |  |  |  |   |  |  |  |  ] range=[0x00000000f0000000-0x00000000f7ffffff] (128MB)
   kernel: efi: mem36: [MMIO        |RUN|  |  |  |  |  |  |  |  |   |  |  |  |UC] range=[0x00000000fe000000-0x00000000fe010fff] (0MB)
   kernel: efi: mem37: [MMIO        |RUN|  |  |  |  |  |  |  |  |   |  |  |  |UC] range=[0x00000000fec00000-0x00000000fec00fff] (0MB)
   kernel: efi: mem38: [MMIO        |RUN|  |  |  |  |  |  |WP|  |   |WB|WT|  |UC] range=[0x00000000fee00000-0x00000000fee00fff] (0MB)
   kernel: efi: mem39: [MMIO        |RUN|  |  |  |  |  |  |WP|  |   |WB|WT|  |UC] range=[0x00000000ff000000-0x00000000ffffffff] (16MB)

So with the patch, the EFI MMIO regions are omitted from the E820 table.

I added a little more debugging to this patch, but it shouldn't make any functional difference.  Can you double-check to be sure you actually booted the kernel with the patch?

On my system, setup_e820() in the EFI x86-stub.c is the important change.  I really doubt your system would exercise the do_add_efi_memmap() path, but the debug would tell us for sure.

I suppose there must be a path to get the E820 table directly from the BIOS (instead of converting the EFI memory map to the E820 format), but your system *seems* to be using EFI.
Comment 7 Florent DELAHAYE 2022-11-03 19:59:25 UTC
Indeed I was confused too, I might have booted an unpatched kernel so I will test again and let you know.
Comment 8 Bjorn Helgaas 2022-11-05 01:46:59 UTC
I think my patch at comment #4, which tries to omit EfiMemoryMappedIO regions from the E820 table, cannot be a workable solution.  I'm not a boot expert, so some of the following is speculation, but here's what I think is happening:

Linux doesn't use the EFI memory map directly; it uses the E820 format.  The BIOS INT15 0x0e820 interface fetches the E820 map from the firmware, but Linux doesn't use that interface either.  Linux relies on the bootloader to pass the E820 table in memory in a struct boot_params (the "zeropage") [1].

System firmware might supply only an E820 map (via INT15), only an EFI memory map (via EFI_BOOT_SERVICES.GetMemoryMap()), or both.  These interfaces are used by either a bootloader (e.g., grub, syslinux, etc.) or an EFI boot stub to construct the E820 map passed to Linux.

For example, the EFI boot stub uses EFI_BOOT_SERVICES.GetMemoryMap() and builds an E820 map here:

  efi_main
    exit_boot
      efi_exit_boot_services
        efi_get_memory_map         [2]
          efi_bs_call(get_memory_map, ...)
      setup_e820                   [3]

In this particular case we could change the EFI->E820 conversion because while the EFI boot stub is not part of the Linux image proper, it is included in the Linux source tree and the compiled stub is included in the EFI executable.

But in many cases the EFI->E820 conversion is done in bootloaders, e.g., grub [5], syslinux [6], and we can't change them.  If bootloaders decided that EfiMemoryMappedIO regions should show up as "reserved" in the E820 maps they build, we're stuck with that.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/uapi/asm/bootparam.h?id=v6.0#n228
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/efi/libstub/mem.c?id=v6.0#n26
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/efi/libstub/x86-stub.c?id=v6.0#n556
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/efi-stub.rst?id=v6.0
[5] https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/mmap/efi/mmap.c?id=grub-2.06#n32
[6] https://repo.or.cz/syslinux.git/blob/syslinux-6.04-pre3:/efi/main.c#l1005
Comment 9 Florent DELAHAYE 2022-11-08 10:54:37 UTC
Created attachment 303146 [details]
dmesg results after patch 2

This time I directly compiled the sources from kernel.org instead of rebuilding the deb package to be 100% sure. I did it for current stable 6.0.7 which ended up in dumping the expected EFI debug table.

Please find dmesg output attached.
Comment 10 Bjorn Helgaas 2022-11-08 13:12:37 UTC
Thanks for going to that trouble.  Comment #8 was a long-winded way to say "patch 2 cannot possibly work".

And indeed, your dmesg shows that the EFI->E820 conversion is being done by a bootloader in your case, so a Linux kernel patch or a Linux EFI stub patch cannot fix it.
Comment 11 Florent DELAHAYE 2022-11-08 14:16:49 UTC
Thank you for the information. I close the case since it will work with the workaround anyway.
Comment 12 Bjorn Helgaas 2022-11-19 00:14:22 UTC
Created attachment 303228 [details]
workaround v3

If it's practical for you to test this with "efi=debug", I'd be interested in the results.  My idea is to remove EfiMemoryMappedIO regions from the E820 map because apparently EfiMemoryMappedIO is not supposed to prevent the OS from using the region.
Comment 13 Florent DELAHAYE 2022-11-22 20:07:31 UTC
Created attachment 303269 [details]
dmesg results after patch 3

Sure, please find dmesg attached.
Comment 14 Bjorn Helgaas 2022-11-22 20:42:44 UTC
Thank you very much for testing this, Florent!

Just to confirm, I assume your touchpad still does not work with that v6.1-rc4 kernel unless you either boot with "pci=no_e820" or add the patch from comoment #12.  Is that right?

Here's what I think you have seen; correct me if I'm wrong:

  1) Your initial report was v5.19 and the touchpad doesn't work unless you boot with "pci=no_e820".

  2) You tested the comment #2 patch that made the touchpad work even without "pci=no_e820".

  3) You're now testing v6.1-rc kernels, and I assume the touchpad still does not work with v6.1-rc4 unless you boot with "pci=no_e820".

  4) You tested v6.1-rc4 with the comment #12 patch, and I think the touchpad *does* work even without "pci=no_e820".  Right?

The reason v6.1-rc still requires "pci=no_e820" is that we eventually merged https://git.kernel.org/linus/d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks"), which checks for "Clevo X170KM-G Barebone" but not your machine ("Clevo NL4XLU Barebone").  This is the maintenance headache of quirks like this.

I'm going to reopen this for now.  I know "pci=no_e820" is a workaround, but I don't consider it a fix.  The touchpad needs to work out of the box, with no special "pci=" parameters.

We need to either add your machine to the quirk list or merge a patch similar to comment #12.
Comment 15 Florent DELAHAYE 2022-11-22 21:50:52 UTC
(In reply to Bjorn Helgaas from comment #14)
> Thank you very much for testing this, Florent!
> 
Glad to help !
> Just to confirm, I assume your touchpad still does not work with that
> v6.1-rc4 kernel unless you either boot with "pci=no_e820" or add the patch
> from comoment #12.  Is that right?
No it's unfortunately worse, when I wrote comment #13 I tested 6.1-rc4 kernel without "pci=no_e820" and the touchpad was not detected like previous kernels. However I just tested 6.1-rc4 kernel again with "pci=no_e820" and it surprisingly doesn't work either (unlike previous kernels) which means the patch prevents the touchpad from working whatever "pci" is set to.

> 
> Here's what I think you have seen; correct me if I'm wrong:
> 
>   1) Your initial report was v5.19 and the touchpad doesn't work unless you
> boot with "pci=no_e820".
> 
>   2) You tested the comment #2 patch that made the touchpad work even
> without "pci=no_e820".
> 
>   3) You're now testing v6.1-rc kernels, and I assume the touchpad still
> does not work with v6.1-rc4 unless you boot with "pci=no_e820".
> 
Correct so far.
>   4) You tested v6.1-rc4 with the comment #12 patch, and I think the
> touchpad *does* work even without "pci=no_e820".  Right?
As described above, it doesn't work with or without "pci=no_e820".

> 
> The reason v6.1-rc still requires "pci=no_e820" is that we eventually merged
> https://git.kernel.org/linus/d341838d776a ("x86/PCI: Disable E820 reserved
> region clipping via quirks"), which checks for "Clevo X170KM-G Barebone" but
> not your machine ("Clevo NL4XLU Barebone").  This is the maintenance
> headache of quirks like this.
I understand, if you want I can test with specific kernel rc/versions to avoid such collisions.

> 
> I'm going to reopen this for now.  I know "pci=no_e820" is a workaround, but
> I don't consider it a fix.  The touchpad needs to work out of the box, with
> no special "pci=" parameters.
> 
> We need to either add your machine to the quirk list or merge a patch
> similar to comment #12.

To get back to comment #8, do you know whether there is a reason for Linux to not use INT15 0x0e820 directly instead of Zero Page? Perhaps INT15 0x0e820 is 16 bits only?
I have read some discussions about it especially with the introduction of EFI_BOOT_SERVICES.GetMemoryMap() which makes things redundant - therefore more complicated - and I am a bit surprised there isn't a single authoritative source of memory map but that's above this case.
Comment 16 Bjorn Helgaas 2022-11-22 22:57:32 UTC
Oh, dear.  So the touchpad works in v6.1-rc4 (without my patch) booted with "pci=no_e820".

But v6.1-rc4 + comment #12 doesn't work.  From your comment #13 dmesg:

  Linux version 6.1.0-rc4+patch3
  Command line: BOOT_IMAGE=/vmlinuz-6.1.0-rc4+patch3 root=UUID=053924a2-45b4-4dd4-9c1c-3e2549068e33 ro quiet splash efi=debug
  BIOS-e820: [mem 0x000000005bc50000-0x00000000cfffffff] reserved
  efi: removing MMIO range=[0x000000006d800000-0x00000000cfffffff] (1576MB) from E820 reservations
  PCI: MMCONFIG at [mem 0xc0000000-0xcfffffff] reserved in ACPI motherboard resources
  PCI: Using E820 reservations for host bridge windows
  pci_bus 0000:00: root bus resource [mem 0x6d800000-0xbfffffff window]
  pci 0000:00:15.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit]
  pci 0000:00:15.0: BAR 0: assigned [mem 0x6d800000-0x6d800fff 64bit]
  pci 0000:00:15.1: BAR 0: assigned [mem 0x6d801000-0x6d801fff 64bit]

So far this is what I expect, and the 00:15.0 assignments are the same as in your original v5.19 dmesg with "pci=no_e820", so that's good.  I think your touchpad is on 00:15.0, and I don't see any problem with that device in the log.

However, v5.19 includes this, while v6.1-rc4 + comment #12 does not:

  input: FTCS1000:01 2808:0101 Mouse as /devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-1/i2c-FTCS1000:01/0018:2808:0101.0001/input/input11
  input: FTCS1000:01 2808:0101 Touchpad as /devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-1/i2c-FTCS1000:01/0018:2808:0101.0001/input/input12

I don't know where that's from; I don't see any of that text in my kernel tree.

Can you attach the dmesg log from the working scenario (unmodified v6.1-rc4 with "pci=no_e820")?

Sorry, I don't have a good answer about INT15 0xe820.  There's a long messy history of getting the memory map, and I don't know very much of it.
Comment 17 Florent DELAHAYE 2022-11-26 19:27:02 UTC
Created attachment 303298 [details]
dmesg 6.1-rc4 unmodified

Working scenario dmesg (6.1-rc4 unpatched + pci=no_e820) in attachment. I have added another dmesg without pci=no_e820. Both have efi=debug.
Comment 18 Bjorn Helgaas 2022-12-01 20:55:39 UTC
Thanks, Florent!  I'm poring over the comment #17 "pci=no_e820" dmesg (which works) and the comment #13 dmesg (which fails).

The PCI configuration is identical between them, so I'm mystified.  I do see several dmesg differences that make me suspect a different kernel config between them, e.g.,

  - dmesg with "pci=no_e820"
  + dmesg with comment #12 patch

  -ee1004 2-0050: 512 byte EE1004-compliant SPD EEPROM, read-only
  -hid-generic 0018:2808:0101.0001: input,hidraw0: I2C HID v1.00 Mouse [FTCS1000:01 2808:0101] on i2c-FTCS1000:01
  -hid-multitouch 0018:2808:0101.0001: input,hidraw0: I2C HID v1.00 Mouse [FTCS1000:01 2808:0101] on i2c-FTCS1000:01
  -integrity: Platform Keyring initialized
  -intel_pmc_core INT33A1:00:  initialized
  -landlock: Up and running.
  -LSM support for eBPF active
  -mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
  -MPTCP token hash table entries: 8192 (order: 5, 196608 bytes, linear)
  -NET: Registered PF_QIPCRTR protocol family
  -NET: Registered PF_XDP protocol family
  -rtsx_pci 0000:01:00.0: enabling device (0000 -> 0002)
  -sgx: EPC section 0x600c0000-0x65d7ffff
  -systemd[1]: Inserted module 'autofs4'
  +systemd[1]: Failed to find module 'autofs4'

Those hid-generic and hid-multitouch lines in particular make me wonder if the kernel is missing those drivers, and maybe that's why the touchpad didn't work?
Comment 19 Florent DELAHAYE 2022-12-05 20:16:35 UTC
Indeed, for some reason I compiled using an old kernel config for one kernel and not the other. I will recompile both of them (6.1.0-rc4 and 6.1.0-rc4+patch3) and provide the dmesg soon.

Sorry for the confusion.
Comment 20 Bjorn Helgaas 2022-12-06 17:25:36 UTC
For posterity, the comment #12 patch will *not* help if the machine can be booted with the BIOS in "legacy" or "CSM" mode because in that case the BIOS itself generates the E820 map, and the EFI memory map used by comment #12 is not available to the kernel.

Florent checked and did not find a way to boot the Clevo NL4XLU in CSM mode:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1948811/comments/8

If it does turn out to be possible (e.g., with an older BIOS version), we may need the comment #2 patch in addition to the comment #12 one.
Comment 21 Florent DELAHAYE 2022-12-06 19:01:39 UTC
Created attachment 303372 [details]
dmesg results after patch 3

Good news, the touchpad is indeed detected without "pci=no_e820" when using patch3. Dmesg and config files attached.
Comment 22 Bjorn Helgaas 2022-12-06 19:17:22 UTC
Great, thank you, Florent!  I know from personal experience that building a bootable, working kernel from scratch is a real hassle, so I appreciate all the time and effort you've put in.
Comment 23 Florent DELAHAYE 2022-12-06 19:21:51 UTC
You are welcome, thank you for your help!

Note You need to log in before you can comment on or make changes to this bug.