Bug 206459 - thinkpad thunderbolt 3 dock gen2 pci memory allocation errors on Yoga C940 unless plugged in before boot
Summary: thinkpad thunderbolt 3 dock gen2 pci memory allocation errors on Yoga C940 un...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-07 19:52 UTC by Benoit Grégoire
Modified: 2020-06-17 16:49 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.5.2
Tree: Mainline
Regression: No


Attachments
acpidump (1.49 MB, text/plain)
2020-02-07 19:52 UTC, Benoit Grégoire
Details
mainline_5.5.2_notworking_dmesg_dock_plugged_after_boot (86.24 KB, text/plain)
2020-02-07 19:54 UTC, Benoit Grégoire
Details
mainline_5.5.2_notworking_lspci_vvvv_dock_plugged_after_boot (15.40 KB, text/plain)
2020-02-07 19:54 UTC, Benoit Grégoire
Details
mainline_5.5.2_notworking_lsusb_dock_plugged_after_boot (412 bytes, text/plain)
2020-02-07 19:55 UTC, Benoit Grégoire
Details
mainline_5.5.2_working_dmesg_dock_plugged_before_boot (86.16 KB, text/plain)
2020-02-07 19:55 UTC, Benoit Grégoire
Details
mainline_5.5.2_working_lspci_vvvv_dock_plugged_before_boot (15.63 KB, text/plain)
2020-02-07 19:55 UTC, Benoit Grégoire
Details
mainline_5.5.2_working_lsusb_dock_plugged_before_boot (1015 bytes, text/plain)
2020-02-07 19:56 UTC, Benoit Grégoire
Details
mainline_5.6rc2_working_dmesg_dock_plugged_before_boot (86.84 KB, text/plain)
2020-02-18 18:09 UTC, Benoit Grégoire
Details
mainline_5.6rc2_working_lspci_xxxx_dock_plugged_before_boot (146.54 KB, text/plain)
2020-02-18 18:10 UTC, Benoit Grégoire
Details
mainline_5.6rc2_working_lspci_vt_dock_plugged_before_boot (1.29 KB, text/plain)
2020-02-18 18:11 UTC, Benoit Grégoire
Details
mainline_5.6rc2_reference_lspci_vv_dock_not_plugged (29.54 KB, text/plain)
2020-02-18 18:12 UTC, Benoit Grégoire
Details
mainline_5.6rc2_reference_dmesg_dock_not_plugged (75.76 KB, text/plain)
2020-02-18 18:12 UTC, Benoit Grégoire
Details
mainline_5.6rc2_notworking_lspci_vv_dock_plugged_second_port_after_boot (46.07 KB, text/plain)
2020-02-18 18:13 UTC, Benoit Grégoire
Details
mainline_5.6rc2_notworking_dmesg_dock_plugged_second_port_after_boot (84.71 KB, text/plain)
2020-02-18 18:14 UTC, Benoit Grégoire
Details
mainline_5.6rc2_notworking_dmesg_dock_plugged_after_boot (84.81 KB, text/plain)
2020-02-18 18:15 UTC, Benoit Grégoire
Details
mainline_5.6rc2_notworking_dmesg_dock_replugged_after_boot (97.26 KB, text/plain)
2020-02-18 18:15 UTC, Benoit Grégoire
Details
mainline_5.6rc2_working_dmesg_pci_dyndbg_dock_plugged_before_boot (128.99 KB, text/plain)
2020-02-19 03:05 UTC, Benoit Grégoire
Details
mainline_5.6rc2_working_lspci_vnnt_dock_plugged_before_boot (1.46 KB, text/plain)
2020-02-19 03:06 UTC, Benoit Grégoire
Details
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot (150.48 KB, text/plain)
2020-02-19 04:19 UTC, Benoit Grégoire
Details
mainline_5.6rc2_cat_proc_iomem_before_attach (3.83 KB, text/plain)
2020-02-19 06:46 UTC, Benoit Grégoire
Details
mainline_5.6rc2_cat_proc_iomem_after_attach (3.93 KB, text/plain)
2020-02-19 06:47 UTC, Benoit Grégoire
Details
mainline_5.6rc2_working_dmesg_pci_dyndbg_dock_plugged_before_boot (129.18 KB, text/plain)
2020-02-19 18:28 UTC, Benoit Grégoire
Details
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot (179.39 KB, text/plain)
2020-02-19 18:28 UTC, Benoit Grégoire
Details
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot_hpmensize_0 (157.49 KB, text/plain)
2020-02-21 00:41 UTC, Benoit Grégoire
Details
Don't align upstream port resources (2.71 KB, patch)
2020-02-21 09:46 UTC, Mika Westerberg
Details | Diff
mainline_5.6rc2_notworking_dmesg_dock_plugged_after_boot_2020-02-21_patch (163.13 KB, text/plain)
2020-02-22 05:03 UTC, Benoit Grégoire
Details
Skip clipping e820 regions (413 bytes, patch)
2020-02-26 15:37 UTC, Mika Westerberg
Details | Diff
Do not exclude regions marked as MMIO in EFI memmap (5.66 KB, patch)
2020-02-27 14:23 UTC, Mika Westerberg
Details | Diff
mainline_5.6rc3_working_dmesg_dock_plugged_after_boot_patch_287619 (176.33 KB, text/plain)
2020-02-27 18:30 UTC, Benoit Grégoire
Details
mainline_5.6rc3_working_dmesg_dock_plugged_after_boot_patch_287661 (133.82 KB, text/plain)
2020-02-27 18:31 UTC, Benoit Grégoire
Details

Description Benoit Grégoire 2020-02-07 19:52:30 UTC
Created attachment 287231 [details]
acpidump

I have thinkpad thunderbolt 3 dock gen2 dock I am trying to use with a New Lenovo Yoga C940-14IIL laptop.  Laptop is very recent hardware, with a 10th gen intel cpu, and a bios with very few options :(

- The dock works fine when plugged-in before boot.
- The dock does NOT work when plugged after the system booted.
- The dock does NOT work when plugged-in at boot, subsequently unplugged and plugged back in.
- The dock work fine in windows, in all the above scenarios

When it fails, it fails with memory allocation messages such as:

[ 342.507320] pci 0000:2b:00.0: BAR 14: no space for [mem size 0x0c200000]
[ 342.507323] pci 0000:2b:00.0: BAR 14: failed to assign [mem size 0x0c200000]

Things I tried:
- Ubuntu kernel 5.3.0-26, same symptoms
- Kernel mainline 5.4.12, same symptoms
- Kernel mainline 5.5.2, same symptoms, but gets a little further allocating memory on the second pass.
- Plugging the dock after powering up the laptop, but at the grub screen before boot. In this case the dock works fine after boot.

Other potentially useful information to narrow it down:

- The tests were done with only an ethernet cable and power plugged into the dock to minimize the number of moving parts...

- Dock and laptop both have the very latest firmware as of 2020-02-07
cat /sys/bus/thunderbolt/devices/0-0/nvm_version
72.0
cat /sys/bus/thunderbolt/devices/0-3/nvm_version
50.0

- Unfortunately I cannot procure older firmware for the dock to know if the laptop or the dock is the source of the problem (As this dock was released over a year ago, and I cannot find any specific relevant problems with Linux)

- The screens connected to the displayports on the dock always work. But but all other ports (USB, ethernet, sound fail) when plugged-in after boot.

- Doesn't seem to be a thunderbolt authorization problem:
tbtadm devices 
0-3     Lenovo  ThinkPad Thunderbolt 3 Dock     authorized      not in ACL

Originally reported to ubuntu in: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860284
Comment 1 Benoit Grégoire 2020-02-07 19:54:07 UTC
Created attachment 287233 [details]
mainline_5.5.2_notworking_dmesg_dock_plugged_after_boot
Comment 2 Benoit Grégoire 2020-02-07 19:54:41 UTC
Created attachment 287235 [details]
mainline_5.5.2_notworking_lspci_vvvv_dock_plugged_after_boot
Comment 3 Benoit Grégoire 2020-02-07 19:55:04 UTC
Created attachment 287237 [details]
mainline_5.5.2_notworking_lsusb_dock_plugged_after_boot
Comment 4 Benoit Grégoire 2020-02-07 19:55:24 UTC
Created attachment 287239 [details]
mainline_5.5.2_working_dmesg_dock_plugged_before_boot
Comment 5 Benoit Grégoire 2020-02-07 19:55:46 UTC
Created attachment 287241 [details]
mainline_5.5.2_working_lspci_vvvv_dock_plugged_before_boot
Comment 6 Benoit Grégoire 2020-02-07 19:56:10 UTC
Created attachment 287243 [details]
mainline_5.5.2_working_lsusb_dock_plugged_before_boot
Comment 7 Benoit Grégoire 2020-02-17 21:21:44 UTC
Still no luck on 5.5.4, and with an updated BIOS (AUCN54WW)

Is there any other information I could provide?
Comment 8 Nicholas Johnson 2020-02-18 00:27:45 UTC
Hi Benoit,

Please try Linux v5.6-rc2: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.6-rc2/

I have seven patches directly relating to Thunderbolt PCI native enumeration in the v5.6 release, which may help.

In the future, please note that "sudo lspci -xxxx" dumps all information into a file, allowing us to run any lspci command from that file, as if it were on your system. "lspci -F file -vt" for example. I like to have -vt to get a feel for the topology, especially for Thunderbolt.

Thanks for reporting.

Regards,
Nicholas
Comment 9 Mika Westerberg 2020-02-18 07:43:59 UTC
In case the v5.6-rcX kernel does not help, can you boot the system without device connected and attach 'sudo lspci -vv' and also full dmesg? It looks like the root port (07.1) gets misconfigured by Linux for some reason upon hotplug.

Another question, if you plug the device to another port, does it work any better? Can you attach 'sudo lspci -vv' and dmesg output of that run as well?
Comment 10 Benoit Grégoire 2020-02-18 18:09:49 UTC
Created attachment 287469 [details]
mainline_5.6rc2_working_dmesg_dock_plugged_before_boot
Comment 11 Benoit Grégoire 2020-02-18 18:10:56 UTC
Created attachment 287471 [details]
mainline_5.6rc2_working_lspci_xxxx_dock_plugged_before_boot
Comment 12 Benoit Grégoire 2020-02-18 18:11:50 UTC
Created attachment 287473 [details]
mainline_5.6rc2_working_lspci_vt_dock_plugged_before_boot
Comment 13 Benoit Grégoire 2020-02-18 18:12:25 UTC
Created attachment 287475 [details]
mainline_5.6rc2_reference_lspci_vv_dock_not_plugged
Comment 14 Benoit Grégoire 2020-02-18 18:12:48 UTC
Created attachment 287477 [details]
mainline_5.6rc2_reference_dmesg_dock_not_plugged
Comment 15 Benoit Grégoire 2020-02-18 18:13:58 UTC
Created attachment 287479 [details]
mainline_5.6rc2_notworking_lspci_vv_dock_plugged_second_port_after_boot
Comment 16 Benoit Grégoire 2020-02-18 18:14:40 UTC
Created attachment 287481 [details]
mainline_5.6rc2_notworking_dmesg_dock_plugged_second_port_after_boot
Comment 17 Benoit Grégoire 2020-02-18 18:15:12 UTC
Created attachment 287483 [details]
mainline_5.6rc2_notworking_dmesg_dock_plugged_after_boot
Comment 18 Benoit Grégoire 2020-02-18 18:15:38 UTC
Created attachment 287485 [details]
mainline_5.6rc2_notworking_dmesg_dock_replugged_after_boot
Comment 19 Benoit Grégoire 2020-02-18 18:19:48 UTC
Hello Nicholas and  Mika,

Unfortunately, 5.6rc2 didn't help.

See the new attachments, I believe I included all the information requested. 

In addition, I included separate dmesg for when the dock is plugged after boot, and when it was plugged before boot and subsequently re-plugged.

Thanks for your help!
Comment 20 Nicholas Johnson 2020-02-19 02:38:12 UTC
Thanks for the additional information, Benoit.

If you have other Thunderbolt 3 devices, do they also cause issues with this computer?

Do you have another Thunderbolt 3 computer to boot Linux to try the dock?

Please give "lspci -vnnt" with dock attached before boot and working so I can be sure of topology.

Mika, do you think it could it be worth changing the ACPI OSI name to mimic Windows to see if ACPI is treating us differently?

I see there is a conflict with reserved memory (I have never seen this before) but it is with the SPI controller, not Thunderbolt.

The dmesg suggests booting with pci=realloc. It is worth that with Ice Lake, Linux refuses to reassign (my theory is that ACPI _DSM method evaluates to zero).

I would really like the struct resource to be changed in Linux so that the desired alignment is preserved after assignment, so that we can see it. I suspect the dock has funny alignment expectations which we cannot easily see.

For future tests, you may want to pass pci.dyndbg to kernel parameters to give more information.

This is a bunch of random thoughts and observations for now. I will continue to scour the logs for clues.
Comment 21 Nicholas Johnson 2020-02-19 03:01:33 UTC
Benoit, are you comfortable compiling and running your own kernel?
Comment 22 Benoit Grégoire 2020-02-19 03:05:34 UTC
Created attachment 287493 [details]
mainline_5.6rc2_working_dmesg_pci_dyndbg_dock_plugged_before_boot
Comment 23 Benoit Grégoire 2020-02-19 03:06:04 UTC
Created attachment 287495 [details]
mainline_5.6rc2_working_lspci_vnnt_dock_plugged_before_boot
Comment 24 Benoit Grégoire 2020-02-19 03:09:38 UTC
Nicholas, see the two new files with the info you requested (dmesg with pci.dyndbg, and lspci -vnnt)

Unfortunately, I do not have another thunderbolt3 peripheral or other machine with thunderbolt3 on hand.

Yes, I can compile my own kernel to test things if it helps.
Comment 25 Nicholas Johnson 2020-02-19 03:11:51 UTC
Thanks Benoit, I will have a look at them.

Here is another person who was having issues specifically with MMIO resource window when hot plugging. I think it could be related (same bug?):

https://lore.kernel.org/linux-pci/PSXP216MB0438BE9DA58D0AF9F908070680540@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM/T/
Comment 26 Nicholas Johnson 2020-02-19 03:13:58 UTC
Sorry, I you had already given an lspci -vt which was sufficient, I forgot to drop that request before posting.

Could we please have dyndbg after the dock has been hot-added after boot? Thanks
Comment 27 Benoit Grégoire 2020-02-19 04:19:32 UTC
Created attachment 287501 [details]
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot
Comment 28 Benoit Grégoire 2020-02-19 04:20:08 UTC
Sure, see new attachment
Comment 29 Nicholas Johnson 2020-02-19 05:36:02 UTC
Hi Benoit,

It does not contain the information I am expecting.

I need to see the pci_dbg() calls at lines 1855 and 1859 here:

https://elixir.bootlin.com/linux/v5.6-rc2/source/drivers/pci/setup-bus.c

Perhaps your log level is excluding them. Can you please see if you can adjust dmesg log level to see "extended by" and "shrunken by"?

Thanks!
Comment 30 Nicholas Johnson 2020-02-19 05:40:05 UTC
There could be a possibility that they all have new_size = size and are skipping the pci_dbg(), but I find that unlikely. But if this is the case then I apologise.
Comment 31 Nicholas Johnson 2020-02-19 05:43:29 UTC
Could I please also have "sudo cat /proc/iomem" before and after dock attached? Must be sudo or else it excludes address information. This gives a complete overview of resources. Thanks
Comment 32 Benoit Grégoire 2020-02-19 06:46:09 UTC
Created attachment 287503 [details]
mainline_5.6rc2_cat_proc_iomem_before_attach
Comment 33 Benoit Grégoire 2020-02-19 06:47:31 UTC
Created attachment 287505 [details]
mainline_5.6rc2_cat_proc_iomem_after_attach
Comment 34 Benoit Grégoire 2020-02-19 06:48:20 UTC
I don't know, I seem to get the messages generated at https://
elixir.bootlin.com/linux/v5.6-rc2/source/drivers/pci/pci.c#L1378 , line 1378.  
I really don't know what could be filtering the specific ones you want.

The two iomem files are attached above.
Comment 35 Mika Westerberg 2020-02-19 08:36:32 UTC
(In reply to Nicholas Johnson from comment #20)
> Mika, do you think it could it be worth changing the ACPI OSI name to mimic
> Windows to see if ACPI is treating us differently?

Linux should do that by default e.g Linux looks like Windows to the ACPI code.
Comment 36 Mika Westerberg 2020-02-19 08:52:30 UTC
Can you check if you have CONFIG_PCI_DEBUG=y enabled in your .config? If not please enable it and attach full dmesg of the failure. I think that option is also needed to see the additional debugging information regarding resource allocation and more.
Comment 37 Benoit Grégoire 2020-02-19 18:28:05 UTC
Created attachment 287511 [details]
mainline_5.6rc2_working_dmesg_pci_dyndbg_dock_plugged_before_boot
Comment 38 Benoit Grégoire 2020-02-19 18:28:38 UTC
Created attachment 287513 [details]
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot
Comment 39 Benoit Grégoire 2020-02-19 18:29:56 UTC
Ok, thanks Mika.  I compiled my own kernel and the attachments above now have the information Nicholas wanted.
Comment 40 Mika Westerberg 2020-02-20 09:03:51 UTC
I'm still going through your log but in the meantime one option you could try is to put "pci=hpmemsize=0" into the kernel command line and see if it makes any difference.
Comment 41 Benoit Grégoire 2020-02-21 00:41:36 UTC
Created attachment 287523 [details]
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot_hpmensize_0

Attempt with pci=hpmemsize=0
Comment 42 Mika Westerberg 2020-02-21 09:46:49 UTC
Created attachment 287533 [details]
Don't align upstream port resources

Can you try the attached patch? For some reason we fail already when the upstream port (2b:00.0) resources are assigned which is weird because it should simply get all the resources. This one also adds couple of debug prints more so please attach full dmesg.

You can also remove "pci=hpmemsize=0" from the command line since it did not help.
Comment 43 Benoit Grégoire 2020-02-22 05:03:18 UTC
Created attachment 287551 [details]
mainline_5.6rc2_notworking_dmesg_dock_plugged_after_boot_2020-02-21_patch

Unfortunately, the patch did not help
Comment 44 Mika Westerberg 2020-02-25 13:45:15 UTC
I have been trying to reproduce this on my reference ICL system without success but today I got my hands on a recent Lenovo Yoga and it has the same issue so now I can reproduce it :) I'll update this as soon as I have some idea what the root cause might be.
Comment 45 Benoit Grégoire 2020-02-25 19:23:27 UTC
That's great news for me!  Thanks a lot Mika, and good luck...
Comment 46 Mika Westerberg 2020-02-26 15:37:39 UTC
Created attachment 287619 [details]
Skip clipping e820 regions

It looks like the Yoga BIOS-e820 memory map includes some of the memory space reserved for root bridge and the devices below it:

4bc50000-cfffffff               BIOS-e820 reserved area
  65400000-bfffffff             Root bridge
    66000000-721fffff           PCIe root port 07.1

There is code in arch/x86/kernel/resource.c (arch_remove_reservations()) that clips the resource so that it avoids these regions. This is why we can't find memory space for the upstream port.

I wonder if you can try the attached hack patch that skips the clipping?

The changelog in 4dc2287c1805 ("x86: avoid E820 regions when allocating address space") says that Windows seems to ignore these reserved regions which might explain why this works in Windows.
Comment 47 Mika Westerberg 2020-02-27 14:23:53 UTC
Created attachment 287661 [details]
Do not exclude regions marked as MMIO in EFI memmap

This patch is slightly better. Can you try this one?

Bjorn, can you check if this makes sense? The original code is from you so you know this much better than I :) This fixes the issue on Yoga S740 I have here.
Comment 48 Nicholas Johnson 2020-02-27 15:04:55 UTC
Nice catch. Does this affect all Thunderbolt peripherals with MMIO BAR? It sounds like it does.

More abstract questions for thought (not necessarily expecting any answers):
- Why did they do this, why does Windows ignore the reserved region, and why only Lenovo?
- Could this suggest Linux needs to be added into the certification requirements someday?

The thing I love about Ice Lake is it will hopefully give the OEMs less chance to mess up the Thunderbolt implementation than with external chips. However, clearly mistakes can still be made.
Comment 49 Benoit Grégoire 2020-02-27 18:30:53 UTC
Created attachment 287663 [details]
mainline_5.6rc3_working_dmesg_dock_plugged_after_boot_patch_287619

Result of trying patch 287619, it works!
Comment 50 Benoit Grégoire 2020-02-27 18:31:55 UTC
Created attachment 287665 [details]
mainline_5.6rc3_working_dmesg_dock_plugged_after_boot_patch_287661

Result of trying patch 287661, it ALSO works!  Many thanks!
Comment 51 Mika Westerberg 2020-03-02 15:05:39 UTC
Great, thanks for testing. I submitted the patch upstream now:

https://lore.kernel.org/lkml/20200302141451.18983-1-mika.westerberg@linux.intel.com/
Comment 52 Mika Westerberg 2020-03-02 15:07:51 UTC
(In reply to Nicholas Johnson from comment #48)
> Nice catch. Does this affect all Thunderbolt peripherals with MMIO BAR? It
> sounds like it does.

Yes, I think it does.
Comment 53 Benoit Grégoire 2020-06-16 22:26:34 UTC
Any chance this will make it into 5.8 ?
Comment 54 Mika Westerberg 2020-06-17 16:49:29 UTC
I just resent the patch. Hopefully it lands mainline at some point :)

Note You need to log in before you can comment on or make changes to this bug.