Bug 206837

Summary: Thunderbolt bridges take long time to resume from D3cold
Product: Drivers Reporter: Kai-Heng Feng (kai.heng.feng)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED CODE_FIX    
Severity: normal CC: fourdollars, karolherbst, mika.westerberg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: mainline Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with s2idle
lspci -vvnn
lspci -t
proposed patch to use current link speed
dmesg with the patch
dmesg with debug message
another lspci -vvv
dmesg with speed cap dumped
Use link activation to determine when to start waiting
lspci -vvxxxx
lspci -vvxxxx with linkcap2 patch applied

Description Kai-Heng Feng 2020-03-13 05:07:13 UTC

    
Comment 1 Kai-Heng Feng 2020-03-13 05:39:26 UTC
Created attachment 287907 [details]
dmesg with s2idle
Comment 2 Kai-Heng Feng 2020-03-13 05:39:53 UTC
Created attachment 287909 [details]
lspci -vvnn
Comment 3 Kai-Heng Feng 2020-03-13 05:40:08 UTC
Created attachment 287911 [details]
lspci -t
Comment 4 Kai-Heng Feng 2020-03-13 06:08:21 UTC
Created attachment 287913 [details]
proposed patch to use current link speed
Comment 5 Kai-Heng Feng 2020-03-13 06:08:58 UTC
Created attachment 287915 [details]
dmesg with the patch
Comment 6 Mika Westerberg 2020-03-13 11:33:28 UTC
Can you please add some description of the issue and steps to reproduce. Also please enable CONFIG_PCI_DEBUG=y in your .config, reproduce and attach dmesg so we can hopefully see the timings logged and maybe figure out where the time is spend.
Comment 7 Kai-Heng Feng 2020-03-13 13:52:02 UTC
Ok, I forgot to copy the description here, sorry.

The issue is that it takes a long time (> 1s) to runtime resume or system resume Thunderbolt bridges, since the msleep(1100) was call in pci_bridge_wait_for_secondary_bus() -> pcie_wait_for_link_delay().
Comment 8 Kai-Heng Feng 2020-03-13 13:56:42 UTC
Created attachment 287917 [details]
dmesg with debug message

Runtime resume via lspci:
[   49.291962] pcieport 0000:04:00.0: !pdev->link_active_reporting, msleep(1100)
[   49.292495] pcieport 0000:04:02.0: !pdev->link_active_reporting, msleep(1100)

System resume:
[  122.407100] pcieport 0000:04:00.0: !pdev->link_active_reporting, msleep(1100)
[  122.407102] pcieport 0000:04:02.0: !pdev->link_active_reporting, msleep(1100)
Comment 9 Kai-Heng Feng 2020-03-13 13:59:53 UTC
Created attachment 287919 [details]
another lspci -vvv

This is reproduce on another CML-H + Titan Ridge system, because the original system is unavailable now.
Comment 10 Kai-Heng Feng 2020-03-13 14:02:16 UTC
If we use current link speed to determine the speed, i.e.
  pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &linksta);
  speed = pcie_link_speed[linksta & PCI_EXP_LNKSTA_CLS];
We can identify the speed is less than PCIE_SPEED_5_0GT so we don't need to wait that long.
Comment 11 Mika Westerberg 2020-03-13 15:14:36 UTC
Hmm, if we look at the first Titan Ridge downstream port here:

04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) (prog-if 00 [Normal decode])
...
LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L1 <1us

dmesg says:

[   49.291959] pcieport 0000:04:00.0: waiting 100 ms for downstream link, after activation
[   49.291962] pcieport 0000:04:00.0: !pdev->link_active_reporting, msleep(1100)

How it can go to the path where it calls pcie_wait_for_link_delay() because the cap says it is <= 5 GT/s? I would expect here that it says:

[   49.291959] pcieport 0000:04:00.0: waiting 100 ms for downstream link

I'm suprised because I specifically tested this on CFL-H which is similar to CML-H with Titan Ridge and I did not see this happening.
Comment 12 Kai-Heng Feng 2020-03-13 19:11:47 UTC
This is because pciutils only uses PCI_EXP_LNKCAP but pcie_get_speed_cap() prioritizes PCI_EXP_LNKCAP2 over PCI_EXP_LNKCAP.

The PCI_EXP_LNKCAP2 read is 8GT for this case.

We also need to update pciutils to use PCI_EXP_LNKCAP2.
Comment 13 Kai-Heng Feng 2020-03-16 05:59:21 UTC
Filed a request against pciutils:
https://github.com/pciutils/pciutils/issues/38
Comment 14 Mika Westerberg 2020-03-16 09:19:28 UTC
Can you dump the raw PCI_EXP_LNKCAP2 of these ports? It is weird that it claims 8 GT/s because then it must implement active link reporting according to spec.

We cannot use the current link speed here because the link is not yet trained. One option is to use target link speed instead (that should reflect the LNKCAP max supported).
Comment 15 Kai-Heng Feng 2020-03-16 10:03:17 UTC
Created attachment 287947 [details]
dmesg with speed cap dumped

The diff:

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d828ca835a98..c7a795569f6d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4619,6 +4619,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
         * case, we wait for 1000 + 100 ms.
         */
        if (!pdev->link_active_reporting) {
+               pci_info(pdev, "!pdev->link_active_reporting, msleep(1100)\n");
                msleep(1100);
                return true;
        }
@@ -5783,7 +5784,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
         * should use the Supported Link Speeds field in Link Capabilities,
         * where only 2.5 GT/s and 5.0 GT/s speeds were defined.
         */
+       pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
        pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
+       pci_info(dev, "%s: cap: 0x%0x cap2: 0x%0x\n", __func__, lnkcap, lnkcap2);
        if (lnkcap2) { /* PCIe r3.0-compliant */
                if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_32_0GB)
                        return PCIE_SPEED_32_0GT;
Comment 16 Mika Westerberg 2020-03-16 11:59:42 UTC
It indeed announces 8GT/s as supported speed (0xe >> 1 == 0x7). That is weird because you would expect that it has the same value for both fields. Also for > 5GT/s it must support data link layer active reporting.

I'm wondering if this could be some sort of firmware issue? What NVM version the card reports?
Comment 17 Kai-Heng Feng 2020-03-16 12:11:15 UTC
The nvm_version is 55.0.
Comment 18 Kai-Heng Feng 2020-03-16 12:12:51 UTC
Sorry, that's a typo. The nvm_version is 50.0
Comment 19 Mika Westerberg 2020-03-18 16:51:13 UTC
Created attachment 287973 [details]
Use link activation to determine when to start waiting

I realized that speeds > 5 GT/s and support for active link reporting capability goes hand in hand (it is mandatory in that case for downstream ports). So I'm wondering whether we should just check for that instead of trying to figure this based on speed.

Can you try the attached patch?
Comment 20 Kai-Heng Feng 2020-03-19 09:43:45 UTC
Yes, it solves the issue.
Thanks a lot!

Please collect my Tested-by tag:
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Comment 21 Kai-Heng Feng 2020-05-08 08:16:11 UTC
Created attachment 288997 [details]
lspci -vvxxxx
Comment 22 Kai-Heng Feng 2020-05-08 08:16:49 UTC
Created attachment 288999 [details]
lspci -vvxxxx with linkcap2 patch applied
Comment 23 Kai-Heng Feng 2020-05-08 08:17:44 UTC
The above two lspci outputs are collect from another system, albeit exhibit the same slow TBT resume symptom.
Comment 24 Kai-Heng Feng 2020-06-02 09:36:26 UTC
commit ec411e02b7a2e785a4ed9ed283207cd14f48699d
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Thu May 14 16:30:43 2020 +0300

    PCI/PM: Assume ports without DLL Link Active train links in 100 ms
Comment 25 Karol Herbst 2020-07-17 11:33:05 UTC
(In reply to Kai-Heng Feng from comment #24)
> commit ec411e02b7a2e785a4ed9ed283207cd14f48699d
> Author: Mika Westerberg <mika.westerberg@linux.intel.com>
> Date:   Thu May 14 16:30:43 2020 +0300
> 
>     PCI/PM: Assume ports without DLL Link Active train links in 100 ms

This commit regression runpm with Nouveau on one of my systems. Bug filed here: https://bugzilla.kernel.org/show_bug.cgi?id=208597