Bug 208741 - Incorrect calculation of latency in ASPM
Summary: Incorrect calculation of latency in ASPM
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-29 22:41 UTC by Ian Kumlien
Modified: 2020-09-23 21:48 UTC (History)
0 users

See Also:
Kernel Version: 5.8-rc*
Tree: Mainline
Regression: No


Attachments
lspci output - the ASPM of 03:00.0 has been manually disabled (114.45 KB, text/plain)
2020-07-29 22:41 UTC, Ian Kumlien
Details
lspci output as booted with my patch (118.32 KB, text/plain)
2020-07-29 22:42 UTC, Ian Kumlien
Details
lspci tree output for reference (1.28 KB, text/plain)
2020-07-29 22:42 UTC, Ian Kumlien
Details
full dumps along the path (53.33 KB, text/plain)
2020-09-22 23:42 UTC, Ian Kumlien
Details
dmesg with pci=earlydump (156.59 KB, text/plain)
2020-09-23 21:48 UTC, Ian Kumlien
Details

Description Ian Kumlien 2020-07-29 22:41:39 UTC
Created attachment 290675 [details]
lspci output - the ASPM of 03:00.0 has been manually disabled

Adding this for traceability I assume, start off with the problem description and patch, will add lspci traces after.
---
Currently we check the maximum latency of upstream and downstream
per link, not the maximum for the path

This would work if all links have the same latency, but:
endpoint -> c -> b -> a -> root  (in the order we walk the path)

If c or b has the higest latency, it will not register

Fix this by maintaining the maximum latency value for the path

This change fixes a regression introduced by:
66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)

Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
---
 drivers/pci/pcie/aspm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b17e5ffd31b1..bd53fba7f382 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,

 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-       u32 latency, l1_switch_latency = 0;
+       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
        struct aspm_latency *acceptable;
        struct pcie_link_state *link;

@@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
                 * substate latencies (and hence do not do any check).
                 */
                latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
+               l1_max_latency = max_t(u32, latency, l1_max_latency);
                if ((link->aspm_capable & ASPM_STATE_L1) &&
-                   (latency + l1_switch_latency > acceptable->l1))
+                   (l1_max_latency + l1_switch_latency > acceptable->l1))
                        link->aspm_capable &= ~ASPM_STATE_L1;
                l1_switch_latency += 1000;
Comment 1 Ian Kumlien 2020-07-29 22:42:15 UTC
Created attachment 290677 [details]
lspci output as booted with my patch
Comment 2 Ian Kumlien 2020-07-29 22:42:37 UTC
Created attachment 290679 [details]
lspci tree output for reference
Comment 3 Ian Kumlien 2020-07-29 22:45:33 UTC
The differences between both - this was from a try to just disable the endpoint but the data was validated against the old lspci output.

diff -u lscpi-root.output lscpi-endpoint.output
--- lscpi-root.output 2020-07-25 21:24:10.661458522 +0200
+++ lscpi-endpoint.output 2020-07-25 21:20:50.316049129 +0200
@@ -3,7 +3,6 @@
 00:00.2
 00:01.0
 00:01.2
- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
 00:02.0
 00:03.0
 00:03.1
@@ -27,7 +26,6 @@
 00:18.6
 00:18.7
 01:00.0
- LnkCtl: ASPM Disabled; Disabled- CommClk+
 02:03.0
 02:04.0
  LnkCtl: ASPM Disabled; Disabled- CommClk+
Comment 4 Ian Kumlien 2020-07-29 22:46:27 UTC
(difference between both == difference between unpatched and patched. So two bridges are disabled that wasn't disabled before)
Comment 5 Ian Kumlien 2020-09-22 23:42:33 UTC
Created attachment 292563 [details]
full dumps along the path

So that we can look at all latency values in raw data.
Comment 6 Ian Kumlien 2020-09-23 21:48:04 UTC
Created attachment 292601 [details]
dmesg with pci=earlydump

This is still with a patched kernel but booting with pci=earlydump
(5.8.11)

Note You need to log in before you can comment on or make changes to this bug.