216511 – Spurious PCI_EXP_SLTSTA_DLLSC when hot plugging

Bug 216511 - Spurious PCI_EXP_SLTSTA_DLLSC when hot plugging

Summary: Spurious PCI_EXP_SLTSTA_DLLSC when hot plugging

Status:	NEW

Alias:	None

Product:	Drivers
Classification:	Unclassified
Component:	PCI (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	drivers_pci@kernel-bugs.osdl.org

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-09-21 11:30 UTC by Richard Weinberger
Modified:	2022-10-07 08:53 UTC (History)
CC List:	3 users (show)

See Also:
Kernel Version:	Any
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
full dmesg while hotplugging two nvmes and spurious link change (156.97 KB, text/plain) 2022-09-21 11:30 UTC, Richard Weinberger	Details
lspci and lspci -vvv output (130.59 KB, text/plain) 2022-09-21 11:32 UTC, Richard Weinberger	Details
[PATCH] PCI: pciehp: Disable DLLSC events on PEX 8747 and Intel P5608 (3.97 KB, patch) 2022-09-22 12:08 UTC, Lukas Wunner	Details \| Diff
full dmesg while hotplugging two nvmes and disappear after access (163.90 KB, text/plain) 2022-09-30 07:34 UTC, Aaron Marcher	Details
lspci -vvv with both nvme unplugged; after cold-boot (118.14 KB, text/plain) 2022-09-30 07:35 UTC, Aaron Marcher	Details
lspci -vvv with one nvme hotplugged (122.07 KB, text/plain) 2022-09-30 07:35 UTC, Aaron Marcher	Details
lspci -vvv with both nvme hotplugged (126.00 KB, text/plain) 2022-09-30 07:36 UTC, Aaron Marcher	Details
lspci -vvv with both nvme hotplugged; after failed access (125.97 KB, text/plain) 2022-09-30 07:36 UTC, Aaron Marcher	Details
Add an attachment (proposed patch, testcase, etc.)

Description Richard Weinberger 2022-09-21 11:30:47 UTC

Created attachment 301842 [details]
full dmesg while hotplugging two nvmes and spurious link change

A x86_64 machine has a PCI switch (PEX 8747) with four ports, on two of them NVMe disks are attachable.
Using a vendor specific tool I can power on/off each port.
When I power on both ports, hot plugging a NVMe into any port, it works perfectly fine,
but as soon I plug a second one, *both* ports receive a PCI_EXP_SLTSTA_DLLSC event.
As consequence the previously attached NVMe will be detached and only device remains, or the previously attached NVMe gets detached and immediately reattached but all IO fails later.

To me it seems very wrong that both ports see PCI_EXP_SLTSTA_DLLSC.

The problem can be observed with any kernel so far.
Could this be a firmware issue? What debug further methods do you suggest?

Thanks,
//richard

Comment 1 Richard Weinberger 2022-09-21 11:32:02 UTC

Created attachment 301843 [details]
lspci and lspci -vvv output

Comment 2 Lukas Wunner 2022-09-22 06:08:11 UTC

There's an errata sheet for the PEX 8747 linked from:
https://www.broadcom.com/products/pcie-switches-bridges/pcie-switches/pex8747

Unfortunately the download fails with "not found". :(

Comment 3 Lukas Wunner 2022-09-22 12:08:23 UTC

Created attachment 301845 [details]
[PATCH] PCI: pciehp: Disable DLLSC events on PEX 8747 and Intel P5608

@Richard: Here's an experimental patch to disable DLLSC events on PEX 8747, so hotplug only relies on PDC. Would this work for you?

Comment 4 david 2022-09-22 14:05:19 UTC

(In reply to Lukas Wunner from comment #3)
> Created attachment 301845 [details]
> [PATCH] PCI: pciehp: Disable DLLSC events on PEX 8747 and Intel P5608
> 
> @Richard: Here's an experimental patch to disable DLLSC events on PEX 8747,
> so hotplug only relies on PDC. Would this work for you?

@Lukas thanks! We'll give it a try and report back ASAP.

Comment 5 Aaron Marcher 2022-09-30 07:34:26 UTC

Created attachment 301898 [details]
full dmesg while hotplugging two nvmes and disappear after access

Comment 6 Aaron Marcher 2022-09-30 07:35:22 UTC

Created attachment 301899 [details]
lspci -vvv with both nvme unplugged; after cold-boot

Comment 7 Aaron Marcher 2022-09-30 07:35:59 UTC

Created attachment 301900 [details]
lspci -vvv with one nvme hotplugged

Comment 8 Aaron Marcher 2022-09-30 07:36:24 UTC

Created attachment 301901 [details]
lspci -vvv with both nvme hotplugged

Comment 9 Aaron Marcher 2022-09-30 07:36:54 UTC

Created attachment 301902 [details]
lspci -vvv with both nvme hotplugged; after failed access

Comment 10 Aaron Marcher 2022-09-30 07:39:29 UTC

attachment 301898 [details]
attachment 301899 [details]
attachment 301900 [details]
attachment 301901 [details]
attachment 301902 [details]

Sorry for the delay. We applied the experimental patch to disable DLLSC events
on PEX 8747 and did some further testing with the following results.

Now, the second NVMe does not disappear on hotplugging anymore. So it seems
like the patch does its job. However, when accessing the disk (e.g. by doing
a `fdisk -l`), an error message
"nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10"
appears, followed by an error -19 ("ENODEV", as the NVMe cannot be found) and
it disappears from the storage subsystem. In NVMe / PCI subsystems, it is still
visible. The first disk is visible in both.

We also checked power-state related sysfs files during our test-scenario:
The power state of all PCI ports on the PCI bridge is always "D0", except after
accessing the disk as described above - then it switches to "unknown" for the
second hotplugged port.

Additionally, we did some comparisons on `lspci -vvv` outputs using diffs. The
link-, device-, and port-states apparently change multiple times and the `nvme`
driver is unloaded for the second disk at the end after trying to access the
disk. See attached files.

Do you maybe have some hints for us how to further investigate and debug this
issue(s)?

Thanks,
Aaron

Comment 11 Lukas Wunner 2022-10-07 08:53:21 UTC

A "diff -u lspci-02_one_hotplugged.txt lspci-03_both_hotplugged.txt" reveals that the Memory Space and Bus Master bits in the PCI Control register of the NVMe that was plugged in first are cleared once the second NVMe is plugged in:

@@ -1773,13 +1773,12 @@
 10:00.0 Non-Volatile memory controller: Device 27d1:5216 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Device 27d1:5216
        Physical Slot: 0-1
-       Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
+       Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

Naturally, that will break access to the NVMe that was plugged in first. No wonder the nvme driver complains.

As to *why* those bits are cleared, I'm stumped. I'm not seeing anything in the dmesg output that would indicate where and why the bits are cleared.

What you could try is amend pci_write_config_word() with something like:

if (where == PCI_COMMAND && dev->bus->number == 0x10) {
        pci_info(dev, "%s: PCI_COMMAND = %#hx\n", __func__, val);
        dump_stack();
}

That should help identify the place where those bits are cleared.

*If* those bits are cleared by the kernel, that is. It could very well be that firmware fiddles with the bits behind the kernel's back. If the kernel is innocent, next step would be to attach an acpidump here so that we can check if firmware is the culprit.

Another theory is that the PLX switch does something to the device which causes it to clear those bits. In that case, the only option might be to ask PLX support (actually Broadcom support) for help.

But I'd suggest to amend pci_write_config_word() first as shown above.

Note You need to log in before you can comment on or make changes to this bug.