Created attachment 301842 [details] full dmesg while hotplugging two nvmes and spurious link change A x86_64 machine has a PCI switch (PEX 8747) with four ports, on two of them NVMe disks are attachable. Using a vendor specific tool I can power on/off each port. When I power on both ports, hot plugging a NVMe into any port, it works perfectly fine, but as soon I plug a second one, *both* ports receive a PCI_EXP_SLTSTA_DLLSC event. As consequence the previously attached NVMe will be detached and only device remains, or the previously attached NVMe gets detached and immediately reattached but all IO fails later. To me it seems very wrong that both ports see PCI_EXP_SLTSTA_DLLSC. The problem can be observed with any kernel so far. Could this be a firmware issue? What debug further methods do you suggest? Thanks, //richard
Created attachment 301843 [details] lspci and lspci -vvv output
There's an errata sheet for the PEX 8747 linked from: https://www.broadcom.com/products/pcie-switches-bridges/pcie-switches/pex8747 Unfortunately the download fails with "not found". :(
Created attachment 301845 [details] [PATCH] PCI: pciehp: Disable DLLSC events on PEX 8747 and Intel P5608 @Richard: Here's an experimental patch to disable DLLSC events on PEX 8747, so hotplug only relies on PDC. Would this work for you?
(In reply to Lukas Wunner from comment #3) > Created attachment 301845 [details] > [PATCH] PCI: pciehp: Disable DLLSC events on PEX 8747 and Intel P5608 > > @Richard: Here's an experimental patch to disable DLLSC events on PEX 8747, > so hotplug only relies on PDC. Would this work for you? @Lukas thanks! We'll give it a try and report back ASAP.
Created attachment 301898 [details] full dmesg while hotplugging two nvmes and disappear after access
Created attachment 301899 [details] lspci -vvv with both nvme unplugged; after cold-boot
Created attachment 301900 [details] lspci -vvv with one nvme hotplugged
Created attachment 301901 [details] lspci -vvv with both nvme hotplugged
Created attachment 301902 [details] lspci -vvv with both nvme hotplugged; after failed access
attachment 301898 [details] attachment 301899 [details] attachment 301900 [details] attachment 301901 [details] attachment 301902 [details] Sorry for the delay. We applied the experimental patch to disable DLLSC events on PEX 8747 and did some further testing with the following results. Now, the second NVMe does not disappear on hotplugging anymore. So it seems like the patch does its job. However, when accessing the disk (e.g. by doing a `fdisk -l`), an error message "nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10" appears, followed by an error -19 ("ENODEV", as the NVMe cannot be found) and it disappears from the storage subsystem. In NVMe / PCI subsystems, it is still visible. The first disk is visible in both. We also checked power-state related sysfs files during our test-scenario: The power state of all PCI ports on the PCI bridge is always "D0", except after accessing the disk as described above - then it switches to "unknown" for the second hotplugged port. Additionally, we did some comparisons on `lspci -vvv` outputs using diffs. The link-, device-, and port-states apparently change multiple times and the `nvme` driver is unloaded for the second disk at the end after trying to access the disk. See attached files. Do you maybe have some hints for us how to further investigate and debug this issue(s)? Thanks, Aaron
A "diff -u lspci-02_one_hotplugged.txt lspci-03_both_hotplugged.txt" reveals that the Memory Space and Bus Master bits in the PCI Control register of the NVMe that was plugged in first are cleared once the second NVMe is plugged in: @@ -1773,13 +1773,12 @@ 10:00.0 Non-Volatile memory controller: Device 27d1:5216 (rev 01) (prog-if 02 [NVM Express]) Subsystem: Device 27d1:5216 Physical Slot: 0-1 - Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Naturally, that will break access to the NVMe that was plugged in first. No wonder the nvme driver complains. As to *why* those bits are cleared, I'm stumped. I'm not seeing anything in the dmesg output that would indicate where and why the bits are cleared. What you could try is amend pci_write_config_word() with something like: if (where == PCI_COMMAND && dev->bus->number == 0x10) { pci_info(dev, "%s: PCI_COMMAND = %#hx\n", __func__, val); dump_stack(); } That should help identify the place where those bits are cleared. *If* those bits are cleared by the kernel, that is. It could very well be that firmware fiddles with the bits behind the kernel's back. If the kernel is innocent, next step would be to attach an acpidump here so that we can check if firmware is the culprit. Another theory is that the PLX switch does something to the device which causes it to clear those bits. In that case, the only option might be to ask PLX support (actually Broadcom support) for help. But I'd suggest to amend pci_write_config_word() first as shown above.