Bug 217251
Summary: | pciehp: nvme not visible after re-insert to tbt port | ||
---|---|---|---|
Product: | Drivers | Reporter: | Aleksander Trofimowicz (alex) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | normal | CC: | mika.westerberg |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 6.2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
the tracing of nvme_pci_enable() during re-insertion
kernel buffer after the second insertion pci buses overview pci device status after the second insertion 04:01.0 PCI bridge status after the second insertion the tracing of nvme_pci_enable() during the first insertion the tracing of nvme_remove() during unplugging of the peripheral 1st dmesg 1st lspci 2nd dmesg 2nd lspci |
Created attachment 304032 [details]
kernel buffer after the second insertion
Created attachment 304033 [details]
pci buses overview
Created attachment 304034 [details]
pci device status after the second insertion
Created attachment 304035 [details]
04:01.0 PCI bridge status after the second insertion
Created attachment 304036 [details]
the tracing of nvme_pci_enable() during the first insertion
Created attachment 304037 [details]
the tracing of nvme_remove() during unplugging of the peripheral
Can you attach full dmesg and output of 'sudo lspci -vv' after both insertions? Created attachment 304053 [details]
1st dmesg
Created attachment 304054 [details]
1st lspci
Created attachment 304055 [details]
2nd dmesg
Created attachment 304056 [details]
2nd lspci
Thanks for the logs! Indeed, the PCIe downstream port 04:01.0 seems to enter D3 (runtime suspend) even though the connected endpoint (nvme 05:00.0) is in D0. That's unexpected. Can you try if passing "pcie_port_pm=off" works it around? bugzilla-daemon@kernel.org writes: > https://bugzilla.kernel.org/show_bug.cgi?id=217251 > > --- Comment #12 from Mika Westerberg (mika.westerberg@linux.intel.com) --- > Thanks for the logs! Indeed, the PCIe downstream port 04:01.0 seems to enter > D3 > (runtime suspend) even though the connected endpoint (nvme 05:00.0) is in D0. > That's unexpected. Can you try if passing "pcie_port_pm=off" works it around? > I did, and the results were the same. I also decided to widen the problem space: added 4 other distinct NVMe devices, and another mobile platform - TGL. All but those including the 970 Pro device combinations worked flawlessly. After all I could not confirm the claim of one of your colleagues something has been botched since the introduction of ADL. As far as I am concerned, we could drop the towel. Nonetheless if you think the kernel might be at fault, I am willing to devote my time nailing it down. Okay thanks for checking anyway. Yeah, could be device issue but I'm not a NVMe expert (more like looking at this because TBT is involved). |
Created attachment 304031 [details] the tracing of nvme_pci_enable() during re-insertion Hi, There is a JHL7540-based device that may host a NVMe device. After the first insertion a nvme drive is properly discovered and handled by the relevant modules. Once disconnected any further attempts are not successful. The device is visible on a PCI bus, but nvme_pci_enable() ends up calling pci_disable_device() every time; the runtime PM status of the device is "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the device from being power managed ("on" -> /sys/devices/../power/control) combined with device removal and pci rescan changes nothing. A host reboot restores the initial state. I would appreciate any suggestions how to debug it further.