Bug 215525

Summary: HotPlug does not work on upstream kernel 5.17.0-rc1
Product: Drivers Reporter: blazej.kucman
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED DOCUMENTED    
Severity: normal CC: bjorn, blazej.kucman, nirmal.patel
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.17.0-rc1 upstream Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
dmesg when hotplugs not works, parameter pci=nommconf and dyndbg added.
dmesg when hotplugs works, without parameter pci=nommconf, dyndbg added.

Description blazej.kucman 2022-01-24 11:46:14 UTC
Created attachment 300308 [details]
dmesg

While testing on latest upstream kernel(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/) we noticed that with the merge commit (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0a231f01e5b25bacd23e6edc7c979a18a517b2b)
hotplug and hotunplug of nvme drives stopped working.

Rescan PCI does not help.
echo "1" > /sys/bus/pci/rescan

Issue does not reproduce on a kernel built on an antecedent commit(88db8458086b1dcf20b56682504bdb34d2bca0e2).


During hot-remove device does not disappear, however when we try to do I/O on the disk then there is an I/O error, and the device disappears.

Before I/O no logs regarding the disk appeared in the dmesg, only after I/O the entries appeared like below:
[  177.943703] nvme nvme5: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[  177.971661] nvme 10000:0b:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[  177.981121] pcieport 10000:00:02.0: can't derive routing for PCI INT A
[  177.987749] nvme 10000:0b:00.0: PCI INT A: no GSI
[  177.992633] nvme nvme5: Removing after probe failure status: -19
[  178.004633] nvme5n1: detected capacity change from 83984375 to 0
[  178.004677] I/O error, dev nvme5n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0


OS: RHEL 8.4 GA
Platform: Intel Purley

The logs are collected on a non-recent upstream kernel, but a issue also occurs on the newest upstream kernel(dd81e1c7d5fb126e5fbc5c9e334d7b3ec29a16a0)
Comment 1 blazej.kucman 2022-01-27 14:27:58 UTC
Created attachment 300338 [details]
dmesg when hotplugs not works, parameter pci=nommconf and dyndbg added.
Comment 2 blazej.kucman 2022-01-27 14:29:19 UTC
Created attachment 300339 [details]
dmesg when hotplugs works, without parameter pci=nommconf, dyndbg added.
Comment 3 Bjorn Helgaas 2022-02-09 20:57:55 UTC
Summary from the email thread at https://lore.kernel.org/r/20220124214635.GA1553164@bhelgaas:

Issue only occurs when booting with "pci=nommconf" and is related to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") [1].

Prior to 04b12ef163d1, when booting with "pci=nommconf", pciehp hotplug did not work for devices in general because Linux requires support for extended config before it requests control of PCIe features [2, 3, 4].  However, hotplug *did* work for devices below a VMD because VMD acts like a host bridge and inherited the default "native_pcie_hotplug" setting [5], lwhich is "OS owns pciehp".

After 04b12ef163d1, VMD inherits the real ACPI host bridge "native_pcie_hotplug" setting, which is "platform owns pciehp" since Linux didn't ask for control of it.  Therefore, pciehp hotplug doesn't work for *any* devices.

We can avoid the issue by omitting the "pci=nommconf" parameter.  In that case, Linux *will* request control of pciehp, and hotplug will work for all devices, including those below a VMD.


[1] https://git.kernel.org/linus/04b12ef163d1
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pci_root.c?id=v5.16#n41
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pci_root.c?id=v5.16#n388
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pci_root.c?id=v5.16#n450
[5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/probe.c?id=v5.16#n581