Bug 217472 - ACPI _OSC features have different values in Host OS and Guest OS
Summary: ACPI _OSC features have different values in Host OS and Guest OS
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-22 16:32 UTC by Nirmal Patel
Modified: 2023-06-12 21:02 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Rhel9.1_Guest_dmesg (61.15 KB, text/plain)
2023-05-22 16:32 UTC, Nirmal Patel
Details
Hypervisor dmesg (206.34 KB, text/plain)
2023-06-12 20:59 UTC, Nirmal Patel
Details
VM dmesg (82.34 KB, text/plain)
2023-06-12 21:00 UTC, Nirmal Patel
Details
VM lspci (136.34 KB, text/plain)
2023-06-12 21:00 UTC, Nirmal Patel
Details
Hypervisor lspci (380.71 KB, text/plain)
2023-06-12 21:00 UTC, Nirmal Patel
Details

Description Nirmal Patel 2023-05-22 16:32:03 UTC
Created attachment 304301 [details]
Rhel9.1_Guest_dmesg

Issue:
NVMe Drives are still present after performing hotplug in guest OS. We have tested with different combination of OSes, drives and Hypervisor. The issue is present across all the OSes. 

The following patch was added to honor ACPI _OSC values set by BIOS and the patch helped to bring the issue out in VM/ Guest OS.

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e


I also compared the values of the parameters in the patch in Host and Guest OS. The parameters with different values in Host and Guest OS are:

native_pcie_hotplug
native_shpc_hotplug
native_aer
native_ltr

i.e.
value of native_pcie_hotplug in Host OS is 1.
value of native_pcie_hotplug in Guest OS is 0.

I am not sure why "native_pcie_hotplug" is changed to 0 in guest.
Isn't it OSC_ managed parameter? If that is the case, it should
have same value in Host and Guest OS.
Comment 1 Nirmal Patel 2023-05-23 23:18:05 UTC
Thanks Bjorn and Alex for quick response.
I agree with the analysis about guest BIOS not giving control of PCIe native hotplug to guest OS.

Adding some background about the above patch.
The above patch was added to suppress AER messages when samsung drives were connected with VMD enabled. I believe AER is enabled in BIOS i.e. pre-OS VMD driver not by VMD linux driver. So the AER flooding would be seen even non-linux environment.

So with guest BIOS providing different values than the Host BIOS and adding this patch in VMD linux driver leaves direct assign functionality broken across all the hypervisors and guest OS combinations. As a result hotplug will not work which is a major issue.

Before this patch, VMD used pciehp.

What should be the ideal case here? i.e. vmd relies on native_pcie_hotplug setting or BIOS settings.

I am open to a better suggestion, but I can think of two options.

Option 1: Revert the patch f611b83c7a0e and suggest the AER fix to be added to the BIOS or Pre-OS vmd driver.

Option 2: VMD to enumerate all the devices and set native_pcie_hotplug for all the hotplug capable devices.

Thanks
nirmal
Comment 2 Bagas Sanjaya 2023-05-28 12:49:10 UTC
(In reply to Nirmal Patel from comment #1)
> Thanks Bjorn and Alex for quick response.
> I agree with the analysis about guest BIOS not giving control of PCIe native
> hotplug to guest OS.
> 
> Adding some background about the above patch.
> The above patch was added to suppress AER messages when samsung drives were
> connected with VMD enabled. I believe AER is enabled in BIOS i.e. pre-OS VMD
> driver not by VMD linux driver. So the AER flooding would be seen even
> non-linux environment.
> 
> So with guest BIOS providing different values than the Host BIOS and adding
> this patch in VMD linux driver leaves direct assign functionality broken
> across all the hypervisors and guest OS combinations. As a result hotplug
> will not work which is a major issue.
> 
> Before this patch, VMD used pciehp.
> 
> What should be the ideal case here? i.e. vmd relies on native_pcie_hotplug
> setting or BIOS settings.
> 
> I am open to a better suggestion, but I can think of two options.
> 
> Option 1: Revert the patch f611b83c7a0e and suggest the AER fix to be added
> to the BIOS or Pre-OS vmd driver.
> 
> Option 2: VMD to enumerate all the devices and set native_pcie_hotplug for
> all the hotplug capable devices.
> 

I don't have any context you're referring to. Where is lore.kernel.org discussion?
Comment 3 Bjorn Helgaas 2023-06-06 23:17:34 UTC
Bagas, it starts here: https://lore.kernel.org/r/ZGz2FQpHPKYgcc0+@bhelgaas
Comment 4 Nirmal Patel 2023-06-12 20:59:37 UTC
Created attachment 304412 [details]
Hypervisor dmesg
Comment 5 Nirmal Patel 2023-06-12 21:00:00 UTC
Created attachment 304413 [details]
VM dmesg
Comment 6 Nirmal Patel 2023-06-12 21:00:21 UTC
Created attachment 304414 [details]
VM lspci
Comment 7 Nirmal Patel 2023-06-12 21:00:44 UTC
Created attachment 304415 [details]
Hypervisor lspci
Comment 8 Nirmal Patel 2023-06-12 21:02:13 UTC
VMD uses pass-through mechanism to assign drives directly without much
Host OS intervention.
I am removing drive physically from the slot and check lspci and lsblk
to make sure hot removed drive is disappeared from the list.

Reproduction steps:
- Enable vmd and hotplug settings in BIOS
- Install/ Boot rhel. i.e. Rhel9.1
- Mount ISO with repository and install packages: i.e. libvirt, libvirt-python, bridge-utils, virt-manager, libvirt-daemon-config-network, libguestfs-tools, virt-install, qemu-kvm, virt-viewer
- Run virtualization service. i.e. libvirtd
- Create Virtual Machine. Add desired settings like cpu, memory, etc.
- Assign PCI device to VM.
    - Open virt-manager.
    - Select Edit → Virtual Machine Details
    - Click on Add Hardware, and select PCI Host Device
    - Find in list VMD devices selected at the beginning of the manual. e.g. 0000:97:00.5 Intel Corporation Volume Management Device NVMe RAID Controller.

- Boot to Guest OS.
- Verify in lsblk output that assigned drives are visible: # lsblk -o +seriaL
- Hot remove drives

Expected result:
Hot removed drives disappear from guest OS

Actual result:
Hot removed drives are present in guest OS (lsblk, lspci)

Note:
Hot removed drives are gone after Guest VM reboot. Drives do not appear in Guest OS after hot plug.

Hot remove works correctly on Host OS – drives disappear in lsblk, lspci

Note You need to log in before you can comment on or make changes to this bug.