Bug 214219 - The host OS cannot boot successfully when enabling VMD in BIOS setup (ice lake processor)
Summary: The host OS cannot boot successfully when enabling VMD in BIOS setup (ice lak...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-30 07:13 UTC by Adrian Huang
Modified: 2022-06-30 06:38 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.14-rc7
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Serial Log (102.63 KB, text/plain)
2021-08-30 07:13 UTC, Adrian Huang
Details

Description Adrian Huang 2021-08-30 07:13:45 UTC
Created attachment 298517 [details]
Serial Log

When enabling VMD in BIOS setup, the host OS cannot boot successfully with the following error message:

[   13.577850] DMAR: DRHD: handling fault status reg 2
[   13.577852] DMAR: [INTR-REMAP] Request device [0x00:0x00.5] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request

*** Hardware Info ***
CPU: Ice Lake
VMD: 8086:28c0
    # lspci -s 0000:00.5 -nn
    0000:00:00.5 RAID bus controller [0104]: Intel Corporation Volume Management 
    Device NVMe RAID Controller [8086:28c0] (rev 04)

*** Detail Info ***
`git bisect` points the following offending patch (commit: ee81ee84f873):

commit ee81ee84f8739e584c9ccf113ba3c796187b7080
Author: Jon Derrick <jonathan.derrick@intel.com>
Date:   Wed Feb 10 09:13:15 2021 -0700

    PCI: vmd: Disable MSI-X remapping when possible

    VMD will retransmit child device MSI-X using its own MSI-X table and
    requester-id. This limits the number of MSI-X available to the whole
    child device domain to the number of VMD MSI-X interrupts.

    Some VMD devices have a mode where this remapping can be disabled,
    allowing child device interrupts to bypass processing with the VMD MSI-X
    domain interrupt handler and going straight the child device interrupt
    handler, allowing for better performance and scaling. The requester-id
    still gets changed to the VMD endpoint's requester-id, and the interrupt
    remapping handlers have been updated to properly set IRTE for child
    device interrupts to the VMD endpoint's context.

    Some VMD platforms have existing production BIOS which rely on MSI-X
    remapping and won't explicitly program the MSI-X remapping bit. This
    re-enables MSI-X remapping on unload.

    Link: https://lore.kernel.org/r/20210210161315.316097-3-jonathan.derrick@intel.com
    Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
    Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
    Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
    Acked-by: Joerg Roedel <jroedel@suse.de>


*** Debugging Info ***
1. Reverting ee81ee84f873 on top of 5.14-rc7 can fix the issue.

2. The issue occurs when the VMD MSI remapping is disabled by invoking "vmd_set_msi_remapping(vmd, false)." However, IOMMU hardware blocks the compatibility format interrupt request because Interrupt Remapping Enable Status (IRES) and Extended Interrupt Mode Enable (EIME) are enabled. Please refer to section "5.1.4 Interrupt-Remapping Hardware Operation" in Intel VT-d spec. The following patch can fix the issue:

  diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
  index e3fcdfec58b3..dd0c57e5e9a5 100644
  --- a/drivers/pci/controller/vmd.c
  +++ b/drivers/pci/controller/vmd.c
  @@ -863,8 +863,7 @@ static const struct pci_device_id vmd_ids[] = {
                  .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP,},
          {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_28C0),
                  .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
  -                               VMD_FEAT_HAS_BUS_RESTRICTIONS |
  -                               VMD_FEAT_CAN_BYPASS_MSI_REMAP,},
  +                               VMD_FEAT_HAS_BUS_RESTRICTIONS,},
          {PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x467f),
                  .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
                                  VMD_FEAT_HAS_BUS_RESTRICTIONS |
  
  I know VMD_FEAT_CAN_BYPASS_MSI_REMAP is a new feature in Ice Lake's VMD controller, so option 3 might be the better solution.
 
3. The following patch checks if the IOMMU enables the interrupt remapping. If so, VMD still enables interrupt remapping irrespective of VMD_FEAT_CAN_BYPASS_MSI_REMAP.

  # git diff
  diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
  index e3fcdfec58b3..db72932d049f 100644
  --- a/drivers/pci/controller/vmd.c
  +++ b/drivers/pci/controller/vmd.c
  @@ -6,6 +6,7 @@

   #include <linux/device.h>
   #include <linux/interrupt.h>
  +#include <linux/iommu.h>
   #include <linux/irq.h>
   #include <linux/kernel.h>
   #include <linux/module.h>
  @@ -710,7 +711,8 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned 
  long features)
           * acceptable because the guest is usually CPU-limited and MSI
           * remapping doesn't become a performance bottleneck.
           */
  -       if (!(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) ||
  +       if (iommu_capable(vmd->dev->dev.bus, IOMMU_CAP_INTR_REMAP) ||
  +           !(features & VMD_FEAT_CAN_BYPASS_MSI_REMAP) ||
              offset[0] || offset[1]) {
                  ret = vmd_alloc_irqs(vmd);
                  if (ret)

  The test passes with/without the "intremap=off" kernel parameter.

  Jon, I can submit this patch for upstream review if you think this is the valid patch. If you have other solution for this, please let me know. I can test your patch.
Comment 1 Jon Derrick 2021-08-30 15:13:20 UTC
Hi Adrian,

This will cause performance issues but I have no immediate solution. I believe that the Compatibility Format points to a hint that the subdevice could be programmed correctly to deal with IOMMU interrupt remapping as intended. It shouldn't be a compatibility format as far as I know. I'm CCing Nirmal on this thread as he will be maintaining VMD soon.
Comment 2 Jon Derrick 2021-08-30 15:13:44 UTC
Please submit for upstream review. Better to boot than not at all.
Comment 3 KobaKo 2022-05-17 09:03:03 UTC
On 12th platform and also equip with this vmd.
RAID bus controller [0104]: Intel Corporation Volume
  Management Device NVMe RAID Controller [8086:28c0] (rev 04)

Change to RAID and got the same DMAR errors,
[   13.577850] DMAR: DRHD: handling fault status reg 2
[   13.577852] DMAR: [INTR-REMAP] Request device [0x00:0x00.5] fault index 0xa00 

After applied this commit, issue was gone 
6e707d0fc46d) PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU

but other devices would be conflicted the different DMAR errors.
[ 843.730480] DMAR: DRHD: handling fault status reg 3
[ 843.730488] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0x6e000000 [fault reason 0x0c] non-zero reserved fields in PTE

NV/AMD cards would conflict after intel_iommu is on.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1965882
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971146
Comment 4 KobaKo 2022-05-17 09:03:17 UTC
On 12th platform and also equip with this vmd.
RAID bus controller [0104]: Intel Corporation Volume
  Management Device NVMe RAID Controller [8086:28c0] (rev 04)

Change to RAID and got the same DMAR errors,
[   13.577850] DMAR: DRHD: handling fault status reg 2
[   13.577852] DMAR: [INTR-REMAP] Request device [0x00:0x00.5] fault index 0xa00 

After applied this commit, issue was gone 
6e707d0fc46d) PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU

but other devices would be conflicted the different DMAR errors.
[ 843.730480] DMAR: DRHD: handling fault status reg 3
[ 843.730488] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0x6e000000 [fault reason 0x0c] non-zero reserved fields in PTE

NV/AMD cards would conflict after intel_iommu is on.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1965882
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971146
Comment 5 Adrian Huang 2022-05-24 08:09:12 UTC
Hi Koba,

From the dmesg 'https://launchpadlibrarian.net/592239740/CurrentDmesg.txt' (without VMD device enabled: I don't see the VMD device in dmesg), the following error messages (DMAR fault) might occur if the adapter FW tries to access the reserved memory region. But, the reserved memory region is *NOT* configured in IOMMU page table (I saw the IOMMU type is 'translated' instead of 'passthrough'). 

-------------------------------------------------------------------------------
[   17.152648] DMAR: DRHD: handling fault status reg 3
[   17.152653] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xff001000 [fault reason 0x06] PTE Read access is not set
[   17.152657] DMAR: DRHD: handling fault status reg 3
[   17.152658] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xff00f000 [fault reason 0x06] PTE Read access is not set
[   17.152661] DMAR: DRHD: handling fault status reg 3
[   17.152662] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xff01a000 [fault reason 0x06] PTE Read access is not set
[   17.152664] DMAR: DRHD: handling fault status reg 3
-------------------------------------------------------------------------------

Workaround: I'm guessing the issue might go away if the kernel parameter 'iommu=pt' is appended. Could you please give it a try?

Possible solution (my guess): Some adapter FW requires that UEFI/BIOS defines the corresponding the IOMMU RMRR (Reserved memory Region Reporting Structure) in ACPI table - This is so-called vendor's private interface. This is general method to make it work when the IOMMU type is 'translated'. Here is quote from page #45 of "Broadcom 12Gb/s MegaRAID® Tri-Mode Software User Guide" (https://docs.broadcom.com/doc/MR-TM-SW-UG):

-------------------------------------------------------------------------------
System BIOS should support Broadcom’s private interface and add host memory address into the DMAR (DMA
remapping)/RMRR (Reserved memory Region Reporting Structure) table for iMegaRAID to work seamlessly in VTd/IOMMU (Intel Virtualization Technology for Directed I/O/input–output memory management unit) enabled system/
operating system
-------------------------------------------------------------------------------
Comment 7 Adrian Huang 2022-06-30 06:38:48 UTC
(In reply to Nirmal Patel from comment #6)
> The fix was added recently to upstream kernel.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> drivers/pci/controller/vmd.c?h=v5.19-
> rc4&id=886e67100b904cb1b106ed1dfa8a60696aff519a
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> drivers/pci/controller/vmd.c?h=v5.19-
> rc4&id=c94f732e8001a860b42aa740b0a178a29907463c

Yes, I have verified the patch set locally. It works. Thanks, Nirmal.

Note You need to log in before you can comment on or make changes to this bug.