Bug 214219
Summary: | The host OS cannot boot successfully when enabling VMD in BIOS setup (ice lake processor) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Adrian Huang (ahuang12) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | ahuang12, jonathan.derrick, koba.ko, nirmal.patel |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.14-rc7 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Serial Log |
Description
Adrian Huang
2021-08-30 07:13:45 UTC
Hi Adrian, This will cause performance issues but I have no immediate solution. I believe that the Compatibility Format points to a hint that the subdevice could be programmed correctly to deal with IOMMU interrupt remapping as intended. It shouldn't be a compatibility format as far as I know. I'm CCing Nirmal on this thread as he will be maintaining VMD soon. Please submit for upstream review. Better to boot than not at all. On 12th platform and also equip with this vmd. RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller [8086:28c0] (rev 04) Change to RAID and got the same DMAR errors, [ 13.577850] DMAR: DRHD: handling fault status reg 2 [ 13.577852] DMAR: [INTR-REMAP] Request device [0x00:0x00.5] fault index 0xa00 After applied this commit, issue was gone 6e707d0fc46d) PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU but other devices would be conflicted the different DMAR errors. [ 843.730480] DMAR: DRHD: handling fault status reg 3 [ 843.730488] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0x6e000000 [fault reason 0x0c] non-zero reserved fields in PTE NV/AMD cards would conflict after intel_iommu is on. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1965882 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971146 On 12th platform and also equip with this vmd. RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller [8086:28c0] (rev 04) Change to RAID and got the same DMAR errors, [ 13.577850] DMAR: DRHD: handling fault status reg 2 [ 13.577852] DMAR: [INTR-REMAP] Request device [0x00:0x00.5] fault index 0xa00 After applied this commit, issue was gone 6e707d0fc46d) PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU but other devices would be conflicted the different DMAR errors. [ 843.730480] DMAR: DRHD: handling fault status reg 3 [ 843.730488] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0x6e000000 [fault reason 0x0c] non-zero reserved fields in PTE NV/AMD cards would conflict after intel_iommu is on. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1965882 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971146 Hi Koba, From the dmesg 'https://launchpadlibrarian.net/592239740/CurrentDmesg.txt' (without VMD device enabled: I don't see the VMD device in dmesg), the following error messages (DMAR fault) might occur if the adapter FW tries to access the reserved memory region. But, the reserved memory region is *NOT* configured in IOMMU page table (I saw the IOMMU type is 'translated' instead of 'passthrough'). ------------------------------------------------------------------------------- [ 17.152648] DMAR: DRHD: handling fault status reg 3 [ 17.152653] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xff001000 [fault reason 0x06] PTE Read access is not set [ 17.152657] DMAR: DRHD: handling fault status reg 3 [ 17.152658] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xff00f000 [fault reason 0x06] PTE Read access is not set [ 17.152661] DMAR: DRHD: handling fault status reg 3 [ 17.152662] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xff01a000 [fault reason 0x06] PTE Read access is not set [ 17.152664] DMAR: DRHD: handling fault status reg 3 ------------------------------------------------------------------------------- Workaround: I'm guessing the issue might go away if the kernel parameter 'iommu=pt' is appended. Could you please give it a try? Possible solution (my guess): Some adapter FW requires that UEFI/BIOS defines the corresponding the IOMMU RMRR (Reserved memory Region Reporting Structure) in ACPI table - This is so-called vendor's private interface. This is general method to make it work when the IOMMU type is 'translated'. Here is quote from page #45 of "Broadcom 12Gb/s MegaRAID® Tri-Mode Software User Guide" (https://docs.broadcom.com/doc/MR-TM-SW-UG): ------------------------------------------------------------------------------- System BIOS should support Broadcom’s private interface and add host memory address into the DMAR (DMA remapping)/RMRR (Reserved memory Region Reporting Structure) table for iMegaRAID to work seamlessly in VTd/IOMMU (Intel Virtualization Technology for Directed I/O/input–output memory management unit) enabled system/ operating system ------------------------------------------------------------------------------- The fix was added recently to upstream kernel. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/controller/vmd.c?h=v5.19-rc4&id=886e67100b904cb1b106ed1dfa8a60696aff519a https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/controller/vmd.c?h=v5.19-rc4&id=c94f732e8001a860b42aa740b0a178a29907463c (In reply to Nirmal Patel from comment #6) > The fix was added recently to upstream kernel. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > drivers/pci/controller/vmd.c?h=v5.19- > rc4&id=886e67100b904cb1b106ed1dfa8a60696aff519a > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > drivers/pci/controller/vmd.c?h=v5.19- > rc4&id=c94f732e8001a860b42aa740b0a178a29907463c Yes, I have verified the patch set locally. It works. Thanks, Nirmal. |