Bug 76331
Summary: | kernel BUG at drivers/iommu/intel-iommu.c:844! | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Matt (mspeder) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | normal | CC: | alex.williamson, dwmw2, mspeder, szg00000 |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.14.4 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Matt
2014-05-16 08:09:08 UTC
Can you please report the IOMMU cap registers, for VT-d, simply: dmesg | grep ecap Also, it shouldn't matter, but what's the size of the specified guest memory and the version of QEMU being used? Thanks Hi Alex, # dmesg | grep ecap [ 0.057396] dmar: IOMMU 0: reg_base_addr fbfff000 ver 1:0 cap c9008010e60262 ecap f020fa [ 0.057403] dmar: IOMMU 1: reg_base_addr fbffe000 ver 1:0 cap c9078010ef0462 ecap f020fe [ 158.832381] vfio_ecap_init: 0000:00:1b.0 hiding ecap 0x5@0x130 Host RAM is 36G Guest RAM was 16G I just tried with Guest RAM 4G, it didn't make any difference. Here is the relevant extract of libvirt conf : <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <memtune> <hard_limit unit='KiB'>25165824</hard_limit> </memtune> # qemu-system-x86_64 --version QEMU emulator version 2.0.0, Copyright (c) 2003-2008 Fabrice Bellard If you need anything else don't hesitate ! Thanks, Matt Hi Alex, Any news ? Do you need additional info ? I keep receiving the bug each time I power off the VM (VM is working perfectly fine before the shutdown with both sound and gpu passthrough). After that I need to reboot the host to launch this VM again, so this is a quite annoying issue. Thanks ! Matthieu The DRHD capability registers are reported as: IOMMU 0: c9008010e60262 IOMMU 1: c9078010ef0462 101111 100110 From the VT-d spec (v2.2), bits 12:8 of the capability register are the Supported Adjusted Guest Address Widths (SAGAW), defined as: This 5-bit field indicates the supported adjusted guest address widths (which in turn represents the levels of page-table walks for the 4KB base page size) supported by the hardware implementation. A value of 1 in any of these bits indicates the corresponding adjusted guest address width is supported. The adjusted guest address widths corresponding to various bit positions within this field are: • 0: Reserved • 1: 39-bit AGAW (3-level page-table) • 2: 48-bit AGAW (4-level page-table) • 3: Reserved • 4: Reserved Software must ensure that the adjusted guest address width used to set up the page tables is one of the supported guest address widths reported in this field. This system therefore has one DRHD unit supporting 3-level page tables (IOMMU 0) and the other supporting 4-level page tables (IOMMU 1). Bits 21:16 are the Maximum Guest Address Width: This field indicates the maximum DMA virtual addressability supported by remapping hardware. The Maximum Guest Address Width (MGAW) is computed as (N+1), where N is the valued reported in this field. For example, a hardware implementation supporting 48-bit MGAW reports a value of 47 (101111b) in this field. If the value in this field is X, untranslated and translated DMA requests to addresses above 2(X+1)-1 are always blocked by hardware. Device-TLB translation requests to address above 2(X+1)-1 from allowed devices return a null Translation-Completion Data with R=W=0. Guest addressability for a given DMA request is limited to the minimum of the value reported through this field and the adjusted guest address width of the corresponding page-table structure. (Adjusted guest address widths supported by hardware are reported through the SAGAW field). Implementations must support MGAW at least equal to the physical addressability (host address width) of the platform. On this system, IOMMU 0 therefore has a MGAW of 0x26 + 1 = 39 bits, IOMMU 1 = 0x2f + 1 = 48 bits. The BUG we're hitting is: BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width); So the last PFN of the domain is beyond the address width of the domain. last_pfn here is created from DOMAIN_MAX_PFN(domain->gaw) All VM domains are created with a 48 bit width (domain->gaw): #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 So the default last_pfn is 0xf_ffff_ffff Given the default 48 bit width, the default domain AGAW (Adjusted Guest Address Width) is 2 (domain->agaw) When we add devices to the domain, the gaw is updated to match: /* check if this iommu agaw is sufficient for max mapped address */ addr_width = agaw_to_width(iommu->agaw); if (addr_width > cap_mgaw(iommu->cap)) addr_width = cap_mgaw(iommu->cap); if (dmar_domain->max_addr > (1LL << addr_width)) { printk(KERN_ERR "%s: iommu width (%d) is not " "sufficient for the mapped address (%llx)\n", __func__, addr_width, dmar_domain->max_addr); return -EFAULT; } dmar_domain->gaw = addr_width; iommu->agaw is calculated from the SAGAW, and will be either 1 or 2 here depending on which IOMMU manages the device. One bug stands out here, domain->gaw is set to the width of the iommu for the last device added, so an initial suspicion would be that you could avoid the problem by re-ordering the qemu command line to create the devices in the reverse order. So, depending on the order devices were added, domain->gaw is either 48 bits or 39 bits and therefore last_pfn going into the BUG_ON is either 0xf_ffff_ffff or 0x7fff_ffff. addr_width is set from 'agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT' where domain->agaw is initially 2, however just beyond the above code snippet we have: /* * Knock out extra levels of page tables if necessary */ while (iommu->agaw < dmar_domain->agaw) { struct dma_pte *pte; pte = dmar_domain->pgd; if (dma_pte_present(pte)) { dmar_domain->pgd = (struct dma_pte *) phys_to_virt(dma_pte_addr(pte)); free_pgtable_page(pte); } dmar_domain->agaw--; } Therefore, when we add the device behind the 39 bit IOMMU first, we get: last_pfn = 0x7fff_ffff addr_width = 39 but then we add the device behind the 48 bit IOMMU and get: last_pfn = 0xf_ffff_ffff addr_width = 39 Resulting in the BUG_ON The fix might simply be to change setting the GAW here to: dmar_domain->gaw = min(dmar_domain->gaw, addr_width); Hi Alex, Great news ! Yesterday I had the opportunity to recompile my kernel with your suggested fix in intel-iommu driver : dmar_domain->gaw = min(dmar_domain->gaw, addr_width); After multiple tests I can confirm that this successfully fixed the issue. How can we have this integrated in the official kernel sources ? I also tried to re-order the qemu command-line... With or without the fix I don't see any difference and I always end up with various problems related to the gpu pass-thru : - one VM blue screen at boot (VIDEO_TDR_ERROR) - one Host crash ! - driver error (code 43) and dmesg full of errors like : [ 2283.900194] dmar: DMAR:[DMA Read] Request device [06:00.0] fault addr 12de0a000 DMAR:[fault reason 12] non-zero reserved fields in PTE [ 2283.900201] dmar: DMAR:[DMA Write] Request device [06:00.0] fault addr aff93000 DMAR:[fault reason 12] non-zero reserved fields in PTE [ 2286.149141] dmar: DRHD: handling fault status reg 602 But I'm not sure if this problem is related... Hi Alex and David, I've been successfully using Alex's fix for more than a month now. https://lkml.org/lkml/2014/5/29/932 Would it be possible to close this bug by adding the patch to the official kernel tree ? Thanks ! |