Bug 86951 - iommu regression
Summary: iommu regression
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-26 13:56 UTC by Will
Modified: 2016-10-28 20:49 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.13
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Will 2014-10-26 13:56:08 UTC
I am trying to make a Digium TDM410 PCI card available to a guest VM via VT-d / IOMMU. This appears to work under kernel 3.2 but not via any later kernel tested since.

I noted pci_find_upstream_pcie_bridge errors (https://bugzilla.kernel.org/show_bug.cgi?id=44881) which appear resolved in 3.17 and above but was advised to file a new bug as this appears to be a new problem.

Working kernel linux 3.2.0-70-generic 
Tested but not working 3.5.0-54-generic, 3.13.0-37-generic, 3.16.6-031606-generic, 3.17.1-031701-generic or 3.18.0-031800rc1-generic.

3.17.1 and 3.18.0-rc1 resolve the pci_find_upstream_pcie_bridge error message both still have the DMAR errors, the later has both read and write errors.

https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1385388

I believe this type of configuration is often achieved using Xen but I am using virtualbox 4.1.34 and it is possible there are some quirks associated with my motherboard.

Motherboard: Gigabyte Technology Co., Ltd. Z87X-D3H/Z87X-D3H-CF, BIOS F9 08/25/2014 (latest)

03:00.0 Ethernet controller: Digium, Inc. Wildcard TDM410 4-port analog card (rev 11)
03:00.0 0200: d161:8005 (rev 11)

02:00.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 41) (prog-if 01 [Subtractive decode])
02:00.0 0604: 8086:244e (rev 41)

typical errors shown on the host when module loaded via modprobe in the guest....

Oct 26 06:35:39 kernel: [ 227.795652] dmar: DRHD: handling fault status reg 3
Oct 26 06:35:39 kernel: [ 227.795678] dmar: DMAR:[DMA Read] Request device [03:00.0] fault addr 73ac6000
Oct 26 06:35:39 kernel: [ 227.795678] DMAR:[fault reason 06] PTE Read access is not set

Modprobe freezes in the guest, guest is causing a high cpu load and freezes on reboot due to possible spin lock.
Comment 1 Alex Williamson 2014-10-27 19:25:01 UTC
AFAIK vboxpci is a closed source kernel module.  If you'd like to try with KVM/VFIO we might be able to help you.
Comment 2 Bjorn Helgaas 2014-12-05 18:25:31 UTC
Will, is there any chance you can reproduce this problem with an upstream kernel, i.e., without vboxpci?  It's hard for us to identify the source code for the combination of Linux + vbox that you're running (and I know nothing about vbox anyway), so I don't think we can make much progress here.

Another possibility is for you to bisect to identify the commit that broke things.  That might be enough to help figure out a fix.
Comment 3 Will 2014-12-08 22:25:16 UTC
A bisect sounds like a sensible way to find the problem. My issue is that the machine is a production machine so it limits what I can do easily. I just booted an old kernel that was left there from updates to see if it would work which it did, but there is no Ethernet driver for my card in that kernel and I thought it would be better to go forwards rather than backwards hence stopping off here. When I get some spare time associated with this machine or some equivalent hardware to test on I will investigate further and see if I can provide some more useful information.
Comment 4 David Flater 2014-12-30 02:44:25 UTC
I am getting the same kind of DMAR fault messages from unpatched kernel.org kernels when trying to use a PCI sound card without bounce buffers (no virtualization, just intel_iommu=on), but reverting to git tag v3.2 or v3.1 did not fix it.  It only triggered the other problem to appear (warning at pci_find_upstream_pcie_bridge).

Dec 17 19:36:41 lava64 kernel: dmar: DRHD: handling fault status reg 3
Dec 17 19:36:41 lava64 kernel: dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr 7fff0000 
Dec 17 19:36:41 lava64 kernel: DMAR:[fault reason 06] PTE Read access is not set

Gigabyte GA-Z97-HD3 rev. 2.0 with F6 BIOS (new PC, new problem)
Sound card:  VIA Technologies Inc. ICE1712 [Envy24] PCI, module snd_ice1712

I gathered from searching that lots of people have had this problem since the IOMMU first appeared, but it is usually blamed on faulty BIOS with no workaround except to disable the IOMMU ("don't do that then").
Comment 5 Alex Williamson 2014-12-30 04:33:32 UTC
(In reply to David Flater from comment #4)
> I am getting the same kind of DMAR fault messages from unpatched kernel.org
> kernels when trying to use a PCI sound card without bounce buffers (no
> virtualization, just intel_iommu=on), but reverting to git tag v3.2 or v3.1
> did not fix it.  It only triggered the other problem to appear (warning at
> pci_find_upstream_pcie_bridge).
> 
> Dec 17 19:36:41 lava64 kernel: dmar: DRHD: handling fault status reg 3
> Dec 17 19:36:41 lava64 kernel: dmar: DMAR:[DMA Read] Request device
> [05:00.0] fault addr 7fff0000 
> Dec 17 19:36:41 lava64 kernel: DMAR:[fault reason 06] PTE Read access is not
> set
> 
> Gigabyte GA-Z97-HD3 rev. 2.0 with F6 BIOS (new PC, new problem)
> Sound card:  VIA Technologies Inc. ICE1712 [Envy24] PCI, module snd_ice1712
> 
> I gathered from searching that lots of people have had this problem since
> the IOMMU first appeared, but it is usually blamed on faulty BIOS with no
> workaround except to disable the IOMMU ("don't do that then").

Sounds like a different problem, please file a new bug instead of cluttering this one.  My guess would be that the device is trying to flush DMA writes with a DMA read, but the driver hasn't properly mapped anything to the flush address.  Since it has never worked as far as we know, it's not a regression.  Potentially a driver bug, maybe a hardware bug.
Comment 6 Bjorn Helgaas 2016-10-28 20:49:22 UTC
This bug seems stale.  I'm closing it because I don't think we're making any progress on it.  If it's still a problem, please reopen and we'll try again.

Note You need to log in before you can comment on or make changes to this bug.