Bug 12578
Summary: | DMAR errors and driver instability | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Andy Isaacson (adi) |
Component: | x86-64 | Assignee: | David Woodhouse (dwmw2) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | adar, antonf, bhavesh, chrisw, markmc |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.29-rc3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
test patch
another test patch dmesg with dmar-coherency-debug.patch dmesg with test patch further debugging patch (incremental) dmesg with debug-me-harder.patch dmesg.2.6.28-dmar-00001-g8aeaf45 Potential fix potential fix, really. dmesg with A05 bios + 33bfad5 updated iommu patch dmesg with A03, 33bfad5+iwlfix+rwbf-3.patch 2.6.29-rc5 kernel log containing DMAR faults from ixgbe |
Description
Andy Isaacson
2009-01-29 15:13:06 UTC
ah, neglected to mention that both e1000e and iwlagn are rock solid (GB of transfer without problem) with CONFIG_DMAR=n. There are classes of driver bug which wouldn't show up, so one thing to try would be the DMA API debug infrastructure posted by Jörg Rödel recently: https://lists.linux-foundation.org/pipermail/iommu/2009-January/000944.html That would eliminate the majority of possible driver bugs. I should be home early next week and able to look into this in detail then. Linking https://bugzilla.redhat.com/show_bug.cgi?id=479996 as there are some useful hints there (another workaround is to boot with intel_iommu=off). (In reply to comment #2) > There are classes of driver bug which wouldn't show up, so one thing to try > would be the DMA API debug infrastructure posted by Jörg Rödel recently: > https://lists.linux-foundation.org/pipermail/iommu/2009-January/000944.html > > That would eliminate the majority of possible driver bugs. I should be home > early next week and able to look into this in detail then. Adar tried DMA_API_DEBUG and it didn't show anything up: https://bugzilla.redhat.com/479996#c22 Created attachment 20120 [details]
test patch
Can you try this debug patch, just in case it shows anything interesting?
(It highlights a real bug in domain_update_iommu_coherency() where we were previously setting domain->iommu_coherency to 1 temporarily, but I don't think that's very likely to be the problem here unless you have lots of hotplug events).
Created attachment 20123 [details]
another test patch
This separate test patch applies on top of Jörg's dma api debugging infrastructure, and it will dump the kernel's idea of which regions are mapped for DMA, when the fault happens.
If the kernel thinks that the faulting address _should_ be mapped, I'll do a further patch which will take a closer look at the actual DMA page tables...
(In reply to comment #6) > If the kernel thinks that the faulting address _should_ be mapped, I'll do a > further patch which will take a closer look at the actual DMA page tables... That would be quite handy! And FWIW, I tried something similar to your patch that ignores the coherency bit in the iommu ecap register and always clflushes the PTEs, but no luck. See https://bugzilla.redhat.com/show_bug.cgi?id=479996 comment #37. Created attachment 20127 [details] dmesg with dmar-coherency-debug.patch The patch (attachment 20120 [details]) didn't change the failure, but there is a bit more debug output now. Created attachment 20130 [details] dmesg with test patch It'd be easiest for me if you simply provide a git pull URL, that way there's no confusion as to where I should apply the patch... I applied attachment 20123 [details] on top of 5b4ea830 (the tip of joro/linux-2.6-iommu.git dma-api/debug) resulting in my local 19c691b and captured the attached dmesg (I only enclosed the first trace -- the total kernel log for this boot is 23 MB!). Note some interesting printk buffer corruption around 18.167980. You applied it to the correct tree; thanks. Looks like I should refine the debug patch a little -- to dump only entries for the affected device, and perhaps some locking to prevent simultaneous dumps. The first one looks quite simple though. From: [ 18.165699] DMAR:[DMA Write] Request device [0c:00.0] fault addr ff9df000 [ 18.165700] DMAR:[fault reason 05] PTE Write access is not set ... to ... [ 18.167389] e1000e 0000:00:19.0: coherent idx 255 P=1189d5000 D=fffff000 L=1000 DMA_BIDIRECTIONAL It doesn't look like 0xFF9DF000 is actually supposed to be mapped. The closest entries in the list are [ 18.166928] e1000e 0000:00:19.0: single idx 206 P=115949012 D=fff9d012 L=5f2 DMA_FROM_DEVICE ... and ... [ 18.167155] iwlagn 0000:0c:00.0: single idx 239 P=118c51800 D=ff9de800 L=400 DMA_TO_DEVICE At first glance, unless the list is lossy for some reason, I'm inclined to suspect that the fault here is with the driver -- is it DMAing into a buffer it's already unmapped? The second one, however... [ 18.167459] DMAR:[DMA Write] Request device [0c:00.0] fault addr ff9dd000 [ 18.167460] DMAR:[fault reason 05] PTE Write access is not set [ 18.167700] iwlagn 0000:0c:00.0: single idx 0 P=114ed0000 D=ffe00000 L=2100 DMA_FROM_DEVICE ... [ 18.327515] iwlagn 0000:0c:00.0: single idx 238 P=114860000 D=ff9dc000 L=2100 DMA_FROM_DEVICE ... 0xff9dc000 + 0x2100 is 0xff9de100, so a DMA write at ff9dd000 ought to be within that range, and permitted by the IOMMU. I'll knock up something which verifies that the CPU's view of the actual page tables would allow that. And if _that_ looks OK, we're back to muttering nasty words about cache-incoherency. Created attachment 20131 [details]
further debugging patch (incremental)
OK, this patch ought to dump the actual iommu page table entry for the offending address, when a fault happens. I say 'ought'. I haven't tested it here, and it's about 5 minutes to Friday. It does at least build.
On the topic of 'nasty words about cache-incoherency'... can you try Bhavesh's patch at https://bugzilla.redhat.com/show_bug.cgi?id=479996#c41 Just to clarify: my patch is against 2.6.28 vanilla. But you would have figured that out :) Created attachment 20140 [details] dmesg with debug-me-harder.patch The first 10,000 lines of dmesg output with debug-me-harder.patch. Note that I didn't actually notice any device failures on this boot. The kernel is 5b4ea830 + attachment 20123 [details] + attachment 20131 [details] (debug-me-harder.patch). Created attachment 20141 [details] dmesg.2.6.28-dmar-00001-g8aeaf45 (In reply to comment #14) > Just to clarify: my patch is against 2.6.28 vanilla. But you would have > figured > that out :) Indeed. :) The udelay did *not* fix the problem for me, please find attached a dmesg with udelay. This is upstream 2.6.28, 4a6908a3 plus: diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 3d017cf..409d1fc 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -317,6 +317,7 @@ static inline void __iommu_flush_cache( { if (!ecap_coherent(iommu->ecap)) clflush_cache_range(addr, size); + udelay(2); } extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); Increasing it to udelay(20) made my boot take a lot longer (about 61 seconds to userland) but I can still reproduce similar failures: [ 494.448749] DMAR:[DMA Read] Request device [00:19.0] fault addr ffea5000 [ 494.448750] DMAR:[fault reason 06] PTE Read access is not set [ 496.816437] 0000:00:19.0: eth0: Detected Tx Unit Hang: [ 496.816438] TDH <7a> [ 496.816439] TDT <e3> [ 496.816440] next_to_use <e3> [ 496.816441] next_to_clean <78> [ 496.816442] buffer_info[next_to_clean]: [ 496.816443] time_stamp <10000bde3> [ 496.816444] next_to_watch <7a> [ 496.816445] jiffies <10000c034> [ 496.816446] next_to_watch.status <0> [ 498.816204] 0000:00:19.0: eth0: Detected Tx Unit Hang: [ 498.816205] TDH <7a> [ 498.816205] TDT <e3> [ 498.816206] next_to_use <e3> [ 498.816207] next_to_clean <78> [ 498.816207] buffer_info[next_to_clean]: [ 498.816208] time_stamp <10000bde3> [ 498.816208] next_to_watch <7a> [ 498.816209] jiffies <10000c228> [ 498.816210] next_to_watch.status <0> Created attachment 20174 [details]
Potential fix
Please could you test with this patch (against an otherwise clean kernel; especially make sure you take the udelay out).
Created attachment 20175 [details]
potential fix, really.
Hm, when it said 'Attach URL' I thought it was going to do something more useful than that. Here's the patch for real this time.
David, I tested your patch against a clean 2.6.29-rc4 tree, using both rwbf_flush and rwbf_iotlb_flush. Neither worked for me; I am still seeing the same issue I reported in https://bugzilla.redhat.com/show_bug.cgi?id=479996#c17. I'm sorry, I just took a look at your patch and realized that the one I had tested (from another Intel engineer via Bhavesh) is different than this one; it did not have the "dummy IOTLB flush only a single page" case. Tomorrow I'll apply your patch, run it through that particular case (looks like it's the default path taken), and see if it helps. That other patch was from me too, and wasn't significantly different. This one just has a couple of minor optimisations (single-page flush, not a whole domain, and don't fall through to doing the ineffective write-buffer flush after doing the IOTLB flush). We thought Bhavesh had said the original workaround _was_ working... Can you make it follow the workaround path _always_, rather than only if (!cap_rwbf(iommu->cap)) ? (In reply to comment #22) > We thought Bhavesh had said the original workaround _was_ working... Yes, the workaround did appear to solve the problem on Bhavesh's laptop (the Thinkpad x200), but unfortunately, not on my own Dell Precision 5400. > Can you make it follow the workaround path _always_, rather than only if > (!cap_rwbf(iommu->cap)) ? Sure, I'll try that out tomorrow as well. (In reply to comment #22) > That other patch was from me too, and wasn't significantly different. This > one > just has a couple of minor optimisations (single-page flush, not a whole > domain, and don't fall through to doing the ineffective write-buffer flush > after doing the IOTLB flush). Pretty much, except that I had changed the quirk param to an explicit intel_iommu_quirk boot time string parameter for my own sanity and testing. > We thought Bhavesh had said the original workaround _was_ working... Yes on my Lenovo x200 and on a Dell Latitude E6400 notebook, but not on a Dell Precision T5400 workstation. > Can you make it follow the workaround path _always_, rather than only if > (!cap_rwbf(iommu->cap)) ? If Adar doesn't get around to this, I can try it on his machine. Also BTW, I tried your patch from comment #19 after backing out my previous patch on 2.6.29-rc4, and it still works on my Lenovo x200 with the default intel_iommu.quirks = 3 (as in no more IOMMU DMA address translation faults and no more e1000e Tx Unit Hangs as a result) (In reply to comment #24) > > Can you make it follow the workaround path _always_, rather than only if > > (!cap_rwbf(iommu->cap)) ? > > If Adar doesn't get around to this, I can try it on his machine. Just a quick clarification on that one: do you want to try the single page flush workaround or the full domain flush workaround? T5400 is a different chipset and any problem there is probably something different. We revert to our normal stance of assuming the BIOS is wrong, and getting that tested would be the first thing to do. To confirm: (Adar, T5400) you reported that adding udelay(2) _did_ fix the problem for you, didn't you? (I don't mind whether you try single-page workaround or full domain flush. They should both have the same effect and I just switched from one to the other as an optimisation, but I don't think that's the issue on the T5400 anyway). (In reply to comment #26) > T5400 is a different chipset and any problem there is probably something > different. We revert to our normal stance of assuming the BIOS is wrong, and > getting that tested would be the first thing to do. Yup. Working out of band to do the chipset diagnostics requested on the T5400 and send along the information to you. > (I don't mind whether you try single-page workaround or full domain flush. > They > should both have the same effect and I just switched from one to the other as > an optimisation, but I don't think that's the issue on the T5400 anyway). I'm going to hold off on that since I agree that this may indeed be a BIOS buggy DMAR ACPI table issue. Unless Adar gets around to trying the workaround any ways... Created attachment 20184 [details] dmesg with A05 bios + 33bfad5 (In reply to comment #19) > Created an attachment (id=20175) [details] > potential fix, really. > > Hm, when it said 'Attach URL' I thought it was going to do something more > useful than that. Here's the patch for real this time. So the e4300 I've been testing on finally had its RMA replacement arrive (RMAed due to a cosmetic flaw with the LCD). The replacement has BIOS A05 versus original's A03 BIOS. With the new BIOS I haven't yet reproduced a full-on device failure (tested with 33bfad5 and 33bfad5+flush-debug.patch), although I am still seeing a fairly steady stream of DMAR errors. I'll try out your fix on the A03 BIOS tomorrow. (In reply to comment #22) > That other patch was from me too, and wasn't significantly different. This > one > just has a couple of minor optimisations (single-page flush, not a whole > domain, and don't fall through to doing the ineffective write-buffer flush > after doing the IOTLB flush). > > We thought Bhavesh had said the original workaround _was_ working... > > Can you make it follow the workaround path _always_, rather than only if > (!cap_rwbf(iommu->cap)) ? I gave this a shot (with rwbf_flush and rwbf_iotlb_flush) and neither seemed to make any difference. (In reply to comment #18) > Please could you test with this patch (against an otherwise clean kernel; > especially make sure you take the udelay out). I tested with your flush-debug.patch on top of 33bfad5 from Linus' tree on the Dell E4300 with A03 BIOS. I still see DMAR errors associated with iwlagn (if I "sudo ifconfig wlan0 down" the DMAR errors stop). But on the plus side, I can successfully transfer large amounts over e1000e without getting any DMAR errors or device lockups. And, although I do see a steady stream of iwlagn-associated DMAR error messages, the iwlagn seems to work OK. I've got acpidump output if you'd like, and note that upgrading to BIOS A05 seems to resolve the device failures (though not the DMAR error messages). So it looks like either your patch or A05 fixes the device failures, and iwlagn has some cosmetic-but-noisy driver DMA bug? iwlagn has a cosmetic-but-noisy driver DMA bug. It writes back to descriptors which ought to be read-only. http://david.woodhou.se/iwlfix has the fix, as does the Fedora rawhide kernel. Created attachment 20227 [details] updated iommu patch Please could you retest with _just_ this patch, and the wireless fix from http://david.woodhou.se/iwlfix We believe that once the wireless bug is out of the way and not confusing us, this patch is all that's actually needed for the IOMMU. Thanks. David, I'm wondering if I should also be using your new patch from comment #32 instead of the older patch (from comment #19) that had 3 quirk modes. I'm also wondering when you plan to send an official patch out for review to the mailing lists. Thanks! (In reply to comment #33) > David, I'm wondering if I should also be using your new patch from comment > #32 > instead of the older patch (from comment #19) that had 3 quirk modes. Yes, you should. > I'm also wondering when you plan to send an official patch out for review to > > the mailing lists. Thanks! Just as soon as you tell me it works for you :) I tested with the patch in comment #32, and it works as well as the patch in comment #19 with quirk mode 1: I don't get any more "PTE Read" faults from the IOMMU for DMA's from the e1000e NIC, but I still get the one "PTE Write" fault from the IOMMU for the VGA controller on my Lenovo x200: [ 0.216043] DMAR: Forcing write-buffer flush capability [ 0.928007] DMAR:[DMA Write] Request device [00:02.0] fault addr feff5e000 [ 0.928007] DMAR:[fault reason 05] PTE Write access is not set 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) I noticed that you don't flush the IOTLB any more in this new patch. Right, we don't flush the IOTLB any more. We don't believe that's actually necessary. You were still seeing the single VGA fault at startup, even when you had the more paranoid version of the patch which _did_ flush the IOTLB... weren't you? I think that's a separate bug. Or just that the graphics engine is still doing something when the kernel starts up, and we force it to stop (I see something similar from the USB controllers on my test box). Actually, the VGA DMA fault had disappeared with the most aggressive of your patches: always flush the IOTLB for the whole domain. Across multiple boots. I suspect that's a separate issue, and it's just coincidence that it goes away. Can you confirm my suspicion that you have CONFIG_DMAR_GFX_WA set anyway? Which means that we're giving direct 1:1 access to the VGA device and its mappings aren't taking this code path anyway? I suspect it's just a timing thing. Will consult... (In reply to comment #38) > Can you confirm my suspicion that you have CONFIG_DMAR_GFX_WA set anyway? > Which > means that we're giving direct 1:1 access to the VGA device and its mappings > aren't taking this code path anyway? Yup: CONFIG_DMAR_GFX_WA=y Using the patch that includes quirk for iommu_flush_write_buffer plus flushes before flush_context() I still see: [ 0.157017] Cantiga chipset detected; enabling DMAR workarounds ... [ 0.742004] DMAR:[DMA Write] Request device [00:02.0] fault addr ffffff000 [ 0.742005] DMAR:[fault reason 05] PTE Write access is not set OK, that reassures me that it's a separate issue, and if it does go away for Bhavesh it's just a timing or some kind of coincidence. There's just no reason why the graphics chipset should be writing to main memory at this stage in the boot anyway; I think it's got to be an issue with the way the BIOS has set it up, and the IOMMU is _supposed_ to be blocking it. FWIW, this DMA Write fault from the VGA device doesn't cause any ill side effects later any way, so is not as serious of an issue as those other DMA failures that used to happen much later at boot. But I don't understand your claim: "the IOMMU is _supposed_ to be blocking it". Without an IOMMU in the picture, the device *was* accessing main memory and nobody was stopping it, right? Why the device was accessing main memory in the first place is beyond me, though. Created attachment 20233 [details] dmesg with A03, 33bfad5+iwlfix+rwbf-3.patch (In reply to comment #32) > Created an attachment (id=20227) [details] > updated iommu patch > > Please could you retest with _just_ this patch, and the wireless fix from > http://david.woodhou.se/iwlfix > > We believe that once the wireless bug is out of the way and not confusing us, > this patch is all that's actually needed for the IOMMU. I took 33bfad5 and applied iwlfix and rwbf-3.patch on A03 BIOS. The resulting kernel boots and seems to have fixed all the DMAR errors. I've attached the complete dmesg. So it looks like these two patches resolve the issue completely. Thanks! -andy I believe the address it was accessing was marked as 'reserved' in your E820 memory map from the BIOS? It is for Chris. So it's not necessarily 'main memory' according to the kernel. Perhaps the BIOS should have included it in an RMRR entry, to 'whitelist' accesses to that address? The graphic chip has a 'hardware status page' which it can write to; we don't believe that the IOMMU is just _inventing_ a DMA write transaction. :) Normally we'd expect that this status page would be disabled while the kernel is booting. But who are we to guess what the BIOS is playing at? [ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) [ 0.000000] BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved) [ 0.000000] BIOS-e820: 0000000100000000 - 000000013c000000 (usable) [ 0.928007] DMAR:[DMA Write] Request device [00:02.0] fault addr feff5e000 There's no entry for feff5e000 in the E820 map. I wouldn't have expected the BIOS to be doing anything at the time this fault happened, but that's just BSing around without actually looking at the code. And no, I wasn't implying that the IOMMU made up that DMA write transaction: I was just wondering what the side effects of denying a DMA write from the VGA device with the IOMMU in the middle would be. Patch submitted to Linus: https://lists.linux-foundation.org/pipermail/iommu/2009-February/001117.html Here's an update on my situation (with the Dell Precision T5400). Having had no luck booting with David's patches, I tried a more drastic approach. Suspecting some sort of bug in Fusion drivers (for my SAS controller), I removed the controller from my system and plugged a SATA drive directly into the motherboard's SATA headers. I installed Ubuntu 8.10 on this drive, cloned linux-2.6, built a kernel with CONFIG_DMAR_*=y, and booted into it. I was able to successfully boot and the system appears to be quite usable (X was successfully started, I can access the network, etc.). I do see a lot of DMAR "Present bit in root entry is clear" faults in my kernel log, and they appear to be related to the ixgbe driver powering the Oplin dual-port NIC in my system. I'm not using that NIC for anything, so I unbound the driver from the two devices and the DMAR fault messages have stopped. The system appears to still be OK, and I've attached the log with the messages. Created attachment 20292 [details]
2.6.29-rc5 kernel log containing DMAR faults from ixgbe
David, any thoughts on the ixgbe issue? Perhaps it's indicative of a bug in the driver? Was the ixgbe issue fixed by http://git.kernel.org/linus/924b6231edfaf1e764ffb4f97ea382bf4facff58 ? (In reply to comment #50) > Was the ixgbe issue fixed by > http://git.kernel.org/linus/924b6231edfaf1e764ffb4f97ea382bf4facff58 ? I suspect it was, as that commit fixed nearly all of my DMAR fault issues. I can't confirm, though, because the card has since departed from my system. Thanks. Let's close this bug then -- you know where to find me... |