Bug 12578

Summary: DMAR errors and driver instability
Product: Platform Specific/Hardware Reporter: Andy Isaacson (adi)
Component: x86-64Assignee: David Woodhouse (dwmw2)
Status: RESOLVED CODE_FIX    
Severity: normal CC: adar, antonf, bhavesh, chrisw, markmc
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: test patch
another test patch
dmesg with dmar-coherency-debug.patch
dmesg with test patch
further debugging patch (incremental)
dmesg with debug-me-harder.patch
dmesg.2.6.28-dmar-00001-g8aeaf45
Potential fix
potential fix, really.
dmesg with A05 bios + 33bfad5
updated iommu patch
dmesg with A03, 33bfad5+iwlfix+rwbf-3.patch
2.6.29-rc5 kernel log containing DMAR faults from ixgbe

Description Andy Isaacson 2009-01-29 15:13:06 UTC
Latest working kernel version: unknown
Earliest failing kernel version: 2.6.28, earlier?
Distribution: Ubuntu 8.10
Hardware Environment: Dell Latitude E4300
Software Environment:
Problem Description: CONFIG_DMAR results in many "[DMA Write] Request ... fault" messages, and failures in iwlagn and e1000e

Steps to reproduce: build 18e352e Linux 2.6.29-rc3 with CONFIG_DMAR=y, boot and observe:

[    0.000000] Linux version 2.6.29-rc3-dmar (adi@philipl-e4300) (gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11) ) #3 SMP Thu Jan 29 14:48:40 PST 2009
[    0.000000] Command line: root=/dev/sda5
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
[    0.000000]  BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000dd04d400 (usable)
[    0.000000]  BIOS-e820: 00000000dd04d400 - 00000000dd04f400 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000dd04f400 - 00000000e0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed18000 - 00000000fed1c000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
[    0.000000]  BIOS-e820: 00000000feda0000 - 00000000feda6000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffe60000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 000000011c000000 (usable)
[    0.000000] DMI 2.4 present.
[    0.000000] last_pfn = 0x11c000 max_arch_pfn = 0x100000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] last_pfn = 0xdd04d max_arch_pfn = 0x100000000
[    0.000000] init_memory_mapping: 0000000000000000-00000000dd04d000
[    0.000000] last_map_addr: dd04d000 end: dd04d000
[    0.000000] init_memory_mapping: 0000000100000000-000000011c000000
[    0.000000] last_map_addr: 11c000000 end: 11c000000
[    0.000000] RAMDISK: 37bed000 - 37fefadd
[    0.000000] ACPI: RSDP 000FB9F0, 0024 (r2 DELL  )
[    0.000000] ACPI: XSDT DD051E00, 0074 (r1 DELL    M09     27D80A0A ASL        61)
[    0.000000] ACPI: FACP DD051C9C, 00F4 (r4 DELL    M09     27D80A0A ASL        61)
[    0.000000] ACPI Warning (tbfadt-0568): 32/64X length mismatch in Gpe0Block: 128/64 [20081204]
[    0.000000] FADT: X_PM1a_EVT_BLK.bit_width (16) does not match PM1_EVT_LEN (4)
[    0.000000] ACPI: DSDT DD052400, 64B2 (r2 INT430 SYSFexxx     1001 INTL 20050624)
[    0.000000] ACPI: FACS DD060C00, 0040
[    0.000000] ACPI: HPET DD051F00, 0038 (r1 DELL    M09            1 ASL        61)
[    0.000000] ACPI: DMAR DD060400, 0120 (r1 DELL    M09     27D80A0A ASL        61)
[    0.000000] ACPI: APIC DD052000, 0068 (r1 DELL    M09     27D80A0A ASL        47)
[    0.000000] ACPI: ASF! DD051C00, 006A (r32 DELL    M09     27D80A0A ASL        61)
[    0.000000] ACPI: MCFG DD051FC0, 003E (r16 DELL    M09     27D80A0A ASL        61)
[    0.000000] ACPI: SLIC DD05209C, 0176 (r1 DELL    M09     27D80A0A ASL        61)
[    0.000000] ACPI: TCPA DD052300, 0032 (r1                        0 ASL         0)
[    0.000000] ACPI: BOOT DD051BC0, 0028 (r1 DELL    M09     27D80A0A ASL        61)
[    0.000000] ACPI: SSDT DD050331, 066C (r1  PmRef    CpuPm     3000 INTL 20050624)

...

[    0.408471] DMAR:Host address width 36  
[    0.408575] DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed10000
[    0.408688] DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed11000
[    0.408799] DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed12000
[    0.408909] DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed13000
[    0.409020] DMAR:RMRR base: 0x00000000dd7e7000 end: 0x00000000dd7fffff
[    0.409129] DMAR:RMRR base: 0x00000000ddc00000 end: 0x00000000dfffffff
[    0.409296] IOMMU 0xfed12000: using Register based invalidation
[    0.409403] IOMMU 0xfed11000: using Register based invalidation
[    0.409510] IOMMU 0xfed10000: using Register based invalidation
[    0.409617] IOMMU 0xfed13000: using Register based invalidation
[    0.409724] IOMMU: Setting identity map for device 0000:00:02.0 [0xddc00000 - 0xe0000000]
[    0.409754] IOMMU: Setting identity map for device 0000:00:02.1 [0xddc00000 - 0xe0000000]
[    0.412569] IOMMU: Setting identity map for device 0000:00:1d.0 [0xdd7e7000 - 0xdd800000]
[    0.412763] IOMMU: Setting identity map for device 0000:00:1d.1 [0xdd7e7000 - 0xdd800000]
[    0.412953] IOMMU: Setting identity map for device 0000:00:1d.2 [0xdd7e7000 - 0xdd800000]
[    0.413146] IOMMU: Setting identity map for device 0000:00:1d.7 [0xdd7e7000 - 0xdd800000]
[    0.413336] IOMMU: Setting identity map for device 0000:00:1a.0 [0xdd7e7000 - 0xdd800000]
[    0.413526] IOMMU: Setting identity map for device 0000:00:1a.1 [0xdd7e7000 - 0xdd800000]
[    0.413716] IOMMU: Setting identity map for device 0000:00:1a.2 [0xdd7e7000 - 0xdd800000]
[    0.413906] IOMMU: Setting identity map for device 0000:00:1a.7 [0xdd7e7000 - 0xdd800000]
[    0.414099] IOMMU: gfx device 0000:00:02.0 1-1 mapping
[    0.414205] IOMMU: Setting identity map for device 0000:00:02.0 [0x0 - 0x9b000]
[    0.414423] IOMMU: Setting identity map for device 0000:00:02.0 [0x100000 - 0xdd04d000]
[    0.674461] IOMMU: Setting identity map for device 0000:00:02.0 [0x100000000 - 0x11c000000]
[    0.707494] IOMMU: gfx device 0000:00:02.1 1-1 mapping
[    0.707603] IOMMU: Setting identity map for device 0000:00:02.1 [0x0 - 0x9b000]
[    0.707823] IOMMU: Setting identity map for device 0000:00:02.1 [0x100000 - 0xdd04d000]
[    0.967825] IOMMU: Setting identity map for device 0000:00:02.1 [0x100000000 - 0x11c000000]
[    1.000894] IOMMU: Prepare 0-16M unity mapping for LPC
[    1.001004] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000] 
[    1.036022] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
[    1.036155] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
[    1.036464] hpet0: 4 comparators, 64-bit 14.318180 MHz counter

...

[   19.386279] iwlagn 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[   19.386352] iwlagn 0000:0c:00.0: restoring config space at offset 0x1 (was 0x100102, writing 0x100106)
[   19.386440] iwlagn 0000:0c:00.0: irq 34 for MSI/MSI-X
[   19.386497] iwlagn 0000:0c:00.0: firmware: requesting iwlwifi-5000-1.ucode
[   19.440326] iwlagn loaded firmware version 5.4.1.16
[   19.457483] DMAR:[DMA Write] Request device [0c:00.0] fault addr ff9df000 
[   19.457484] DMAR:[fault reason 05] PTE Write access is not set
[   19.457583] DMAR:[DMA Write] Request device [0c:00.0] fault addr ff9dd000 
[   19.457584] DMAR:[fault reason 05] PTE Write access is not set
[   19.596507] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbe3000 
[   19.596509] DMAR:[fault reason 05] PTE Write access is not set
[   19.597114] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbe1000 
[   19.597115] DMAR:[fault reason 05] PTE Write access is not set
[   19.597191] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbe0000 
[   19.597192] DMAR:[fault reason 05] PTE Write access is not set
[   19.597269] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbdf000 
[   19.597270] DMAR:[fault reason 05] PTE Write access is not set
[   19.597341] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbde000 
[   19.597342] DMAR:[fault reason 05] PTE Write access is not set
[   19.597410] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbdc000 
[   19.597411] DMAR:[fault reason 05] PTE Write access is not set
[   19.599351] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbd9000 
[   19.599352] DMAR:[fault reason 05] PTE Write access is not set
[   19.599582] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbd8000 
[   19.599583] DMAR:[fault reason 05] PTE Write access is not set
[   19.599607] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbd7000 
[   19.599608] DMAR:[fault reason 05] PTE Write access is not set
[   19.599889] Registered led device: iwl-phy0:radio
[   19.599902] Registered led device: iwl-phy0:assoc
[   19.599915] Registered led device: iwl-phy0:RX
[   19.599928] Registered led device: iwl-phy0:TX
[   19.599940] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbd4000 
[   19.599941] DMAR:[fault reason 05] PTE Write access is not set
[   19.599945] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbd3000 
[   19.599946] DMAR:[fault reason 05] PTE Write access is not set
[   19.603489] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff8c000 
[   19.603490] DMAR:[fault reason 05] PTE Write access is not set
[   19.603532] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff8b000 
[   19.603533] DMAR:[fault reason 05] PTE Write access is not set
[   19.603549] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff88000 
[   19.603550] DMAR:[fault reason 05] PTE Write access is not set
[   19.603944] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff86000 
[   19.603945] DMAR:[fault reason 05] PTE Write access is not set
[   19.603978] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff85000 

(note that after the above and some more DMAR messages, iwlagn still works at least well enough to associate with an AP and ssh out!)

Then later attempt to rsync hundreds of MB over eth0 (e1000e) and it falls over:

[  269.369656] DMAR:[fault reason 05] PTE Write access is not set
[  269.779281] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff8d000 
[  269.779283] DMAR:[fault reason 05] PTE Write access is not set
[  269.880362] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff8d000 
[  269.880364] DMAR:[fault reason 05] PTE Write access is not set
[  269.982169] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff8d000 
[  269.982171] DMAR:[fault reason 05] PTE Write access is not set
[  270.394169] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd5b000 
[  270.394170] DMAR:[fault reason 05] PTE Write access is not set
[  271.927601] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffdbf000 
[  271.927603] DMAR:[fault reason 05] PTE Write access is not set
[  272.234770] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffdbf000 
[  272.234771] DMAR:[fault reason 05] PTE Write access is not set
[  272.337496] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff1f000 
[  272.337498] DMAR:[fault reason 05] PTE Write access is not set
[  273.259504] DMAR:[DMA Read] Request device [00:19.0] fault addr fff6c000 
[  273.259505] DMAR:[fault reason 06] PTE Read access is not set
[  274.816140] 0000:00:19.0: eth0: Detected Tx Unit Hang:
[  274.816141]   TDH                  <1a>
[  274.816148]   TDT                  <3>
[  274.816148]   next_to_use          <3>
[  274.816149]   next_to_clean        <17>
[  274.816150] buffer_info[next_to_clean]:
[  274.816150]   time_stamp           <ffffe5e2>
[  274.816151]   next_to_watch        <1a>
[  274.816152]   jiffies              <ffffe768>
[  274.816152]   next_to_watch.status <0>
[  275.307808] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd27000 
[  275.307810] DMAR:[fault reason 05] PTE Write access is not set
[  276.126152] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff0f000 
[  276.126154] DMAR:[fault reason 05] PTE Write access is not set
[  276.816323] 0000:00:19.0: eth0: Detected Tx Unit Hang:
[  276.816325]   TDH                  <1a>
[  276.816325]   TDT                  <3>
[  276.816326]   next_to_use          <3>
[  276.816326]   next_to_clean        <17>
[  276.816327] buffer_info[next_to_clean]:
[  276.816328]   time_stamp           <ffffe5e2>
[  276.816328]   next_to_watch        <1a>
[  276.816329]   jiffies              <ffffe95c>
[  276.816330]   next_to_watch.status <0>
[  278.788571] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd5f000 
[  278.788573] DMAR:[fault reason 05] PTE Write access is not set
[  278.816330] 0000:00:19.0: eth0: Detected Tx Unit Hang:
[  278.816331]   TDH                  <1a>
[  278.816331]   TDT                  <3>
[  278.816332]   next_to_use          <3>
[  278.816333]   next_to_clean        <17>
[  278.816333] buffer_info[next_to_clean]:
[  278.816334]   time_stamp           <ffffe5e2>
[  278.816335]   next_to_watch        <1a>
[  278.816335]   jiffies              <ffffeb50>
[  278.816336]   next_to_watch.status <0>
[  279.300570] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff17000 
[  279.300572] DMAR:[fault reason 05] PTE Write access is not set
[  280.324596] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffbef000 
[  280.324598] DMAR:[fault reason 05] PTE Write access is not set
[  280.631797] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff1f000 
[  280.631799] DMAR:[fault reason 05] PTE Write access is not set
[  280.816304] 0000:00:19.0: eth0: Detected Tx Unit Hang:
[  280.816305]   TDH                  <1a>
[  280.816306]   TDT                  <3>
[  280.816307]   next_to_use          <3>
[  280.816308]   next_to_clean        <17>
[  280.816309] buffer_info[next_to_clean]:
[  280.816310]   time_stamp           <ffffe5e2>
[  280.816316]   next_to_watch        <1a>
[  280.816317]   jiffies              <ffffed44>
[  280.816318]   next_to_watch.status <0>
[  280.939243] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff1f000 
[  280.939245] DMAR:[fault reason 05] PTE Write access is not set
[  281.860614] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd4b000 
[  281.860616] DMAR:[fault reason 05] PTE Write access is not set
[  282.372661] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff23000 
[  282.372662] DMAR:[fault reason 05] PTE Write access is not set
[  282.816260] ------------[ cut here ]------------
[  282.816263] WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0xcd/0x184()
[  282.816266] Hardware name: Latitude E4300                  
[  282.816268] NETDEV WATCHDOG: eth0 (e1000e): transmit timed out
[  282.816271] Modules linked in: pci_slot sbp2 btusb iwlagn iwlcore rfkill lib80211 button battery ac ohci1394 ieee1394 e1000e thermal fan
[  282.816288] Pid: 0, comm: swapper Not tainted 2.6.29-rc3-dmar #3
[  282.816291] Call Trace:
[  282.816293]  <IRQ>  [<ffffffff81044947>] warn_slowpath+0xd3/0x10f
[  282.816305]  [<ffffffff81486470>] ? _spin_lock_irqsave+0x36/0x3f
[  282.816316]  [<ffffffff8105f616>] ? getnstimeofday+0x58/0xb4
[  282.816318]  [<ffffffff81486550>] ? _spin_lock+0x17/0x1a
[  282.816320]  [<ffffffff81392227>] ? netif_tx_lock+0x72/0x8c
[  282.816323]  [<ffffffff81392241>] ? dev_watchdog+0x0/0x184
[  282.816325]  [<ffffffff8139230e>] dev_watchdog+0xcd/0x184
[  282.816328]  [<ffffffff81392241>] ? dev_watchdog+0x0/0x184
[  282.816331]  [<ffffffff8104dc85>] run_timer_softirq+0x1a3/0x232
[  282.816334]  [<ffffffff81049e31>] __do_softirq+0x8a/0x151
[  282.816338]  [<ffffffff810127dc>] call_softirq+0x1c/0x30
[  282.816340]  [<ffffffff81013860>] do_softirq+0x44/0x8f
[  282.816342]  [<ffffffff81049bb3>] irq_exit+0x3f/0x79
[  282.816345]  [<ffffffff81022e5b>] smp_apic_timer_interrupt+0x93/0xac
[  282.816348]  [<ffffffff810121b3>] apic_timer_interrupt+0x13/0x20
[  282.816350]  <EOI> <4>---[ end trace 6168acfc226aca2d ]---
[  282.912504] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff23000 
[  282.912505] DMAR:[fault reason 05] PTE Write access is not set
[  283.601446] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffb2b000 
[  283.601448] DMAR:[fault reason 05] PTE Write access is not set
[  283.703850] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffb2b000 
[  283.703852] DMAR:[fault reason 05] PTE Write access is not set
[  284.727905] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff0f000 
[  284.727907] DMAR:[fault reason 05] PTE Write access is not set
[  285.651214] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd1f000 
[  285.651216] DMAR:[fault reason 05] PTE Write access is not set
[  285.992987] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  288.004772] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff17000 
[  288.004773] DMAR:[fault reason 05] PTE Write access is not set
[  288.108245] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff17000 
[  288.108246] DMAR:[fault reason 05] PTE Write access is not set
[  289.438590] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd0b000 
[  289.438591] DMAR:[fault reason 05] PTE Write access is not set
[  289.643496] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd0b000 
[  289.643498] DMAR:[fault reason 05] PTE Write access is not set
[  290.871922] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffdbf000 
[  290.871924] DMAR:[fault reason 05] PTE Write access is not set
[  291.804344] 0000:00:19.0: eth0: Detected Tx Unit Hang:
[  291.804345]   TDH                  <0>
[  291.804346]   TDT                  <1>
[  291.804346]   next_to_use          <1>
[  291.804347]   next_to_clean        <0>
[  291.804347] buffer_info[next_to_clean]:
[  291.804348]   time_stamp           <fffff59a>
[  291.804349]   next_to_watch        <0>
[  291.804349]   jiffies              <fffff7ff>
[  291.804350]   next_to_watch.status <0>
[  292.817665] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd43000 
[  292.817667] DMAR:[fault reason 05] PTE Write access is not set
[  293.124854] DMAR:[DMA Write] Request device [0c:00.0] fault addr ffd43000 
[  293.124856] DMAR:[fault reason 05] PTE Write access is not set
[  293.227634] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff0f000 
[  293.227636] DMAR:[fault reason 05] PTE Write access is not set
[  294.148916] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff0f000 
[  294.148918] DMAR:[fault reason 05] PTE Write access is not set
[  294.251238] DMAR:[DMA Write] Request device [0c:00.0] fault addr fff0f000 
[  294.251240] DMAR:[fault reason 05] PTE Write access is not set

Similar failures on iwlagn, but they're somewhat harder to reproduce since I have limited wifi bandwidth available.
Comment 1 Andy Isaacson 2009-01-29 16:24:50 UTC
ah, neglected to mention that both e1000e and iwlagn are rock solid (GB of transfer without problem) with CONFIG_DMAR=n.
Comment 2 David Woodhouse 2009-01-29 16:38:36 UTC
There are classes of driver bug which wouldn't show up, so one thing to try would be the DMA API debug infrastructure posted by Jörg Rödel recently:
https://lists.linux-foundation.org/pipermail/iommu/2009-January/000944.html

That would eliminate the majority of possible driver bugs. I should be home early next week and able to look into this in detail then.
Comment 3 Andy Isaacson 2009-01-29 18:02:43 UTC
Linking https://bugzilla.redhat.com/show_bug.cgi?id=479996 as there are some useful hints there (another workaround is to boot with intel_iommu=off).
Comment 4 Mark McLoughlin 2009-01-30 00:37:47 UTC
(In reply to comment #2)
> There are classes of driver bug which wouldn't show up, so one thing to try
> would be the DMA API debug infrastructure posted by Jörg Rödel recently:
> https://lists.linux-foundation.org/pipermail/iommu/2009-January/000944.html
> 
> That would eliminate the majority of possible driver bugs. I should be home
> early next week and able to look into this in detail then.

Adar tried DMA_API_DEBUG and it didn't show anything up:

  https://bugzilla.redhat.com/479996#c22
Comment 5 David Woodhouse 2009-02-05 01:07:33 UTC
Created attachment 20120 [details]
test patch

Can you try this debug patch, just in case it shows anything interesting?
(It highlights a real bug in domain_update_iommu_coherency() where we were previously setting domain->iommu_coherency to 1 temporarily, but I don't think that's very likely to be the problem here unless you have lots of hotplug events).
Comment 6 David Woodhouse 2009-02-05 03:22:13 UTC
Created attachment 20123 [details]
another test patch

This separate test patch applies on top of Jörg's dma api debugging infrastructure, and it will dump the kernel's idea of which regions are mapped for DMA, when the fault happens.

If the kernel thinks that the faulting address _should_ be mapped, I'll do a further patch which will take a closer look at the actual DMA page tables...
Comment 7 Bhavesh Davda 2009-02-05 10:05:13 UTC
(In reply to comment #6)
> If the kernel thinks that the faulting address _should_ be mapped, I'll do a
> further patch which will take a closer look at the actual DMA page tables...

That would be quite handy! And FWIW, I tried something similar to your patch that ignores the coherency bit in the iommu ecap register and always clflushes the PTEs, but no luck. See https://bugzilla.redhat.com/show_bug.cgi?id=479996 comment #37.

 
Comment 8 Andy Isaacson 2009-02-05 14:10:00 UTC
Created attachment 20127 [details]
dmesg with dmar-coherency-debug.patch

The patch (attachment 20120 [details]) didn't change the failure, but there is a bit more debug output now.
Comment 9 Andy Isaacson 2009-02-05 14:43:42 UTC
Created attachment 20130 [details]
dmesg with test patch

It'd be easiest for me if you simply provide a git pull URL, that way there's no confusion as to where I should apply the patch...

I applied attachment 20123 [details] on top of 5b4ea830 (the tip of joro/linux-2.6-iommu.git dma-api/debug) resulting in my local 19c691b and captured the attached dmesg (I only enclosed the first trace -- the total kernel log for this boot is 23 MB!).  Note some interesting printk buffer corruption around 18.167980.
Comment 10 David Woodhouse 2009-02-05 14:59:01 UTC
You applied it to the correct tree; thanks.

Looks like I should refine the debug patch a little -- to dump only entries for the affected device, and perhaps some locking to prevent simultaneous dumps.

The first one looks quite simple though. From:

[   18.165699] DMAR:[DMA Write] Request device [0c:00.0] fault addr ff9df000 
[   18.165700] DMAR:[fault reason 05] PTE Write access is not set
 ... to ...
[   18.167389] e1000e 0000:00:19.0: coherent idx 255 P=1189d5000 D=fffff000 L=1000 DMA_BIDIRECTIONAL

It doesn't look like 0xFF9DF000 is actually supposed to be mapped. The closest entries in the list are 

[   18.166928] e1000e 0000:00:19.0: single idx 206 P=115949012 D=fff9d012 L=5f2 DMA_FROM_DEVICE
 ... and ...
[   18.167155] iwlagn 0000:0c:00.0: single idx 239 P=118c51800 D=ff9de800 L=400 DMA_TO_DEVICE

At first glance, unless the list is lossy for some reason, I'm inclined to suspect that the fault here is with the driver -- is it DMAing into a buffer it's already unmapped?
Comment 11 David Woodhouse 2009-02-05 15:08:21 UTC
The second one, however...

[   18.167459] DMAR:[DMA Write] Request device [0c:00.0] fault addr ff9dd000 
[   18.167460] DMAR:[fault reason 05] PTE Write access is not set
[   18.167700] iwlagn 0000:0c:00.0: single idx 0 P=114ed0000 D=ffe00000 L=2100 DMA_FROM_DEVICE
 ...
[   18.327515] iwlagn 0000:0c:00.0: single idx 238 P=114860000 D=ff9dc000 L=2100 DMA_FROM_DEVICE
 ...

0xff9dc000 + 0x2100 is 0xff9de100, so a DMA write at ff9dd000 ought to be within that range, and permitted by the IOMMU. I'll knock up something which verifies that the CPU's view of the actual page tables would allow that.

And if _that_ looks OK, we're back to muttering nasty words about cache-incoherency.
Comment 12 David Woodhouse 2009-02-05 15:59:09 UTC
Created attachment 20131 [details]
further debugging patch (incremental)

OK, this patch ought to dump the actual iommu page table entry for the offending address, when a fault happens. I say 'ought'. I haven't tested it here, and it's about 5 minutes to Friday. It does at least build.
Comment 13 David Woodhouse 2009-02-05 16:39:33 UTC
On the topic of 'nasty words about cache-incoherency'... can you try Bhavesh's patch at https://bugzilla.redhat.com/show_bug.cgi?id=479996#c41
Comment 14 Bhavesh Davda 2009-02-05 16:44:45 UTC
Just to clarify: my patch is against 2.6.28 vanilla. But you would have figured that out :)
Comment 15 Andy Isaacson 2009-02-06 12:04:41 UTC
Created attachment 20140 [details]
dmesg with debug-me-harder.patch

The first 10,000 lines of dmesg output with debug-me-harder.patch.  Note that I didn't actually notice any device failures on this boot.  The kernel is 5b4ea830 + attachment 20123 [details] + attachment 20131 [details] (debug-me-harder.patch).
Comment 16 Andy Isaacson 2009-02-06 14:26:20 UTC
Created attachment 20141 [details]
dmesg.2.6.28-dmar-00001-g8aeaf45

(In reply to comment #14)
> Just to clarify: my patch is against 2.6.28 vanilla. But you would have
> figured
> that out :)

Indeed. :)  The udelay did *not* fix the problem for me, please find attached a dmesg with udelay.  This is upstream 2.6.28, 4a6908a3 plus:

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 3d017cf..409d1fc 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -317,6 +317,7 @@ static inline void __iommu_flush_cache(
 {
        if (!ecap_coherent(iommu->ecap))
                clflush_cache_range(addr, size);
+       udelay(2);
 }
 
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
Comment 17 Andy Isaacson 2009-02-06 15:32:34 UTC
Increasing it to udelay(20) made my boot take a lot longer (about 61 seconds to userland) but I can still reproduce similar failures:

[  494.448749] DMAR:[DMA Read] Request device [00:19.0] fault addr ffea5000 
[  494.448750] DMAR:[fault reason 06] PTE Read access is not set
[  496.816437] 0000:00:19.0: eth0: Detected Tx Unit Hang:
[  496.816438]   TDH                  <7a>
[  496.816439]   TDT                  <e3>
[  496.816440]   next_to_use          <e3>
[  496.816441]   next_to_clean        <78>
[  496.816442] buffer_info[next_to_clean]:
[  496.816443]   time_stamp           <10000bde3>
[  496.816444]   next_to_watch        <7a>
[  496.816445]   jiffies              <10000c034>
[  496.816446]   next_to_watch.status <0>
[  498.816204] 0000:00:19.0: eth0: Detected Tx Unit Hang:
[  498.816205]   TDH                  <7a>
[  498.816205]   TDT                  <e3>
[  498.816206]   next_to_use          <e3>
[  498.816207]   next_to_clean        <78>
[  498.816207] buffer_info[next_to_clean]:
[  498.816208]   time_stamp           <10000bde3>
[  498.816208]   next_to_watch        <7a>
[  498.816209]   jiffies              <10000c228>
[  498.816210]   next_to_watch.status <0>
Comment 18 David Woodhouse 2009-02-10 00:35:49 UTC
Created attachment 20174 [details]
Potential fix

Please could you test with this patch (against an otherwise clean kernel; especially make sure you take the udelay out).
Comment 19 David Woodhouse 2009-02-10 00:37:41 UTC
Created attachment 20175 [details]
potential fix, really.

Hm, when it said 'Attach URL' I thought it was going to do something more useful than that. Here's the patch for real this time.
Comment 20 Adar Dembo 2009-02-10 01:15:08 UTC
David, I tested your patch against a clean 2.6.29-rc4 tree, using both rwbf_flush and rwbf_iotlb_flush. Neither worked for me; I am still seeing the same issue I reported in https://bugzilla.redhat.com/show_bug.cgi?id=479996#c17.
Comment 21 Adar Dembo 2009-02-10 01:19:42 UTC
I'm sorry, I just took a look at your patch and realized that the one I had tested (from another Intel engineer via Bhavesh) is different than this one; it did not have the "dummy IOTLB flush only a single page" case.

Tomorrow I'll apply your patch, run it through that particular case (looks like it's the default path taken), and see if it helps.
Comment 22 David Woodhouse 2009-02-10 01:48:07 UTC
That other patch was from me too, and wasn't significantly different. This one just has a couple of minor optimisations (single-page flush, not a whole domain, and don't fall through to doing the ineffective write-buffer flush after doing the IOTLB flush).

We thought Bhavesh had said the original workaround _was_ working...

Can you make it follow the workaround path _always_, rather than only if (!cap_rwbf(iommu->cap)) ? 
Comment 23 Adar Dembo 2009-02-10 01:55:41 UTC
(In reply to comment #22)
> We thought Bhavesh had said the original workaround _was_ working...

Yes, the workaround did appear to solve the problem on Bhavesh's laptop (the Thinkpad x200), but unfortunately, not on my own Dell Precision 5400.
 
> Can you make it follow the workaround path _always_, rather than only if
> (!cap_rwbf(iommu->cap)) ? 

Sure, I'll try that out tomorrow as well.
Comment 24 Bhavesh Davda 2009-02-10 08:52:54 UTC
(In reply to comment #22)
> That other patch was from me too, and wasn't significantly different. This
> one
> just has a couple of minor optimisations (single-page flush, not a whole
> domain, and don't fall through to doing the ineffective write-buffer flush
> after doing the IOTLB flush).

Pretty much, except that I had changed the quirk param to an explicit intel_iommu_quirk boot time string parameter for my own sanity and testing.

> We thought Bhavesh had said the original workaround _was_ working...

Yes on my Lenovo x200 and on a Dell Latitude E6400 notebook, but not on a Dell Precision T5400 workstation.

> Can you make it follow the workaround path _always_, rather than only if
> (!cap_rwbf(iommu->cap)) ? 

If Adar doesn't get around to this, I can try it on his machine.
Comment 25 Bhavesh Davda 2009-02-10 08:59:52 UTC
Also BTW, I tried your patch from comment #19 after backing out my previous patch on 2.6.29-rc4, and it still works on my Lenovo x200 with the default intel_iommu.quirks = 3 (as in no more IOMMU DMA address translation faults and no more e1000e Tx Unit Hangs as a result)


(In reply to comment #24)
> > Can you make it follow the workaround path _always_, rather than only if
> > (!cap_rwbf(iommu->cap)) ? 
> 
> If Adar doesn't get around to this, I can try it on his machine.

Just a quick clarification on that one: do you want to try the single page flush workaround or the full domain flush workaround?
Comment 26 David Woodhouse 2009-02-10 09:20:47 UTC
T5400 is a different chipset and any problem there is probably something different. We revert to our normal stance of assuming the BIOS is wrong, and getting that tested would be the first thing to do.

To confirm: (Adar, T5400) you reported that adding udelay(2) _did_ fix the problem for you, didn't you?

(I don't mind whether you try single-page workaround or full domain flush. They should both have the same effect and I just switched from one to the other as an optimisation, but I don't think that's the issue on the T5400 anyway).
Comment 27 Bhavesh Davda 2009-02-10 09:25:25 UTC
(In reply to comment #26)
> T5400 is a different chipset and any problem there is probably something
> different. We revert to our normal stance of assuming the BIOS is wrong, and
> getting that tested would be the first thing to do.

Yup. Working out of band to do the chipset diagnostics requested on the T5400 and send along the information to you.

> (I don't mind whether you try single-page workaround or full domain flush.
> They
> should both have the same effect and I just switched from one to the other as
> an optimisation, but I don't think that's the issue on the T5400 anyway).

I'm going to hold off on that since I agree that this may indeed be a BIOS buggy DMAR ACPI table issue. Unless Adar gets around to trying the workaround any ways...
Comment 28 Andy Isaacson 2009-02-10 13:42:17 UTC
Created attachment 20184 [details]
dmesg with A05 bios + 33bfad5

(In reply to comment #19)
> Created an attachment (id=20175) [details]
> potential fix, really.
> 
> Hm, when it said 'Attach URL' I thought it was going to do something more
> useful than that. Here's the patch for real this time.

So the e4300 I've been testing on finally had its RMA replacement arrive (RMAed due to a cosmetic flaw with the LCD).  The replacement has BIOS A05 versus original's A03 BIOS.  With the new BIOS I haven't yet reproduced a full-on device failure (tested with 33bfad5 and 33bfad5+flush-debug.patch), although I am still seeing a fairly steady stream of DMAR errors.  I'll try out your fix on the A03 BIOS tomorrow.
Comment 29 Adar Dembo 2009-02-10 16:09:31 UTC
(In reply to comment #22)
> That other patch was from me too, and wasn't significantly different. This
> one
> just has a couple of minor optimisations (single-page flush, not a whole
> domain, and don't fall through to doing the ineffective write-buffer flush
> after doing the IOTLB flush).
> 
> We thought Bhavesh had said the original workaround _was_ working...
> 
> Can you make it follow the workaround path _always_, rather than only if
> (!cap_rwbf(iommu->cap)) ? 

I gave this a shot (with rwbf_flush and rwbf_iotlb_flush) and neither seemed to make any difference.
Comment 30 Andy Isaacson 2009-02-12 18:06:56 UTC
(In reply to comment #18)
> Please could you test with this patch (against an otherwise clean kernel;
> especially make sure you take the udelay out).

I tested with your flush-debug.patch on top of 33bfad5 from Linus' tree on the Dell E4300 with A03 BIOS.  I still see DMAR errors associated with iwlagn (if I "sudo ifconfig wlan0 down" the DMAR errors stop).  But on the plus side, I can successfully transfer large amounts over e1000e without getting any DMAR errors or device lockups.  And, although I do see a steady stream of iwlagn-associated DMAR error messages, the iwlagn seems to work OK.

I've got acpidump output if you'd like, and note that upgrading to BIOS A05 seems to resolve the device failures (though not the DMAR error messages).

So it looks like either your patch or A05 fixes the device failures, and iwlagn has some cosmetic-but-noisy driver DMA bug?
Comment 31 David Woodhouse 2009-02-13 00:14:54 UTC
iwlagn has a cosmetic-but-noisy driver DMA bug. It writes back to descriptors which ought to be read-only.

http://david.woodhou.se/iwlfix has the fix, as does the Fedora rawhide kernel.
Comment 32 David Woodhouse 2009-02-13 08:50:38 UTC
Created attachment 20227 [details]
updated iommu patch

Please could you retest with _just_ this patch, and the wireless fix from http://david.woodhou.se/iwlfix 

We believe that once the wireless bug is out of the way and not confusing us, this patch is all that's actually needed for the IOMMU.

Thanks.
Comment 33 Bhavesh Davda 2009-02-13 09:01:12 UTC
David, I'm wondering if I should also be using your new patch from comment #32 instead of the older patch (from comment #19) that had 3 quirk modes. I'm also wondering when you plan to send an official patch out for review to the mailing lists. Thanks!
Comment 34 David Woodhouse 2009-02-13 09:12:20 UTC
(In reply to comment #33)
> David, I'm wondering if I should also be using your new patch from comment
> #32
> instead of the older patch (from comment #19) that had 3 quirk modes. 

Yes, you should.

> I'm also wondering when you plan to send an official patch out for review to
> > the mailing lists. Thanks!

Just as soon as you tell me it works for you :)
Comment 35 Bhavesh Davda 2009-02-13 10:04:21 UTC
I tested with the patch in comment #32, and it works as well as the patch in comment #19 with quirk mode 1: I don't get any more "PTE Read" faults from the IOMMU for DMA's from the e1000e NIC, but I still get the one "PTE Write" fault from the IOMMU for the VGA controller on my Lenovo x200:

[    0.216043] DMAR: Forcing write-buffer flush capability
[    0.928007] DMAR:[DMA Write] Request device [00:02.0] fault addr feff5e000
[    0.928007] DMAR:[fault reason 05] PTE Write access is not set
00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)

I noticed that you don't flush the IOTLB any more in this new patch.
Comment 36 David Woodhouse 2009-02-13 10:12:35 UTC
Right, we don't flush the IOTLB any more. We don't believe that's actually necessary.

You were still seeing the single VGA fault at startup, even when you had the more paranoid version of the patch which _did_ flush the IOTLB... weren't you?

I think that's a separate bug. Or just that the graphics engine is still doing something when the kernel starts up, and we force it to stop (I see something similar from the USB controllers on my test box).
Comment 37 Bhavesh Davda 2009-02-13 10:17:19 UTC
Actually, the VGA DMA fault had disappeared with the most aggressive of your patches: always flush the IOTLB for the whole domain. Across multiple boots.
Comment 38 David Woodhouse 2009-02-13 10:24:58 UTC
I suspect that's a separate issue, and it's just coincidence that it goes away.

Can you confirm my suspicion that you have CONFIG_DMAR_GFX_WA set anyway? Which means that we're giving direct 1:1 access to the VGA device and its mappings aren't taking this code path anyway?

I suspect it's just a timing thing. Will consult...
Comment 39 Bhavesh Davda 2009-02-13 10:35:55 UTC
(In reply to comment #38)
> Can you confirm my suspicion that you have CONFIG_DMAR_GFX_WA set anyway?
> Which
> means that we're giving direct 1:1 access to the VGA device and its mappings
> aren't taking this code path anyway?

Yup: CONFIG_DMAR_GFX_WA=y
Comment 40 Chris Wright 2009-02-13 11:04:50 UTC
Using the patch that includes quirk for iommu_flush_write_buffer plus flushes before flush_context() I still see:
[    0.157017] Cantiga chipset detected; enabling DMAR workarounds
...
[    0.742004] DMAR:[DMA Write] Request device [00:02.0] fault addr ffffff000 
[    0.742005] DMAR:[fault reason 05] PTE Write access is not set
Comment 41 David Woodhouse 2009-02-13 11:17:14 UTC
OK, that reassures me that it's a separate issue, and if it does go away for Bhavesh it's just a timing or some kind of coincidence.

There's just no reason why the graphics chipset should be writing to main memory at this stage in the boot anyway; I think it's got to be an issue with the way the BIOS has set it up, and the IOMMU is _supposed_ to be blocking it.
Comment 42 Bhavesh Davda 2009-02-13 11:22:27 UTC
FWIW, this DMA Write fault from the VGA device doesn't cause any ill side effects later any way, so is not as serious of an issue as those other DMA failures that used to happen much later at boot.

But I don't understand your claim: "the IOMMU is _supposed_ to be blocking it". Without an IOMMU in the picture, the device *was* accessing main memory and nobody was stopping it, right? Why the device was accessing main memory in the first place is beyond me, though.
Comment 43 Andy Isaacson 2009-02-13 11:25:43 UTC
Created attachment 20233 [details]
dmesg with A03, 33bfad5+iwlfix+rwbf-3.patch

(In reply to comment #32)
> Created an attachment (id=20227) [details]
> updated iommu patch
> 
> Please could you retest with _just_ this patch, and the wireless fix from
> http://david.woodhou.se/iwlfix 
> 
> We believe that once the wireless bug is out of the way and not confusing us,
> this patch is all that's actually needed for the IOMMU.

I took 33bfad5 and applied iwlfix and rwbf-3.patch on A03 BIOS.  The resulting kernel boots and seems to have fixed all the DMAR errors.  I've attached the complete dmesg.

So it looks like these two patches resolve the issue completely.

Thanks!
-andy
Comment 44 David Woodhouse 2009-02-13 11:28:22 UTC
I believe the address it was accessing was marked as 'reserved' in your E820 memory map from the BIOS? It is for Chris. So it's not necessarily 'main memory' according to the kernel. Perhaps the BIOS should have included it in an RMRR entry, to 'whitelist' accesses to that address?

The graphic chip has a 'hardware status page' which it can write to; we don't believe that the IOMMU is just _inventing_ a DMA write transaction. :)

Normally we'd expect that this status page would be disabled while the kernel is booting. But who are we to guess what the BIOS is playing at? 
Comment 45 Bhavesh Davda 2009-02-13 11:37:05 UTC
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 000000013c000000 (usable)
[    0.928007] DMAR:[DMA Write] Request device [00:02.0] fault addr feff5e000
There's no entry for feff5e000 in the E820 map.
I wouldn't have expected the BIOS to be doing anything at the time this fault happened, but that's just BSing around without actually looking at the code. And no, I wasn't implying that the IOMMU made up that DMA write transaction: I was just wondering what the side effects of denying a DMA write from the VGA device with the IOMMU in the middle would be.
Comment 46 David Woodhouse 2009-02-13 16:01:17 UTC
Patch submitted to Linus:
https://lists.linux-foundation.org/pipermail/iommu/2009-February/001117.html
Comment 47 Adar Dembo 2009-02-17 18:05:16 UTC
Here's an update on my situation (with the Dell Precision T5400).

Having had no luck booting with David's patches, I tried a more drastic approach. Suspecting some sort of bug in Fusion drivers (for my SAS controller), I removed the controller from my system and plugged a SATA drive directly into the motherboard's SATA headers. I installed Ubuntu 8.10 on this drive, cloned linux-2.6, built a kernel with CONFIG_DMAR_*=y, and booted into it. I was able to successfully boot and the system appears to be quite usable (X was successfully started, I can access the network, etc.).

I do see a lot of DMAR "Present bit in root entry is clear" faults in my kernel log, and they appear to be related to the ixgbe driver powering the Oplin dual-port NIC in my system. I'm not using that NIC for anything, so I unbound the driver from the two devices and the DMAR fault messages have stopped. The system appears to still be OK, and I've attached the log with the messages.
Comment 48 Adar Dembo 2009-02-17 18:05:59 UTC
Created attachment 20292 [details]
2.6.29-rc5 kernel log containing DMAR faults from ixgbe
Comment 49 Adar Dembo 2009-02-26 18:05:57 UTC
David, any thoughts on the ixgbe issue? Perhaps it's indicative of a bug in the driver?
Comment 50 David Woodhouse 2009-06-09 17:08:24 UTC
Was the ixgbe issue fixed by http://git.kernel.org/linus/924b6231edfaf1e764ffb4f97ea382bf4facff58 ?
Comment 51 Adar Dembo 2009-06-09 18:56:24 UTC
(In reply to comment #50)
> Was the ixgbe issue fixed by
> http://git.kernel.org/linus/924b6231edfaf1e764ffb4f97ea382bf4facff58 ?

I suspect it was, as that commit fixed nearly all of my DMAR fault issues. I can't confirm, though, because the card has since departed from my system.
Comment 52 David Woodhouse 2009-06-09 18:58:53 UTC
Thanks. Let's close this bug then -- you know where to find me...