Hey, I just bought a Lenovo X201s (CPU is Core i7 LM640) and I have hard lockups in X when using intel_iommu=on. I'm reporting against DRI driver since the involved process is X and I recall some other issues with DMAR and graphics card, but feel free to reassign if it's not the correct component. Basically, after some time using the laptop, there's a freeze. Using netconsole I managed to get a trace (which doesn't get saved in kern.log afaict): [ 414.698653] Kernel panic - not syncing: DMAR hardware is malfunctioning [ 414.698657] [ 414.698817] Pid: 1482, comm: Xorg Not tainted 3.0.0-1-amd64 #1 [ 414.698914] Call Trace: [ 414.698964] <IRQ> [<ffffffff81334c28>] ? panic+0x92/0x1a1 [ 414.699075] [<ffffffff81038189>] ? test_tsk_need_resched+0xe/0x17 [ 414.699174] [<ffffffff811ca67b>] ? __iommu_flush_iotlb+0x131/0x178 [ 414.699270] [<ffffffff811caa11>] ? flush_unmaps+0x66/0x11d [ 414.699357] [<ffffffff811caadd>] ? flush_unmaps_timeout+0x15/0x25 [ 414.699453] [<ffffffff81052b04>] ? run_timer_softirq+0x1bf/0x28a [ 414.699549] [<ffffffff8104c0ad>] ? raise_softirq_irqoff+0x9/0x2e [ 414.699642] [<ffffffff811caac8>] ? flush_unmaps+0x11d/0x11d [ 414.699732] [<ffffffff81066feb>] ? timekeeping_get_ns+0xd/0x2a [ 414.699823] [<ffffffff8104bdd4>] ? __do_softirq+0xb9/0x178 [ 414.699911] [<ffffffff8133cc9c>] ? call_softirq+0x1c/0x30 [ 414.699997] [<ffffffff8100a9ef>] ? do_softirq+0x3f/0x84 [ 414.700080] [<ffffffff8104c040>] ? irq_exit+0x3f/0xa3 [ 414.700163] [<ffffffff8101f51e>] ? smp_apic_timer_interrupt+0x76/0x86 [ 414.700263] [<ffffffff8133c453>] ? apic_timer_interrupt+0x13/0x20 [ 414.700354] <EOI> [<ffffffff8133ba92>] ? system_call_fastpath+0x16/0x1b [ 415.884326] panic occurred, switching back to text console [ 415.884413] BUG: scheduling while atomic: Xorg/1482/0x10000100 [ 415.884500] Modules linked in: thinkpad_acpi nvram netconsole configfs nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ext2 acpi_cpufreq mperf snd_hda_codec_hdmi snd_hda_codec_conexant arc4 iwlagn snd_hda_intel mac80211 snd_hda_codec snd_hwdep cfg80211 snd_pcm snd_timer btusb bluetooth i2c_i801 pcspkr psmouse snd tpm_tis rfkill battery serio_raw ac soundcore power_supply evdev snd_page_alloc tpm tpm_bios intel_ips wmi processor ext4 mbcache jbd2 crc16 aesni_intel cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod sg sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit thermal ahci libahci e1000e ehci_hcd libata scsi_mod usbcore i2c_core video thermal_sys button [last unloaded: nvram] [ 415.886115] CPU 0 [ 415.886149] Modules linked in: thinkpad_acpi nvram netconsole configfs nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ext2 acpi_cpufreq mperf snd_hda_codec_hdmi snd_hda_codec_conexant arc4 iwlagn snd_hda_intel mac80211 snd_hda_codec snd_hwdep cfg80211 snd_pcm snd_timer btusb bluetooth i2c_i801 pcspkr psmouse snd tpm_tis rfkill battery serio_raw ac soundcore power_supply evdev snd_page_alloc tpm tpm_bios intel_ips wmi processor ext4 mbcache jbd2 crc16 aesni_intel cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod sg sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit thermal ahci libahci e1000e ehci_hcd libata scsi_mod usbcore i2c_core video thermal_sys button [last unloaded: nvram] [ 415.887749] [ 415.887780] Pid: 1482, comm: Xorg Not tainted 3.0.0-1-amd64 #1 LENOVO 51434JG/51434JG [ 415.887918] RIP: 0033:[<00007fe14fbc261e>] [<00007fe14fbc261e>] 0x7fe14fbc261d [ 415.888039] RSP: 002b:00007fff5bdc7d50 EFLAGS: 00003246 [ 415.888119] RAX: 0000000000000000 RBX: ffffffff8133ba92 RCX: 0000000000000000 [ 415.888225] RDX: 000000000205a008 RSI: 0000000000000000 RDI: 000000000205a008 [ 415.888329] RBP: 0000000003e7b7b0 R08: 00000000039be000 R09: 00000000039be000 [ 415.888435] R10: 0000000002059f20 R11: 0000000000003246 R12: ffffffff8133c44e [ 415.888540] R13: 0000000040406469 R14: 00007fff5bdc7d80 R15: 0000000000000000 [ 415.888646] FS: 00007fe150874880(0000) GS:ffff880137c00000(0000) knlGS:0000000000000000 [ 415.888764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 415.888851] CR2: 0000000003b9c011 CR3: 0000000131e67000 CR4: 00000000000006f0 [ 415.888955] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 415.889060] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 415.889166] Process Xorg (pid: 1482, threadinfo ffff880132142000, task ffff88012f1a49f0) [ 415.889282] [ 415.889310] Call Trace: and it ends here. This is on Debian sid with Debian 3.0 kernel. I'm gonna drop the intel_iommu=on boot arg for now, and will try other args like igfx_off, but I'd like to use the I/O MMU when possible. Bios is up to date. As I just bought the laptop I can't say if it's a regression in the kernel (3.0 was the first kernel I installed, I'll try from a GRML just in case) or in the BIOS (I don't think I had a lockup before upgrading but since I upgraded to latest BIOS pretty fast, it might not be meaningful). I'll follow up on that bug with complete dmesg after boot (with and without intel_iommu=on), if you need any more information, please ask.
And note that while: [ 415.884326] panic occurred, switching back to text console is displayed, there's no switch, it stays in X.
The ironlake iommu does not work with the igfx due to a silicon bug, hence why the kernel disables it...
Hmh, I might be confused but I thought that it was just a matter of excluding igfx (setting up an identity mapping) from DMAR. Do you mean there's *no* way to use the I/O MMU on this laptop?
Created attachment 67082 [details] dmesg with no I/O MMU enabled
Created attachment 67092 [details] dmesg with intel_iommu=on
Created attachment 67102 [details] dmesg with intel_iommu=igfx_off It seems that, for now, intel_iommu=igfx_off works fine (no lockup, though I'm only booted for half an hour).
Hum, indeed, with igfx_off it seems that it doesn't really use DMAR: [ 0.939792] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Ok, I might be wrong but couldn't the same kind of fix than http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2d9e667e be applied? AIUI in the patch only GFX DMAR is disabled, for the relevant device ([8086:2a40] rev07). In my case my device is: 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) I'll try to extend the patch and see it leads somewhere.
Created attachment 67152 [details] disable iommu for graphics on ironlake too Ok, using this patch seems to fix the problem but still enable DMAR. Is it normal that when booting with intel_iommu=igfx_off the setup does: dmar_map_gfx = 0; but not dmar_disabled = 0; ? Debian doesn't have DMAR_DEFAULT_ON enabled (for good reasons) but I think (according to the documentation) that intel_iommu=igfx_off means “DMAR remapping enabled except for IGFX”: igfx_off [Default Off] By default, gfx is mapped as normal device. If a gfx device has a dedicated DMAR unit, the DMAR unit is bypassed by not enabling DMAR with this option. In this case, gfx device will use physical address for DMA. What do you think? Should I prepare a patch adding dmar_disable = 0; for intel_iommu=igfx_on?
Ok, I was confused. I still think the quirk is valid (in order to prevent users like me to shoot themselves in the foot) but I didn't realized one could do: intel_iommu=on,igfx_off to have the same behavior as with the quirk. As Dave Airled said on #intel-gfx though, it might make sense to extend the quirk to all Cantiga and Ironlake chipsets where apparently the GFX I/O MMU is buggy, while the main one does work.
Note that Intel-IOMMU.txt in Documentation says: Graphics Problems? ------------------ If you encounter issues with graphics devices, you can try adding option intel_iommu=igfx_off to turn off the integrated graphics engine. If this fixes anything, please ensure you file a bug reporting the problem. so this is exactly what I did. My understanding is that intel_iommu=igfx_off is a temporary workaround until a fix is committed to the code (wether it's a real fix or disabling the IGFX I/O MMU). If I'm wrong, maybe rephrasing the documentation would be a good idea?