Bug 205893 - Thunderbolt is broken when "DMA Protection" is enabled in UEFI settings
Summary: Thunderbolt is broken when "DMA Protection" is enabled in UEFI settings
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IOMMU (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_iommu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-17 21:28 UTC by Xesxen
Modified: 2022-10-27 01:09 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.4.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg log with DMA Protection enabled (968.00 KB, text/plain)
2019-12-17 21:28 UTC, Xesxen
Details
dmesg log with DMA Protection disabled (116.15 KB, text/plain)
2019-12-17 21:29 UTC, Xesxen
Details

Description Xesxen 2019-12-17 21:28:40 UTC
Created attachment 286351 [details]
dmesg log with DMA Protection enabled

My laptop, an HP Elitebook 850 G6, has got a feature called "DMA Protection" in its UEFI configuration. When enabled, Thunderbolt 3 devices will not work except for video output using the current stable kernel (5.4.4). 


[   42.703664] ------------[ cut here ]------------
[   42.703668] WARNING: CPU: 1 PID: 0 at drivers/iommu/intel-iommu.c:3916 bounce_unmap_single+0x100/0x110
[   42.703669] Modules linked in: ccm hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_hub intel_ishtp_loader intel_ishtp_hid cros_ec_ishtp cros_ec rfcomm xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo cmac algif_hash algif_skcipher af_alg xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc bnep snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc ledtrig_audio snd_sof_xtensa_dsp snd_sof_intel_hda_common joydev snd_soc_hdac_hda mousedev snd_sof_intel_hda iwlmvm snd_sof snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi hid_multitouch iTCO_wdt hp_wmi iTCO_vendor_support hid_generic snd_soc_core mei_hdcp mac80211 intel_wmi_thunderbolt mei_wdt sparse_keymap wmi_bmof snd_compress ac97_bus intel_rapl_msr snd_pcm_dmaengine i915 x86_pkg_temp_thermal
[   42.703687]  intel_powerclamp coretemp libarc4 snd_hda_intel kvm_intel snd_intel_nhlt snd_hda_codec kvm iwlwifi nls_iso8859_1 uvcvideo irqbypass i2c_algo_bit snd_hda_core intel_cstate nls_cp437 intel_uncore vfat snd_hwdep videobuf2_vmalloc fat drm_kms_helper intel_rapl_perf snd_pcm btusb videobuf2_memops btrtl btbcm psmouse pcspkr snd_timer videobuf2_v4l2 btintel input_leds videobuf2_common snd bluetooth cfg80211 drm e1000e i2c_i801 videodev soundcore intel_gtt agpgart mei_me ecdh_generic intel_lpss_pci syscopyarea rfkill ecc thunderbolt mc crc16 mei intel_lpss sysfillrect processor_thermal_device intel_ish_ipc sysimgblt intel_rapl_common ucsi_acpi tpm_crb idma64 fb_sys_fops intel_ishtp intel_soc_dts_iosf typec_ucsi intel_pch_thermal i2c_hid typec tpm_tis tpm_tis_core hid wmi battery int3403_thermal tpm int340x_thermal_zone int3400_thermal rng_core ac evdev mac_hid acpi_thermal_rel hp_wireless sg scsi_mod crypto_user ip_tables x_tables btrfs libcrc32c crc32c_generic xor raid6_pq dm_crypt
[   42.703710]  dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw atkbd libps2 aesni_intel crypto_simd xhci_pci cryptd glue_helper xhci_hcd i8042 serio
[   42.703716] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.4 #1
[   42.703716] Hardware name: HP HP EliteBook 850 G6/8549, BIOS R70 Ver. 01.03.04 11/06/2019
[   42.703718] RIP: 0010:bounce_unmap_single+0x100/0x110
[   42.703719] Code: 46 7a 5d 00 49 8b 45 00 48 85 c0 75 e1 65 ff 0d 3e d9 5e 66 75 a3 e8 3f a6 9d ff eb 9c 0f 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b eb 8b 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 e9
[   42.703719] RSP: 0018:ffffa1b3c01a4d70 EFLAGS: 00010046
[   42.703720] RAX: 0000000000000000 RBX: ffff9e2b7e1b00b0 RCX: 0000000000000000
[   42.703720] RDX: 0000000000000008 RSI: 00000000fff7d000 RDI: ffff9e2b7c3957b8
[   42.703721] RBP: 00000000fff7d000 R08: 0000000000000000 R09: ffffffff99a29270
[   42.703721] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000008
[   42.703722] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[   42.703722] FS:  0000000000000000(0000) GS:ffff9e2b85440000(0000) knlGS:0000000000000000
[   42.703723] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.703724] CR2: 00007ffdcd71c448 CR3: 000000004520a006 CR4: 00000000003606e0
[   42.703724] Call Trace:
[   42.703725]  <IRQ>
[   42.703729]  usb_hcd_unmap_urb_setup_for_dma+0x9c/0xd0
[   42.703730]  usb_hcd_unmap_urb_for_dma+0x16/0x160
[   42.703732]  __usb_hcd_giveback_urb+0x36/0x120
[   42.703738]  xhci_giveback_urb_in_irq.isra.0+0x72/0x100 [xhci_hcd]
[   42.703743]  xhci_td_cleanup+0xe1/0x120 [xhci_hcd]
[   42.703747]  xhci_irq+0xbe8/0x1db0 [xhci_hcd]
[   42.703750]  __handle_irq_event_percpu+0x45/0x1b0
[   42.703752]  handle_irq_event_percpu+0x31/0x80
[   42.703753]  handle_irq_event+0x37/0x54
[   42.703754]  handle_edge_irq+0xae/0x1f0
[   42.703756]  do_IRQ+0x84/0x140
[   42.703758]  common_interrupt+0xf/0xf
[   42.703758]  </IRQ>
[   42.703760] RIP: 0010:cpuidle_enter_state+0xc4/0x480
[   42.703761] Code: e8 11 10 9a ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 93 03 00 00 31 ff e8 33 66 a0 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 be 02 00 00 49 63 cc 4c 2b 6c 24 10 48 8d 04 49 48
[   42.703761] RSP: 0018:ffffa1b3c012fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[   42.703762] RAX: ffff9e2b85440000 RBX: ffffffff9a6bc1c0 RCX: 000000000000001f
[   42.703763] RDX: 0000000000000000 RSI: 000000004041d72a RDI: 0000000000000000
[   42.703763] RBP: ffff9e2b85474a00 R08: 00000009f1560b89 R09: 00000009f2bf320f
[   42.703763] R10: ffff9e2b854697e0 R11: ffff9e2b854697c0 R12: 0000000000000006
[   42.703764] R13: 00000009f1560b89 R14: 0000000000000006 R15: ffff9e2b81a15c40
[   42.703766]  ? cpuidle_enter_state+0x9f/0x480
[   42.703767]  cpuidle_enter+0x29/0x40
[   42.703768]  do_idle+0x1de/0x260
[   42.703770]  cpu_startup_entry+0x19/0x20
[   42.703772]  start_secondary+0x186/0x1d0
[   42.703773]  secondary_startup_64+0xb6/0xc0
[   42.703775] ---[ end trace 86285b5358591f4c ]---
Comment 1 Xesxen 2019-12-17 21:29:21 UTC
Created attachment 286353 [details]
dmesg log with DMA Protection disabled
Comment 2 frederik 2019-12-30 11:56:34 UTC
Same problem here with Dell Precision 7540 but I can't disable DMA protection. Only downgrading to 5.3.x helps.

[  +0,000005] WARNING: CPU: 10 PID: 0 at drivers/iommu/intel-iommu.c:3916 bounce_unmap_single+0x103/0x110
[...]
[  +0,000001] Call Trace:
[  +0,000002]  <IRQ>
[  +0,000003]  usb_hcd_unmap_urb_setup_for_dma+0x9f/0xe0
[  +0,000001]  usb_hcd_unmap_urb_for_dma+0x1c/0x170
[  +0,000002]  __usb_hcd_giveback_urb+0x36/0x120
[  +0,000008]  xhci_giveback_urb_in_irq.isra.0+0x72/0x100 [xhci_hcd]
[  +0,000007]  xhci_td_cleanup+0x101/0x140 [xhci_hcd]
[  +0,000007]  xhci_irq+0xbf0/0x1db0 [xhci_hcd]
[  +0,000005]  __handle_irq_event_percpu+0x44/0x1b0
[  +0,000002]  handle_irq_event_percpu+0x34/0x80
[  +0,000002]  handle_irq_event+0x37/0x54
[  +0,000002]  handle_edge_irq+0xae/0x1f0
[  +0,000002]  do_IRQ+0x84/0x140
[  +0,000003]  common_interrupt+0xf/0xf
[  +0,000001]  </IRQ>
Comment 3 frederik 2019-12-30 23:04:11 UTC
Reverting these commits:
3b53034c268d550d9e8522e613a14ab53b8840d8
c5a5dc4cbbf4540c1891cdb2b70cf469405ea61f
cfb94a372f2d4ee226247447c863f8709863d170
e5e04d051979dbd636a99099b7a595093c50a4bc

fixes the Problem with 5.4. So this seems to be a regression.
Comment 4 Lu Baolu 2020-01-13 02:37:20 UTC
Can you please try adding below kernel parameter?

intel_iommu=nobounce
Comment 5 Lu Baolu 2020-01-13 02:56:31 UTC
Can you please help me to point out which is exactly line 3916 in your tree? 

  +0,000005] WARNING: CPU: 10 PID: 0 at drivers/iommu/intel-iommu.c:3916 bounce_unmap_single+0x103/0x110

static void
bounce_unmap_single(struct device *dev, dma_addr_t dev_addr, size_t size,
                    enum dma_data_direction dir, unsigned long attrs)
{
        size_t aligned_size = ALIGN(size, VTD_PAGE_SIZE);
        struct dmar_domain *domain;
        phys_addr_t tlb_addr;

        domain = find_domain(dev);
        if (WARN_ON(!domain))
                return;

        tlb_addr = intel_iommu_iova_to_phys(&domain->domain, dev_addr);
        if (WARN_ON(!tlb_addr))
                return;

        intel_unmap(dev, dev_addr, size);
        if (is_swiotlb_buffer(tlb_addr))
                swiotlb_tbl_unmap_single(dev, tlb_addr, size,
                                         aligned_size, dir, attrs);

        trace_bounce_unmap_single(dev, dev_addr, size);
}
Comment 6 Xesxen 2020-01-13 09:17:57 UTC
3916 in a clean 5.4.4 tree is the line about `if (WARN_ON(!tlb_addr))`. I'll try the kernel parameter later today once I've access to the TB peripherals again
Comment 7 Lu Baolu 2020-01-14 03:28:52 UTC
Can you please check whether below commit could solve your problem?

commit 75d18385394f56db76845d91a192532aba421875
Author: Lu Baolu <baolu.lu@linux.intel.com>
Date:   Wed Dec 11 09:40:15 2019 +0800

    iommu/vt-d: Fix dmar pte read access not set error
    
    If the default DMA domain of a group doesn't fit a device, it
    will still sit in the group but use a private identity domain.
    When map/unmap/iova_to_phys come through iommu API, the driver
    should still serve them, otherwise, other devices in the same
    group will be impacted. Since identity domain has been mapped
    with the whole available memory space and RMRRs, we don't need
    to worry about the impact on it.
    
    Link: https://www.spinics.net/lists/iommu/msg40416.html
    Cc: Jerry Snitselaar <jsnitsel@redhat.com>
    Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
    Fixes: 942067f1b6b97 ("iommu/vt-d: Identify default domains replaced with private")
    Cc: stable@vger.kernel.org # v5.3+
    Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
    Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
    Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
Comment 8 frederik 2020-01-14 16:03:51 UTC
commit 75d18385394f56db76845d91a192532aba421875 

lgtm
Comment 9 Xesxen 2020-01-14 20:34:29 UTC
Commit 75d18385394f56db76845d91a192532aba421875 was merged as 8a9661847790ad2c0cf16100554f4fac28874ad7 in the 5.4.x tree, and was included since 5.4.7. When booted with the 5.4.11 kernel, the error does not appear anymore and thunderbolt functions as intended.

Looks fixed to me :)

Note You need to log in before you can comment on or make changes to this bug.