Created attachment 287245 [details]
Screenshot of the issue
Description of problem:
Windows of OpenGL applications ran with DRI_PRIME=1 environment variable (hybrid graphics) are filled with white backgrounds when scaled instead of being rendered properly.
Steps to Reproduce:
1. Boot with kernel version 5.5rc1 or higher and login to your xorg session (I use xmonad without a compositor)
2. Run any OpenGL application with DRI_PRIME=1 set. I used glxgears for this test (tested mpv with --vo=opengl, radium are known to be affected as well)
Half of the window is filled with white, white pixels do not go away as you resize the window.
Application is rendered correctly.
be62dbf554c5b50718a54a359372c148cd9975c7 is the first bad commit
Author: Tom Murphy <firstname.lastname@example.org>
Date: Sun Sep 8 09:56:41 2019 -0700
iommu/amd: Convert AMD iommu driver to the dma-iommu api
Convert the AMD iommu driver to the dma-iommu api. Remove the iova
handling and reserve region code from the AMD iommu driver.
Signed-off-by: Tom Murphy <email@example.com>
Signed-off-by: Joerg Roedel <firstname.lastname@example.org>
drivers/iommu/Kconfig | 1 +
drivers/iommu/amd_iommu.c | 692 +++++-----------------------------------------
2 files changed, 68 insertions(+), 625 deletions(-)
dmesg prints this from radeon/amd-vi once when glxgears is started:
AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0000 address=0xfffffff2c0 flags=0x0010]
awk -f scripts/ver_linux
Linux archy 5.5.0-1-git-10086-g41dcd67e8868 #3 SMP PREEMPT Fri, 07 Feb 2020 22:37:35 +0000 x86_64 GNU/Linux
GNU C 9.2.0
GNU Make 4.3
Linux C Library 2.30
Dynamic linker (ldd) 2.30
Linux C++ Library 6.0.27
Modules Loaded acpi_cpufreq aesni_intel agpgart ahci amdgpu asus_wmi async_memcpy async_pq async_raid6_recov async_tx async_xor battery bcache blake2b_generic btrfs ccp crc32c_ge
neric crc32c_intel crc32_pclmul crc64 crct10dif_pclmul cryptd crypto_simd dca dm_mod dm_raid drm drm_kms_helper eeepc_wmi evdev fat fb_sys_fops ghash_clmulni_intel glue_helper gpio_amdpt
gpu_sched hid hid_generic i2c_algo_bit i2c_piix4 igb input_leds ip_tables irqbypass jc42 joydev k10temp kvm kvm_amd libahci libata libcrc32c mac_hid macvlan macvtap mc md_mod mousedev mxm
_wmi nls_cp437 nls_iso8859_1 pcspkr pinctrl_amd radeon raid1 raid456 raid6_pq rfkill rng_core scsi_mod sd_mod snd snd_aloop snd_hda_codec snd_hda_codec_hdmi snd_hda_core snd_hda_intel snd
_hwdep snd_intel_dspcfg snd_pcm snd_rawmidi snd_timer snd_usb_audio snd_usbmidi_lib soundcore sparse_keymap syscopyarea sysfillrect sysimgblt tap ttm tun usbhid vfat vfio vfio_iommu_type1
vfio_pci vfio_virqfd vhost vhost_net wmi wmi_bmof xfs xhci_hcd xhci_pci xor x_tables
Reverting commit be62dbf554c5b50718a54a359372c148cd9975c7 on 5.5 tag or disabling IOMMU in bios resolves the issue and glxgears is rendered correctly.
Created attachment 287247 [details]
Created attachment 287253 [details]
lspci -vv on unaffected 5.4.15 kernel
This is likely a bug in the DRM code for your GPU. The recent changes in the AMD IOMMU driver might cause sg-list entries to be merged by the DMA-API. Some drivers probably don't handle this correctly.
I've seen a similar problem with RDMA devices, caused by the same change.
Please report this issue to the developers of your GPU driver.
Please attach a copy of your dmesg output. @Joerg, can you point to what changes are required in drivers to handle this?
Also, has anyone seen similar issues with other IOMMU drivers that use use the dma-iommu api? If not, that would point to a problem on the IOMMU side.
Created attachment 287545 [details]
@Alex Added dmesg log
Ticket for drm/amd is at https://gitlab.freedesktop.org/drm/amd/issues/1056
Tried the Patch/hack in the RDMA thread (https://www.spinics.net/lists/linux-nfs/msg76402.html) and it helps with this issue as well fwiw
(In reply to Alex Deucher from comment #6)
> Please attach a copy of your dmesg output. @Joerg, can you point to what
> changes are required in drivers to handle this?
The AMD IOMMU driver in v5.5 switched its DMA-API implementation to the common dma-iommu code. The main difference in behavior between the old and the dma-iommu implementation is that dma-iommu does sg-list merging, which means it can return less mapped segments than requested. See this paragraph from Documentation/DMA-API.txt:
> dma_map_sg(struct device *dev, struct scatterlist *sg,
> int nents, enum dma_data_direction direction)
> Returns: the number of DMA address segments mapped (this may be shorter
> than <nents> passed in if some elements of the scatter/gather list are
> physically or virtually adjacent and an IOMMU maps them with a single
(In reply to Alex Deucher from comment #7)
> Also, has anyone seen similar issues with other IOMMU drivers that use use
> the dma-iommu api? If not, that would point to a problem on the IOMMU side.
There is only one other driver on x86, the VT-d driver. And this one uses its own DMA-API implementation which doesn't do sg-entry merging.
I have also seen similar bug-reports for RDMA devices, and the culprit there was also that the RDMA device driver did not correctly handle the sg-entry merging case.
fixed in 5.6: