Created attachment 287245 [details] Screenshot of the issue
Description of problem: Windows of OpenGL applications ran with DRI_PRIME=1 environment variable (hybrid graphics) are filled with white backgrounds when scaled instead of being rendered properly. How reproducible: Always Steps to Reproduce: 1. Boot with kernel version 5.5rc1 or higher and login to your xorg session (I use xmonad without a compositor) 2. Run any OpenGL application with DRI_PRIME=1 set. I used glxgears for this test (tested mpv with --vo=opengl, radium are known to be affected as well) For example: DRI_PRIME=1 glxgears Actual results: Half of the window is filled with white, white pixels do not go away as you resize the window. Expected results: Application is rendered correctly. Bisect results: be62dbf554c5b50718a54a359372c148cd9975c7 is the first bad commit commit be62dbf554c5b50718a54a359372c148cd9975c7 Author: Tom Murphy <murphyt7@tcd.ie> Date: Sun Sep 8 09:56:41 2019 -0700 iommu/amd: Convert AMD iommu driver to the dma-iommu api Convert the AMD iommu driver to the dma-iommu api. Remove the iova handling and reserve region code from the AMD iommu driver. Signed-off-by: Tom Murphy <murphyt7@tcd.ie> Signed-off-by: Joerg Roedel <jroedel@suse.de> drivers/iommu/Kconfig | 1 + drivers/iommu/amd_iommu.c | 692 +++++----------------------------------------- 2 files changed, 68 insertions(+), 625 deletions(-) Extra info: dmesg prints this from radeon/amd-vi once when glxgears is started: AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0000 address=0xfffffff2c0 flags=0x0010] awk -f scripts/ver_linux Linux archy 5.5.0-1-git-10086-g41dcd67e8868 #3 SMP PREEMPT Fri, 07 Feb 2020 22:37:35 +0000 x86_64 GNU/Linux GNU C 9.2.0 GNU Make 4.3 Binutils 2.33.1 Util-linux 2.35.1 Mount 2.35.1 Module-init-tools 26 E2fsprogs 1.45.5 Jfsutils 1.1.15 Reiserfsprogs 3.6.27 Xfsprogs 5.4.0 Bison 3.5.1 Flex 2.6.4 Linux C Library 2.30 Dynamic linker (ldd) 2.30 Linux C++ Library 6.0.27 Procps 3.3.15 Net-tools 2.10 Kbd 2.2.0 Console-tools 2.2.0 Sh-utils 8.31 Udev 244 Modules Loaded acpi_cpufreq aesni_intel agpgart ahci amdgpu asus_wmi async_memcpy async_pq async_raid6_recov async_tx async_xor battery bcache blake2b_generic btrfs ccp crc32c_ge neric crc32c_intel crc32_pclmul crc64 crct10dif_pclmul cryptd crypto_simd dca dm_mod dm_raid drm drm_kms_helper eeepc_wmi evdev fat fb_sys_fops ghash_clmulni_intel glue_helper gpio_amdpt gpu_sched hid hid_generic i2c_algo_bit i2c_piix4 igb input_leds ip_tables irqbypass jc42 joydev k10temp kvm kvm_amd libahci libata libcrc32c mac_hid macvlan macvtap mc md_mod mousedev mxm _wmi nls_cp437 nls_iso8859_1 pcspkr pinctrl_amd radeon raid1 raid456 raid6_pq rfkill rng_core scsi_mod sd_mod snd snd_aloop snd_hda_codec snd_hda_codec_hdmi snd_hda_core snd_hda_intel snd _hwdep snd_intel_dspcfg snd_pcm snd_rawmidi snd_timer snd_usb_audio snd_usbmidi_lib soundcore sparse_keymap syscopyarea sysfillrect sysimgblt tap ttm tun usbhid vfat vfio vfio_iommu_type1 vfio_pci vfio_virqfd vhost vhost_net wmi wmi_bmof xfs xhci_hcd xhci_pci xor x_tables Known fixes: Reverting commit be62dbf554c5b50718a54a359372c148cd9975c7 on 5.5 tag or disabling IOMMU in bios resolves the issue and glxgears is rendered correctly.
Created attachment 287247 [details] Bisect log
Created attachment 287253 [details] lspci -vv on unaffected 5.4.15 kernel
This is likely a bug in the DRM code for your GPU. The recent changes in the AMD IOMMU driver might cause sg-list entries to be merged by the DMA-API. Some drivers probably don't handle this correctly. I've seen a similar problem with RDMA devices, caused by the same change. Please report this issue to the developers of your GPU driver.
Please attach a copy of your dmesg output. @Joerg, can you point to what changes are required in drivers to handle this?
Also, has anyone seen similar issues with other IOMMU drivers that use use the dma-iommu api? If not, that would point to a problem on the IOMMU side.
Created attachment 287545 [details] dmesg output
@Alex Added dmesg log Ticket for drm/amd is at https://gitlab.freedesktop.org/drm/amd/issues/1056 Tried the Patch/hack in the RDMA thread (https://www.spinics.net/lists/linux-nfs/msg76402.html) and it helps with this issue as well fwiw
(In reply to Alex Deucher from comment #6) > Please attach a copy of your dmesg output. @Joerg, can you point to what > changes are required in drivers to handle this? The AMD IOMMU driver in v5.5 switched its DMA-API implementation to the common dma-iommu code. The main difference in behavior between the old and the dma-iommu implementation is that dma-iommu does sg-list merging, which means it can return less mapped segments than requested. See this paragraph from Documentation/DMA-API.txt: > int > dma_map_sg(struct device *dev, struct scatterlist *sg, > int nents, enum dma_data_direction direction) > > Returns: the number of DMA address segments mapped (this may be shorter > than <nents> passed in if some elements of the scatter/gather list are > physically or virtually adjacent and an IOMMU maps them with a single > entry). (In reply to Alex Deucher from comment #7) > Also, has anyone seen similar issues with other IOMMU drivers that use use > the dma-iommu api? If not, that would point to a problem on the IOMMU side. There is only one other driver on x86, the VT-d driver. And this one uses its own DMA-API implementation which doesn't do sg-entry merging. I have also seen similar bug-reports for RDMA devices, and the culprit there was also that the RDMA device driver did not correctly handle the sg-entry merging case.
See https://bugzilla.kernel.org/show_bug.cgi?id=206895
fixed in 5.6: commit 42e67b479eab6d26459b80b4867298232b0435e7 commit 0199172f933342d8b1011aae2054a695c25726f4 https://gitlab.freedesktop.org/drm/amd/issues/1056#note_457214