Bug 208611

Summary: amdgpu crash on sharing image memory between Vulkan and OpenGL
Product: Drivers Reporter: Ivan Molodetskikh (yalterz)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.7.8-200.fc32.x86_64 Subsystem:
Regression: No Bisected commit-id:
Attachments: The kernel crash part of the log.

Description Ivan Molodetskikh 2020-07-19 12:36:34 UTC
Created attachment 290347 [details]
The kernel crash part of the log.

I have two programs: one (Vulkan) creates an image, allocates memory for it and exports the memory as an fd. The second program (OpenGL) receives the fd, imports it as a memory for a texture, and blits some data into the texture. The Vulkan program subsequently maps the memory and reads pixel values from it.

Both programs follow the code of this ANGLE test for Vulkan-OpenGL interop very closely: https://github.com/pmatos/WebKit/blob/c42c49d3859ceb5d6e5c502373c8d3e371662ac4/Source/ThirdParty/ANGLE/src/tests/gl_tests/VulkanExternalImageTest.cpp#L434

It works fine with VK_FORMAT_R8G8B8A8_UNORM, but after changing it to R8G8B8_UNORM (without the alpha) in (I'm fairly certain) all respective places, I got corrupt image data on one run and a kernel crash on another run. Here are some excerpts from the log:

amdgpu 0000:01:00.0: GPU fault detected: 146 0x0f20880c for process vulkan-external pid 173502 thread vulkan-external pid 173502
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001005E4
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0608800C
amdgpu 0000:01:00.0: VM fault (0x0c, vmid 3, pasid 32784) at page 1050084, read from 'TC6' (0x54433600) (136)
amdgpu 0000:01:00.0: IH ring buffer overflow (0x0008AF70, 0x00006660, 0x0000AF80)
[drm] Fence fallback timer expired on ring sdma0
gmc_v8_0_process_interrupt: 305 callbacks suppressed
amdgpu 0000:01:00.0: GPU fault detected: 146 0x0e20480c for process vulkan-external pid 174193 thread vulkan-external pid 174193
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001005C4
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32784) at page 1050052, read from 'TC4' (0x54433400) (72)
[drm] Fence fallback timer expired on ring sdma0
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:49:crtc-1] flip_done timed out

Full last part of the log attached.

AMD RX 580, running Fedora 32 with GNOME Wayland.
Comment 1 Alex Deucher 2020-07-19 15:15:54 UTC
This is most likely a bug in mesa.  The kernel driver is just the messenger.
Comment 2 Ivan Molodetskikh 2020-07-19 15:34:15 UTC
(In reply to Alex Deucher from comment #1)
> This is most likely a bug in mesa.  The kernel driver is just the messenger.

I have opened a bug report in mesa: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3291
Comment 3 Ivan Molodetskikh 2023-08-05 04:12:33 UTC
This was resolved in mesa; forgot to close this.