Bug 218993
Summary: | SIGBUS with amdgpu on multi-GPU system on X server with DRI3/GBM | ||
---|---|---|---|
Product: | Drivers | Reporter: | adaha |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED MOVED | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://gitlab.freedesktop.org/drm/amd/-/issues/3457 | ||
Kernel Version: | 6.9.5-200.fc40.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | trace before crash, Xvnc on Ryzen 5 7600, vkcube on Arc A380 |
To clarify, the crash does not come from gbm_bo_map() directly, but by the incorrectly mapped memory which causes a crash later in the program. This doesn't look like a kernel bug to me. You could try asking here though: https://gitlab.freedesktop.org/drm/amd/-/issues |
Created attachment 306503 [details] trace before crash, Xvnc on Ryzen 5 7600, vkcube on Arc A380 I ran into a SIGBUS when using multiple GPUs and DRI with an X server that has GPU acceleration (TigerVNC's Xvnc). This happened on a machine with: OS: Fedora 40 running 6.9.5-200.fc40.x86_64 iGPU: Ryzen 5 7600 dGPU: RTX 4060 | Arc A380 | RX 7600 The issue occurs when the X server is configured to use an AMD rendernode, and an application wants to use a non-AMD rendernode. When opening the AMD rendernode using gbm_create_device(), a SIGBUS will occur when gbm_bo_map() is called, if the application wants to use another rendernode that is not an AMD GPU. In my setup, /dev/dri/renderD128 is the AMD iGPU, and /dev/dri/renderD129 is an RTX 4060. If I run the X server with $ Xvnc :50 -rendernode /dev/dri/renderD128 and vkcube with renderD129 on the X server $ DISPLAY=:50 vkcube --gpu_number 1 I get the sigbus: (EE) (EE) Backtrace: (EE) 0: Xvnc (xorg_backtrace+0x82) [0x560c52b47d42] (EE) 1: Xvnc (0x560c52991000+0x1b7f4c) [0x560c52b48f4c] (EE) 2: /lib64/libc.so.6 (0x7f0c99613000+0x40710) [0x7f0c99653710] (EE) 3: /lib64/libpixman-1.so.0 (0x7f0c99ed0000+0x8a2d0) [0x7f0c99f5a2d0] (EE) 4: /lib64/libpixman-1.so.0 (pixman_blt+0x81) [0x7f0c99ede8d1] (EE) 5: Xvnc (vncDRI3SyncPixmapFromGPU+0x10e) [0x560c529f303e] (EE) 6: Xvnc (0x560c52991000+0x622c3) [0x560c529f32c3] (EE) 7: Xvnc (dri3_pixmap_from_fds+0xcf) [0x560c52a7fdaf] (EE) 8: Xvnc (0x560c52991000+0xf1309) [0x560c52a82309] (EE) 9: Xvnc (Dispatch+0x426) [0x560c52ae3f56] (EE) 10: Xvnc (dix_main+0x46a) [0x560c52af2d4a] (EE) 11: /lib64/libc.so.6 (0x7f0c99613000+0x2a088) [0x7f0c9963d088] (EE) 12: /lib64/libc.so.6 (__libc_start_main+0x8b) [0x7f0c9963d14b] (EE) 13: Xvnc (_start+0x25) [0x560c529eed75] (EE) (EE) Bus error at address 0x7f0c8e211000 (EE) Fatal server error: (EE) Caught signal 7 (Bus error). Server aborting (EE) Aborted (core dumped) The same crash occurs when running vkcube on an Arc GPU (A380). However, running the X server on an Arc or Nvidia GPU, and vkcube on the AMD GPU, does not cause a crash. Neither does running the X server on AMD, and vkcube on a different AMD GPU (iGPU & RX 7600 for example). I've attached a stacktrace with the last call to mmap() before the crash.