Created attachment 288763 [details] Corrupted graphics Application i am working on somehow causes AMDGPU driver to fail, which results in soft-ish system lockup. Graphics get corrupted, system does react to some keypresses (can switch VT, sysrq works). Problem is reproducible *always*. Steps to reproduce: git clone --branch=hdpi-support-fontscale-viewport https://github.com/rokups/imgui cd imgui/examples/example_sdl_opengl3 make ./example_sdl_opengl3 Interact with program a bit, try to drag windows outside of main viewport. Issue happens within seconds. bal. 27 16:30:21 rk-PC systemd-coredump[15701]: Process 15649 (example_sdl_ope) of user 1000 dumped core. Stack trace of thread 15649: #0 0x00007fed0b12ace5 raise (libc.so.6 + 0x3bce5) #1 0x00007fed0b114857 abort (libc.so.6 + 0x25857) #2 0x00007fed0b16e2b0 __libc_message (libc.so.6 + 0x7f2b0) #3 0x00007fed0b1fe06a __fortify_fail (libc.so.6 + 0x10f06a) #4 0x00007fed0b1fc904 __chk_fail (libc.so.6 + 0x10d904) #5 0x00007fed085ab0b3 n/a (radeonsi_dri.so + 0x6340b3) #6 0x00007fed080f28b7 n/a (radeonsi_dri.so + 0x17b8b7) #7 0x00007fed080b71b8 n/a (radeonsi_dri.so + 0x1401b8) #8 0x00007fed080a8fcb n/a (radeonsi_dri.so + 0x131fcb) #9 0x00007fed082e8a63 n/a (radeonsi_dri.so + 0x371a63) #10 0x00007fed082e8e82 n/a (radeonsi_dri.so + 0x371e82) #11 0x00005564903169b5 n/a (/home/rk/src/games/Libs/imgui/cmake-build-debug/bin/example_sdl_opengl3 + 0x2a9b5) #12 0x0000556490313e69 n/a (/home/rk/src/games/Libs/imgui/cmake-build-debug/bin/example_sdl_opengl3 + 0x27e69) #13 0x00007fed0b116023 __libc_start_main (libc.so.6 + 0x27023) #14 0x00005564903136ce n/a (/home/rk/src/games/Libs/imgui/cmake-build-debug/bin/example_sdl_opengl3 + 0x276ce) <...> bal. 27 16:30:33 rk-PC kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! bal. 27 16:30:37 rk-PC kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! bal. 27 16:30:38 rk-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=542239, emitted seq=542241 bal. 27 16:30:38 rk-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process example_sdl_ope pid 15735 thread example_sd:cs0 pid 15741 bal. 27 16:30:38 rk-PC kernel: amdgpu 0000:08:00.0: GPU reset begin! bal. 27 16:30:38 rk-PC kernel: amdgpu 0000:08:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) bal. 27 16:30:38 rk-PC kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed bal. 27 16:30:39 rk-PC kernel: cp is busy, skip halt cp bal. 27 16:30:39 rk-PC kernel: rlc is busy, skip halt rlc bal. 27 16:30:39 rk-PC kernel: amdgpu 0000:08:00.0: GPU BACO reset bal. 27 16:30:39 rk-PC kernel: amdgpu 0000:08:00.0: GPU reset succeeded, trying to resume bal. 27 16:30:39 rk-PC kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000). bal. 27 16:30:39 rk-PC kernel: [drm] VRAM is lost due to GPU reset! bal. 27 16:30:40 rk-PC kernel: [drm] UVD and UVD ENC initialized successfully. bal. 27 16:30:40 rk-PC rtkit-daemon[7614]: Supervising 8 threads of 5 processes of 1 users. bal. 27 16:30:40 rk-PC rtkit-daemon[7614]: Successfully made thread 15789 of process 7410 owned by '1000' RT at priority 5. bal. 27 16:30:40 rk-PC rtkit-daemon[7614]: Supervising 9 threads of 5 processes of 1 users. bal. 27 16:30:40 rk-PC kernel: [drm] VCE initialized successfully. bal. 27 16:30:40 rk-PC kernel: [drm] recover vram bo from shadow start bal. 27 16:30:40 rk-PC kernel: [drm] recover vram bo from shadow done bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: [drm] Skip scheduling IBs! bal. 27 16:30:40 rk-PC kernel: amdgpu 0000:08:00.0: GPU reset(2) succeeded! bal. 27 16:30:41 rk-PC audit[2071]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2071 comm="Xorg:gdrv0" exe="/usr/lib/Xorg" sig=11 res=1 bal. 27 16:30:41 rk-PC kernel: audit: type=1701 audit(1587994241.386:339): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2071 comm="Xorg:gdrv0" exe="/usr/lib/Xorg" sig=11 res=1 bal. 27 16:30:41 rk-PC audit: AUDIT1334 prog-id=35 op=LOAD bal. 27 16:30:41 rk-PC kernel: audit: type=1334 audit(1587994241.399:340): prog-id=35 op=LOAD bal. 27 16:30:41 rk-PC audit: AUDIT1334 prog-id=36 op=LOAD bal. 27 16:30:41 rk-PC systemd[1]: Started Process Core Dump (PID 15795/UID 0). bal. 27 16:30:41 rk-PC audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-15795-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' bal. 27 16:30:41 rk-PC kernel: audit: type=1334 audit(1587994241.399:341): prog-id=36 op=LOAD bal. 27 16:30:41 rk-PC kernel: audit: type=1130 audit(1587994241.399:342): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@1-15795-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' bal. 27 16:30:41 rk-PC krunner[13252]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC konsole[11554]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC pulseaudio[7410]: X connection to :0 broken (explicit kill or server shutdown). bal. 27 16:30:41 rk-PC kdeinit5[7253]: kdeinit5: Fatal IO error: client killed bal. 27 16:30:41 rk-PC org_kde_powerdevil[7414]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC kdeinit5[7253]: kdeinit5: sending SIGHUP to children. bal. 27 16:30:41 rk-PC DiscoverNotifier[7388]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC xembedsniproxy[7335]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC plasmashell[7333]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC kscreen_backend_launcher[7301]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC kactivitymanagerd[7395]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC kernel: audit: type=1701 audit(1587994241.892:343): auid=1000 uid=1000 gid=100 ses=1 pid=11413 comm="spotify" exe="/opt/spotify/spotify" sig=6 res=1 bal. 27 16:30:41 rk-PC audit[11413]: ANOM_ABEND auid=1000 uid=1000 gid=100 ses=1 pid=11413 comm="spotify" exe="/opt/spotify/spotify" sig=6 res=1 bal. 27 16:30:41 rk-PC at-spi-bus-launcher[7858]: X connection to :0 broken (explicit kill or server shutdown). bal. 27 16:30:41 rk-PC ksmserver[7322]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC python[7420]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC kwalletd5[7086]: The X11 connection broke (error 1). Did the X11 server die? bal. 27 16:30:41 rk-PC gmenudbusmenuproxy[7385]: The X11 connection broke (error 1). Did the X11 server die?
Tested 5.4.35-1-lts kernel as well, corruption does happen, but looks bit different visually. Also i can access another VT without issues, rendering is ok there. Restarting X11 does not help to recover system, reboot is still needed. I also forgot to specify my GPU: AMD RX 580 Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux-lts.img rd.luks.name=<...>=cryptolvm rd.luks.options=discard,keyfile-timeout=10s rd.luks.key=<...>=/keys/root.key:UUID=<...> root=/dev/mapper/system-root resume=/dev/mapper/system-swap fastboot rw amd_iommu=on amd_iommu=pt nohz_full=8-15,24-31 rcu_nocbs=8-15,24-31 rcu_nocb_poll user_namespace.enable=1 And some info from early boot, should it be useful: bal. 27 15:46:49 archlinux kernel: [drm] amdgpu kernel modesetting enabled. bal. 27 15:46:49 archlinux kernel: fb0: switching to amdgpudrmfb from EFI VGA bal. 27 15:46:49 archlinux kernel: amdgpu 0000:08:00.0: vgaarb: deactivate vga console bal. 27 15:46:49 archlinux kernel: [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1462:0x3417 0xE7). bal. 27 15:46:49 archlinux kernel: [drm] register mmio base: 0xF7C00000 bal. 27 15:46:49 archlinux kernel: [drm] register mmio size: 262144 bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 0 <vi_common> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 1 <gmc_v8_0> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 2 <tonga_ih> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 3 <gfx_v8_0> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 4 <sdma_v3_0> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 5 <powerplay> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 6 <dm> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 7 <uvd_v6_0> bal. 27 15:46:49 archlinux kernel: [drm] add ip block number 8 <vce_v3_0> bal. 27 15:46:49 archlinux kernel: amdgpu 0000:08:00.0: No more image in the PCI ROM bal. 27 15:46:49 archlinux kernel: [drm] UVD is enabled in VM mode bal. 27 15:46:49 archlinux kernel: [drm] UVD ENC is enabled in VM mode bal. 27 15:46:49 archlinux kernel: [drm] VCE enabled in VM mode bal. 27 15:46:49 archlinux kernel: [drm] vm size is 512 GB, 2 levels, block size is 10-bit, fragment size is 9-bit bal. 27 15:46:49 archlinux kernel: amdgpu 0000:08:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used) bal. 27 15:46:49 archlinux kernel: amdgpu 0000:08:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF bal. 27 15:46:49 archlinux kernel: [drm] Detected VRAM RAM=8192M, BAR=256M bal. 27 15:46:49 archlinux kernel: [drm] RAM width 256bits GDDR5 bal. 27 15:46:49 archlinux kernel: [drm] amdgpu: 8192M of VRAM memory ready bal. 27 15:46:49 archlinux kernel: [drm] amdgpu: 8192M of GTT memory ready. bal. 27 15:46:49 archlinux kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536 bal. 27 15:46:49 archlinux kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000). bal. 27 15:46:49 archlinux kernel: [drm] Chained IB support enabled! bal. 27 15:46:49 archlinux kernel: amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu bal. 27 15:46:49 archlinux kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16 bal. 27 15:46:49 archlinux kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: values for Engine clock bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 300000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 600000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 927000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 1179000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 1251000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 1294000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 1339000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 1380000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: Validation clocks: bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: engine_max_clock: 138000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: memory_max_clock: 200000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: level : 8 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: values for Memory clock bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 300000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 1000000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: 2000000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: Validation clocks: bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: engine_max_clock: 138000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: memory_max_clock: 200000 bal. 27 15:46:49 archlinux kernel: [drm] DM_PPLIB: level : 8 bal. 27 15:46:49 archlinux kernel: [drm] Display Core initialized with v3.2.69! bal. 27 15:46:49 archlinux kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). bal. 27 15:46:49 archlinux kernel: [drm] Driver supports precise vblank timestamp query. bal. 27 15:46:49 archlinux kernel: [drm] UVD and UVD ENC initialized successfully. bal. 27 15:46:49 archlinux kernel: [drm] VCE initialized successfully. bal. 27 15:46:49 archlinux kernel: [drm] fb mappable at 0xE0830000 bal. 27 15:46:49 archlinux kernel: [drm] vram apper at 0xE0000000 bal. 27 15:46:49 archlinux kernel: [drm] size 14745600 bal. 27 15:46:49 archlinux kernel: [drm] fb depth is 24 bal. 27 15:46:49 archlinux kernel: [drm] pitch is 10240 bal. 27 15:46:49 archlinux kernel: fbcon: amdgpudrmfb (fb0) is primary device bal. 27 15:46:49 archlinux kernel: amdgpu 0000:08:00.0: fb0: amdgpudrmfb frame buffer device bal. 27 15:46:49 archlinux systemd-modules-load[471]: Inserted module 'amdgpu' bal. 27 15:46:49 archlinux kernel: [drm] Initialized amdgpu 3.36.0 20150101 for 0000:08:00.0 on minor 0 bal. 27 15:46:54 rk-PC systemd[1]: Condition check resulted in Load Kernel Module drm being skipped. bal. 27 15:46:55 rk-PC kernel: snd_hda_intel 0000:08:00.1: bound 0000:08:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) bal. 27 15:46:58 rk-PC kernel: amdgpu 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
This is most likely an application or mesa bug. The GPU has hung and the kernel driver has recovered it. You'll need to restart your GUI after a GPU reset.
This makes sense. I am pretty sure i never observed this issue with LTS kernel in the past, and now it is present. I will report it on mesa bugtracker. Thank you for replying.