Created attachment 298081 [details] dmesg using 5.10.53 Original report in Debian : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991546 The issue seems resolved in more recent kernel version. Originally found using Debian's linux-image-5.10.0-8-amd64, but confirmed with latest 5.10.53 vanilla kernel. When the display is switched off and on, the system becomes unresponsive before really crashing. The easiest way to reproduce in my case is: - start a sway session - start firefox - cycle the display with : swaymsg "output * dpms off"; sleep 10; swaymsg "output * dpms on" When the display is switched back on: - if firefox is correctly displayed, then there's no issue - if the top part of firefox is not correctly displayed (transparent), then the issue is visible. Firefox is also not responsive, and after few seconds, the following is emitted : Jul 25 21:13:40 arrakis kernel: [ 109.007130] (t=5250 jiffies g=8557 q=4735) Jul 25 21:13:40 arrakis kernel: [ 109.007131] NMI backtrace for cpu 8 Jul 25 21:13:40 arrakis kernel: [ 109.007133] CPU: 8 PID: 1797 Comm: Xwayland Tainted: G S E 5.10.53 #1 Jul 25 21:13:40 arrakis kernel: [ 109.007134] Hardware name: Gigabyte Technology Co., Ltd. AB350M-Gaming 3/AB350M-Gaming 3-CF, BIOS F42d 10/18/2019 Jul 25 21:13:40 arrakis kernel: [ 109.007135] Call Trace: Jul 25 21:13:40 arrakis kernel: [ 109.007137] <IRQ> Jul 25 21:13:40 arrakis kernel: [ 109.007142] dump_stack+0x6b/0x83 Jul 25 21:13:40 arrakis kernel: [ 109.007143] nmi_cpu_backtrace.cold+0x32/0x69 Jul 25 21:13:40 arrakis kernel: [ 109.007146] ? lapic_can_unplug_cpu+0x80/0x80 Jul 25 21:13:40 arrakis kernel: [ 109.007148] nmi_trigger_cpumask_backtrace+0xd7/0xe0 Jul 25 21:13:40 arrakis kernel: [ 109.007150] rcu_dump_cpu_stacks+0xa2/0xd0 Jul 25 21:13:40 arrakis kernel: [ 109.007152] rcu_sched_clock_irq.cold+0x1ff/0x3d6 Jul 25 21:13:40 arrakis kernel: [ 109.007154] update_process_times+0x8c/0xc0 Jul 25 21:13:40 arrakis kernel: [ 109.007156] tick_sched_handle+0x22/0x60 Jul 25 21:13:40 arrakis kernel: [ 109.007158] tick_sched_timer+0x7c/0xb0 Jul 25 21:13:40 arrakis kernel: [ 109.007159] ? tick_do_update_jiffies64.part.0+0xc0/0xc0 Jul 25 21:13:40 arrakis kernel: [ 109.007160] __hrtimer_run_queues+0x12a/0x270 Jul 25 21:13:40 arrakis kernel: [ 109.007161] hrtimer_interrupt+0x110/0x2c0 Jul 25 21:13:40 arrakis kernel: [ 109.007163] __sysvec_apic_timer_interrupt+0x5f/0xd0 Jul 25 21:13:40 arrakis kernel: [ 109.007164] asm_call_irq_on_stack+0x12/0x20 Jul 25 21:13:40 arrakis kernel: [ 109.007165] </IRQ> Jul 25 21:13:40 arrakis kernel: [ 109.007167] sysvec_apic_timer_interrupt+0x72/0x80 Jul 25 21:13:40 arrakis kernel: [ 109.007168] asm_sysvec_apic_timer_interrupt+0x12/0x20 Jul 25 21:13:40 arrakis kernel: [ 109.007182] RIP: 0010:__drm_dbg+0x3e/0x90 [drm] Jul 25 21:13:40 arrakis kernel: [ 109.007184] Code: 4c 24 48 4c 89 44 24 50 4c 89 4c 24 58 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 23 3d 51 1c 05 00 75 12 48 8b 44 24 28 <65> 48 2b 04 25 28 00 00 00 75 40 c9 c3 48 8d 45 10 48 89 34 24 48 Jul 25 21:13:40 arrakis kernel: [ 109.007185] RSP: 0018:ffffb880836d7ba0 EFLAGS: 00000246 Jul 25 21:13:40 arrakis kernel: [ 109.007187] RAX: 4f8e6fb112e3c800 RBX: ffffb880836d7d38 RCX: 0000000200000000 Jul 25 21:13:40 arrakis kernel: [ 109.007188] RDX: 0000000404000000 RSI: ffffffffc09d01f8 RDI: 0000000000000000 Jul 25 21:13:40 arrakis kernel: [ 109.007188] RBP: ffffb880836d7c00 R08: 0000000000000000 R09: 0000000000000000 Jul 25 21:13:40 arrakis kernel: [ 109.007189] R10: 000000000000000a R11: 0000000404000000 R12: ffffb880836d7d38 Jul 25 21:13:40 arrakis kernel: [ 109.007189] R13: 00000000fffffff4 R14: ffff90a352d80000 R15: ffffb880836d7e28 Jul 25 21:13:40 arrakis kernel: [ 109.007252] amdgpu_bo_do_create+0x2a4/0x4f0 [amdgpu] Jul 25 21:13:40 arrakis kernel: [ 109.007305] amdgpu_bo_create+0x40/0x270 [amdgpu] Jul 25 21:13:40 arrakis kernel: [ 109.007359] amdgpu_gem_create_ioctl+0x123/0x310 [amdgpu] Jul 25 21:13:40 arrakis kernel: [ 109.007413] ? amdgpu_gem_object_close+0x200/0x200 [amdgpu] Jul 25 21:13:40 arrakis kernel: [ 109.007423] drm_ioctl_kernel+0xaa/0xf0 [drm] Jul 25 21:13:40 arrakis kernel: [ 109.007433] drm_ioctl+0x20f/0x3a0 [drm] Jul 25 21:13:40 arrakis kernel: [ 109.007486] ? amdgpu_gem_object_close+0x200/0x200 [amdgpu] Jul 25 21:13:40 arrakis kernel: [ 109.007487] ? do_setitimer+0x179/0x210 Jul 25 21:13:40 arrakis kernel: [ 109.007539] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] Jul 25 21:13:40 arrakis kernel: [ 109.007541] __x64_sys_ioctl+0x83/0xb0 Jul 25 21:13:40 arrakis kernel: [ 109.007543] do_syscall_64+0x33/0x80 Jul 25 21:13:40 arrakis kernel: [ 109.007545] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 25 21:13:40 arrakis kernel: [ 109.007546] RIP: 0033:0x7fb8fffa2cc7 Jul 25 21:13:40 arrakis kernel: [ 109.007548] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48 Jul 25 21:13:40 arrakis kernel: [ 109.007548] RSP: 002b:00007fff2f0db438 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Jul 25 21:13:40 arrakis kernel: [ 109.007550] RAX: ffffffffffffffda RBX: 00007fff2f0db490 RCX: 00007fb8fffa2cc7 Jul 25 21:13:40 arrakis kernel: [ 109.007550] RDX: 00007fff2f0db490 RSI: 00000000c0206440 RDI: 000000000000000a Jul 25 21:13:40 arrakis kernel: [ 109.007551] RBP: 00000000c0206440 R08: 00000000ffffffff R09: 00007fb90006cbe0 Jul 25 21:13:40 arrakis kernel: [ 109.007551] R10: 0000000000000100 R11: 0000000000000246 R12: 0000555ce28ba2a0 Jul 25 21:13:40 arrakis kernel: [ 109.007552] R13: 000000000000000a R14: 0000000404000000 R15: 0000000000200000 After being instructed to test with latest stable (5.13 -- no issue) and to bisect to find when the kernel changes behavior wrt to this, I found this commit : commit 89fa15ecdca7eb46a711476b961f70a74765bbe4 Author: Huang Rui <ray.huang@amd.com> Date: Sat Jan 30 17:14:30 2021 +0800 drm/amdgpu: fix the issue that retry constantly once the buffer is oversize We cannot modify initial_domain every time while the retry starts. That will cause the busy waiting that unable to switch to GTT while the vram is not enough. Fixes: f8aab60422c3 ("drm/amdgpu: Initialise drm_gem_object_funcs for imported BOs") Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org As a very naive test, I applied it blindly over v5.10 and can confirm I can't reproduce the problem, but have no clue if this correct. I've been asked to file this issue for a possible backport in the 5.10.y line. I'll be happy to help if necessary. Thank you for your work!
Created attachment 298083 [details] lspci
As mentioned in debian bugreport, my kernel is tainted: [ 101.233439] CPU: 9 PID: 1811 Comm: Xwayland Tainted: G S E 5.11.0-rc4-00375-ga692a610d7ed #6 ** Tainted: S (4) * SMP kernel oops on an officially SMP incapable processor The reason is caused by my Ryzen 1600 being faulty and I need to disable its C6 state by writing in some MSR (using https://github.com/r4m0n/ZenStates-Linux )