The system is still operational, and I was able to login remotely via ssh. Seems to happen more often when the CPU is under load. Relevant dmesg: INFO: task Xorg:539 blocked for more than 120 seconds. Not tainted 4.7.0 #5 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Xorg D ffff880085387a48 0 539 537 0x00080084 ffff880085387a48 ffff880392f1ea40 0000000000000000 ffff880085388000 ffff88045e658000 ffff88045e658000 ffff880062516840 7fffffffffffffff ffff880085387a60 ffffffff815e5dd0 0000000000000000 ffff880085387ad8 Call Trace: [<ffffffff815e5dd0>] schedule+0x30/0x80 [<ffffffff815e8429>] schedule_timeout+0x159/0x1a0 [<ffffffff810a956e>] ? __wake_up_locked+0xe/0x10 [<ffffffff8144b452>] fence_default_wait+0x182/0x1d0 [<ffffffff8144adf0>] ? fence_free+0x20/0x20 [<ffffffff8144af9e>] fence_wait_timeout+0x4e/0x90 [<ffffffffc02d89dd>] amdgpu_ctx_add_fence+0x5d/0x100 [amdgpu] [<ffffffffc02cad00>] amdgpu_cs_ioctl+0xbf0/0xff0 [amdgpu] [<ffffffff813e978d>] drm_ioctl+0x14d/0x530 [<ffffffffc02ca110>] ? amdgpu_cs_find_mapping+0x90/0x90 [amdgpu] [<ffffffff81171b6a>] ? do_readv_writev+0x11a/0x200 [<ffffffffc02b2047>] amdgpu_drm_ioctl+0x47/0x80 [amdgpu] [<ffffffff81182ebd>] do_vfs_ioctl+0x8d/0x570 [<ffffffff81001255>] ? syscall_slow_exit_work+0xb5/0x100 [<ffffffff8153ea5d>] ? __sys_recvmsg+0x5d/0x70 [<ffffffff810fbec6>] ? __audit_syscall_entry+0xa6/0xf0 [<ffffffff811833dc>] SyS_ioctl+0x3c/0x70 [<ffffffff8100179a>] do_syscall_64+0x5a/0x110 [<ffffffff815e9040>] entry_SYSCALL64_slow_path+0x25/0x25
Please attach your xorg log and dmesg output.
Created attachment 227861 [details] dmesg
Created attachment 227871 [details] Xorg.log
Previous dmesg is from 4.7.0, the attached one is from git HEAD. Using the _k_ firmware seems to make this happen much less frequently.
I tried to reset the gpu by using /sys/kernel/debug/dri/0/amdgpu_gpu_reset, and the result is a NULL pointer dereference in the kernel. dmesg attached
Created attachment 227901 [details] Kernel Oops when resetting the GPU
This seems to be (very) related to https://bugzilla.kernel.org/show_bug.cgi?id=198883