Bug 196291
Summary: | amdgpu: Freeze because of syscall not returning | ||
---|---|---|---|
Product: | Drivers | Reporter: | Tobias Auerochs (tobi291019) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | christian.koenig |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.11.8 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg with lockup warning at the end
/sys/kernel/debug/dri/0/amdgpu_fence_info after being frozen for a few minutes |
Please provide the output of "cat /sys/kernel/debug/dri/0/amdgpu_fence_info" when this happens. Created attachment 257449 [details]
/sys/kernel/debug/dri/0/amdgpu_fence_info after being frozen for a few minutes
Got the freeze again randomly, attached the output from /sys/kernel/debug/dri/0/amdgpu_fence_info.
That isn't related to any system call. The problem is simply that the hardware has crashed and some task is trying to push new commands to it, waiting for previous commands to end (which never happens). That is most likely a problem on the user space driver side and not related to the kernel at all. Please open a bug report on FDO for this. Submitted on freedesktop.org bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101746 Well, after encountering a possibly unrelated (reproducible) issue, causing the exact same symptons and a GPU reset (in debugfs) seems to recover correctly from that, I think this issue really just runs down to GPU resets not being issued automatically on the kernel side yet. |
Created attachment 257397 [details] dmesg with lockup warning at the end An amdgpu syscall, called by plasmashell, appears to deadlock randomly and freeze X.org completely. Several graphics processes, plasmashell and X.org are left stuck in D-State. Everything else continues to operate correctly, including audio, networking, etc.. The issue seems to appear more frequently whilst running games, although I am unable to find any particular pattern to it. Running Arch Linux with a custom compiled linux-zen kernel (with ACS override patches) and ZFS, although as far as I can tell those are not related to the issue, Mesa 17.1.4 with Radeon RX 480. The issue has been around for a while and I sadly do not remember when it first occured, but definitely the entire 4.11.x lineup is affected and I am fairly sure 4.10.x was as well. The issue is way too rare though for me to bisect the exact cause however.