Bug 196291 - amdgpu: Freeze because of syscall not returning
Summary: amdgpu: Freeze because of syscall not returning
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-07 18:01 UTC by Tobias Auerochs
Modified: 2017-09-22 01:15 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.11.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with lockup warning at the end (87.26 KB, text/plain)
2017-07-07 18:01 UTC, Tobias Auerochs
Details
/sys/kernel/debug/dri/0/amdgpu_fence_info after being frozen for a few minutes (1.29 KB, text/plain)
2017-07-10 18:34 UTC, Tobias Auerochs
Details

Description Tobias Auerochs 2017-07-07 18:01:25 UTC
Created attachment 257397 [details]
dmesg with lockup warning at the end

An amdgpu syscall, called by plasmashell, appears to deadlock randomly and freeze X.org completely. Several graphics processes, plasmashell and X.org are left stuck in D-State. Everything else continues to operate correctly, including audio, networking, etc..

The issue seems to appear more frequently whilst running games, although I am unable to find any particular pattern to it.

Running Arch Linux with a custom compiled linux-zen kernel (with ACS override patches) and ZFS, although as far as I can tell those are not related to the issue, Mesa 17.1.4 with Radeon RX 480. The issue has been around for a while and I sadly do not remember when it first occured, but definitely the entire 4.11.x lineup is affected and I am fairly sure 4.10.x was as well. The issue is way too rare though for me to bisect the exact cause however.
Comment 1 Christian König 2017-07-07 18:10:46 UTC
Please provide the output of "cat /sys/kernel/debug/dri/0/amdgpu_fence_info" when this happens.
Comment 2 Tobias Auerochs 2017-07-10 18:34:20 UTC
Created attachment 257449 [details]
/sys/kernel/debug/dri/0/amdgpu_fence_info after being frozen for a few minutes

Got the freeze again randomly, attached the output from /sys/kernel/debug/dri/0/amdgpu_fence_info.
Comment 3 Christian König 2017-07-10 18:50:54 UTC
That isn't related to any system call. The problem is simply that the hardware has crashed and some task is trying to push new commands to it, waiting for previous commands to end (which never happens).

That is most likely a problem on the user space driver side and not related to the kernel at all.

Please open a bug report on FDO for this.
Comment 4 Tobias Auerochs 2017-07-10 19:11:01 UTC
Submitted on freedesktop.org bugzilla:
https://bugs.freedesktop.org/show_bug.cgi?id=101746
Comment 5 Tobias Auerochs 2017-09-22 01:15:25 UTC
Well, after encountering a possibly unrelated (reproducible) issue, causing the exact same symptons and a GPU reset (in debugfs) seems to recover correctly from that, I think this issue really just runs down to GPU resets not being issued automatically on the kernel side yet.

Note You need to log in before you can comment on or make changes to this bug.