Bug 206017
Summary: | Kernel 5.4.x unusable with GUI due to crashes (some hard) | ||
---|---|---|---|
Product: | Drivers | Reporter: | udo (udovdh) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED CODE_FIX | ||
Severity: | blocking | CC: | alexdeucher, kernel, paul.e.hill2, priit, reuben_p |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.4.x | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
udo
2019-12-30 16:02:23 UTC
See https://gitlab.freedesktop.org/drm/amd/issues/934 for more details. There is no need to file another bug report. Let's keep this all in one place. amdgpu.noretry=0 appears to help on 5.4.6. But 5.4.x is not really stable; it crashes easily within a day where 5.3.18 can stay up for a few days. Firefox is still the trigger. When I do not use it the system remains usable. When I use Firefox the system crashes hard within a few hours. 5.4.8 also suffers from the hard hang, Firefox is involded playing youtube and such. And it happened again, without youtube playing but while browsing. 5.3.18 takes a lot longer to crash/hang or whatever. Does the screen corruption I see now and then have something to do with this issue? 5.4.8 runs less than 12 hours until hard crash when used. More like 6 hours or less. I.e.: it is stable and working OK with e.g. mkv playing. Then we start Firefox and boom. System freezes, 5.4.9 also has this issue. Runs ok with firefox not being used, as far as I can test and detect. With firefox the system locks hard after a while. Hello! I am experiencing the same issue on 5.4.10 (Fedora 31, KDE Spin). I'm going to attempt the 'amdgpu.noretry=0' fix later today. I made the below bug report with Fedora: https://ask.fedoraproject.org/t/fedora-kde-amdgpu-issue/5026 Summarized: gpu: Radeon Vega 10 Issue: I discovered a lot of these entries within journalctl and dmesg after gui freezes: kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! Thank you! (In reply to Paul from comment #13) > Hello! > > I am experiencing the same issue on 5.4.10 (Fedora 31, KDE Spin). I'm going > to attempt the 'amdgpu.noretry=0' fix later today. > > I made the below bug report with Fedora: > https://ask.fedoraproject.org/t/fedora-kde-amdgpu-issue/5026 > > > Summarized: > gpu: Radeon Vega 10 > > Issue: I discovered a lot of these entries within journalctl and dmesg after > gui freezes: > > kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but > soft recovered > kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for > fences timed out! > > Thank you! Just wanted to report in that the 'amdgpu.noretry=0' workaround resolved my issues. Thanks! Should be fixed in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7aec9ec1cf324d5c5a8d17b9c78a34c388e5f17b which should also be landing in various stable kernels as well. amdgpu.noretry=0 works as workaround so the commit should fix things well. Thanks for the commit! Still looking for the right component for https://bugzilla.kernel.org/show_bug.cgi?id=206191 :-/ Kernel 5.6.x works very well. Git mesa might help too. And it's back after several months. 5.6.16-1-MANJARO mesa 20.0.7-3 amdgpu.ppfeaturemask=0xfffd7fff amdgpu.noretry=0 amdgpu.lockup_timeout=0 amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt Appears to work OK for me: AMD Ryzen 5 3400G with Radeon Vega Graphics on Gigabyte X570 AORUS PRO, Fedora 31, git mesa, kernel.org 5.6.x, etc amdgpu.gttsize=8192 amdgpu.lockup_timeout=1000 amdgpu.gpu_recovery=1 amdgpu.noretry=0 amdgpu.ppfeaturemask=0xfffd3fff kernel 5.8.0-2-MANJARO; Vega 64 GPU; mesa 20.1.5; xf86-video-amdgpu 19.1.0 aug 20 12:58:47 Zen kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! aug 20 12:58:47 Zen kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! aug 20 12:58:52 Zen kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! aug 20 12:58:52 Zen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=158674961, emitted seq=158674963 aug 20 12:58:52 Zen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 933 thread Xorg:cs0 pid 941 aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x63, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x26, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x61, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed message: 0x37, input parameter: 0x0, error code: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x63, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x26, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x61, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed message: 0x37, input parameter: 0x0, error code: 0xffffffff aug 20 12:58:56 Zen systemd-coredump[109412]: Process 933 (Xorg) of user 0 dumped core. |