Bug 205169
Summary: | AMDGPU driver with Navi card hangs Xorg in fullscreen only. | ||
---|---|---|---|
Product: | Drivers | Reporter: | Dmitri Seletski (drjoms) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | aladjev.andrew, alexdeucher, kernelbug5193, pierre-eric.pelloux-prayer, shtetldik, witold.baryluk+kernel |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.4.0-rc2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg Sat 12 Oct 2019 03:34:43 PM IST
.config file Sat 12 Oct 2019 03:36:01 PM IST possible fix |
Description
Dmitri Seletski
2019-10-12 14:24:47 UTC
Created attachment 285479 [details]
dmesg Sat 12 Oct 2019 03:34:43 PM IST
Created attachment 285481 [details]
.config file Sat 12 Oct 2019 03:36:01 PM IST
Module Size Used by bridge 147456 0 stp 16384 1 bridge llc 16384 2 bridge,stp tun 53248 2 uvcvideo 106496 0 videobuf2_vmalloc 16384 1 uvcvideo videobuf2_memops 16384 1 videobuf2_vmalloc videobuf2_v4l2 24576 1 uvcvideo videodev 204800 2 videobuf2_v4l2,uvcvideo kvm_amd 86016 0 videobuf2_common 49152 2 videobuf2_v4l2,uvcvideo joydev 24576 0 mousedev 24576 0 kvm 659456 1 kvm_amd amdgpu 3989504 12 irqbypass 16384 1 kvm snd_virtuoso 49152 2 snd_oxygen_lib 49152 1 snd_virtuoso snd_mpu401_uart 16384 1 snd_oxygen_lib gpu_sched 32768 1 amdgpu i2c_piix4 24576 0 snd_rawmidi 32768 1 snd_mpu401_uart ttm 94208 1 amdgpu sr_mod 28672 0 cdrom 36864 1 sr_mod k10temp 16384 0 i realised that I have llvm 10 and 9 same time on my machine. i removed llvm 10, recompiled mesa. uname -a Linux (none)dimko's Desktop 5.4.0-rc2 #1 SMP PREEMPT Tue Oct 8 19:48:16 IST 2019 x86_64 AMD Ryzen 5 1600 Six-Core Processor AuthenticAMD GNU/Linux I am on AMD64 Gentoo. will test after mesa is recompiled with V9 LLVM support and report any changes. If any. screen resolution 3440x1440. refresh rate 100, also tried 60. did not make any difference. interesting find, under Xwayland, same issue doesn't happen! I won't blame it on Xorg, because under older kernel programs with OpenGL and fulscreen work. (In reply to Dmitri Seletski from comment #0) > I have another problem logged with Navi + AMDGPU drivers. It's triggered > independently and reliable. > https://bugzilla.kernel.org/show_bug.cgi?id=204725 > > With that said, starting strictly and specifically with kernel version > 5.4.0* I have new problem. > What kernel version were you using before that didn't have the problem? (In reply to Pierre-Eric Pelloux-Prayer from comment #7) > (In reply to Dmitri Seletski from comment #0) > > I have another problem logged with Navi + AMDGPU drivers. It's triggered > > independently and reliable. > > https://bugzilla.kernel.org/show_bug.cgi?id=204725 > > > > With that said, starting strictly and specifically with kernel version > > 5.4.0* I have new problem. > > > > What kernel version were you using before that didn't have the problem? It was 5.3.* when I could open and use OpenGL and Vulkan apps full screen and it wouldn't crash. This is list of kernels I used from 5.3.* ls /boot/ |grep vmlinuz-5.3. vmlinuz-5.3.0+ vmlinuz-5.3.0-next-20190920 vmlinuz-5.3.0+.old vmlinuz-5.3.0-rc6 vmlinuz-5.3.0-rc6+ vmlinuz-5.3.0-rc6+.old vmlinuz-5.3.0-rc8 vmlinuz-5.3.0-rc8.old i had a couple of LLVM versions.i removed all. Now I have version 9.0.0 dimko@(none)dimko's Desktop ~ $ ls /boot/ |grep vmlinuz-5.3. sys-devel/llvm Latest version available: 9.0.0 Latest version installed: 9.0.0 I have recompiled Mesa with llvm 9(previously was compiled with llvm 10 which i removed off the system manually) glxinfo | grep "OpenGL version" OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.3.0-devel (git-1294f01e06) "git bisect" identifies this commit as the problematic one: 617089d5837a ("drm/amd/display: revert wait in pipelock"). Reverting this commit on top of amd-staging-drm-next seems to work fine. (In reply to Pierre-Eric Pelloux-Prayer from comment #10) > "git bisect" identifies this commit as the problematic one: 617089d5837a > ("drm/amd/display: revert wait in pipelock"). > > Reverting this commit on top of amd-staging-drm-next seems to work fine. uname -a Linux (none)dimko's Desktop 5.3.0-rc3+ #3 SMP PREEMPT Mon Oct 14 20:49:02 IST 2019 x86_64 AMD Ryzen 5 1600 Six-Core Processor AuthenticAMD GNU/Linux git checkout 617089d5837a^ Issue no longer happens Major downgrade, but no more problem. Which commit can I use to solve this issue? Bug 205169 - AMDGPU driver with Navi card hangs Xorg in fullscreen only. (edit) https://bugzilla.kernel.org/show_bug.cgi?id=204725 Sorry that I take advantage of you here. I will try to find 5.3.0 commit. I am new into all this stuff. (In reply to Dmitri Seletski from comment #11) > (In reply to Pierre-Eric Pelloux-Prayer from comment #10) > > "git bisect" identifies this commit as the problematic one: 617089d5837a > > ("drm/amd/display: revert wait in pipelock"). > > > > Reverting this commit on top of amd-staging-drm-next seems to work fine. > > uname -a > Linux (none)dimko's Desktop 5.3.0-rc3+ #3 SMP PREEMPT Mon Oct 14 20:49:02 > IST 2019 x86_64 AMD Ryzen 5 1600 Six-Core Processor AuthenticAMD GNU/Linux > > > git checkout 617089d5837a^ > > Issue no longer happens > > Major downgrade, but no more problem. > Which commit can I use to solve this issue? > > Bug 205169 - AMDGPU driver with Navi card hangs Xorg in fullscreen only. > (edit) > https://bugzilla.kernel.org/show_bug.cgi?id=204725 > > Sorry that I take advantage of you here. > I will try to find 5.3.0 commit. I am new into all this stuff. with regards to that other bug. It's there since moment when Navi driver was first introduced. I had a similar issue with Borderlands 2: https://gitlab.freedesktop.org/mesa/mesa/issues/2004 After I reverted the patch mentioned in comment 10, the issue seems to be fixed. The other hang later seems unrelated (looks like sdma is the problem with that one). Looks like the same issue with Pathfinder: Kingmaker: https://bugs.freedesktop.org/show_bug.cgi?id=112266 (In reply to ArneJ from comment #13) > I had a similar issue with Borderlands 2: > https://gitlab.freedesktop.org/mesa/mesa/issues/2004 > > > After I reverted the patch mentioned in comment 10, the issue seems to be > fixed. > The other hang later seems unrelated (looks like sdma is the problem with > that one). in my case its with ALL games. pls try others and report back. (In reply to Shmerl from comment #14) > Looks like the same issue with Pathfinder: Kingmaker: > https://bugs.freedesktop.org/show_bug.cgi?id=112266 in my case its with ALL games. pls try others and report back. (In reply to Dmitri Seletski from comment #16) > (In reply to Shmerl from comment #14) > > Looks like the same issue with Pathfinder: Kingmaker: > > https://bugs.freedesktop.org/show_bug.cgi?id=112266 > > in my case its with ALL games. pls try others and report back. I don't know which games you mean. Some others work don't hang me, such as Ion Fury, The Bard's Tale IV and etc. Yet some others like Hedon hang with gfx_0.0.0 timeout hang, so not the same as flip_done timed out hang. Anyway, I'll try reverting that commit, to check if it helps. I can confirm, that reverting that commit indeed prevents the hang in Pathfinder: Kingmaker! (In reply to Dmitri Seletski from comment #16) > (In reply to Shmerl from comment #14) > > Looks like the same issue with Pathfinder: Kingmaker: > > https://bugs.freedesktop.org/show_bug.cgi?id=112266 > > in my case its with ALL games. pls try others and report back. I tested many games all over. Many had this issue, some not. After reverting the aforementioned kernel patch and installing latest llvm and mesa from git, I had no more hangs (around 3-4 weeks without a hang now). Created attachment 285935 [details]
possible fix
Does this patch help?
(In reply to Alex Deucher from comment #20) > Created attachment 285935 [details] > possible fix > > Does this patch help? It did not just solve one problem, but two! First of all it solved original issue. Second of all, some games were hanging right before quitting. Xorg was responsive, but processes did not disappear. I was blaming on proprietary code. Apparently it was same bug, just different invocation of it. Please close this bug report. My problem is now fixed. It fixes Pathfinder: Kingamer too. But first let the patch be upstreamed, then it's OK to close the bug :) I just let Borderlands 2 run for about one hour in the menu which causes a hang without this patch in at most 3 minutes. Consider Borderlands 2 also fixed with this :) Just FYI, 5.4 is out, but the fix didn't land yet, so it needs to be still applied manually. Also, even with 100 ms timeout, the flip hang still happens just very rarely and not in the usual scenarios for me. For example when playing The Witcher 3 (Wine+dxvk) and minimizing the game Window, on some rare occasion that flip hang occurs even with the patch. I suppose it's something to do with KWin (I usually keep compositing disabled though in those cases). So may be 100 ms value is not always enough? Patch has been upstream for a while: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e29be9e0bbbf9cb3d718c5c48382b1420ce0749 Kernel driver hangs in production using regular usage. Such issues should be escalated as much as possible: DCN authors and developers meetings, core developers replacements, driver refactoring/rewrite, tests coverage. But it works in commercial environment only, open source provides TIMEOUT_FOR_FLIP_PENDING. 1.5 years passed: TIMEOUT_FOR_FLIP_PENDING is still here and nobody cares, and i am almost sure that nobody will care about it tomorrow. Thank you. |