Bug 199357
Summary: | amdgpu: hang a few seconds after logging in, most likely due to regression | ||
---|---|---|---|
Product: | Drivers | Reporter: | Mathias Tillman (master.homer) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | alexdeucher, christian.koenig, harry.wentland |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | v4.16 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Kernel log of the hang/crash
Hardware info Kernel log with added logging |
Created attachment 275293 [details]
Hardware info
Looks like an issue with DC to me. Can you bisect? I've just finished running a bisect now, and I have concluded that commit 36cc549d59864b7161f0e23d710c1c4d1b9cf022 (drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)) causes the lock-up. Let me know if you need anything else. Thanks, yeah that is clearly DC (display core). Harry can you take a look? I've no idea why this causes "flip_done timed out" and locks the system right now, but we're currently also dealing with some more fallout from that change, in particular blinking/flickering display if redshift/nightlight is on. I'm reluctant to just revert the offending commit as it's not incorrect but seems to expose some other flaws in our atomic check/commit implementation. (In reply to Harry Wentland from comment #5) > I'm reluctant to just revert the offending commit as it's not incorrect > but seems to expose some other flaws in our atomic check/commit > implementation. Unless a fix is at least on the horizon, since this commit introduced multiple issues, it would be nice to our users to revert it for the time being, then re-apply it when it's safe. Wanted to add some more info. The soft lock up will release after approximately 30 seconds, but after a few seconds it will lock up again and repeat. Looking at the kernel log, it seems that when the lock up happens, it takes an abnormally long time to reach the dm_pflip_high_irq function which is supposed to trigger the flip_done message. I've attached a new log with my added logging in case that helps. Created attachment 275337 [details]
Kernel log with added logging
Just saw that this has been reverted on git, so I will mark this as resolved. Since that commit was pushed to v4.16, shouldn't it also be reverted on linux-stable to make it to a future 4.16.y release? Yes, the revert cc'ed stable so it will show up in 4.16 as well. |
Created attachment 275291 [details] Kernel log of the hang/crash I've been testing kernel v4.16 on my computer, but it's basically unusable - because after a few seconds or so after logging in it will do a soft lockup, and I can't even switch to VT. I was, however, able to ssh in to it, which is how I was able to get the kernel log. Right as the hang happened, I can see this in the log: Apr 11 14:04:13 homer-desktop kernel: [ 45.532038] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:45:crtc-1] flip_done timed out Apr 11 14:04:23 homer-desktop kernel: [ 55.772028] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:37:plane-1] flip_done timed out Apr 11 14:04:33 homer-desktop kernel: [ 66.012282] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:44:plane-7] flip_done timed out and after that is the regular kernel crash. I have tried this on both v4.16 and v4.16.1 with the same results. However, it doesn't happen on v4.15 (which is what I'm running now). So there must be some kind of regression between those releases. I am running stable KDE neon (which is based on Ubuntu LTS) with precompiled kernels from the ubuntu mainline ppa.