Bug 209713
Summary: | amdgpu drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:483 dcn10_get_dig_frontend+0x9e/0xc0 [amdgpu] when resuming from S3 state | ||
---|---|---|---|
Product: | Drivers | Reporter: | Lahfa Samy (samy) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | REOPENED --- | ||
Severity: | low | CC: | bugzilla.kernel.org, fkrueger, kmueller, oliver |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.8.13-arch1-1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
parts of dmesg where the call trace happens during the resume from S3 sleep state.
Trace linux 5.10.1 Trace for 5.10.16 at boot Trace after resume from S3 state on 5.11.1 |
I cannot reproduce this call trace on the new kernel 5.9.1, so I could take that this issue was silently fixed ? I'll open the issue again if I see that the call trace shows up again someday. I'm still hitting this (when fbcon is initialized) with the DRM code queued for 5.10. I'm hitting this problem, too, after resume from s2ram. - Linux 5.10.1 - CPU: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx - Xorg 1.20 - Mesa 20.2 See attached file dcn10_get_dig_frontend.log Created attachment 294343 [details]
Trace linux 5.10.1
see entry before
Seems to be fixed for me since the last firmware update for the Picasso driver: - xf86-video-amdgpu-19.1.0-lp152.67.5.x86_64 - kernel-firmware-20201218-lp152.36.1.noarch The same happens on kernel 5.10.7 with kernel-firmware-20210109_d528862. - CPU: AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx (family: 0x17, model: 0x18, stepping: 0x1) - XOrg 1.20.10 . Mesa 20.3.2 Yeah, it came again yesterday evening - after it has been disappeared for about one week ... . It's fixed in kernel 5.10.9 with Mesa 20.3.3. no - same behavior as before with 5.10.9 and Mesa 20.3.3 Uups, it's the other crash now: 2021-01-23T18:45:31.955962+01:00 localhost kernel: [23110.401847] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out 2021-01-23T18:45:31.955989+01:00 localhost kernel: [23110.401869] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out 2021-01-23T18:45:42.709289+01:00 localhost kernel: [23121.153848] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-3] flip_done timed out 2021-01-23T18:45:42.709318+01:00 localhost kernel: [23121.153944] ------------[ cut here ]------------ 2021-01-23T18:45:42.709320+01:00 localhost kernel: [23121.154112] WARNING: CPU: 4 PID: 2627 at ../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7272 amdgpu_dm_atomic_commit_tail+0x22b1/0x2360 [amdgpu] let's wait ... The problem is back with kernel 5.10.10. [ 89.664494] WARNING: CPU: 6 PID: 4323 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:483 dcn10_get_dig_frontend+0x94/0xc0 [amdgpu] I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). Kernel 5.10.9 does not have it. The only change regarding "DCN" from 5.10.9 to 5.10.10 is commit 99ea120383b19feb1737c787dc1c8b35ce630fc5 Author: Alex Deucher <alexander.deucher@amd.com> Date: Mon Jan 4 11:24:20 2021 -0500 drm/amdgpu/display: drop DCN support for aarch64 commit c241ed2f0ea549c18cff62a3708b43846b84dae3 upstream. From Ard: "Simply disabling -mgeneral-regs-only left and right is risky, given that the standard AArch64 ABI permits the use of FP/SIMD registers anywhere, and GCC is known to use SIMD registers for spilling, and may invent other uses of the FP/SIMD register file that have nothing to do with the floating point code in question. Note that putting kernel_neon_begin() and kernel_neon_end() around the code that does use FP is not sufficient here, the problem is in all the other code that may be emitted with references to SIMD registers in it. So the only way to do this properly is to put all floating point code in a separate compilation unit, and only compile that unit with -mgeneral-regs-only." Disable support until the code can be properly refactored to support this properly on aarch64. Acked-by: Will Deacon <will@kernel.org> Reported-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> [ardb: backport to v5.10 by reverting c38d444e44badc55 instead] Acked-by: Alex Deucher <alexander.deucher@amd.com> # v5.10 backport Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Any idea? (In reply to Frank Kruger from comment #12) > I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with > kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). Kernel 5.10.9 does > not have it. This was originally reported for older kernels, and per comment 2, I was hitting it with the DRM code merged for 5.10 before 5.10-rc1. You probably just didn't hit it with 5.10.9 by luck. Created attachment 295319 [details] Trace for 5.10.16 at boot (In reply to Frank Kruger from comment #12) > I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with > kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). […] I can confirm this for 5.10.16. Looks like this is fixed in 5.11.0 and 5.11.1. > Looks like this is fixed in 5.11.0 and 5.11.1. I'm still getting this issue reliably under kernel 5.11.1 when resuming from suspended state.(In reply to Oliver Reeh from comment #16) So I confirm this for 5.11.1, still not solved. Created attachment 295425 [details]
Trace after resume from S3 state on 5.11.1
I didn't see any problem any more since 2021-02-14 and linux 5.10.16 with this patch applied: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/releases/5.9.14/revert-amd-amdgpu-disable-vcn-dpg-mode-for-picasso.patch Hope that this really fixed the problem for me. Still no problem here with the 5.11.x kernels. |
Created attachment 293025 [details] parts of dmesg where the call trace happens during the resume from S3 sleep state. I'm thinking that this bug is a regression since I haven't seen this call trace before on kernel older than 5.8.12-arch1-1 but I have yet to confirm this. The call trace may also happen only in a very specific way, my current computer has a USB-C Dock that is plugged in and the call trace happened when the USB-C was plugged in and the computer was suspended, then resumed. It is a Lenovo Thinkpad T495 model 20NKS28F00 with an AMD Ryzen 7 3700U and a Vega Radeon RX 10. Further comments will confirm if the call trace happens only when the USB-C Dock is plugged. As well as if this call trace happens on kernels older than 5.8.12-arch1-1. The computer does resume successfully and there is a like a minor screen glitch for a millisecond so it's not a very severe bug.