Bug 209713

Summary: amdgpu drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:483 dcn10_get_dig_frontend+0x9e/0xc0 [amdgpu] when resuming from S3 state
Product: Drivers Reporter: Lahfa Samy (samy)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: REOPENED ---    
Severity: low CC: bugzilla.kernel.org, fkrueger, kmueller, oliver
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.8.13-arch1-1 Subsystem:
Regression: No Bisected commit-id:
Attachments: parts of dmesg where the call trace happens during the resume from S3 sleep state.
Trace linux 5.10.1
Trace for 5.10.16 at boot
Trace after resume from S3 state on 5.11.1

Description Lahfa Samy 2020-10-16 09:31:00 UTC
Created attachment 293025 [details]
parts of dmesg where the call trace happens during the resume from S3 sleep state.

I'm thinking that this bug is a regression since I haven't seen this call trace before on kernel older than 5.8.12-arch1-1 but I have yet to confirm this.

The call trace may also happen only in a very specific way, my current computer has a USB-C Dock that is plugged in and the call trace happened when the USB-C was plugged in and the computer was suspended, then resumed.

It is a Lenovo Thinkpad T495 model 20NKS28F00 with an AMD Ryzen 7 3700U and a Vega Radeon RX 10.

Further comments will confirm if the call trace happens only when the USB-C Dock is plugged.

As well as if this call trace happens on kernels older than 5.8.12-arch1-1.

The computer does resume successfully and there is a like a minor screen glitch for a millisecond so it's not a very severe bug.
Comment 1 Lahfa Samy 2020-10-23 04:21:49 UTC
I cannot reproduce this call trace on the new kernel 5.9.1, so I could take that this issue was silently fixed ?
I'll open the issue again if I see that the call trace shows up again someday.
Comment 2 Michel Dänzer 2020-10-23 08:12:50 UTC
I'm still hitting this (when fbcon is initialized) with the DRM code queued for 5.10.
Comment 3 Klaus Mueller 2020-12-26 09:22:06 UTC
I'm hitting this problem, too, after resume from s2ram.

- Linux 5.10.1
- CPU: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx
- Xorg 1.20
- Mesa 20.2

See attached file dcn10_get_dig_frontend.log
Comment 4 Klaus Mueller 2020-12-26 09:23:54 UTC
Created attachment 294343 [details]
Trace linux 5.10.1

see entry before
Comment 5 Klaus Mueller 2021-01-11 21:27:37 UTC
Seems to be fixed for me since the last firmware update for the Picasso driver:
- xf86-video-amdgpu-19.1.0-lp152.67.5.x86_64
- kernel-firmware-20201218-lp152.36.1.noarch
Comment 6 Oliver Reeh 2021-01-13 19:57:31 UTC
The same happens on kernel 5.10.7 with kernel-firmware-20210109_d528862.

- CPU: AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx (family: 0x17, model: 0x18, stepping: 0x1)
- XOrg 1.20.10
. Mesa 20.3.2
Comment 7 Klaus Mueller 2021-01-13 20:44:15 UTC
Yeah, it came again yesterday evening - after it has been disappeared for about one week ... .
Comment 8 Oliver Reeh 2021-01-20 20:43:49 UTC
It's fixed in kernel 5.10.9 with Mesa 20.3.3.
Comment 9 Klaus Mueller 2021-01-23 18:04:44 UTC
no - same behavior as before with 5.10.9 and Mesa 20.3.3
Comment 10 Klaus Mueller 2021-01-23 18:10:35 UTC
Uups, it's the other crash now:
2021-01-23T18:45:31.955962+01:00 localhost kernel: [23110.401847] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
2021-01-23T18:45:31.955989+01:00 localhost kernel: [23110.401869] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out
2021-01-23T18:45:42.709289+01:00 localhost kernel: [23121.153848] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-3] flip_done timed out
2021-01-23T18:45:42.709318+01:00 localhost kernel: [23121.153944] ------------[ cut here ]------------
2021-01-23T18:45:42.709320+01:00 localhost kernel: [23121.154112] WARNING: CPU: 4 PID: 2627 at ../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7272 amdgpu_dm_atomic_commit_tail+0x22b1/0x2360 [amdgpu]

let's wait ...
Comment 11 Oliver Reeh 2021-01-24 09:03:14 UTC
The problem is back with kernel 5.10.10.

[   89.664494] WARNING: CPU: 6 PID: 4323 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:483 dcn10_get_dig_frontend+0x94/0xc0 [amdgpu]
Comment 12 Frank Kruger 2021-01-31 10:37:30 UTC
I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). Kernel 5.10.9 does not have it.
Comment 13 Frank Kruger 2021-01-31 11:11:41 UTC
The only change regarding "DCN" from 5.10.9 to 5.10.10 is

commit 99ea120383b19feb1737c787dc1c8b35ce630fc5
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Mon Jan 4 11:24:20 2021 -0500

    drm/amdgpu/display: drop DCN support for aarch64
    
    commit c241ed2f0ea549c18cff62a3708b43846b84dae3 upstream.
    
    From Ard:
    
    "Simply disabling -mgeneral-regs-only left and right is risky, given that
    the standard AArch64 ABI permits the use of FP/SIMD registers anywhere,
    and GCC is known to use SIMD registers for spilling, and may invent
    other uses of the FP/SIMD register file that have nothing to do with the
    floating point code in question. Note that putting kernel_neon_begin()
    and kernel_neon_end() around the code that does use FP is not sufficient
    here, the problem is in all the other code that may be emitted with
    references to SIMD registers in it.
    
    So the only way to do this properly is to put all floating point code in
    a separate compilation unit, and only compile that unit with
    -mgeneral-regs-only."
    
    Disable support until the code can be properly refactored to support this
    properly on aarch64.
    
    Acked-by: Will Deacon <will@kernel.org>
    Reported-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    [ardb: backport to v5.10 by reverting c38d444e44badc55 instead]
    Acked-by: Alex Deucher <alexander.deucher@amd.com> # v5.10 backport
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Any idea?
Comment 14 Michel Dänzer 2021-01-31 11:32:30 UTC
(In reply to Frank Kruger from comment #12)
> I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with
> kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). Kernel 5.10.9 does
> not have it.

This was originally reported for older kernels, and per comment 2, I was hitting it with the DRM code merged for 5.10 before 5.10-rc1. You probably just didn't hit it with 5.10.9 by luck.
Comment 15 Erik Quaeghebeur 2021-02-16 15:41:44 UTC
Created attachment 295319 [details]
Trace for 5.10.16 at boot

(In reply to Frank Kruger from comment #12)
> I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with
> kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). […]
I can confirm this for 5.10.16.
Comment 16 Oliver Reeh 2021-02-24 20:13:23 UTC
Looks like this is fixed in 5.11.0 and 5.11.1.
Comment 17 Lahfa Samy 2021-02-25 00:00:28 UTC
> Looks like this is fixed in 5.11.0 and 5.11.1.
I'm still getting this issue reliably under kernel 5.11.1 when resuming from suspended state.(In reply to Oliver Reeh from comment #16)

So I confirm this for 5.11.1, still not solved.
Comment 18 Lahfa Samy 2021-02-25 00:02:45 UTC
Created attachment 295425 [details]
Trace after resume from S3 state on 5.11.1
Comment 19 Klaus Mueller 2021-02-26 19:47:50 UTC
I didn't see any problem any more since 2021-02-14 and linux 5.10.16 with this patch applied: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/releases/5.9.14/revert-amd-amdgpu-disable-vcn-dpg-mode-for-picasso.patch

Hope that this really fixed the problem for me.
Comment 20 Oliver Reeh 2021-03-01 20:46:51 UTC
Still no problem here with the 5.11.x kernels.