Bug 207589

Summary: amdgpu not working with kernel 5.6.x
Product: Drivers Reporter: fannullone (bit.gossip)
Component: Video(Other)Assignee: drivers_video-other
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexdeucher, andy.holst, dick
Priority: P1    
Hardware: x86-64   
OS: Linux   
URL: https://github.com/Dunedan/mbp-2016-linux/issues/142
Kernel Version: 5.6.x Subsystem:
Regression: No Bisected commit-id:
Attachments: lshw.log
hwinfo.log
dmesg log
Xorg log
built in display working for kernel v5.5.0
built in display NOT working for compiled kernel v5.6.0
dmesg.kernel.v5.6.x_commit_fb95aae6e67c4e319a24b3eea32032d4246a5335.log
dmesg.kernel.v5.5.0+.commit.8815a94f27d2f30fe1216ce10c7da0f6ae69ca0f.bad.log
Xorg.0.8815a94f27d2f30fe1216ce10c7da0f6ae69ca0f.log
Dmesg and Xorg logs after handful times bisecting the git kernel commits
2001-drm-amd-display-Force-link_rate-as-LINK_RATE_RBR2-fo.patch

Description fannullone 2020-05-05 20:42:56 UTC
platform is Apple macbook pro 13.3 which is equipped with 2 gpu:
[mbp ~]$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev ef)

It was working fine with amdgpu up to kernel 5.5.* then it broke with kernel 5.6.
Comment 1 Alex Deucher 2020-05-07 19:44:45 UTC
Please attach your dmesg output and xorg log (if using X).  Can you bisect?
Comment 2 Andy Holst 2020-05-07 20:38:46 UTC
Created attachment 288983 [details]
lshw.log

lshw log from my MacBookPro13,3 with kernel 5.7 rc3
Comment 3 Andy Holst 2020-05-07 20:40:55 UTC
Created attachment 288985 [details]
hwinfo.log

hwinfo from my MacBookPro13,3 on kernel 5.7 rc3
Comment 4 Andy Holst 2020-05-07 20:43:58 UTC
Created attachment 288987 [details]
dmesg log

kernel 5.7 rc3
Comment 5 Andy Holst 2020-05-07 20:45:22 UTC
Created attachment 288989 [details]
Xorg log

kernel 5.7 rc3
Comment 6 Dick Marinus 2020-05-09 05:53:26 UTC
Thanks for your reply Alex! I've bisected the problem to:

b9f1246df179522bc28fda50b720553c845863db is the first bad commit
commit b9f1246df179522bc28fda50b720553c845863db
Author: Noah Abradjian <noah.abradjian@amd.com>
Date:   Fri Nov 22 16:07:24 2019 -0500

    drm/amd/display: Collapse resource arrays when pipe is disabled
    
    [Why]
    Currently, pipe resources are assigned to an index that matches the pipe position.
    However, if pipe 1 or 2 is disabled, there will be a gap in the arrays which causes a crash when iterating based on pipe_count.
    
    [How]
    Fix resource construct to assign resources to minimum available array index.
    
    Signed-off-by: Noah Abradjian <noah.abradjian@amd.com>
    Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
    Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 .../gpu/drm/amd/display/dc/dcn21/dcn21_resource.c    | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b9f1246df179522bc28fda50b720553c845863db
Comment 7 Alex Deucher 2020-05-09 13:22:42 UTC
I doubt that commit is the culprit  It changes a file that is not even used on your asic.  Can you attach your dmesg output from a kernel where it is working as well?
Comment 8 Andy Holst 2020-05-09 17:24:09 UTC
Created attachment 289023 [details]
built in display working for kernel v5.5.0
Comment 9 Andy Holst 2020-05-09 17:25:34 UTC
Created attachment 289025 [details]
built in display NOT working for compiled kernel v5.6.0
Comment 10 Dick Marinus 2020-05-09 19:02:46 UTC
> I doubt that commit is the culprit  It changes a file that is not even used
> on your asic.

I was afraid of that, I found out that bisecting isn't that easy. I might have been hunting ghosts, I've marked booting with a black screen as "bad" and booting with a working framebuffer as "good".

I've now used linux-stable for bisecting, I can use another git repository or ranges if you'd like.
Comment 11 Andy Holst 2020-05-09 20:16:14 UTC
After bisecting a few times (more steps to go) I found that commit fb95aae6e67c4e319a24b3eea32032d4246a5335 (v5.6.0-rc1) is working for the built in display.
Comment 12 Andy Holst 2020-05-09 20:17:05 UTC
Created attachment 289033 [details]
dmesg.kernel.v5.6.x_commit_fb95aae6e67c4e319a24b3eea32032d4246a5335.log
Comment 13 Andy Holst 2020-05-09 22:31:16 UTC
Created attachment 289037 [details]
dmesg.kernel.v5.5.0+.commit.8815a94f27d2f30fe1216ce10c7da0f6ae69ca0f.bad.log

commit 8815a94f27d2f30fe1216ce10c7da0f6ae69ca0f caused black builtin display, however it seems identified by Xorg log.
Comment 14 Andy Holst 2020-05-09 22:32:12 UTC
Created attachment 289039 [details]
Xorg.0.8815a94f27d2f30fe1216ce10c7da0f6ae69ca0f.log

Xorg.0.8815a94f27d2f30fe1216ce10c7da0f6ae69ca0f.log
Comment 16 Andy Holst 2020-05-11 00:28:57 UTC
Created attachment 289065 [details]
Dmesg and Xorg logs after handful times bisecting the git kernel commits

After a handful times bisecting the kernel git commits where v5.5.0 was set as good commit and v5.6.0 as bad commit it says that the first bad commit is:

------------------------------------------------------------------------------

f33a8770cdda79031a22241eaaac4eaf66e304fb is the first bad commit
commit f33a8770cdda79031a22241eaaac4eaf66e304fb
Author: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Date:   Fri Dec 6 12:43:30 2019 -0500

    drm/amdgpu: Add task barrier to XGMI hive.
    
    Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
    Reviewed-by: Le Ma <Le.Ma@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 664928e1bc23a51cd51d7dc2d40be9af9b332beb 071a99250da92b900dd06f3d7183d0df4abe6653 M      drivers


------------------------------------------------------------------------------

There was two git commits that I couldn't compile, I assumed the first commit I couldn't commit as good one and the second one as bad one.

I hope the dmesg and Xorg logs from the built kernel dmesg logs from the bisected git commits can give some clue what is going wrong with the built in display for MacBookPro13,3.
Comment 17 Andy Holst 2020-05-11 00:34:11 UTC
The two git commits that I couldn't compile are also included in the tar archive.
Comment 18 Andy Holst 2020-05-12 06:50:12 UTC
I bisected the commits a second time between good v5.5.0 and bad v5.6.0 and all commits I couldn't compile I counted as bad commit.

The first bad commit I got this time is

Author: Roman Li <roman.li@amd.com>
Date:   Fri Nov 22 10:58:10 2019 -0500

    drm/amd/display: Default max bpc to 16 for eDP
    
    [Why]
    Some 10bit eDP panels don't lightup after we cap bpc to 8.
    
    [How]
    Set default max_bpc to 16 for edp connector type.
    
    Signed-off-by: Roman Li <roman.li@amd.com>
    Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
    Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 f23cf38da6c12011608bdcdda5cf6e0f63628254 f108cc1e2e108ce444439a1d05ac1a0b0a228562 M	drivers
Comment 19 Andy Holst 2020-05-12 06:53:52 UTC
Right the commit hash is 4a8ca46bae8affba063aabac85a0b1401ba810a3 for the first bad commit.
Comment 20 Dick Marinus 2020-07-22 18:57:52 UTC
Created attachment 290453 [details]
2001-drm-amd-display-Force-link_rate-as-LINK_RATE_RBR2-fo.patch

Aun-Ali Zaidi created this patch and this seems to fix the issue on my system.
Comment 21 Alex Deucher 2020-07-22 19:54:17 UTC
There is already a similar fix upstream:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dec9de2ada523b344eb2428abfedf9d6cd0a0029
Does that patch fix the issue?
Comment 22 Andy Holst 2020-07-23 00:25:30 UTC
I can confirm the patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dec9de2ada523b344eb2428abfedf9d6cd0a0029 included in the kernel version 5.7.10 makes the built-in display working again for MBP model 13,3.
Comment 23 fannullone 2020-08-09 16:15:42 UTC
I can confirm that my mbp 13.3 is running again with amdgpu after upgrading to 5.7.12-200.fc32.x86_64.
Can also use built-in screen together with external monitor just fine.
Thanks everybody for great job!
Comment 24 Andy Holst 2020-08-13 04:22:12 UTC
Actually the commit 639e0db2d70fb84833d96e782cc4a01825e03b13 seems to be one fixing the issue included in v5.8 not the https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dec9de2ada523b344eb2428abfedf9d6cd0a0029 as suggested.
Comment 25 Alex Deucher 2020-08-13 04:28:03 UTC
This bug can be closed.
Comment 26 Andy Holst 2020-08-13 22:43:18 UTC
Indeed, the bug can be closed since the built-in display is working again for kernel version 5.7.8+.
Comment 27 fannullone 2020-08-14 14:54:28 UTC
Working great, thanks everybody who contributed to the fix!