Created attachment 296399 [details] My .config. Hello, I have tested and found this bug to occur on the specified bisected commit through Linux Kernel version 5.11.12. I am running Devuan (Debian) Linux with a hand created kernel config. I'm attaching it. I've bisected the kernel and found the broken commit. Here's how I got there in case you're curious: for i in "start v4.19-rc1 v4.18" bad good good skip skip good good skip good skip good skip good good bad good good bad good bad good bad bad good; do git bisect $i; done The broken commit is this one: commit 8eaf2b1faaf4358c6337785f2192055c6ef41e0d Author: Alex Deucher <alexander.deucher@amd.com> Date: Mon Jul 2 14:35:36 2018 -0500 drm/amdgpu: switch firmware path for SI parts Use separate firmware path for amdgpu to avoid conflicts with radeon on SI parts. Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Christian Knig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 56 +++++++++++++++++------------------ drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 14 ++++----- drivers/gpu/drm/amd/amdgpu/si_dpm.c | 22 +++++++------- 3 files changed, 46 insertions(+), 46 deletions(-) The last few messages before the whole system, including the network part of the LK and USB subsystem of the LK freezes up/powers down are: #15 #16 #17 #18 #19 #20 #21 #22 #23 I think that's telling my that I have 24 threads in my 3900X. During boot up, I don't even reach init. The whole system freezes. I don't see any boot up messages that suggest a problem with anything. I see only a very few massages at all. I did enable early printk. Steps to reproduce: 1: Build kernel with my config on a bad commit. 2: Boot new kernel (with an AMD HD7770 GPU installed, of course). 3: profit. If you even think of suggesting I upgrade to a newer GPU, recall also that there are none in stock -- and any GPU that is is priced unbelievably high. I'd love a newer GPU. But I'm going to have to wait like I have been for years for RDNA's big Navi (I was hoping it was compute/gaming oriented like a Titan. It looks like it's not. Grrrr, more waiting...) Thanks, David
I suspect the kernel is stalled looking for firmware that it can't fine. Do you have the firmwares in the new location in your initrd or filesystem in the appropriate place? I.e., they moved from radeon/ to amdgpu/. Check if the firmware for your card exists in /lib/firmware/amdgpu/ or wherever your distro puts the firmware and make sure your initrd is up to date.
Because this is a custom kernel, I decided to build the FW into the binary. A quick grep of my config would have told you that. 'CONFIG_EXTRA_FIRMWARE="radeon/verde_ce.bin radeon/verde_mc.bin radeon/verde_me.bin radeon/verde_pfp.bin radeon/verde_rlc.bin radeon/verde_smc.bin radeon/TAHITI_uvd.bin"' For some reason, when my system booted during the 4.14 (I think), series which I was using, it wanted to load the TAHITI FW also. My card is a Cape Verde. So I also included one of it's members.
Just in case the path changed, I ran a quick ls for you: % ls /lib/firmware/{radeon/verde_ce.bin,radeon/verde_mc.bin,radeon/verde_me.bin,radeon/verde_pfp.bin,radeon/verde_rlc.bin,radeon/verde_smc.bin,radeon/TAHITI_uvd.bin} /lib/firmware/radeon/TAHITI_uvd.bin /lib/firmware/radeon/verde_me.bin /lib/firmware/radeon/verde_smc.bin /lib/firmware/radeon/verde_ce.bin /lib/firmware/radeon/verde_pfp.bin /lib/firmware/radeon/verde_mc.bin /lib/firmware/radeon/verde_rlc.bin It's all there.
Having thought about it, it's possible that more than 1 TAHITI*.bin file is desired by the amdgpu driver. It's even possible that all of the radeon firmware is desired by the amdgpu driver. What do you think?
Just out of curiosity I decided to look at the firmware and see if there were any differences in the count, the naming, or binary data. % ls /lib/firmware/radeon/ | grep -i verde VERDE_ce.bin VERDE_mc.bin VERDE_mc2.bin VERDE_me.bin VERDE_pfp.bin VERDE_rlc.bin VERDE_smc.bin verde_ce.bin verde_k_smc.bin verde_mc.bin verde_me.bin verde_pfp.bin verde_rlc.bin verde_smc.bin % ls /lib/firmware/amdgpu/ | grep -i verde verde_ce.bin verde_k_smc.bin verde_mc.bin verde_me.bin verde_pfp.bin verde_rlc.bin verde_smc.bin % diff /lib/firmware/{radeon,amdgpu}/verde_ce.bin Binary files /lib/firmware/radeon/verde_ce.bin and /lib/firmware/amdgpu/verde_ce.bin differ % diff /lib/firmware/{radeon,amdgpu}/verde_k_smc.bin % diff /lib/firmware/{radeon,amdgpu}/verde_mc.bin % diff /lib/firmware/{radeon,amdgpu}/verde_me.bin Binary files /lib/firmware/radeon/verde_me.bin and /lib/firmware/amdgpu/verde_me.bin differ % diff /lib/firmware/{radeon,amdgpu}/verde_pfp.bin Binary files /lib/firmware/radeon/verde_pfp.bin and /lib/firmware/amdgpu/verde_pfp.bin differ % diff /lib/firmware/{radeon,amdgpu}/verde_rlc.bin Binary files /lib/firmware/radeon/verde_rlc.bin and /lib/firmware/amdgpu/verde_rlc.bin differ I'll rebuild my kernel and test with the different firmware and see if that changes anything.
Ok, I've rebuilt the 4.18 kernel above and it works with the AMDGPU firmware vs. the firmware in the Radeon directory. However, the 5.11 series kernel still exhibits the above bug. I'm going to rebuild it with the latest firmware vs. the firmware from 2019 which Devuan (Debian) Linux offers in their package manager. It should be noted that including all the FW from the AMDGPU and Radeon dirs causes the kernel to boot noticeably slower. It takes about 5-10s to load the correct firmware and continue onto init. Therefore, I hope to come up with a more minimalistic FW config in the future.
Thanks for the help! I would never have guessed that the FW differed between dirs. I also wouldn't normally guess that it would change for a card so old as mine. Solution: Use the FW from the amdgpu dir as opposed to the radeon dir. Build kernel with the latest FW.