Bug 205521

Summary: 5.3.11 update broke AMDGPU Raven Ridge
Product: Drivers Reporter: Luya Tshimbalanga (luya)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED UNREPRODUCIBLE    
Severity: blocking CC: alexdeucher
Priority: P1    
Hardware: x86-64   
OS: Linux   
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1772313
Kernel Version: 5.3.11 and up Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg reporting broken amd raven ridge firmware
dmesg with the latest git snapshot

Description Luya Tshimbalanga 2019-11-14 06:15:28 UTC
Created attachment 285903 [details]
dmesg reporting broken amd raven ridge firmware

AMD Raven Ridge firware is currently broken with the recent stable kernel release resulting a blank screen on boot and preventing booting on the login screen either graphical and text mode.

Extract from boot:

nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Direct firmware load for amdgpu/raven_gpu_info.bin failed with error -2
nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Failed to load gpu_info firmware "amdgpu/raven_gpu_info.bin"
nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Fatal error during GPU init
Comment 1 Luya Tshimbalanga 2019-11-14 06:16:43 UTC
Created attachment 285905 [details]
dmesg with the latest git snapshot

Recent kernel git snapshot is also affected:
Comment 2 Luya Tshimbalanga 2019-11-14 07:40:03 UTC
Added similar report from freedesktop.org
Comment 3 Alex Deucher 2019-11-14 13:56:16 UTC
(In reply to Luya Tshimbalanga from comment #0)
> 
> nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Direct firmware load for
> amdgpu/raven_gpu_info.bin failed with error -2
> nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Failed to load gpu_info
> firmware "amdgpu/raven_gpu_info.bin"
> nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Fatal error during GPU init

The kernel is not able to find the firmware image.  If you are using an initrd, please make sure to includes the firmwares in the initrd.  If you are building the diver into the kernel, you need to build the firmware into the kernel as well.
Comment 4 Luya Tshimbalanga 2019-11-14 16:58:15 UTC
(In reply to Alex Deucher from comment #3)
> (In reply to Luya Tshimbalanga from comment #0)
> > 
> > nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Direct firmware load for
> > amdgpu/raven_gpu_info.bin failed with error -2
> > nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Failed to load gpu_info
> > firmware "amdgpu/raven_gpu_info.bin"
> > nov 13 13:53:55 kernel: amdgpu 0000:03:00.0: Fatal error during GPU init
> 
> The kernel is not able to find the firmware image.  If you are using an
> initrd, please make sure to includes the firmwares in the initrd.  If you
> are building the diver into the kernel, you need to build the firmware into
> the kernel as well.

It is a Fedora kernel. I don't know how that happened with a simple update and I included the dmesg for investigation. I linked the Fedora bug report as well for reference.
Comment 5 Luya Tshimbalanga 2019-11-15 16:07:49 UTC
I am closing this report for now as I reinstalled the system. The update proceeded normally with the result:

sudo lsinitrd /boot/initramfs-5.3.11-300.fc31.x86_64.img | grep raven                                                        
-rw-r--r--   2 root     root        86528 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_asd.bin
-rw-r--r--   1 root     root         9344 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_ce.bin
-rw-r--r--   1 root     root          316 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_gpu_info.bin
-rw-r--r--   1 root     root        17536 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_me.bin
-rw-r--r--   2 root     root       268048 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_mec2.bin
-rw-r--r--   2 root     root            0 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_mec.bin
-rw-r--r--   1 root     root        21632 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_pfp.bin
-rw-r--r--   1 root     root        38324 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_rlc.bin
-rw-r--r--   1 root     root        17408 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_sdma.bin
-rw-r--r--   1 root     root       343456 Jul 24 15:24 usr/lib/firmware/amdgpu/raven2_vcn.bin
-rw-r--r--   1 root     root        78336 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_asd.bin
-rw-r--r--   1 root     root         9344 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_ce.bin
-rw-r--r--   1 root     root        23152 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_dmcu.bin
-rw-r--r--   2 root     root          316 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_gpu_info.bin
-rw-r--r--   1 root     root        39084 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_kicker_rlc.bin
-rw-r--r--   1 root     root        17536 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_me.bin
-rw-r--r--   2 root     root       268048 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_mec2.bin
-rw-r--r--   2 root     root            0 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_mec.bin
-rw-r--r--   1 root     root        21632 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_pfp.bin
-rw-r--r--   1 root     root        39084 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_rlc.bin
-rw-r--r--   2 root     root        17408 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_sdma.bin
-rw-r--r--   2 root     root       341728 Jul 24 15:24 usr/lib/firmware/amdgpu/raven_vcn.bin

It appears dracut somehow managed to not install the firmware prior to the failure. I can no longer reproduce it with a reinstall.