Bug 207767 - amdgpu fails at boot (4 in ~5 times): Invalid PCI ROM header signature (0xffff - fb0: switching to amdgpudrmfb from EFI VGA)
Summary: amdgpu fails at boot (4 in ~5 times): Invalid PCI ROM header signature (0xfff...
Status: NEW
Alias: None
Product: EFI
Classification: Unclassified
Component: Video (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: EFI Virtual User
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-17 23:22 UTC by kolAflash
Modified: 2020-05-18 08:19 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.6.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg from bad boot, random good boot and all-uefi boot (64.18 KB, application/gzip)
2020-05-17 23:22 UTC, kolAflash
Details

Description kolAflash 2020-05-17 23:22:42 UTC
Created attachment 289173 [details]
dmesg from bad boot, random good boot and all-uefi boot

About four in five times amdgpu doesn't initialize on boot.
The display simply freezes.
Nevertheless, the system boots in background and is reachable via ssh.

Last message on the display (if bootet without "quiet"):
  fb0: switching to amdgpudrmfb from EFI VGA

Most interesting message from dmesg:
  amdgpu 0000:04:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff

Computer: HP EliteBook 735 G6
CPU + GPU: Ryzen 7 3700U (Vega 10)
BIOS version: R74 Ver. 01.05.00 04/15/2020
Boot mode: EFI

Tested OS (all AMD64):
- Debian Testing (11 / Bullseye - 20200511) (on internal SSD)
- openSUSE-Tumbleweed-Snapshot20200511 (KDE LiveCD/USB)
- Kubuntu 20.04 LTS (LiveCD/USB)
- Manjaro 20.0.1 (LiveCD/USB)

--

(LOOK HERE FOR WORKAROUND)
Bug appears only if BIOS is set to:
  Advanced > Optional ROM Launch Policy > All UEFI
Bad settings are "All Legacy" and "All UEFI Except Video".
Alternatively the system also boots with "nomodeset", but "amdgpu" won't be available.

I don't know why someone might choose something else than "All UEFI".
In my case I just didn't know about that setting and was wondering why my system didn't boot up normally.
So the most problematic about this is maybe just to find out that "All UEFI" helps.
Nevertheless, it would be nice if amdgpu would work in all cases.

Note:
Bad settings only available if:
  Advanced > Secure Boot Configuration >
    Configure Legacy ... > Legacy Support Enable ...

--

This is the most significantly different part of dmesg between bad and random good boot.
Nevertheless, dmesg also differs in other lines before. So feel free to have a look at the attached full dmesg output.

bad:
  amdgpu 0000:04:00.0: firmware: direct-loading firmware amdgpu/picasso_gpu_info.bin
  [drm] BIOS signature incorrect 0 0
  resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000c3fff window]
  caller pci_map_rom+0x6a/0x17d mapping multiple BARs
  amdgpu 0000:04:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
  [drm] BIOS signature incorrect 0 0
  [drm:amdgpu_get_bios [amdgpu]] *ERROR* Unable to locate a BIOS ROM
  amdgpu 0000:04:00.0: Fatal error during GPU init
  [drm] amdgpu: finishing device. 0000:04:00.0: Fatal error during GPU init

random good boot (not set to "All UEFI"):
  amdgpu 0000:04:00.0: firmware: direct-loading firmware amdgpu/picasso_gpu_info.bin
  [drm] BIOS signature incorrect 0 0
  resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000c3fff window]
  caller pci_map_rom+0x6a/0x17d mapping multiple BARs
  ATOM BIOS: SWBRT48929.001
  amdgpu 0000:04:00.0: firmware: direct-loading firmware amdgpu/picasso_sdma.bin

--

Where I found the solution:
https://forum.manjaro.org/t/problems-with-manjaro-linux-on-a-hp-elitebook-755-g5-with-ryzen-7-and-vega-10/65672/71

Maybe related:
https://bugzilla.kernel.org/show_bug.cgi?id=188301
https://forums.opensuse.org/showthread.php/537391-Newest-Linux-5-2-10-1-cannot-switch-to-EFI-VGA-(Radeon)
Comment 1 Ard Biesheuvel 2020-05-18 08:19:59 UTC
This is not a bug, and it is definitely not a EFI bug.

The AMD GPU drivers need the ATOM BIOS to function correctly. Also, the AMD GPU drivers are developed under the assumption that the boot driver executes first.

So it is expected that disabling the UEFI driver makes the system unstable.

Note You need to log in before you can comment on or make changes to this bug.