Bug 188301 - AMD Radeon R9 380 and amdgpu: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
Summary: AMD Radeon R9 380 and amdgpu: Invalid PCI ROM header signature: expecting 0xa...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-21 20:22 UTC by Matthias Nagel
Modified: 2023-09-04 23:20 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.8.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg | egrep -i -e 'amd|fb|drm|graphic' (6.75 KB, text/plain)
2016-11-21 20:22 UTC, Matthias Nagel
Details
dmesg (kernel 4.8.9) (65.08 KB, text/plain)
2016-11-22 19:24 UTC, Matthias Nagel
Details
lspci -vv (kernel 4.8.9) (36.37 KB, text/plain)
2016-11-22 19:25 UTC, Matthias Nagel
Details
dmesg (kernel 4.9-rc6) (65.18 KB, text/plain)
2016-11-22 21:21 UTC, Matthias Nagel
Details
my dirty quick workaround (2.72 KB, patch)
2022-01-26 12:37 UTC, neoe
Details | Diff

Description Matthias Nagel 2016-11-21 20:22:04 UTC
Created attachment 245451 [details]
dmesg | egrep -i -e 'amd|fb|drm|graphic'

The PCI subsystem reports

Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff

during initialization/loading of the amdgpu driver.
Comment 1 Bjorn Helgaas 2016-11-21 20:29:22 UTC
Hi Matthias, thanks for the report!  Do you know if this is a regression?  Can you please attach the complete dmesg log and "lspci -vv" output?

If you are able to build kernels, could you try v4.9-rc6?  It has a fix for shadow ROMs that could be related.
Comment 2 Matthias Nagel 2016-11-22 19:24:02 UTC
The attached dmesg and output from "lspci -vv" stems from running the latest stable kernel of my distro 4.8.9.
Comment 3 Matthias Nagel 2016-11-22 19:24:38 UTC
Created attachment 245731 [details]
dmesg (kernel 4.8.9)
Comment 4 Matthias Nagel 2016-11-22 19:25:09 UTC
Created attachment 245741 [details]
lspci -vv (kernel 4.8.9)
Comment 5 Matthias Nagel 2016-11-22 21:20:46 UTC
I tried 4.9-rc6 now. Same results! The error message is still there. :-( I attached the whole dmesg output, too.

I cannot tell if it is a regression bug, because I installed the card the day before yesterday.
Comment 6 Matthias Nagel 2016-11-22 21:21:14 UTC
Created attachment 245751 [details]
dmesg (kernel 4.9-rc6)
Comment 7 Bjorn Helgaas 2016-12-27 22:33:49 UTC
Is there something that is actually not working?  I see several similar reports on the web, but I haven't seen an actual problem other than the alarming message.

The message is printed by pci_get_rom_size().  I'm not sure we should even be printing a message there.  If we *do* print a message, it probably shouldn't be a dev_err(), because it's not necessarily an error if the ROM doesn't exist.  Only the caller knows whether it's really an error.

It sounds like your AMD Radeon R9 380 is a plug-in card, correct?  It looks like it has a ROM on the card and we think it is shadowed in RAM:

  Linux version 4.9.0-rc6 ...
  pci 0000:01:00.0: reg 0x30: [mem 0xf7e40000-0xf7e5ffff pref]
  pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
  amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff

So I think the kernel is complaining that the RAM copy at 0xc0000 starts with 0xffff.

Can you use this: http://cmp.felk.cvut.cz/~pisa/linux/rdwrmem.c to dump the beginning of both 0x000c0000-0x000dffff (the RAM copy) and 0xf7e40000-0xf7e5ffff (the ROM on the card):

  rdwrmem -m -s 0xc0000 -l 32
  setpci -s01:00.0 0x30.l=0xf7e40001    # enable the ROM
  rdwrmem -m -s 0xf7e40000 -l 32
Comment 8 Matthias Nagel 2017-01-01 09:55:32 UTC
# ./rdwrmem -m -s 0xc0000 -l 32
000C0000:55 AA 80 E9 89 02 00 00 00 00 00 00 00 00 00 00
000C0010:00 00 00 00 00 00 00 00 2C 02 00 00 00 00 49 42
# setpci -s01:00.0 0x30.l=0xf7e40001
# ./rdwrmem -m -s 0xf7e40000 -l 32
F7E40000:55 AA 80 E9 89 02 00 00 00 00 00 00 00 00 00 00
F7E40010:00 00 00 00 00 00 00 00 2C 02 00 00 00 00 49 42

Seems to be identical.

> It sounds like your AMD Radeon R9 380 is a plug-in card, correct?
Yes.

> Is there something that is actually not working?
I do not experience any problems.

If you look at pci_get_rom_size(...) in rom.c the executions skips the calculation at the end of the loop due to the premature break, if the signature of the ROM is wrong. Probably someone had a very good reason to print this error message at this point. Just to remove the error message does not feel correct to me. However, I am not a kernel programmer.

Moreover, I wonder why both dumps above actually start with 0xaa55. I had expected the first one to start with 0xffff, because this is what the error message complains about.
Comment 9 Christian Lanig 2017-04-07 18:55:48 UTC
The test is only to determine whether it's a standard ROM. In case it's not the error message is thrown.
Obviously the ROM is not compatible with standards because instead of starting with AA55 it starts with 55AA. That's the case for the original ROMs provided by AMD. The current Nvidia ROMs start with 4E56 by the way. Nevertheless it works as intended.

So in my opinion this message is too "rough" and confusing. I recommend changing the message to something more precise like: "Non-standard ROM detected." and make it more discreet.
Comment 10 Dennis Wagelaar 2018-10-01 08:41:35 UTC
Also happens on other AMD GPU cards, such as my MSI AMD Radeon RX560. Tested on kernel 4.16.16 and 4.18.9. See downstream bugreport for logs: https://bugzilla.redhat.com/show_bug.cgi?id=1634389
Comment 11 neoe 2021-10-10 06:41:38 UTC
I use AMD Ryzen 2700U cpu, HP Elitebook 735 G5.

000C0000:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
000C0010:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
F7E40000:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
F7E40010:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

in some condition which I cannot reproduce,
it can go normal and read 0xaa55, 
dmesg:"amdgpu: Fetched VBIOS from ROM BAR"
then everything OK.
but in most condition ,
it read 0xffff,
dmesg:"amdgpu: Fetched VBIOS from platform"
then dead in black screen .
Comment 12 neoe 2021-10-10 06:43:55 UTC
in the case of 0xffff, the amdgpu module finished normally, but actually result in black screen.
Comment 13 neoe 2022-01-26 05:15:27 UTC
I made a workaround to let kernel to save VBIOS when "amdgpu: Fetched VBIOS from ROM BAR", and use it when "amdgpu: Fetched VBIOS from platform", then it works for everything reboot.
Comment 14 Bjorn Helgaas 2022-01-26 12:07:30 UTC
Can you attach your workaround here?  Maybe there's something we can put in the kernel so you don't need to keep using a workaround.
Comment 15 neoe 2022-01-26 12:37:58 UTC
Created attachment 300321 [details]
my dirty quick workaround
Comment 16 neoe 2022-01-26 12:40:20 UTC
(In reply to Bjorn Helgaas from comment #14)
> Can you attach your workaround here?  Maybe there's something we can put in
> the kernel so you don't need to keep using a workaround.

Sure, you are welcome.

sorry, I lost 2 "#include ", because I used to save on a web page caused by HTML tag, cannot recall, should be for file writing. hope still helps. 

Created attachment 300321 [details]
my dirty quick workaround
Comment 17 neoe 2022-01-26 23:06:08 UTC
 (In reply to neoe from comment #16)
> (In reply to Bjorn Helgaas from comment #14)
> 
> sorry, I lost 2 "#include ", because I used to save on a web page caused by
> HTML tag, cannot recall, should be for file writing. hope still helps. 
> 
> Created attachment 300321 [details]
> my dirty quick workaround

BTW I figure out the missing include is 

# include <linux/fs.h>
# include <linux/kernel_read_file.h>
Comment 18 neoe 2022-01-27 06:04:46 UTC
some related dmesg when the workaround works on AMD Ryzen 2700U laptop:
> [    1.579246] [drm] BIOS signature incorrect 0 0
> [    1.579262] resource sanity check: requesting [mem 0x000c0000-0x000dffff],
> which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000cbfff window]
> [    1.579269] caller pci_map_rom+0x7f/0x1e0 mapping multiple BARs
> [    1.579288] amdgpu 0000:04:00.0: Invalid PCI ROM header signature:
> expecting 0xaa55, got 0xffff
> [    1.613953] [drm] BIOS signature incorrect 0 0
> [    1.614218] amdgpu 0000:04:00.0: amdgpu: [neoe]read cached VBIOS 54272
> [    1.614222] amdgpu 0000:04:00.0: amdgpu: Fetched VBIOS from
> /var/vbios.cache
> [    1.614226] amdgpu: ATOM BIOS: SWBRT38964.001
Comment 19 Arnd Bergmann 2023-07-27 14:51:48 UTC
I came across the warning on a Dell Inspiron 7375 with Ryzen 7 2700U APU, it was apparently resolved with a system BIOS update from version 1.5.0 to 1.9.0.
Comment 20 roma_bl9 2023-09-04 23:20:51 UTC
(In reply to neoe from comment #11)
> I use AMD Ryzen 2700U cpu, HP Elitebook 735 G5.
> 
> 000C0000:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
> 000C0010:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
> F7E40000:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
> F7E40010:FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
> 
> in some condition which I cannot reproduce,
> it can go normal and read 0xaa55, 
> dmesg:"amdgpu: Fetched VBIOS from ROM BAR"
> then everything OK.
> but in most condition ,
> it read 0xffff,
> dmesg:"amdgpu: Fetched VBIOS from platform"
> then dead in black screen .

Had the same issue on my HP ProBook 445 G6, AMD Ryzen 2700U. Problem has gone in my case when I turned off legacy boot and turned on security boot in BIOS

Note You need to log in before you can comment on or make changes to this bug.