Bug 210929 - MCE bea0000000000108 Crash on heavy/gaming workload since Kernel 5.5
Summary: MCE bea0000000000108 Crash on heavy/gaming workload since Kernel 5.5
Status: RESOLVED INVALID
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-28 09:56 UTC by binarytamer
Modified: 2021-05-21 12:38 UTC (History)
6 users (show)

See Also:
Kernel Version: 5.5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
LSPCI for checking NV kerneldriver status (30.80 KB, text/plain)
2020-12-28 10:15 UTC, binarytamer
Details
Kernel .config for checking the Userconfiguration (136.26 KB, text/plain)
2020-12-28 10:17 UTC, binarytamer
Details
DMESG after crash (146.12 KB, text/plain)
2020-12-28 10:23 UTC, binarytamer
Details
dmesg after crash(5.10 kernel) (97.78 KB, text/plain)
2021-01-05 11:52 UTC, danknil
Details
current lsmod(5.10 kernel) (4.32 KB, text/plain)
2021-01-05 11:54 UTC, danknil
Details

Description binarytamer 2020-12-28 09:56:20 UTC
Hi,

this Report is bisected from https://bugzilla.kernel.org/show_bug.cgi?id=206903

I have MCE crashes on my system since kernel version 5.5 on solus, manjaro and gentoo. When a run a game the system suddenly reboots and reports the following error:

Dec 23 08:14:45 dp-pc kernel: [    0.533154] mce: [Hardware Error]: Machine check events logged
Dec 23 08:14:45 dp-pc kernel: [    0.533154] mce: [Hardware Error]: CPU 13: Machine Check: 0 Bank 5: bea0000000000108
Dec 23 08:14:45 dp-pc kernel: [    0.533154] mce: [Hardware Error]: TSC 0 ADDR 7fb1c9c9685a MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Dec 23 08:14:45 dp-pc kernel: [    0.533154] mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1608707676 SOCKET 0 APIC b microcode 8701021

This is my Hardware setup:

MB: MPG X570 GAMING PLUS (MS-7C37)
CPU: Ryzen 7 3800X
RAM: 32GB 2x16GB DDR4 3200MT/s
GPU: ASUS TUF 3-RX5700XT-O8G-GAMING 
GPU: NVIDIA RTX 2080 only for IOMMU
Using linux-firmware-20201218 at the moment

Here is what I already checked:

- 24h memorytest
- CPU stresstest to check temps
- tried kernel options discussed in report 206903
- tried EFI options like disable S6, disable PBO, UEFI update and rolling back
- tried different distributions (all  rolling) Solus, Manjaro and now Gentoo --> same behavior on all systems

All this starts with kernel version >= 5.5 when I install kernel 5.4.80 (in Gentoo currently marked stable)the system is rock solid not a single crash. I observed this on my rig since version 5.5 is out and tried sporadically to update to the recent version. But always the same crashes occur.

Since I can build easy a modified kernel, i could try to change the kernel config for helping to locate the problem? In the original Bugreport someone mentioned that this could GPU-powersave-related. I am no where into the linux kernel but if there is an option named i will build a testkernel. Thankyou to all in advance.
Comment 1 Borislav Petkov 2020-12-28 10:06:34 UTC
> GPU: NVIDIA RTX 2080 only for IOMMU

Are you using the nvidia proprietary driver? If so, try to reproduce without it.

And pls upload full dmesg.

Thx.
Comment 2 binarytamer 2020-12-28 10:15:25 UTC
Created attachment 294367 [details]
LSPCI for checking NV kerneldriver status
Comment 3 binarytamer 2020-12-28 10:17:36 UTC
Created attachment 294369 [details]
Kernel .config for checking the Userconfiguration
Comment 4 binarytamer 2020-12-28 10:23:22 UTC
Created attachment 294371 [details]
DMESG after crash
Comment 5 binarytamer 2020-12-28 10:29:55 UTC
Hi,
thank you for your fast answer. I realized that I didn't proper described the problem.

The crash occurs when I play games native on linux with the 5700XT. My NVIDIA is only used in a virtual machine and has no proprietary driver nor is the kernel configured to provide the nouveau driver. No VM is running when a crash happens. The NVIDIA is in powermode S3.

I attached the following:

- DMESG after the crash
- Current .config of the Kernel
- Output of LSPCI to check the status of the NVIDIA
Comment 6 Borislav Petkov 2020-12-28 17:50:45 UTC
Thanks.

A couple of observations:

[    0.000000] Notice: NX (Execute Disable) protection missing in CPU!

Why is that? Do you have some strange setting in your BIOS which
disables NX? I'd reenable it.

[    0.360218] efi: Error mapping PA 0xff000000 -> VA 0xff000000!
[    0.360218] efi: Error mapping PA 0xff000000 -> VA 0xfffffffeff000000!
[    0.360219] efi: Error mapping PA 0xfedd4000 -> VA 0xfedd4000!
[    0.360219] efi: Error mapping PA 0xfedd4000 -> VA 0xfffffffefefd4000!
[    0.360220] efi: Error mapping PA 0xfedc2000 -> VA 0xfedc2000!
...

this is *really* strange. It basically says that you can't map any EFI
runtime services. Should not happen either.

The MCE itself decodes to:

[  346.622085] [Hardware Error]: System Fatal error.
[  346.627024] [Hardware Error]: CPU:12 (17:1:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000000000108
[  346.637614] [Hardware Error]: Error Addr: 0x0001ffffb50a37bc
[  346.643497] [Hardware Error]: IPID: 0x0000000000000000, Syndrome: 0x000000004d000000
[  346.651444] [Hardware Error]: Execution Unit Ext. Error Code: 0, Watchdog Timeout error.
[  346.659803] [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

which normally fires when some transaction times out. Which brings us
to the huuge bugzilla entry which you've already quoted, where people
complain about such things. That looks like a hardware issue and not a
kernel issue so far.

> All this starts with kernel version >= 5.5 when I install kernel
> 5.4.80 (in Gentoo currently marked stable)the system is rock solid not
> a single crash.

If that is the case you could perhaps bisect the issue - see "man
git-bisect" for how to do that. And ask questions if something's not
clear still.

If 5.4.80 is good and 5.5 is bad, then that should be not too many
bisection steps. If successful, it might point us to a commit which
could be a guilty one.

HTH.
Comment 7 binarytamer 2020-12-29 14:09:58 UTC
Hi,

I will check the NX option, probably I disabled it while checking the EFI settings for the culprit. I red into this git-bisect procedure and I want to ask if you could check my approach. Which would look like the following:

- cloning the kernel repo with:  git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git

- setting the good & bad tags:
git bisect good v5.4
git bisect bad v5.5

- building the kernel and installing it:
make
make modules_install
make install
grub-mkconfig -o /boot/grub/grub.cfg

- testing the kernel

- set the current version to good or bad:
git bisect good / bad

- repeat the steps

- when I found a version that crashes: provide the output of git bisect log

Thank you for checking that unfortunately I never did this before, so sorry for the annoyance...
Comment 8 Borislav Petkov 2020-12-29 18:40:25 UTC
(In reply to binarytamer from comment #7)
> - cloning the kernel repo with:  git clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git

Yap, however you also need to add the stable trees too so that you can
test those tags too. After the above step, you do:

$ git remote add stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
$ git fetch stable

You say 5.4.80 is good so you should do

git bisect good v5.4.80
git bisect bad v5.5

But do test those two first because you need to make sure they're really good and bad respectively.

> - setting the good & bad tags: git bisect good v5.4 git bisect bad
> v5.5
>
> - building the kernel and installing it: make make modules_install
> make install grub-mkconfig -o /boot/grub/grub.cfg
>
> - testing the kernel
>
> - set the current version to good or bad: git bisect good / bad
>
> - repeat the steps

Yap, exactly.

Just be careful when you do the steps because one mistake and you go off
"into the weeds". Happens to me from time to time so I use a pen and paper
too. :-)

> - when I found a version that crashes: provide the output of git
> bisect log

Once the bisection is done, it'll tell you "the first bad commit is... "

> Thank you for checking that unfortunately I never did this before, so
> sorry for the annoyance...

No worries, thanks for reporting and bisecting!
Comment 9 danknil 2021-01-05 06:45:40 UTC
I have absolutely same issue, but on kernel 5.4 issue still occur.
Specs:
CPU: AMD Ryzen 3 3100
GPU: AMD Radeon HD 7970
RAM: 32GB 2x16GB DDR4 3200MT/s
Motherboard: ASRock B450M PRO4-F

Using linux-firmware version 20201218
Tried these kernels: 5.11rc1, 5.10.4, 5.4.80

Error: 
дек 31 17:13:19 archlinux kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
дек 31 17:13:19 archlinux kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc0373d30 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
дек 31 17:13:19 archlinux kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1609409597 SOCKET 0 APIC 8 microcode 8701021
Comment 10 binarytamer 2021-01-05 08:18:10 UTC
Hi,

Thankyou for checking. It took a lot of time, but I am now done with bisecting of the kernel versions. Here is the output:

a3511321fd004d0b2a6d81dab1837dcc6c752da4 is the first bad commit
commit a3511321fd004d0b2a6d81dab1837dcc6c752da4
Author: Stephen Rothwell <sfr@canb.auug.org.au>
Date:   Thu Nov 21 14:54:03 2019 +1100

    merge fix for "ftrace: Rework event_create_dir()"

    Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

I will provide addition information if needed. Hopefully this is any helpful and I did the procedure correctly. Also I will paste the bisect log:

git bisect start
# good: [9f4b26f3ea18cb2066c9e58a84ff202c71739a41] Linux 5.4.80
git bisect good 9f4b26f3ea18cb2066c9e58a84ff202c71739a41
# bad: [d5226fa6dbae0569ee43ecfc08bdcd6770fc4755] Linux 5.5
git bisect bad d5226fa6dbae0569ee43ecfc08bdcd6770fc4755
# good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
git bisect good 219d54332a09e8d8741c1e1982f5eae56099de85
# good: [8c39f71ee2019e77ee14f88b1321b2348db51820] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good 8c39f71ee2019e77ee14f88b1321b2348db51820
# bad: [76bb8b05960c3d1668e6bee7624ed886cbd135ba] Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
git bisect bad 76bb8b05960c3d1668e6bee7624ed886cbd135ba
# bad: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect bad 21b26d2679584c6a60e861aa3e5ca09a6bab0633
# good: [3275a71e76fac5bc276f0d60e027b18c2e8d7a5b] Merge tag 'drm-next-5.5-2019-10-09' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect good 3275a71e76fac5bc276f0d60e027b18c2e8d7a5b
# good: [2ef4144d1ea8b181d377d0783c43032cb44889f7] Merge tag 'drm-intel-next-2019-11-01-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect good 2ef4144d1ea8b181d377d0783c43032cb44889f7
# bad: [0a6cad5df541108cfd3fbd79eef48eb824c89bdc] Merge branch 'vmwgfx-coherent' of git://people.freedesktop.org/~thomash/linux into drm-next
git bisect bad 0a6cad5df541108cfd3fbd79eef48eb824c89bdc
# good: [ad4d81dc57e2dff7cf3b55f63356f0d0017050a1] drm/amdgpu/renoir: move gfxoff handling into gfx9 module
git bisect good ad4d81dc57e2dff7cf3b55f63356f0d0017050a1
# good: [78e2ea291ead1e395864ff1583064e07b1adeb62] drm/i915/display: Fix TRANS_DDI_MST_TRANSPORT_SELECT definition
git bisect good 78e2ea291ead1e395864ff1583064e07b1adeb62
# good: [c0e21ea1d0b557bdedd5b54d529162f74e7ef407] drm/amdgpu: put flush_delayed_work at first
git bisect good c0e21ea1d0b557bdedd5b54d529162f74e7ef407
# bad: [1b34de7c3fef0c7ebb3d05acc1756bfb585279ca] drm/amd/amdgpu/sriov skip RLCG s/r list for arcturus VF.
git bisect bad 1b34de7c3fef0c7ebb3d05acc1756bfb585279ca
# good: [8fc41344138831071c5d5f51635c7eb33459e249] drm/amdgpu: disable gfxoff on original raven
git bisect good 8fc41344138831071c5d5f51635c7eb33459e249
# good: [57fb0ab2f1398d81b42a8143a40e5d209a290a48] drm/amdgpu: Update Arcturus golden registers
git bisect good 57fb0ab2f1398d81b42a8143a40e5d209a290a48
# bad: [210b3b3c7563df391bd81d49c51af303b928de4a] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset
git bisect bad 210b3b3c7563df391bd81d49c51af303b928de4a
# bad: [a3511321fd004d0b2a6d81dab1837dcc6c752da4] merge fix for "ftrace: Rework event_create_dir()"
git bisect bad a3511321fd004d0b2a6d81dab1837dcc6c752da4
# first bad commit: [a3511321fd004d0b2a6d81dab1837dcc6c752da4] merge fix for "ftrace: Rework event_create_dir()"

Thanks for the support and if there is anything I should test just describe it and will try.
Comment 11 danknil 2021-01-05 11:52:14 UTC
Created attachment 294503 [details]
dmesg after crash(5.10 kernel)

my dmesg log
Comment 12 danknil 2021-01-05 11:54:39 UTC
Created attachment 294505 [details]
current lsmod(5.10 kernel)
Comment 13 Borislav Petkov 2021-01-05 13:09:30 UTC
(In reply to binarytamer from comment #10)
> Hi,
> 
> Thankyou for checking. It took a lot of time, but I am now done with
> bisecting of the kernel versions. Here is the output:
> 
> a3511321fd004d0b2a6d81dab1837dcc6c752da4 is the first bad commit
> commit a3511321fd004d0b2a6d81dab1837dcc6c752da4
> Author: Stephen Rothwell <sfr@canb.auug.org.au>
> Date:   Thu Nov 21 14:54:03 2019 +1100
> 
>     merge fix for "ftrace: Rework event_create_dir()"

Yeah, I warned you that bisection might veer off into the weeds. So this
is only a build fix for:

                 from drivers/gpu/drm/amd/amdgpu/amdgpu_trace_points.c:29:
./include/trace/../../drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h:520:52: error: expected expression before ‘;’ token
  520 |         __string(ring, sched_job->base.sched->name);


which means that it is highly unlikely that this patch is really causing
the MCE. And you can't revert it ontop of 5.5 to check because it really
is only a build fix.

Which means that you could try the bisection again. Yap, that takes a
lot of time but if you do it and encounter yet another innocent commit
as the first bad one, then it very likely could be that this really is a
hardware issue. Just like "danknil" says in comment #9 that he/she can
trigger even on 5.4.

The fact that you run a game and some transaction timeouts could be
something GPU-related like the GPU sucking too much power or so and it
resulting in a transaction timeout. Without proper equipment that is
very hard to debug, unfortunately. But this is all pure speculation.
Comment 14 danknil 2021-01-05 13:18:58 UTC
>Which means that you could try the bisection again. Yap, that takes a
>lot of time but if you do it and encounter yet another innocent commit
>as the first bad one, then it very likely could be that this really is a
>hardware issue. Just like "danknil" says in comment #9 that he/she can
>trigger even on 5.4.
I'm also test out some games on Windows 10 for about 4-5 hours without any issues, so i don't think it hardware one.
Comment 15 Borislav Petkov 2021-01-05 14:07:06 UTC
(In reply to danknil from comment #14)
> I'm also test out some games on Windows 10 for about 4-5 hours without any
> issues, so i don't think it hardware one.

This happens only when you play games, right? I.e., when the GPU is being stressed.

And yes people have reported that they can't trigger on windoze but that doesn't mean a whole lot: it could be the windoze GPU driver doing something else, power management too or even windoze not reporting the MCEs (I doubt it but still). In any case, see https://bugzilla.kernel.org/show_bug.cgi?id=206903. There are some ideas what to try there, you could try them.
Comment 16 Christian König 2021-01-05 14:44:09 UTC
My best idea would be to disable all power management features and see if the problem disappears.

Try amdgpu.cg_mask=0 and amdgpu.pg_mask=0 on the kernel command line.
Comment 17 Alex Deucher 2021-01-05 14:53:25 UTC
Or amdgpu.dpm=0 to disable dynamic clock switching.  If that helps, you can also use the ppfeaturemask option to further narrow down which power feature (if any) causes the issue (e.g., drop the dpm=0 and add ppfeaturemask=0x...).  The mask bits are defined here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/amd_shared.h#n196
Comment 18 Alex Deucher 2021-01-05 14:54:08 UTC
See also:
https://bugzilla.kernel.org/show_bug.cgi?id=206903
Comment 19 danknil 2021-01-05 15:08:43 UTC
>This happens only when you play games, right?
Yes.

>see https://bugzilla.kernel.org/show_bug.cgi?id=206903. There are some ideas
>>what to try there, you could try them.
I'm already check it and tried all methods described(expect possible fix in attachment)
Comment 20 danknil 2021-01-05 15:13:29 UTC
(In reply to Alex Deucher from comment #17)
>Or amdgpu.dpm=0 to disable dynamic clock switching.  If that helps, you can
>also >use the ppfeaturemask option to further narrow down which power feature
>(if any) >causes the issue (e.g., drop the dpm=0 and add ppfeaturemask=0x...).
> The mask >bits are defined here:
Tried disable separately through ppfeaturemask. Next day try both
Comment 21 danknil 2021-01-05 15:14:11 UTC
(In reply to Christian König from comment #16)
> My best idea would be to disable all power management features and see if
> the problem disappears.
> 
> Try amdgpu.cg_mask=0 and amdgpu.pg_mask=0 on the kernel command line.

Okay, i'll test
Comment 22 danknil 2021-01-06 07:57:14 UTC
(In reply to Christian König from comment #16)
> My best idea would be to disable all power management features and see if
> the problem disappears.
> 
> Try amdgpu.cg_mask=0 and amdgpu.pg_mask=0 on the kernel command line.

nothing changed :(
Comment 23 danknil 2021-01-06 07:57:43 UTC
(In reply to Christian König from comment #16)
> My best idea would be to disable all power management features and see if
> the problem disappears.
> 
> Try amdgpu.cg_mask=0 and amdgpu.pg_mask=0 on the kernel command line.

nothing changed :(
Comment 24 binarytamer 2021-01-26 10:53:29 UTC
Hi,

I just want to update the current status of the issue on my system. It took a very long time to be sure because of the sporadic kind of occurrence.

I tried to bisect the kernel, but I still cannot pin down a specific commit. That said, I directed my efforts to find out which component, or which combination components may cause the problem. After building a test system I am sure that this MCE occurs on different AMD Systems with my 5700XT GPU. On a Intel Skylake I could not reproduce the problem. CPU, RAM, PSU, Mainboard were all changed during the tests.
Than I red about a bad ASUS Firmware on my 5700XT. I already tried to update the 5700XT in the past but the ASUS Update Tool always reports that there is no update needed. So I flashed a ROM file from the ASUS Update with the AMD Flash tool this time.

And the MCE is gone! No reboots on newer Kernels >5.4.
Comment 25 Paul Menzel 2021-04-17 08:51:37 UTC
@binarytamer: Thank you for reporting back with the fix for your problem. For the record, what Asus firmware was on the 5700XT, and what version did you flash?

@danknil: As this issue is very complicated, and you were able to reproduce it with Linux 5.4.80, I’d say it’s a different issue, and recommend to open a separate issue.
Comment 26 Paul Menzel 2021-05-17 15:17:59 UTC
(In reply to Paul Menzel from comment #25)
> @binarytamer: Thank you for reporting back with the fix for your problem.
> For the record, what Asus firmware was on the 5700XT, and what version did
> you flash?

In bug 206903, Alex Deucher suggested the command below:

    sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
Comment 27 binarytamer 2021-05-20 06:12:35 UTC
Hello Paul

Sorry for the extreme late answer. But here is the output of cat /sys/kernel/debug/dri/0/amdgpu_firmware_info:

VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 32, firmware version: 0x00000061
PFP feature version: 32, firmware version: 0x00000093
CE feature version: 32, firmware version: 0x00000025
RLC feature version: 1, firmware version: 0x00000080
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
MEC feature version: 32, firmware version: 0x0000008d
MEC2 feature version: 32, firmware version: 0x0000008d
SOS feature version: 0, firmware version: 0x00100450
ASD feature version: 0, firmware version: 0x2100004a
TA RAS feature version: 0x00000000, firmware version: 0x2100002a
TA XGMI feature version: 0x00000000, firmware version: 0x2100002a
TA HDCP feature version: 0x17000010, firmware version: 0x2100002a
TA DTM feature version: 0x12000003, firmware version: 0x2100002a
SMC feature version: 0, firmware version: 0x002a3f00
SDMA0 feature version: 50, firmware version: 0x00000023
SDMA1 feature version: 50, firmware version: 0x00000023
VCN feature version: 0, firmware version: 0x0510a00d
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x00000000
TOC feature version: 0, firmware version: 0x00000000
VBIOS version: 115-D199PI0-101
Comment 28 Arne Brücher 2021-05-21 12:20:57 UTC
Hey everybody,
I'd like to update my GPUs firmware as well, because I experience the same issues using a RX 5700 and Ryzen 5 3600. Here's my output, but unfortunately the VBIOS version is not displayed for some reason.

VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 32, firmware version: 0x00000061
PFP feature version: 32, firmware version: 0x00000093
CE feature version: 32, firmware version: 0x00000025
RLC feature version: 1, firmware version: 0x00000080
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
MEC feature version: 32, firmware version: 0x0000008d
MEC2 feature version: 32, firmware version: 0x0000008d
SOS feature version: 0, firmware version: 0x00100450
ASD feature version: 0, firmware version: 0x2100004a
TA RAS feature version: 0x00000000, firmware version: 0x2100002a
TA XGMI feature version: 0x00000000, firmware version: 0x2100002a
TA HDCP feature version: 0x17000010, firmware version: 0x2100002a
TA DTM feature version: 0x12000003, firmware version: 0x2100002a
SMC feature version: 0, firmware version: 0x002a3f00
SDMA0 feature version: 50, firmware version: 0x00000023
SDMA1 feature version: 50, firmware version: 0x00000023
VCN feature version: 0, firmware version: 0x0510a00d
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x00000000
TOC feature version: 0, firmware version: 0x00000000
VBIOS version: 

But amdvbflash -ai outputs this:

Adapter  0    SEG=0000, BN=28, DN=00, PCIID=731F1002, SSID=381C1462)
    Asic Family        :  Navi10         
    Flash Type         :  W25Q80      (1024 KB)
    Product Name       :  113-MSITV381MH.281 
    Bios Config File   :  281.bin        
    Bios P/N           :  P/N Not Available
    Bios Version       :  017.001.000.049.000000
    Bios Date          :  11/13/19 04:23 
    ROM Image Type     :  Hybrid Images
    ROM Image Details  :  
        Image[0]: Size(59392 Bytes), Type(Legacy Image)
        Image[1]: Size(44032 Bytes), Type(EFI Image)
Comment 29 Paul Menzel 2021-05-21 12:38:39 UTC
As you have a different card – sorry my oversight in bug 206903 – please create a separate issue.

Note You need to log in before you can comment on or make changes to this bug.