Bug 204975 - AMD-Vi: Command buffer timeout
Summary: AMD-Vi: Command buffer timeout
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-23 23:06 UTC by Gluzskiy Alexandr
Modified: 2020-04-02 11:29 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.19.75
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg fragment with error (20.61 KB, application/gzip)
2019-09-23 23:06 UTC, Gluzskiy Alexandr
Details
dmesg frament with error (23.16 KB, application/gzip)
2019-09-23 23:07 UTC, Gluzskiy Alexandr
Details
dmesg fragment with error (31.04 KB, application/gzip)
2019-09-23 23:07 UTC, Gluzskiy Alexandr
Details
dmesg fragment with error (latest bios) (28.38 KB, application/gzip)
2019-09-23 23:08 UTC, Gluzskiy Alexandr
Details
dmesg fragment with error (latest bios) (20.96 KB, application/gzip)
2019-09-23 23:08 UTC, Gluzskiy Alexandr
Details
dmidecode output (2.71 KB, application/gzip)
2019-09-23 23:09 UTC, Gluzskiy Alexandr
Details
lspci output (4.65 KB, text/plain)
2019-09-23 23:09 UTC, Gluzskiy Alexandr
Details

Description Gluzskiy Alexandr 2019-09-23 23:06:58 UTC
Created attachment 285127 [details]
dmesg fragment with error

i have "ASRock X470 Gaming K4" motherboard, and using pci-passthrough for sometime already, working fine on bios version 1.90, had some troubles, but working overall, unfortunately i decided to update bios to latest versions (3.40, 3.50), and pci-passthrough stopped work at all, i guess problem rerlated to new "AMD AGESA Combo-AM4 1.0.0.3", a little searching over internet confirms what it's common problem across different boards, i do not know exactly is it amd bug, kvm, bug, or kernel bug, i have already reported problem to board manufacturer.

in console i get something like "vfio: cannot power on device, stuck in D3" from qemu.

also a lot of warnings in dmesg (see attachments).

device still visible in lspci, but looks completely unresponsive.
Comment 1 Gluzskiy Alexandr 2019-09-23 23:07:33 UTC
Created attachment 285129 [details]
dmesg frament with error
Comment 2 Gluzskiy Alexandr 2019-09-23 23:07:54 UTC
Created attachment 285131 [details]
dmesg fragment with error
Comment 3 Gluzskiy Alexandr 2019-09-23 23:08:22 UTC
Created attachment 285133 [details]
dmesg fragment with error (latest bios)
Comment 4 Gluzskiy Alexandr 2019-09-23 23:08:49 UTC
Created attachment 285135 [details]
dmesg fragment with error (latest bios)
Comment 5 Gluzskiy Alexandr 2019-09-23 23:09:09 UTC
Created attachment 285137 [details]
dmidecode output
Comment 6 Gluzskiy Alexandr 2019-09-23 23:09:28 UTC
Created attachment 285139 [details]
lspci output
Comment 7 Alex Williamson 2019-09-23 23:20:24 UTC
When we get the "stuck in D3" message, it usually means that we're getting back -1 on config space reads rather than the device is actually stuck in D3.  The -1 return probably means the downstream bus never recovered when we issued a secondary bus reset to perform a reset on the GPU.  This seems to be common with AGESA updates and AFAICT indicates a hardware/firmware issue, not a kernel issue.  As you indicate, it worked previously and started failing after BIOS update.  This is the common story, AMD needs to fix secondary bus reset support on their root ports.  I believe some users have had success rolling back their BIOS to a previous release.
Comment 8 Gluzskiy Alexandr 2019-09-23 23:32:18 UTC
i see, got it.
unfortunately manufacturer of this board decided to NOT provide downgrade option for some reason.
Comment 9 Gluzskiy Alexandr 2019-09-25 13:38:51 UTC
is it possible to workaround this problem somehow ?
testing another board (x570 based), same problem.
Comment 10 Gluzskiy Alexandr 2019-09-26 14:59:01 UTC
https://community.amd.com/thread/241650 - link to amd own forums for reference.
Comment 11 Masato Yoshida 2020-04-02 11:29:20 UTC
BioStar's X570GT8 motherboard AGESA 1.0.0.4 patch B has the same error.
OS ubuntu 19.10
kernel 5.4.21

By setting the PCI-E speed to Gen2 in UEFI BIOS, no error occurs and pass-through can be performed without any problem.

Note You need to log in before you can comment on or make changes to this bug.