Bug 204975
Summary: | AMD-Vi: Command buffer timeout | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Gluzskiy Alexandr (sss123next) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | blocking | CC: | alex.williamson, masato |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.19.75 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg fragment with error
dmesg frament with error dmesg fragment with error dmesg fragment with error (latest bios) dmesg fragment with error (latest bios) dmidecode output lspci output |
Created attachment 285129 [details]
dmesg frament with error
Created attachment 285131 [details]
dmesg fragment with error
Created attachment 285133 [details]
dmesg fragment with error (latest bios)
Created attachment 285135 [details]
dmesg fragment with error (latest bios)
Created attachment 285137 [details]
dmidecode output
Created attachment 285139 [details]
lspci output
When we get the "stuck in D3" message, it usually means that we're getting back -1 on config space reads rather than the device is actually stuck in D3. The -1 return probably means the downstream bus never recovered when we issued a secondary bus reset to perform a reset on the GPU. This seems to be common with AGESA updates and AFAICT indicates a hardware/firmware issue, not a kernel issue. As you indicate, it worked previously and started failing after BIOS update. This is the common story, AMD needs to fix secondary bus reset support on their root ports. I believe some users have had success rolling back their BIOS to a previous release. i see, got it. unfortunately manufacturer of this board decided to NOT provide downgrade option for some reason. is it possible to workaround this problem somehow ? testing another board (x570 based), same problem. https://community.amd.com/thread/241650 - link to amd own forums for reference. BioStar's X570GT8 motherboard AGESA 1.0.0.4 patch B has the same error. OS ubuntu 19.10 kernel 5.4.21 By setting the PCI-E speed to Gen2 in UEFI BIOS, no error occurs and pass-through can be performed without any problem. |
Created attachment 285127 [details] dmesg fragment with error i have "ASRock X470 Gaming K4" motherboard, and using pci-passthrough for sometime already, working fine on bios version 1.90, had some troubles, but working overall, unfortunately i decided to update bios to latest versions (3.40, 3.50), and pci-passthrough stopped work at all, i guess problem rerlated to new "AMD AGESA Combo-AM4 1.0.0.3", a little searching over internet confirms what it's common problem across different boards, i do not know exactly is it amd bug, kvm, bug, or kernel bug, i have already reported problem to board manufacturer. in console i get something like "vfio: cannot power on device, stuck in D3" from qemu. also a lot of warnings in dmesg (see attachments). device still visible in lspci, but looks completely unresponsive.