Bug 88671

Summary: Radeon driver fails to reset hardware properly after kvm guest reboot
Product: Drivers Reporter: Tom Stellard (tstellar)
Component: Video(DRI - non Intel)Assignee: virtualization_kvm
Status: NEW ---    
Severity: normal CC: alex.williamson, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.17.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci
Backtrace from BUG_ON
Virtual machine definition

Description Tom Stellard 2014-11-21 18:33:14 UTC
Created attachment 158381 [details]
lspci

I'm running into this bug while trying to use pci passthrough of an AMD BONAIRE XT (Radeon HD 7790)

Steps to reproduce:

1. virsh start vm
2. virsh destroy vm
3. virsh start vm

This bug only appears after starting the vm for the second time.  The first time the vm boots normally and passthrough works as expected.
Comment 1 Tom Stellard 2014-11-21 18:33:42 UTC
Created attachment 158391 [details]
Backtrace from BUG_ON
Comment 2 Tom Stellard 2014-11-21 18:34:11 UTC
Created attachment 158401 [details]
Virtual machine definition
Comment 3 Tom Stellard 2014-11-21 18:36:22 UTC
I should also mention that I have this hook executing when the machine starts up:

if [ "$2" = "prepare" ]; then
        virsh nodedev-detach pci_0000_01_00_1
fi
Comment 4 Alex Williamson 2014-11-21 19:14:40 UTC
I'm not sure how you're getting to this BUG_ON, but (a) legacy KVM device assignment is deprecated and (b) the card you've chosen has known reset issues.  You might want to try vfio-pci, which I know can make this card work at least once per host boot, but you're likely to get a BSOD and IOMMU faults on subsequent guest [re]boots.  The reset problem with this card has been reported to AMD, but there is no solution at this time.
Comment 5 Tom Stellard 2014-11-29 02:36:51 UTC
Thanks for the tip about vfio-pci.  I can now sometimes get two working guest boots per host boot.
Comment 6 Tom Stellard 2015-03-02 16:42:11 UTC
I've been playing with this a little more and it seems to be working correctly,
but radeon dynamic power management (dpm) always fails to initialize on the second guest boot.  My questions are:

1. What methods are being used by kvm/qemu/libvirt to reset the GPU on guest shutdown?

2. Is the problem only cuased by the fact that GPU reset is not implemented correctly in the radeon driver of are there improvements that are needed in
kvm/qemu/libvirt in order to get this working?
Comment 7 Alex Williamson 2015-03-02 17:32:49 UTC
(In reply to Tom Stellard from comment #6)
> I've been playing with this a little more and it seems to be working
> correctly,
> but radeon dynamic power management (dpm) always fails to initialize on the
> second guest boot.  My questions are:
> 
> 1. What methods are being used by kvm/qemu/libvirt to reset the GPU on guest
> shutdown?

Secondary PCI bus reset from the parent bridge.

> 2. Is the problem only cuased by the fact that GPU reset is not implemented
> correctly in the radeon driver of are there improvements that are needed in
> kvm/qemu/libvirt in order to get this working?

Without a device specific reset, we're doing to most thorough standard reset available to us.  I've also tried to use some of the reset mechanisms implemented in the radeon FOSS driver but they offered no improvement over the bus reset.  There seems to be some state retained in the device that is not cleared via bus reset.