Bug 112941 - Cannot re-enable SRIOV after disabling SRIOV on AMD GPU
Summary: Cannot re-enable SRIOV after disabling SRIOV on AMD GPU
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-23 14:58 UTC by Kelly
Modified: 2016-08-24 19:02 UTC (History)
1 user (show)

See Also:
Kernel Version: all versions from v4.3.5 to current (4.5.0-rc3)
Subsystem:
Regression: No
Bisected commit-id:


Attachments
stack trace of warn (3.76 KB, text/plain)
2016-02-23 14:58 UTC, Kelly
Details

Description Kelly 2016-02-23 14:58:50 UTC
Created attachment 204431 [details]
stack trace of warn

Repro steps;
1) Enable SRIOV on AMD GPU
    - lspci shows the VFs added
2) Disable SRIOV on AMD GPU
    - lspci shows the devices gone
3) Enable SROIV on AMD GPU 
    - dmesg shows the following WARNING  
"WARNING: CPU: 1 PID: 3390 at drivers/pci/ats.c:85 pci_disable_ats+0x26/0xa4()"

The code surrounding the WARNING is;

/**
 * pci_disable_ats - disable the ATS capability
 * @dev: the PCI device
 */
void pci_disable_ats(struct pci_dev *dev)
{
        struct pci_dev *pdev;
        u16 ctrl;

        if (WARN_ON(!dev->ats_enabled))
                return;

        if (atomic_read(&dev->ats_ref_cnt))
                return;         /* VFs still enabled */


In kernel versions 3.4.9 to 4.2.0 there were no problems encountered enabling or disabling SRIOV multiple times with AMD GPU.

In versions 4.2.0 up until 4.3.5 there was a bug which prevented SRIOV from being enabled even once.  

In v4.3.5 the bug was fixed and SRIOV can be enabled but only once.  If SRIOV is disabled it cannot be re-enabled again without the WARN message.

Full stack trace is attached
Comment 1 Kelly 2016-02-26 16:42:22 UTC
>     - dmesg shows the following WARNING  
> "WARNING: CPU: 1 PID: 3390 at drivers/pci/ats.c:85
> pci_disable_ats+0x26/0xa4()"
> 
> The code surrounding the WARNING is;
> 
> /**
>  * pci_disable_ats - disable the ATS capability
>  * @dev: the PCI device
>  */
> void pci_disable_ats(struct pci_dev *dev)
> {
>         struct pci_dev *pdev;
>         u16 ctrl;
> 
>         if (WARN_ON(!dev->ats_enabled))
>                 return;
> 
>         if (atomic_read(&dev->ats_ref_cnt))
>                 return;         /* VFs still enabled */
> 
> 


I have made some progress on this and it is related in part to the asymmetrical nature of the iommu attach/detach calls.
There are three code flows that need to be examined and I have summarized them below.

The iommu attach code flows as follows;
.attach_dev = amd_iommu_attach_device
    --> amd_iommu_attach_device
        --> get dev_data from dev.archdata.iommu
               If dev_data->domain != NULL  --> detach_device()   <-- the WARNING is coming from this code path.
               attach_device ()

The iommu detach code path flows as follows; .detach_dev = amd_iommu_detach_device
    --> amd_iommu_detach_device 
        --> detach_device()
            -->  __detach_device()
                --> do_detach()
                    --> set dev_data->domain = NULL;
            --> pci_disable_ats()    <-- expects ats_enabled == 1 on entry
                --> Set ats_enabled = 0

And finally when pci_enable_sriov() is called the following flow is important;
--> virtfn_add()    <-- allocates a new virtual function device
    --> pci_device_add()
        --> iommu_init_device()
            --> find_dev_data()

Now here is the problem.  
When pci_enable_sriov is called for the first time, amd_iommu_attach_device() gets called and it adds dev->archdata.iommu to its dev_data list (stored by device_id which looks like it is related to BDF) . Remember that the value of dev_data->domain is saved here.

When pci_disable_sriov is called, amd_iommu_detach_device is NOT called.  The dev_data for the device is still on the iommu dev_data list even though the device has been destroyed.

When pci_enable_sriov() is called for the second time it creates new dev's for the virtual functions BUT the dev_data for the dev (identified by device_id) still remains in the iommu dev_data list. 

Iommu_init_device() searches the dev_data list to see if a dev_data already exists for this dev. It erroneously finds one.  When amd_iommu_attach_device() is ultimately called it uses the dev_data from the previous dev and sees that dev_data->domain != NULL.  This causes the call to detach_device() which eventually calls pci_disable_ats().  BUT this is a new dev and ats is not enabled yet and the ats_enabled flag == 0.  Hence the WARNING and bug.

I have done the triage but I am not sure where the fix should be.  I quite accidentally found the following somewhat related thread at http://www.gossamer-threads.com/lists/linux/kernel/2225731.  It seems that he is having a similar problem but on a different platform.

I don't know if the asynchronous nature of the iommu attach/detach is by design or if it is broken somewhere up the tree and just not working in my case.  Maybe one of the iommu owners could answer this.

Note You need to log in before you can comment on or make changes to this bug.