Bug 195891 - PCI resource allocation failed for AMD SR-IOV capable GPU
Summary: PCI resource allocation failed for AMD SR-IOV capable GPU
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-27 08:11 UTC by collins.cheng
Modified: 2017-05-30 15:46 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg log with "pci=earlydump". Also lspci log included. (171.36 KB, text/plain)
2017-05-27 08:11 UTC, collins.cheng
Details

Description collins.cheng 2017-05-27 08:11:10 UTC
Created attachment 256741 [details]
dmesg log with "pci=earlydump". Also lspci log included.

Some AMD GPUs have hardware support for graphics SR-IOV.
If the SR-IOV capable GPU is plugged into the SR-IOV incapable platform. It would cause a problem on PCI resource allocation in current Linux kernel.

If the device is an AMD graphics device and it supports SR-IOV it will require a large amount of resources.

If the system BIOS supports SR-IOV, it will reserve enough resource for all VF BARs. If the system BIOS doesn't support SR-IOV or cannot allocate the enough resource for VF BARs, only PF BAR will be assigned and VF BARs are empty. Then system boots to Linux kernel and kernel doesn't check if the VF BARs are empty or valid. Kernel will re-assign the BAR resources for PF and all VFs. The problem I saw is that kernel will fail to allocate PF BAR resource because some resources are assigned to VF, this is not expected. The related code is in sriov_init() in file iov.c.

So kernel might need to do some check before re-assign the PF/VF resource, so that PF device will be correctly assigned BAR resource and user can use PF device.

Actually this issue can be resolved by adding "pci=realloc=off" in grub boot parameter. But not each user has the knowledge to add this partameter. I hope kernel could handle this resource allocation failure gracefully, like keeps the resource as the BIOS assigned value if there is a failure on device's resource reallocation.

I attached a log file 2.txt that shows the failure when resource allocation for PF and VF. Search device "01:00.0" could see the log.
Comment 1 collins.cheng 2017-05-27 08:14:27 UTC
To reproduce this issue, it would need a normal Intel or AMD X86 desktop motherboard, disable "Above 4G Decoding" in CMOS setup. Then plug a AMD SR-IOV capable GPU card (e.g. AMD Fire Pro S7150) in PCIE slot.

Note You need to log in before you can comment on or make changes to this bug.