Bug 92671 - Linux denies 64-bit non-pref BARs in 64-bit prefetchable space even though it's safe
Summary: Linux denies 64-bit non-pref BARs in 64-bit prefetchable space even though it...
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-04 11:16 UTC by Daniel J Blueman
Modified: 2015-07-30 02:34 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.19-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
PCI listing (226.57 KB, text/plain)
2015-02-04 11:16 UTC, Daniel J Blueman
Details
kernel log showing BARs being inhibited (98.00 KB, text/plain)
2015-02-04 11:17 UTC, Daniel J Blueman
Details
PNP: Don't check for overlaps with unassigned PCI BARs (1.59 KB, patch)
2015-03-10 01:16 UTC, Bjorn Helgaas
Details | Diff
PCI: Mark invalid BARs as unassigned (1.18 KB, patch)
2015-03-10 01:19 UTC, Bjorn Helgaas
Details | Diff
PCI: Show driver, BAR#, and resource on pci_ioremap_bar() failure (1.33 KB, patch)
2015-03-10 01:24 UTC, Bjorn Helgaas
Details | Diff
PCI: Fail pci_ioremap_bar() on unassigned resources (1.10 KB, patch)
2015-03-10 01:25 UTC, Bjorn Helgaas
Details | Diff
Kernel messages for unpatched 4.0-rc4 (109.98 KB, text/plain)
2015-03-20 08:46 UTC, Daniel J Blueman
Details
Kernel messages for 4.0-rc4 w/ Bjorn's 4 patches and my BAR flag patch (105.95 KB, text/plain)
2015-03-20 08:47 UTC, Daniel J Blueman
Details

Description Daniel J Blueman 2015-02-04 11:16:06 UTC
Created attachment 165821 [details]
PCI listing

With systems with a large number of PCI devices on NumaConnect systems, we're seeing lack of 32-bit MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space.

The PCIe 2.1 spec and later provides guidance on limitations with 64-bit
non-prefetchable BARs (since bridges have only 32-bit non-prefetchable
ranges) stating that vendors can enable the prefetchable bit in BARs under
certain circumstances to allow 64-bit allocation [1].

Vendors can't know apriori what hosts their products will be in, so can't just advertise prefetchable 64-bit BARs. What can be done, is system firmware can use the 64-bit prefetchable BAR in bridges, and assign a 64-bit non-prefetchable device BAR into that area, where it is safe to do so (following the guidance).

At present, linux denies such allocations and disables the BARs (causing the crash in the attached log). It seems a practical solution to allow such BAR placement if the firmware believes it is safe.

In the attached kernel log there is insufficient space in the 512MB MMIO32 window ("e820: [mem 0xe0000000-0xefffffff] available for PCI devices") for the 3x128MBs of BARs, so NumaConnect firmware allocates the 64-bit non-prefetchable BARs in prefetchable space. [I took the 'lspci' output by adding 'if (pci_domain_nr(pdev->bus)) return -ENODEV;' to mpt2sas_base_map_resources().]

Is there scope to follow this PCIe guidance and relax the requirements? This is at least part of code:

bus.c pci_bus_alloc_from_region():
/* We cannot allocate a non-prefetching resource
   from a pre-fetching area */
if ((r->flags & IORESOURCE_PREFETCH) &&
    !(res->flags & IORESOURCE_PREFETCH))
	continue;

-- [1] p13

https://www.pcisig.com/specifications/pciexpress/base2/PCIe_Base_r2.1_Errata_08Jun10.pdf
Comment 1 Daniel J Blueman 2015-02-04 11:17:04 UTC
Created attachment 165831 [details]
kernel log showing BARs being inhibited
Comment 2 Bjorn Helgaas 2015-03-10 01:16:02 UTC
Created attachment 170041 [details]
PNP: Don't check for overlaps with unassigned PCI BARs

This isn't a fix by itself, but it should get rid of some of the annoying messages like:

  pnp 00:00: disabling [io  0x0061] because it overlaps 0001:05:00.0 BAR 0 [io  0x0000-0x00ff]

This message is incorrect because the PCI BAR 0 should be marked unassigned.  If it were, we wouldn't enable it, so it couldn't conflict with anything.
Comment 3 Bjorn Helgaas 2015-03-10 01:19:43 UTC
Created attachment 170051 [details]
PCI: Mark invalid BARs as unassigned
Comment 4 Bjorn Helgaas 2015-03-10 01:24:54 UTC
Created attachment 170061 [details]
PCI: Show driver, BAR#, and resource on pci_ioremap_bar() failure
Comment 5 Bjorn Helgaas 2015-03-10 01:25:29 UTC
Created attachment 170071 [details]
PCI: Fail pci_ioremap_bar() on unassigned resources
Comment 6 Bjorn Helgaas 2015-03-10 01:29:06 UTC
Daniel, can you try the patches in comments #2-5 (in that order)?  They won't help assign resources, but we should fail gracefully instead of dereferencing a null pointer.
Comment 7 Daniel J Blueman 2015-03-20 08:42:37 UTC
Many thanks for the patches; I booted 4.0-rc4 with the last four, and no change. It looks like the test for the BAR being unset is too late. The "Disabling ... because if overlaps ..." messages do go if I add:

--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -281,6 +281,13 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 	pcibios_resource_to_bus(dev->bus, &inverted_region, res);

 	/*
+	 * If firmware doesn't assign a valid PCI address (as legacy IO is below
+	 * PCI IO), mark resource unset to prevent later resource conflicts
+	 */
+	if (region.start == 0)
+		res->flags |= IORESOURCE_UNSET;
+
+	/*
 	 * If "A" is a BAR value (a bus address), "bus_to_resource(A)" is
 	 * the corresponding resource address (the physical address used by
 	 * the CPU.  Converting that resource address back to a bus address

I'll attach the dmesg boot output in the unpatched, and with you 4 four patches and this patch.
Comment 8 Daniel J Blueman 2015-03-20 08:46:49 UTC
Created attachment 171381 [details]
Kernel messages for unpatched 4.0-rc4
Comment 9 Daniel J Blueman 2015-03-20 08:47:30 UTC
Created attachment 171391 [details]
Kernel messages for 4.0-rc4 w/ Bjorn's 4 patches and my BAR flag patch
Comment 10 Daniel J Blueman 2015-07-30 02:34:25 UTC
Bjorn's patches landed in mainline a bit ago, so help the unexpected conflicts.

Yinghai added support 64-bit non-prefetchable BARs in 64-bit prefetchable space in patch 34 of:
https://groups.google.com/forum/?fromgroups#!topic/linux.kernel/AieDDCG3JcM

Note You need to log in before you can comment on or make changes to this bug.