Bug 52211 - "pci=nocrs" required to get BAR address assignment for SR-IOV capable card on IBM M4 x3650
Summary: "pci=nocrs" required to get BAR address assignment for SR-IOV capable card on...
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-03 08:50 UTC by Frank Haverkamp
Modified: 2013-09-10 22:00 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.8.0-rc1+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg without using pci=nocrs (120.25 KB, text/plain)
2013-01-03 08:52 UTC, Frank Haverkamp
Details
Output of acpidump tool (148.51 KB, text/plain)
2013-01-03 08:54 UTC, Frank Haverkamp
Details
Output of dmidecode tool (24.83 KB, text/plain)
2013-01-03 09:06 UTC, Frank Haverkamp
Details
New BIOS without pci=nocrs (24.82 KB, text/plain)
2013-01-09 09:01 UTC, Frank Haverkamp
Details
acpidump using a new BIOS without pci=nocrs (150.97 KB, text/plain)
2013-01-09 09:02 UTC, Frank Haverkamp
Details
dmidecode using a new BIOS without pci=nocrs (24.82 KB, text/plain)
2013-01-09 09:02 UTC, Frank Haverkamp
Details
lspci -vvvs 11:00.0 of my card using a new BIOS without pci=nocrs (3.01 KB, text/plain)
2013-01-09 09:03 UTC, Frank Haverkamp
Details
dmesg with new BIOS, card is working, but still warnings. (102.36 KB, text/plain)
2013-01-10 08:19 UTC, Frank Haverkamp
Details

Description Frank Haverkamp 2013-01-03 08:50:46 UTC
I was trying out an SR-IOV capable PCIe card with 1 physical function and 15 virtual functions requiring each 128MiB a large BAR space on a RHEL6.2 installation (2.6.36). This was working OK for me.

Than I switched to a more recent Fedora17 installation with a 3.x kernel and I got messages like below:

[haverkam@oc7383187364 sw]$ grep 11:00.0 dmesg_broken_probing.txt
[    0.238611] pci 0000:11:00.0: [1014:044b] type 00 class 0x120000
[    0.238625] pci 0000:11:00.0: reg 10: [mem 0x3c07f0000000-0x3c07f7ffffff 64bit pref]
[    0.238720] pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
[    0.291320] pci 0000:11:00.0: BAR 0: reserving [mem 0x3c07f0000000-0x3c07f7ffffff flags 0x14220c] (d=1, p=1)
[    0.291322] pci 0000:11:00.0: no compatible bridge window for [mem 0x3c07f0000000-0x3c07f7ffffff 64bit pref]
[    0.291327] pci 0000:11:00.0: BAR 7: reserving [mem 0x3c0778000000-0x3c07efffffff flags 0x14220c] (d=1, p=1)
[    0.291329] pci 0000:11:00.0: no compatible bridge window for [mem 0x3c0778000000-0x3c07efffffff 64bit pref]
[    0.304846] pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
[    0.305851] pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
[    0.305853] pci 0000:11:00.0: BAR 0: can't assign mem pref (size 0x8000000)
...

The 3.x kernel was unable to assign an address to the card. I updated to the latest git kernel with no success.

Finally I ended up in setting "pci=nocrs" on the command line which caused my card to work again. Same with patching my kernel to put my system on the blacklist in acpi.c:

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 0c01261..8249648 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -117,6 +117,17 @@ static const struct dmi_system_id pci_crs_quirks[] __initconst = {
                        DMI_MATCH(DMI_PRODUCT_NAME, "HP xw9300 Workstation"),
                },
        },
+
+       /* FIXME no bugzilla entry yet */
+       {
+               .callback = set_nouse_crs,
+               .ident = "IBM System x3650",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "IBM"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "x3650"),
+               },
+       },
+
        {}
 };

My system is a:
DMI: IBM System x3650 M4 -[7915D2G]-/00W2609, BIOS -[VVE116AUS-1.10]- 06/20/2012
Comment 1 Frank Haverkamp 2013-01-03 08:52:46 UTC
Created attachment 90271 [details]
dmesg without using pci=nocrs
Comment 2 Frank Haverkamp 2013-01-03 08:54:48 UTC
Created attachment 90281 [details]
Output of acpidump tool
Comment 3 Frank Haverkamp 2013-01-03 09:06:23 UTC
Created attachment 90291 [details]
Output of dmidecode tool
Comment 4 Bjorn Helgaas 2013-01-03 22:19:52 UTC
Here's the relevant info from the 3.8.0-rc1 dmesg in comment #1:

  ACPI: PCI Root Bridge [IOH0] (domain 0000 [bus 00-ff])
  pci_bus 0000:00: root bus resource [mem 0x3c0700000000-0x3c077fff0000]
  pci_bus 0000:00: root bus resource [mem 0x3c0780000000-0x3c07ffff0000]

  pci 0000:00:02.0: PCI bridge to [bus 11-15]
  pci 0000:00:02.0:   bridge window [mem 0x3c0778000000-0x3c07f7ffffff
64bit pref]
  pci 0000:00:02.0: no compatible bridge window for [mem
0x3c0778000000-0x3c07f7ffffff 64bit pref]

  pci 0000:11:00.0: [1014:044b] type 00 class 0x120000
  pci 0000:11:00.0: reg 10: [mem 0x3c07f0000000-0x3c07f7ffffff 64bit pref]
  pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
  pci 0000:11:00.0: no compatible bridge window for [mem
0x3c07f0000000-0x3c07f7ffffff 64bit pref]
  pci 0000:11:00.0: no compatible bridge window for [mem
0x3c0778000000-0x3c07efffffff 64bit pref]

The BIOS told us that [mem 0x3c0700000000-0x3c077fff0000] and [mem
0x3c0780000000-0x3c07ffff0000] are routed to PCI bus 0000:00.  It
doesn't mention the 64KB [mem 0x3c077fff0001-0x3c077fffffff] region
between those two apertures, so we have to assume that region is not
routed to bus 0000:00.  But the BIOS programmed the 00:02.0 bridge
window and the 11:00.0 SR-IOV BAR at 0x224 to use that 64KB region.

Since the card works with "pci=nocrs", it seems pretty clear that the
64KB hole between the reported apertures actually *is* routed to bus
0000:00.

This looks like a BIOS bug.  It looks like all the PNP0A08:00 _CRS
apertures above 4GB are missing the last 64KB.

Using "pci=nocrs" or a quirk like the one above avoids this
immediate issue, but it also will keep hot-plug from working because
without the _CRS info, we don't know what resources are available to
assign to hot-added devices.  So it's better to fix the BIOS if you
can.
Comment 5 Gary Hade 2013-01-04 18:45:41 UTC
(In reply to comment #0)
...
> My system is a:
> DMI: IBM System x3650 M4 -[7915D2G]-/00W2609, BIOS -[VVE116AUS-1.10]-
> 06/20/2012

Frank, Please check to see if this issue reproduces with
the latest released uEFI "Version 1.30 - BuildID: VVE124A".
One of the enhancements listed in the change log for 1.20 is
"Support 4GB PCI inventory access."  I am not familiar with
this terminology but I hope it is related.
Comment 6 Frank Haverkamp 2013-01-09 09:01:22 UTC
Hi,

I tried out a new BIOS with the following version:
  DMI: IBM System x3650 M4 -[7915D2G]-/00W2609, BIOS -[VVE124AUS-1.30]- 11/21/20

The system still shows some messages about problems with the address assignment:

[haverkam@oc7383187364 sw]$ grep 11:00.0 dmsg_with_new_bios_without_nocrs.txt 
[    0.249869] pci 0000:11:00.0: [1014:044b] type 00 class 0x120000
[    0.249889] pci 0000:11:00.0: reg 10: [mem 0x3c07f0000000-0x3c07f7ffffff 64bit pref]
[    0.250028] pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
[    0.328180] pci 0000:11:00.0: BAR 0: reserving [mem 0x3c07f0000000-0x3c07f7ffffff flags 0x14220c] (d=1, p=1)
[    0.328182] pci 0000:11:00.0: no compatible bridge window for [mem 0x3c07f0000000-0x3c07f7ffffff 64bit pref]
[    0.328187] pci 0000:11:00.0: BAR 7: reserving [mem 0x3c0778000000-0x3c07efffffff flags 0x14220c] (d=1, p=1)
[    0.328189] pci 0000:11:00.0: no compatible bridge window for [mem 0x3c0778000000-0x3c07efffffff 64bit pref]
[    0.341736] pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
[    0.342743] pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
[    0.342745] pci 0000:11:00.0: BAR 0: assigned [mem 0x3c0000000000-0x3c0007ffffff 64bit pref]
[    0.342759] pci 0000:11:00.0: BAR 0: set to [mem 0x3c0000000000-0x3c0007ffffff 64bit pref] (PCI address [0x3c0000000000-0x3c0007ffffff])
[    0.342774] pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
[    0.342776] pci 0000:11:00.0: BAR 7: assigned [mem 0x3c0008000000-0x3c007fffffff 64bit pref]
[    0.342789] pci 0000:11:00.0: BAR 7: set to [mem 0x3c0008000000-0x3c007fffffff 64bit pref] (PCI address [0x3c0008000000-0x3c007fffffff])
[    0.528321] pci 0000:11:00.0: Signaling PME through PCIe PME interrupt

But the good news is that it seems now to be able to configure bar addresses for my card regardless of those messages. Now I can use my card again ;-)! Many thanks for the analysis and hints about the new BIOS version. I attach the new logs, such that you can have a look what might still be not like it should be.

Regards,

Frank
Comment 7 Frank Haverkamp 2013-01-09 09:01:55 UTC
Created attachment 90831 [details]
New BIOS without pci=nocrs
Comment 8 Frank Haverkamp 2013-01-09 09:02:28 UTC
Created attachment 90841 [details]
acpidump using a new BIOS without pci=nocrs
Comment 9 Frank Haverkamp 2013-01-09 09:02:48 UTC
Created attachment 90851 [details]
dmidecode using a new BIOS without pci=nocrs
Comment 10 Frank Haverkamp 2013-01-09 09:03:47 UTC
Created attachment 90861 [details]
lspci -vvvs 11:00.0 of my card using a new BIOS without pci=nocrs
Comment 11 Bjorn Helgaas 2013-01-09 17:19:49 UTC
Thanks, Frank.  Can you attach the complete dmesg log?  I suspect you intended to in comment #7, but it's dmidecode output, not dmesg.  The "no compatible bridge window" messages mean the BIOS left things configured in a way inconsistent with the host bridge _CRS, so I think there's still something wrong.
Comment 12 Frank Haverkamp 2013-01-10 08:19:30 UTC
Created attachment 90961 [details]
dmesg with new BIOS, card is working, but still warnings.
Comment 13 Bjorn Helgaas 2013-01-10 22:14:56 UTC
From the 3.8.0-rc1+ dmesg log in comment #12:

  ACPI: PCI Root Bridge [IOH0] (domain 0000 [bus 00-ff])
  pci_bus 0000:00: root bus resource [mem 0x3c0700000000-0x3c077fffffff]
  pci_bus 0000:00: root bus resource [mem 0x3c0780000000-0x3c07ffffffff]
  pci 0000:00:02.0: PCI bridge to [bus 11-15]
  pci 0000:00:02.0:   bridge window [mem 0x3c0778000000-0x3c07f7ffffff 64bit pref]
  pci 0000:00:02.0: no compatible bridge window for [mem 0x3c0778000000-0x3c07f7ffffff 64bit pref]

The two host bridge apertures quoted above are contiguous, so the 00:02.0 bridge window should actually be valid.  But Linux isn't smart enough to coalesce the host bridge apertures; it tries to allocate the 00:02.0 bridge window from each host bridge aperture in turn.  Since no single aperture contains the entire window, it fails, and we discard the 00:02.0 window.

Discarding the 00:02.0 [mem 0x3c0778000000-0x3c07f7ffffff] window leads to the similar failures on 11:00.0 because its BARs were originally inside that window.

The only possible Linux change I can think of would be to support PCI device allocations that span multiple contiguous host bridge apertures.  That's certainly possible, and maybe even desirable.  But we try not to touch the resources given us by the BIOS, partly because we have to feed them back to the BIOS in the same format when we change device resource assignments with _SRS.  So I'm hesitant to put work into this right now, since this is the only report I've seen and we are able to reassign the bridge window and the device ends up working.

It would be possible for the BIOS to change its allocation strategy so device assignments always fit in a single _CRS aperture, of course, which would also get rid of the warning.

The 16:00.0 issue with [mem 0xfffe0000-0xffffffff pref] is for the ROM BAR.  That doesn't seem like a valid address.  It's probably left over from the BIOS sizing the ROM BAR.  There's no host bridge aperture that contains that range, and it seems unlikely anyway because the area just under 4G is typically ROM and reset vector stuff on x86.  So this might be a BIOS issue.  But the BIOS probably didn't hand off with the ROM enabled, so one could argue that Linux should just ignore the contents of the ROM BAR completely (without printing the warning) because we have no way of knowing whether it's valid.  This is similar to this existing issue: https://bugzilla.kernel.org/show_bug.cgi?id=48451
Comment 14 Gary Hade 2013-01-11 00:20:37 UTC
Bjorn, Something else that I found interesting is that the 11:00.0 BIOS
pre-assigned ranges:
  0000:11:00.0: reg 10:  [mem 0x3c07f0000000-0x3c07f7ffffff 64bit pref]
  0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]
from which the 00:02.0 [mem 0x3c0778000000-0x3c07f7ffffff] bridge window
was derived are non-contiguous so the bridge window also includes
0x3c0780000000-0x3c07efffffff which AFAIKT is not used by any other
devices below the bridge.  This makes me wonder if bridges could have
multiple windows with each checked separately against the individual
(non-coalesced) host bridge apertures.

Frank, Since the new BIOS's _CRS is now making the previously
excluded 64KB holes (re: Comment #4) available allowing successful
allocation of memory from one of the other host bridge apertures,
it appears to me that you can safely use the card.
Comment 15 Bjorn Helgaas 2013-01-11 17:29:53 UTC
Host bridges can have an arbitrary number of windows, of course.  PCI-PCI bridges like 00:02.0 can have two memory windows: the non-prefetchable one that is restricted to the area below 4GB, and the prefetchable one that can be anywhere in the 64-bit address space.  In this case, 00:02.0 has only the prefetchable window enabled.

So a PCI-PCI bridge *can* have multiple windows, and Linux already checks them individually against upstream resources, i.e., for 00:02.0, each window is checked separately against the host bridge apertures.

But I'm not sure I understood your question correctly.  Oh, maybe this is it ... the 11:00.0 reg 224 SR-IOV BAR fits completely inside one host bridge aperture:

  pci_bus 0000:00: root bus  [mem 0x3c0700000000-0x3c077fffffff]
  pci 0000:11:00.0: reg 224: [mem 0x3c0778000000-0x3c077fffffff 64bit pref]

and the 11:00.0 reg 10 BAR fits inside another host bridge aperture:

  pci_bus 0000:00: root bus  [mem 0x3c0780000000-0x3c07ffffffff]
  pci 0000:11:00.0: reg 10:  [mem 0x3c07f0000000-0x3c07f7ffffff 64bit pref]

So if there were a way to enable two separate windows through the 00:02.0 bridge, we wouldn't need anything that spans two host bridge apertures.  That's true, but unfortunately there's no way to have separate prefetchable windows through a standard PCI-PCI bridge.
Comment 16 Gary Hade 2013-01-11 19:24:37 UTC
Yeah, it sounds like you did mostly understand my wild idea
although my thought was that the multiple windows ("sub windows"
may be a better term) would be a s/w abstraction describing the
contiguous regions of their larger enclosing prefetchable and
non-prefetchable windows.  However, I guess there could be yet
another system where one of the contiguous regions spans 2 or
more host bridge apertures.  Probably a bad idea.
Comment 17 Bjorn Helgaas 2013-09-10 22:00:33 UTC
Based on comment #12, I think things are working even without "pci=nocrs", although we do see some "no compatible bridge window" warnings that cause us to reallocate some BARs unnecessarily.

I'm going to close this because I think it's fixable in the BIOS (e.g., by coalescing the host bridge windows it reports), and I don't see a good way to fix it in Linux.

Please reopen if you think we still need to do something here.

Note You need to log in before you can comment on or make changes to this bug.