Bug 104931 - Expansion ROM Base Address resource requirements
Summary: Expansion ROM Base Address resource requirements
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 low
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-24 01:32 UTC by Myron Stowe
Modified: 2016-10-21 01:30 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments
expansion_rom_dmesg.txt (Expansion ROM BARs not programmed by BIOS) (59.02 KB, text/plain)
2015-09-24 01:32 UTC, Myron Stowe
Details
expansion_rom_dmesg2.txt (94.72 KB, text/plain)
2016-09-01 19:23 UTC, Myron Stowe
Details

Description Myron Stowe 2015-09-24 01:32:58 UTC
Created attachment 188211 [details]
expansion_rom_dmesg.txt (Expansion ROM BARs not programmed by BIOS)

I've encountered numerous bugzilla reports related to platform BIOS' not
programming valid values into a PCI device's Type 0 Configuration space
"Expansion ROM Base Address" field (a.k.a. Expansion ROM BAR).  The main
observed consequence being 'dmesg' entries like the following that get
customers excited enough to file reports against the kernel.

  pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]:
    no compatible bridge window
  pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]:
    no compatible bridge window

After I've provided an analysis similar to [1] (full 'dmesg' log is attached
as 'expansion_rom_dmesg.txt'), explaining the issue, the respective BIOS response (teams from two of the major vendors) is typically:
   "The OS has no business touching the Expansion ROM BARs and it
    provides no value to the equation here.  The Expansion ROM BAR
    is only useful in pre-boot for the BIOS to get boot code from
    a device."

This scenario has occurred enough times now that I'd like to attempt to
"raise the bar" and invite a technically merit based discussion concerning
this topic - via a public forum that is archived and provides a source of
reference for use upon future occurrences - and see if a consensus can be
reached between the various vendor's BIOS engineers and kernel engineers.


A little more background context -

The kernel expects device Expansion ROM BARs to be programmed with valid
values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the
device’s expansion ROM address space is disabled).  This seems to be the
main contention point with said BIOS engineers.  If an Expansion ROM BAR is
not programmed, the kernel will attempt to find available resources and, if
successful, program it.  As this occurs various 'dmesg' entries
related to kernel's actions are output.

Note that for devices that share decoders between the Expansion ROM BAR and
other BARs the firmware (probably) should not enable the Expansion ROM BAR
at hand-off to the operating system (see the last paragraph of the PCI
Firmware Specification, Rev 3.2, Section 3.5 "Device State at
Firmware/Operating System Handoff").

There is a kernel boot parameter, pci=norom, that is intended to disable the
kernel's resource assignment actions for Expansion ROMs that do not already
have BIOS assigned address ranges.  Note however, if I remember correctly,
that this only works if the Expansion ROM BAR is set to "0" by the BIOS
before hand-off.




[1]  Annotated 'dmesg' log concerning Expansion ROM BARs not setup by BIOS

The "can't claim" messages of interest are:
  pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]:
  no compatible bridge window
  pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]:
  no compatible bridge window

The PCI devices of interest are a device at PCI Bus 1, Device 0, Function
0 (01:00.0) and another device at PCI Bus 4, Device 3, Function 0 (04:03.0).

The "root bridge" that leads to PCI buses 1 and 4 - the buses of interest -
is "PCI0" and its I/O Port space and Memory Mapped I/O (MMIO) space are:

 ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
 PCI host bridge to bus 0000:00
 pci_bus 0000:00: root bus resource [bus 00-fe]
 pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
 pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
 pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
 pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfeafffff]

It's helpful to gather up all the resource related information pertaining to
the devices of interest in one place.  Concentrating on the PCI-to-PCI
bridges and individual PCI devices that lead to 01:00.0, the first device
exhibiting the "can't claim" message (everything that is consuming resources
on PCI bus 0 and PCI bus 1):

    pci 0000:00:1a.0: [8086:1c2d] type 00 class 0x0c0320
    pci 0000:00:1a.0: reg 0x10: [mem 0xc1305000-0xc13053ff]

    pci 0000:00:1d.0: [8086:1c26] type 00 class 0x0c0320
    pci 0000:00:1d.0: reg 0x10: [mem 0xc1304000-0xc13043ff]

    pci 0000:00:1f.2: [8086:1c00] type 00 class 0x01018f
    pci 0000:00:1f.2: reg 0x10: [io  0x3078-0x307f]
    pci 0000:00:1f.2: reg 0x14: [io  0x308c-0x308f]
    pci 0000:00:1f.2: reg 0x18: [io  0x3070-0x3077]
    pci 0000:00:1f.2: reg 0x1c: [io  0x3088-0x308b]
    pci 0000:00:1f.2: reg 0x20: [io  0x3050-0x305f]
    pci 0000:00:1f.2: reg 0x24: [io  0x3040-0x304f]

    pci 0000:00:1f.3: [8086:1c22] type 00 class 0x0c0500
    pci 0000:00:1f.3: reg 0x10: [mem 0xc1302000-0xc13020ff 64bit]
    pci 0000:00:1f.3: reg 0x20: [io  0x3000-0x301f]

    pci 0000:00:1f.5: [8086:1c08] type 00 class 0x010185
    pci 0000:00:1f.5: reg 0x10: [io  0x3068-0x306f]
    pci 0000:00:1f.5: reg 0x14: [io  0x3084-0x3087]
    pci 0000:00:1f.5: reg 0x18: [io  0x3060-0x3067]
    pci 0000:00:1f.5: reg 0x1c: [io  0x3080-0x3083]
    pci 0000:00:1f.5: reg 0x20: [io  0x3030-0x303f]
    pci 0000:00:1f.5: reg 0x24: [io  0x3020-0x302f]

  pci 0000:00:01.0: PCI bridge to [bus 01]
  pci 0000:00:01.0:   bridge window [io  0x2000-0x2fff]
  pci 0000:00:01.0:   bridge window [mem 0xc1200000-0xc12fffff]

    pci 0000:01:00.0: [1000:0072] type 00 class 0x010700
      [1000:0072] - LSI (Symbios) Logic : SAS2008 PCIe Fusion-MPT SAS-2
    pci 0000:01:00.0: reg 0x10: [io  0x2000-0x20ff]
    pci 0000:01:00.0: reg 0x14: [mem 0xc1240000-0xc124ffff 64bit]
    pci 0000:01:00.0: reg 0x1c: [mem 0xc1200000-0xc123ffff 64bit]
x   pci 0000:01:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref]


The PCI-to-PCI bridge device for Bus 0 to Bus 1 only has one memory space
apeture ("bridge window") active - [mem 0xc1200000-0xc12fffff].  This must
be a subset of one of the "root bus" memory resources and looking at those,
it is.

The target device - 01:00.0; an 'mpt2sas' device - consumes three memory
ranges.  These correspond to the device's BAR 1 and 2 (64 bit addresses
consume two BAR registers), BAR 3 and 4, and the "Expansion ROM Base Address
(a.k.a. BAR 6).  These also must be a subset of both the corresponding root
bus resources and all PCI-to-PCI bridge devices in the PCI hiearchy leading
to the device itself.  Looking at them we see that the first two satisfy the
requirement but the third - [mem 0xfff00000-0xffffffff pref] - does not!
It's because the "Expansion ROM Base Address" register (a.k.a. BAR 6) does
not adhear to the subset requirement(s) that the kernel later outputs:

 pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]:
 no compatible bridge window
 pci 0000:01:00.0: BAR 6: no space for [mem size 0x00100000 pref]
 pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00100000 pref]

The "can't claim" message is the kernel alerting us that the BIOS has not
correctly set up resources that fulfil all the requirements (subset,
alignment, type, ...).

The kernel then "sizes" the BAR to see how much address space that BAR
requires - in this case we see BAR 6 of 01:00.0 needs 1 MB of contiguous
space - and subsequently tries to work around the BIOS' failure, attempting
to find available, currently unused, resource space that meets all the
requirements which is where the "no space for" message comes from.

There is no contiguous space that meets all the requirements (subset,
alignment, type, ...) available which is fairly easy to see here; there was
only a 1 MB memory aperture provided by the PCI-to-PCI bridge device to
begin with and the 01:00.0 device consumed subsets of that for BARs 1 and 3
so there is no way 1 MB remains free to satisfy BAR 6's needs.  And so the
kernel outputs the "failed to assign" message.


In a very similar scenario, the 04:03.0 device also has not been properly
set up by BIOS ([mem 0xffff0000-0xffffffff pref]).  The difference in this
case is that there were enough available resources left to satisfy all the
subset, alignment, type, ..., requirements and thus the kernel was able to
allocate from such and re-program the device's BAR 6 ([mem
0xc1010000-0xc101ffff pref]) so that the device can function correctly.

  pci 0000:00:1e.0: PCI bridge to [bus 04] (subtractive decode)
  pci 0000:00:1e.0:   bridge window [mem 0xc0800000-0xc10fffff]
  pci 0000:00:1e.0:   bridge window [mem 0xc0000000-0xc07fffff 64bit pref]
  pci 0000:00:1e.0:   bridge window [io  0x0000-0x0cf7] (subtractive decode)
  pci 0000:00:1e.0:   bridge window [io  0x0d00-0xffff] (subtractive decode)
  pci 0000:00:1e.0:   bridge window [mem 0x000a0000-0x000bffff] (subtract d)
  pci 0000:00:1e.0:   bridge window [mem 0xc0000000-0xfeafffff] (subtract d)

    pci 0000:04:03.0: [102b:0532] type 00 class 0x030000
      [102b:0532] - Matrox : MGS G200eW WPCM450 (Graphics)
    pci 0000:04:03.0: reg 0x10: [mem 0xc0000000-0xc07fffff pref]
    pci 0000:04:03.0: reg 0x14: [mem 0xc1000000-0xc1003fff]
    pci 0000:04:03.0: reg 0x18: [mem 0xc0800000-0xc0ffffff]
x   pci 0000:04:03.0: reg 0x30: [mem 0xffff0000-0xffffffff pref]

 pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]:
 no compatible bridge window
 pci 0000:04:03.0: BAR 6: assigned [mem 0xc1010000-0xc101ffff pref]
Comment 1 Myron Stowe 2016-09-01 19:23:04 UTC
Created attachment 231711 [details]
expansion_rom_dmesg2.txt
Comment 2 Myron Stowe 2016-09-01 19:27:33 UTC
Another recent example of Expansion ROM related resource issues...

Researching device 0000:04:00.3 as it's the device with the issue (and all
other devices/functions under PCI bus 04 due to possible competing resource
needs).


Analysis from v4.7.0 kernel run 'dmesg' log with comments interspersed (dmesg log attached to BZ as "expansion_rom_dmesg2.txt") ...

This platform has two PCI Root Bridges.  Limiting analysis to the first
Root Bridge handling PCI buses 0x00 through 0x7e as it contains the
PCI bus in question - bus 04.

  ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7e])
  PCI host bridge to bus 0000:00
  pci_bus 0000:00: root bus resource [io  0x0000-0x03bb window]
  pci_bus 0000:00: root bus resource [io  0x03bc-0x03df window]
  pci_bus 0000:00: root bus resource [io  0x03e0-0x0cf7 window]
  pci_bus 0000:00: root bus resource [io  0x1000-0x7fff window]
  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
  pci_bus 0000:00: root bus resource [mem 0x90000000-0xc7ffbfff window]
  pci_bus 0000:00: root bus resource [mem 0x30000000000-0x33fffffffff window]

CPU addresses falling into the above resource ranges will get intercepted
by the host controller and converted into PCI bus transactions.  Looking
further into the log we find the set of resource ranges (PCI-to-PCI bridge
apertures) corresponding to PCI bus 04.

  pci 0000:00:02.0: PCI bridge to [bus 04]
  pci 0000:00:02.0:   bridge window [io  0x2000-0x2fff]
  pci 0000:00:02.0:   bridge window [mem 0x92000000-0x940fffff]          33M

The following shows what the platforms BIOS programmed into the BARs of
device(s) under PCI bus 04.

  pci 0000:04:00.0: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.0: reg 0x10: [io  0x2300-0x23ff]
  pci 0000:04:00.0: reg 0x18: [mem 0x93800000-0x93ffffff 64bit]         BAR2
  pci 0000:04:00.0: reg 0x20: [mem 0x9400c000-0x9400ffff 64bit]         BAR4
  pci 0000:04:00.0: reg 0x30: [mem 0xfffc0000-0xffffffff pref]          E ROM
  pci 0000:04:00.1: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.1: reg 0x10: [io  0x2200-0x22ff]
  pci 0000:04:00.1: reg 0x18: [mem 0x93000000-0x937fffff 64bit]
  pci 0000:04:00.1: reg 0x20: [mem 0x94008000-0x9400bfff 64bit]
  pci 0000:04:00.1: reg 0x30: [mem 0xfffc0000-0xffffffff pref]
  pci 0000:04:00.2: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.2: reg 0x10: [io  0x2100-0x21ff]
  pci 0000:04:00.2: reg 0x18: [mem 0x92800000-0x92ffffff 64bit]
  pci 0000:04:00.2: reg 0x20: [mem 0x94004000-0x94007fff 64bit]
  pci 0000:04:00.2: reg 0x30: [mem 0xfffc0000-0xffffffff pref]
  pci 0000:04:00.3: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.3: reg 0x10: [io  0x2000-0x20ff]
  pci 0000:04:00.3: reg 0x18: [mem 0x92000000-0x927fffff 64bit]           8M
  pci 0000:04:00.3: reg 0x20: [mem 0x94000000-0x94003fff 64bit]          16K
  pci 0000:04:00.3: reg 0x30: [mem 0xfffc0000-0xffffffff pref]          256K

It's already obvious that the 33M of MMIO space that the PCI-to-PCI bridge
leading to PCI bus 04 provides (0x92000000-0x940fffff) is not enough space
to fully satisfy the MMIO specific addressing needs of all device's BARs
below it - the 4 combined ports - totaling (8M + 16K + 256K) *4) = 33M + 64K.
This is _without_ taking into account any alignment constraints that likely
would increase the buses needed aperture range even further.

Note that the values programmed into the device's Expansion ROM BARs do not
fit within any of its immediately upstream bridge's MMIO related apertures.

  pci 0000:04:00.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window
  pci 0000:04:00.1: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window
  pci 0000:04:00.2: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window
  pci 0000:04:00.3: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window

The kernel notices this and attempts to allocate appropriate space for them
from any remaining, available, MMIO space that meets all the alignment
constraints and such.

  pci 0000:04:00.0: BAR 6: assigned [mem 0x94040000-0x9407ffff pref]
  pci 0000:04:00.1: BAR 6: assigned [mem 0x94080000-0x940bffff pref]
  pci 0000:04:00.2: BAR 6: assigned [mem 0x940c0000-0x940fffff pref]
  pci 0000:04:00.3: BAR 6: no space for [mem size 0x00040000 pref]
  pci 0000:04:00.3: BAR 6: failed to assign [mem size 0x00040000 pref]

The kernel was able to satisfy the first three ports MMIO needs but was
_not_ able to for the last port - there is no remaining available
addressing space within the range to satisfy its needs!

At this point the 0000:04:00.3 device just happens to work by luck due to
the fact that the unmet resource needs correspond to its Expansion ROM
BAR [1].


Next a "user" initiates a PCIe hot-unplug of the sfn7x22f device, the bus
is re-scanned and as a result, BAR4 of all 4 of the device's functions fail
getting their appropriate resources allocated.

  pci 0000:00:02.0: PCI bridge to [bus 04]
  pci 0000:00:02.0:   bridge window [io  0x2000-0x2fff]
  pci 0000:00:02.0:   bridge window [mem 0x92000000-0x940fffff]          33M

  pci 0000:04:00.0: BAR 2: assigned [mem 0x92000000-0x927fffff 64bit]
  pci 0000:04:00.1: BAR 2: assigned [mem 0x92800000-0x92ffffff 64bit]
  pci 0000:04:00.2: BAR 2: assigned [mem 0x93000000-0x937fffff 64bit]
  pci 0000:04:00.3: BAR 2: assigned [mem 0x93800000-0x93ffffff 64bit]
  pci 0000:04:00.0: BAR 6: assigned [mem 0x94000000-0x9403ffff pref]
  pci 0000:04:00.1: BAR 6: assigned [mem 0x94040000-0x9407ffff pref]
  pci 0000:04:00.2: BAR 6: assigned [mem 0x94080000-0x940bffff pref]
  pci 0000:04:00.3: BAR 6: assigned [mem 0x940c0000-0x940fffff pref]

At this point -all- available MMIO resource space has been consumed.

  For the more visually inclined (if it's not already obvious).  There's
  probably an easier way to visualize the exhaustion but here is my lame
  attempt:

  PCI Bridge 04's MMIO aperture resource range totals 33M
  ( 0x92000000-0x940fffff ).  The first line below denotes the 33M in
  1M increments (chunks).  The second line denotes the addressing range;
  specifically bytes 7 and 6 withing the resource's range ( 0x9--xxxxx ).
  The last line denotes the port (0 through 3) consuming that portion
  of the resource's range.

   1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233    33M
  202122232425262728292a2b2c2d232f303132333435363738393a3b3c3d3e3f40    [76]
   0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3--

  The last 1M is consumed by a smaller granularity so expanding the
  above conceptualization to a finer level.

  1M of resource range ( 94000000-940fffff ) visualized in 32K increments
  ( bytes 5 and 4; 0x940--xxx ).
   1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132       1M
  0008101820283038404850586068707880889098a0a8b0b8c0c8d0d8e0e8f0f8      [54]
   0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3

and the remaining needed resource allocation attempts are going to
fail.

  pci 0000:04:00.0: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.0: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.1: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.1: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.2: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.2: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.3: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.3: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.0: BAR 0: assigned [io  0x2000-0x20ff]
  pci 0000:04:00.1: BAR 0: assigned [io  0x2400-0x24ff]
  pci 0000:04:00.2: BAR 0: assigned [io  0x2800-0x28ff]
  pci 0000:04:00.3: BAR 0: assigned [io  0x2c00-0x2cff]

At this point none of the four functions (ports) - 0000:04:00.{0..3} were
able to get their necessary resource needs met and thus the device's functions
(NIC ports) do not work.  In fact, I would expect the driver's call into
the kernel's PCI core 'pci_enable_device()' routine to fail [1].


Conclusion ...

The root cause of the issue(s) [2] is the platform's BIOS not providing
enough, and setting up properly, resource needs that the device requires -
specifically MMIO addressing space related resources.  Most notably
conspicuous is the device's Expansion ROM BAR(s) as they are improperly
programmed - the initial BIOS programmed values do not fall within any
valid resource ranges of the immediately upstream PCI-to-PCI Bridge's MMIO
apertures.


As for "symptomatic" solutions (just a band-aid to treat the symptom and
not addressing the root cause) ...

Short of getting the platform's BIOS updated to appropriately account for
the device's total needs, a "compromized" solution has been to get them to
program device's Expansion ROM BAR values with '0'.  This has been
done in the past so why the platform's BIOS engineers have chosen not to do that
again in this instance is "out of character" and concerning.  If, and only
if, a device's Expansion ROM BAR is programmed with '0', then adding the
"norom" kernel boot parameter will cause the kernel to ignore, and not
attempt to assign resources to, such.

Short of that, drivers can use, and check the return value of,
pci_enable_rom().  That should fail if it's unassigned.  Looking at it, it
only fails if 'flags == 0' so I'm not sure that catches all cases of it
being unassigned.


[1] For a device's normal BARs - the BARs corresponding to the PCI
    specification's "Base Address 0 through 5" Type 0 configuration header
    space entries - that are initially ill programmed and the kernel can
    not subsequently assign appropriate resources for such, then the
    kernel's PCI core subsystem's 'pci_enable_device()' routine should
    fail.

[2] While the analysis only covers one specific device, the 'dmesg' log
    shows that the same base root cause occurs in at least two additional
    instances.
Comment 3 Lee, Chun-Yi 2016-10-20 07:57:01 UTC
A discussion on mailing list for anyone interests on this topic:

[RFC] PCI: Unassigned Expansion ROM BARs
http://www.gossamer-threads.com/lists/linux/kernel/2265600
Comment 4 Myron Stowe 2016-10-20 17:12:33 UTC
> Yes, we discussed the Exp ROM BAR value not being 0 with our BIOS team.
> Their initial thought is that there is no code added explicitly to set the
> ROM BAR to a non zero value. We are trying to find out from which generation
> systems, this value started getting set to non zero value. Still looking
> into this. 
> 
> Also, want to mention here that xxxx system does not support PCI hotplug.
> Only hotplug scenario supported on this system is - hot plugging PCIE SSD
> disks. 
> 
> While looking into the details, we have the following thoughts and queries.
> It would be helpful to know your thoughts on them.
> 
> 1.  On Expansion ROM BAR not being set to 0:
> 
> While looking into this detail, we referred to the 'section 3.5 of the PCI
> Firmware Specification Rev 3.2'. Section 3.5 seems to indicate that the
> operating system needs to know if the Exp ROM has been configured before
> handing over to OS and the way to check this is to check the Exp ROM BAR
> enable bit. If this bit is unset, the ROM BAR is not configured and OS needs
> to consider the BAR content as invalid by default.
> 
> We went though the below references you provided -
> 
> https://lkml.org/lkml/2015/9/23/737
> https://bugzilla.kernel.org/show_bug.cgi?id=104931
> 
> They seem to explain the need for the OS to access Exp ROM BAR and various
> use cases, but it is not clear why the OS ignores the value of Exp ROM BAR
> enable bit. Any insight/history/context you can share on this will be
> helpful.
> 
> The spec also seems to suggest that regualr BARs and Exp ROM BAR should not
> be enabled together. This may not be an issue as modern devices do not seem
> to share address decoders for regualar BARs and Exp ROM BAR. But wanted to
> mention it for completeness. 


Thanks for getting system firmware feedback and tell them to feel free to continue the discussion if they have other questions or views.


Since a device's hardware implementation _may_ share (BAR) address decoding
between a regular BAR and an Exp ROM BAR it basically becomes a requirement
for the platform firmware to hand off the device to the OS with the Exp ROM
BAR's 'enable' bit disabled.

The specification [1] - section 6.2.5.2 - states:
  "Bit 0 in the register is used to control whether or not the device
   accepts accesses to its expansion ROM."

The key take away is to realize that the Exp Rom BAR's 'enable' bit is
enabling/disabling "address decoding" and _not_ enabling/disabling the Exp
ROM BAR itself in any way (i.e. its _not_ an indicator as to whether or not
the Exp ROM exists [is enabled] or not [is disabled]).

That leads to the logical question: "Then how can the device signal whether
or not it supports an Exp ROM?"

There seems to be a couple of possibilities.  The device can return all 0's
when attempting to "size" the Exp ROM BAR, indicating there is no Exp ROM
content and thus no MMIO resource requirements need be allocated (this seems
to be the consensus approach).  Another possibility could be for the OS to
attempt to read the Exp ROM content, looking for the proper header where the
first two bytes are required to be 0x55 0xAA (this does not seem to be the
correct approach as we've seen devices utilize this area for other, device
specific, means).

The former approach is what occurrs with most platforms and can be seen by
manually attempting to "size" the Exp ROM BAR of two devices, one that does
not have Exp ROM content and one that does:
  // A device without an Exp ROM
  % setpci -s 0f:00.0 30.l              // read the BAR's programmed value
  00000000
  % setpci -s 0f:00.0 30.l=FFFFFFFE     // "size" the BAR
  % setpci -s 0f:00.0 30.l
  00000000

  // A device with an Exp ROM (but currently 'disabled')
  % setpci -s 03:00.1 30.l              // read the BAR's programmed value
  00000000
  % setpci -s 03:00.1 30.l=FFFFFFFE     // "size" the BAR
  % setpci -s 03:00.1 30.l
  FFF80000                              // .5 MB


With respect to the observation concerning the PCI Firmware Specification
[2].  Section 3.5 is concerned with, and focusing upon, device configuration
- what's required and optional for system firmware to configure prior to OS
handoff.  Device configuration concerns whether or not the firmware has
allocated system resources corresponding to the BAR and if so, programmed
the BAR with the platform's address pertaining to such resources.  Most
current platforms configure the entire PCI hierarchy prior to handoff.

The possibility of decoder sharing is also talked about:
  "Since not all devices may be configured prior to the operating system
   handoff, the operating system needs to know whether a specific BAR
   register has been configured by firmware. The operating system makes the
   determination by checking the I/O Enable, and Memory Enable bits in the
   device's command register, and Expansion ROM BAR enable bits. If the
   enable bit is set, then the corresponding resource register has been
   configured."

Later the paragraph is repeated with an additional sentence at the end:
  "...  If the enable bit is not set, the operating system cannot assume
   that the associated BAR register contains valid information."

And then covered again in the last paragraph of the section.

These paragraphs are discussing how the OS can determine whether or not the
system firmware has configured the BAR.  They are specifically concerned
with how the OS can determine whether or not the BAR itself is valid or not
(i.e. has the BAR been configured appropriately by system firmware or not).

As we have already seen, the Exp ROM BAR enable bit is for enabling or
disabling the BAR's decoding logic, not anything to do with whether or not
Exp ROM content exists or not.  The only mention concerning Exp ROM content
itself is the very last sentence of the section and it just mentions
validity, not enabling/disabling - "... and that the Expansion ROM BAR
content is correct."


[1] PCI Local Bus Specification; Revision 3.0
[2] PCI Firmware Specification; Revision 3.2


> 
> 2. 
> 
> While looking into the details shared in the issue description, we observed
> that 
> 
> a) During system boot up, the sequence of allocation of BARs is 
> 
> BAR 2
> BAR 4 
> BAR 6  
> 
> But after the PCI remove + rescan, the sequence seems to be 
> 
> BAR 2
> BAR 6
> BAR 4
> 
> The above sequence results in resource allocation failure for BAR 4. If the
> sequence of 2,4 and 6 was followed, the chance of BAR 4 failure is reduced.
> Initially i thought that the BAR's are being allocated from high to low,
> meaning, in the sequence of 6, 4, 2, but that does not seem to be the case.
> It is not high to low. Any thoughts here ? From the issue description, it
> seems like this is not the case with Cent OS 6.5 where sequence of resource
> allocation is same in both scenarios (system boot as well as PCI remove +
> rescan). I understand that there is a lot of change that is expected because
> RHEL 7 has a much newer 3.10 kernel. But want to clarify this aspect as well.

Yes, I noticed both also.  The kernel's internals in this specific area are always trying to improve.  Device configuration - I/O Port space and MMIO - are very contentious resources.  As such, there has been, and likely continue to be, effort to try and maximize the effectiveness of allocating from such a constrained pool of resources.

>  
> > Another viable solution would be to increase the MMIO ranges available for
> > devices.  In this issue's 'dmesg' log one can see that there are three
> > instances of resources not being able to be allocated for devices due to
> one
> > or the other (Expansion ROM BAR not being set to '0', or MMIO range of
> > immediately preceding upstream bridge being too small to accommodate the
> > device's needs as is outlined in the analysis of Comment #13).
> 
> This might fix the problem, but not sure if increasing the MMIO range is a
> valid expectation to have from BIOS/firmware because, by increasing the MMIO
> range, we are trying to allocate/accomodate resource for Exp ROM BAR's as
> well, which does not seem to be required as per the section 3.5 of the spec
> when Exp ROM BAR enable bit is unset. The OS seems to be enabling every
> device Exp ROM irrespective of whether the BIOS/firmware configured it or
> not.

I think this was covered above noting that an Exp ROM BAR's 'enable' bit is enabling/disabling "Exp ROM BAR address decoding".  BIOS/firmware can eliminate the additional MMIO constraint with respect to Exp ROM(s) by having the Exp ROM BAR's "sizing" mechanism return 0.
Comment 5 Lee, Chun-Yi 2016-10-21 01:30:24 UTC
Thanks for Myron created this bug to collect information.

(In reply to Myron Stowe from comment #0)
> Created attachment 188211 [details]
> expansion_rom_dmesg.txt (Expansion ROM BARs not programmed by BIOS)
> 
[...snip]
> 
> The kernel then "sizes" the BAR to see how much address space that BAR
> requires - in this case we see BAR 6 of 01:00.0 needs 1 MB of contiguous
> space - and subsequently tries to work around the BIOS' failure, attempting
> to find available, currently unused, resource space that meets all the
> requirements which is where the "no space for" message comes from.
> 
> There is no contiguous space that meets all the requirements (subset,
> alignment, type, ...) available which is fairly easy to see here; there was
> only a 1 MB memory aperture provided by the PCI-to-PCI bridge device to
> begin with and the 01:00.0 device consumed subsets of that for BARs 1 and 3
> so there is no way 1 MB remains free to satisfy BAR 6's needs.  And so the
> kernel outputs the "failed to assign" message.
> 

Just want to grab more input to clear my thinking...

In the above case, in case of the bridge windows size doesn't enough, if the device driver really needs expansion ROM (BAR 6), then the kernel should re-program and re-allocate the windows on upstream PCI-to-PCI bridges to make sure there have enough free bridge windows size for assigning "BAR 6" on issue device.

Is it a right direction to work around this BIOS problem?

Thanks

Note You need to log in before you can comment on or make changes to this bug.