Created attachment 188211 [details] expansion_rom_dmesg.txt (Expansion ROM BARs not programmed by BIOS) I've encountered numerous bugzilla reports related to platform BIOS' not programming valid values into a PCI device's Type 0 Configuration space "Expansion ROM Base Address" field (a.k.a. Expansion ROM BAR). The main observed consequence being 'dmesg' entries like the following that get customers excited enough to file reports against the kernel. pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window After I've provided an analysis similar to [1] (full 'dmesg' log is attached as 'expansion_rom_dmesg.txt'), explaining the issue, the respective BIOS response (teams from two of the major vendors) is typically: "The OS has no business touching the Expansion ROM BARs and it provides no value to the equation here. The Expansion ROM BAR is only useful in pre-boot for the BIOS to get boot code from a device." This scenario has occurred enough times now that I'd like to attempt to "raise the bar" and invite a technically merit based discussion concerning this topic - via a public forum that is archived and provides a source of reference for use upon future occurrences - and see if a consensus can be reached between the various vendor's BIOS engineers and kernel engineers. A little more background context - The kernel expects device Expansion ROM BARs to be programmed with valid values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the device’s expansion ROM address space is disabled). This seems to be the main contention point with said BIOS engineers. If an Expansion ROM BAR is not programmed, the kernel will attempt to find available resources and, if successful, program it. As this occurs various 'dmesg' entries related to kernel's actions are output. Note that for devices that share decoders between the Expansion ROM BAR and other BARs the firmware (probably) should not enable the Expansion ROM BAR at hand-off to the operating system (see the last paragraph of the PCI Firmware Specification, Rev 3.2, Section 3.5 "Device State at Firmware/Operating System Handoff"). There is a kernel boot parameter, pci=norom, that is intended to disable the kernel's resource assignment actions for Expansion ROMs that do not already have BIOS assigned address ranges. Note however, if I remember correctly, that this only works if the Expansion ROM BAR is set to "0" by the BIOS before hand-off. [1] Annotated 'dmesg' log concerning Expansion ROM BARs not setup by BIOS The "can't claim" messages of interest are: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window The PCI devices of interest are a device at PCI Bus 1, Device 0, Function 0 (01:00.0) and another device at PCI Bus 4, Device 3, Function 0 (04:03.0). The "root bridge" that leads to PCI buses 1 and 4 - the buses of interest - is "PCI0" and its I/O Port space and Memory Mapped I/O (MMIO) space are: ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [bus 00-fe] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfeafffff] It's helpful to gather up all the resource related information pertaining to the devices of interest in one place. Concentrating on the PCI-to-PCI bridges and individual PCI devices that lead to 01:00.0, the first device exhibiting the "can't claim" message (everything that is consuming resources on PCI bus 0 and PCI bus 1): pci 0000:00:1a.0: [8086:1c2d] type 00 class 0x0c0320 pci 0000:00:1a.0: reg 0x10: [mem 0xc1305000-0xc13053ff] pci 0000:00:1d.0: [8086:1c26] type 00 class 0x0c0320 pci 0000:00:1d.0: reg 0x10: [mem 0xc1304000-0xc13043ff] pci 0000:00:1f.2: [8086:1c00] type 00 class 0x01018f pci 0000:00:1f.2: reg 0x10: [io 0x3078-0x307f] pci 0000:00:1f.2: reg 0x14: [io 0x308c-0x308f] pci 0000:00:1f.2: reg 0x18: [io 0x3070-0x3077] pci 0000:00:1f.2: reg 0x1c: [io 0x3088-0x308b] pci 0000:00:1f.2: reg 0x20: [io 0x3050-0x305f] pci 0000:00:1f.2: reg 0x24: [io 0x3040-0x304f] pci 0000:00:1f.3: [8086:1c22] type 00 class 0x0c0500 pci 0000:00:1f.3: reg 0x10: [mem 0xc1302000-0xc13020ff 64bit] pci 0000:00:1f.3: reg 0x20: [io 0x3000-0x301f] pci 0000:00:1f.5: [8086:1c08] type 00 class 0x010185 pci 0000:00:1f.5: reg 0x10: [io 0x3068-0x306f] pci 0000:00:1f.5: reg 0x14: [io 0x3084-0x3087] pci 0000:00:1f.5: reg 0x18: [io 0x3060-0x3067] pci 0000:00:1f.5: reg 0x1c: [io 0x3080-0x3083] pci 0000:00:1f.5: reg 0x20: [io 0x3030-0x303f] pci 0000:00:1f.5: reg 0x24: [io 0x3020-0x302f] pci 0000:00:01.0: PCI bridge to [bus 01] pci 0000:00:01.0: bridge window [io 0x2000-0x2fff] pci 0000:00:01.0: bridge window [mem 0xc1200000-0xc12fffff] pci 0000:01:00.0: [1000:0072] type 00 class 0x010700 [1000:0072] - LSI (Symbios) Logic : SAS2008 PCIe Fusion-MPT SAS-2 pci 0000:01:00.0: reg 0x10: [io 0x2000-0x20ff] pci 0000:01:00.0: reg 0x14: [mem 0xc1240000-0xc124ffff 64bit] pci 0000:01:00.0: reg 0x1c: [mem 0xc1200000-0xc123ffff 64bit] x pci 0000:01:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref] The PCI-to-PCI bridge device for Bus 0 to Bus 1 only has one memory space apeture ("bridge window") active - [mem 0xc1200000-0xc12fffff]. This must be a subset of one of the "root bus" memory resources and looking at those, it is. The target device - 01:00.0; an 'mpt2sas' device - consumes three memory ranges. These correspond to the device's BAR 1 and 2 (64 bit addresses consume two BAR registers), BAR 3 and 4, and the "Expansion ROM Base Address (a.k.a. BAR 6). These also must be a subset of both the corresponding root bus resources and all PCI-to-PCI bridge devices in the PCI hiearchy leading to the device itself. Looking at them we see that the first two satisfy the requirement but the third - [mem 0xfff00000-0xffffffff pref] - does not! It's because the "Expansion ROM Base Address" register (a.k.a. BAR 6) does not adhear to the subset requirement(s) that the kernel later outputs: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window pci 0000:01:00.0: BAR 6: no space for [mem size 0x00100000 pref] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00100000 pref] The "can't claim" message is the kernel alerting us that the BIOS has not correctly set up resources that fulfil all the requirements (subset, alignment, type, ...). The kernel then "sizes" the BAR to see how much address space that BAR requires - in this case we see BAR 6 of 01:00.0 needs 1 MB of contiguous space - and subsequently tries to work around the BIOS' failure, attempting to find available, currently unused, resource space that meets all the requirements which is where the "no space for" message comes from. There is no contiguous space that meets all the requirements (subset, alignment, type, ...) available which is fairly easy to see here; there was only a 1 MB memory aperture provided by the PCI-to-PCI bridge device to begin with and the 01:00.0 device consumed subsets of that for BARs 1 and 3 so there is no way 1 MB remains free to satisfy BAR 6's needs. And so the kernel outputs the "failed to assign" message. In a very similar scenario, the 04:03.0 device also has not been properly set up by BIOS ([mem 0xffff0000-0xffffffff pref]). The difference in this case is that there were enough available resources left to satisfy all the subset, alignment, type, ..., requirements and thus the kernel was able to allocate from such and re-program the device's BAR 6 ([mem 0xc1010000-0xc101ffff pref]) so that the device can function correctly. pci 0000:00:1e.0: PCI bridge to [bus 04] (subtractive decode) pci 0000:00:1e.0: bridge window [mem 0xc0800000-0xc10fffff] pci 0000:00:1e.0: bridge window [mem 0xc0000000-0xc07fffff 64bit pref] pci 0000:00:1e.0: bridge window [io 0x0000-0x0cf7] (subtractive decode) pci 0000:00:1e.0: bridge window [io 0x0d00-0xffff] (subtractive decode) pci 0000:00:1e.0: bridge window [mem 0x000a0000-0x000bffff] (subtract d) pci 0000:00:1e.0: bridge window [mem 0xc0000000-0xfeafffff] (subtract d) pci 0000:04:03.0: [102b:0532] type 00 class 0x030000 [102b:0532] - Matrox : MGS G200eW WPCM450 (Graphics) pci 0000:04:03.0: reg 0x10: [mem 0xc0000000-0xc07fffff pref] pci 0000:04:03.0: reg 0x14: [mem 0xc1000000-0xc1003fff] pci 0000:04:03.0: reg 0x18: [mem 0xc0800000-0xc0ffffff] x pci 0000:04:03.0: reg 0x30: [mem 0xffff0000-0xffffffff pref] pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window pci 0000:04:03.0: BAR 6: assigned [mem 0xc1010000-0xc101ffff pref]
Created attachment 231711 [details] expansion_rom_dmesg2.txt
Another recent example of Expansion ROM related resource issues... Researching device 0000:04:00.3 as it's the device with the issue (and all other devices/functions under PCI bus 04 due to possible competing resource needs). Analysis from v4.7.0 kernel run 'dmesg' log with comments interspersed (dmesg log attached to BZ as "expansion_rom_dmesg2.txt") ... This platform has two PCI Root Bridges. Limiting analysis to the first Root Bridge handling PCI buses 0x00 through 0x7e as it contains the PCI bus in question - bus 04. ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7e]) PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x0000-0x03bb window] pci_bus 0000:00: root bus resource [io 0x03bc-0x03df window] pci_bus 0000:00: root bus resource [io 0x03e0-0x0cf7 window] pci_bus 0000:00: root bus resource [io 0x1000-0x7fff window] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] pci_bus 0000:00: root bus resource [mem 0x90000000-0xc7ffbfff window] pci_bus 0000:00: root bus resource [mem 0x30000000000-0x33fffffffff window] CPU addresses falling into the above resource ranges will get intercepted by the host controller and converted into PCI bus transactions. Looking further into the log we find the set of resource ranges (PCI-to-PCI bridge apertures) corresponding to PCI bus 04. pci 0000:00:02.0: PCI bridge to [bus 04] pci 0000:00:02.0: bridge window [io 0x2000-0x2fff] pci 0000:00:02.0: bridge window [mem 0x92000000-0x940fffff] 33M The following shows what the platforms BIOS programmed into the BARs of device(s) under PCI bus 04. pci 0000:04:00.0: [1924:0923] type 00 class 0x020000 pci 0000:04:00.0: reg 0x10: [io 0x2300-0x23ff] pci 0000:04:00.0: reg 0x18: [mem 0x93800000-0x93ffffff 64bit] BAR2 pci 0000:04:00.0: reg 0x20: [mem 0x9400c000-0x9400ffff 64bit] BAR4 pci 0000:04:00.0: reg 0x30: [mem 0xfffc0000-0xffffffff pref] E ROM pci 0000:04:00.1: [1924:0923] type 00 class 0x020000 pci 0000:04:00.1: reg 0x10: [io 0x2200-0x22ff] pci 0000:04:00.1: reg 0x18: [mem 0x93000000-0x937fffff 64bit] pci 0000:04:00.1: reg 0x20: [mem 0x94008000-0x9400bfff 64bit] pci 0000:04:00.1: reg 0x30: [mem 0xfffc0000-0xffffffff pref] pci 0000:04:00.2: [1924:0923] type 00 class 0x020000 pci 0000:04:00.2: reg 0x10: [io 0x2100-0x21ff] pci 0000:04:00.2: reg 0x18: [mem 0x92800000-0x92ffffff 64bit] pci 0000:04:00.2: reg 0x20: [mem 0x94004000-0x94007fff 64bit] pci 0000:04:00.2: reg 0x30: [mem 0xfffc0000-0xffffffff pref] pci 0000:04:00.3: [1924:0923] type 00 class 0x020000 pci 0000:04:00.3: reg 0x10: [io 0x2000-0x20ff] pci 0000:04:00.3: reg 0x18: [mem 0x92000000-0x927fffff 64bit] 8M pci 0000:04:00.3: reg 0x20: [mem 0x94000000-0x94003fff 64bit] 16K pci 0000:04:00.3: reg 0x30: [mem 0xfffc0000-0xffffffff pref] 256K It's already obvious that the 33M of MMIO space that the PCI-to-PCI bridge leading to PCI bus 04 provides (0x92000000-0x940fffff) is not enough space to fully satisfy the MMIO specific addressing needs of all device's BARs below it - the 4 combined ports - totaling (8M + 16K + 256K) *4) = 33M + 64K. This is _without_ taking into account any alignment constraints that likely would increase the buses needed aperture range even further. Note that the values programmed into the device's Expansion ROM BARs do not fit within any of its immediately upstream bridge's MMIO related apertures. pci 0000:04:00.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window pci 0000:04:00.1: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window pci 0000:04:00.2: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window pci 0000:04:00.3: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window The kernel notices this and attempts to allocate appropriate space for them from any remaining, available, MMIO space that meets all the alignment constraints and such. pci 0000:04:00.0: BAR 6: assigned [mem 0x94040000-0x9407ffff pref] pci 0000:04:00.1: BAR 6: assigned [mem 0x94080000-0x940bffff pref] pci 0000:04:00.2: BAR 6: assigned [mem 0x940c0000-0x940fffff pref] pci 0000:04:00.3: BAR 6: no space for [mem size 0x00040000 pref] pci 0000:04:00.3: BAR 6: failed to assign [mem size 0x00040000 pref] The kernel was able to satisfy the first three ports MMIO needs but was _not_ able to for the last port - there is no remaining available addressing space within the range to satisfy its needs! At this point the 0000:04:00.3 device just happens to work by luck due to the fact that the unmet resource needs correspond to its Expansion ROM BAR [1]. Next a "user" initiates a PCIe hot-unplug of the sfn7x22f device, the bus is re-scanned and as a result, BAR4 of all 4 of the device's functions fail getting their appropriate resources allocated. pci 0000:00:02.0: PCI bridge to [bus 04] pci 0000:00:02.0: bridge window [io 0x2000-0x2fff] pci 0000:00:02.0: bridge window [mem 0x92000000-0x940fffff] 33M pci 0000:04:00.0: BAR 2: assigned [mem 0x92000000-0x927fffff 64bit] pci 0000:04:00.1: BAR 2: assigned [mem 0x92800000-0x92ffffff 64bit] pci 0000:04:00.2: BAR 2: assigned [mem 0x93000000-0x937fffff 64bit] pci 0000:04:00.3: BAR 2: assigned [mem 0x93800000-0x93ffffff 64bit] pci 0000:04:00.0: BAR 6: assigned [mem 0x94000000-0x9403ffff pref] pci 0000:04:00.1: BAR 6: assigned [mem 0x94040000-0x9407ffff pref] pci 0000:04:00.2: BAR 6: assigned [mem 0x94080000-0x940bffff pref] pci 0000:04:00.3: BAR 6: assigned [mem 0x940c0000-0x940fffff pref] At this point -all- available MMIO resource space has been consumed. For the more visually inclined (if it's not already obvious). There's probably an easier way to visualize the exhaustion but here is my lame attempt: PCI Bridge 04's MMIO aperture resource range totals 33M ( 0x92000000-0x940fffff ). The first line below denotes the 33M in 1M increments (chunks). The second line denotes the addressing range; specifically bytes 7 and 6 withing the resource's range ( 0x9--xxxxx ). The last line denotes the port (0 through 3) consuming that portion of the resource's range. 1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233 33M 202122232425262728292a2b2c2d232f303132333435363738393a3b3c3d3e3f40 [76] 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3-- The last 1M is consumed by a smaller granularity so expanding the above conceptualization to a finer level. 1M of resource range ( 94000000-940fffff ) visualized in 32K increments ( bytes 5 and 4; 0x940--xxx ). 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132 1M 0008101820283038404850586068707880889098a0a8b0b8c0c8d0d8e0e8f0f8 [54] 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 and the remaining needed resource allocation attempts are going to fail. pci 0000:04:00.0: BAR 4: no space for [mem size 0x00004000 64bit] pci 0000:04:00.0: BAR 4: failed to assign [mem size 0x00004000 64bit] pci 0000:04:00.1: BAR 4: no space for [mem size 0x00004000 64bit] pci 0000:04:00.1: BAR 4: failed to assign [mem size 0x00004000 64bit] pci 0000:04:00.2: BAR 4: no space for [mem size 0x00004000 64bit] pci 0000:04:00.2: BAR 4: failed to assign [mem size 0x00004000 64bit] pci 0000:04:00.3: BAR 4: no space for [mem size 0x00004000 64bit] pci 0000:04:00.3: BAR 4: failed to assign [mem size 0x00004000 64bit] pci 0000:04:00.0: BAR 0: assigned [io 0x2000-0x20ff] pci 0000:04:00.1: BAR 0: assigned [io 0x2400-0x24ff] pci 0000:04:00.2: BAR 0: assigned [io 0x2800-0x28ff] pci 0000:04:00.3: BAR 0: assigned [io 0x2c00-0x2cff] At this point none of the four functions (ports) - 0000:04:00.{0..3} were able to get their necessary resource needs met and thus the device's functions (NIC ports) do not work. In fact, I would expect the driver's call into the kernel's PCI core 'pci_enable_device()' routine to fail [1]. Conclusion ... The root cause of the issue(s) [2] is the platform's BIOS not providing enough, and setting up properly, resource needs that the device requires - specifically MMIO addressing space related resources. Most notably conspicuous is the device's Expansion ROM BAR(s) as they are improperly programmed - the initial BIOS programmed values do not fall within any valid resource ranges of the immediately upstream PCI-to-PCI Bridge's MMIO apertures. As for "symptomatic" solutions (just a band-aid to treat the symptom and not addressing the root cause) ... Short of getting the platform's BIOS updated to appropriately account for the device's total needs, a "compromized" solution has been to get them to program device's Expansion ROM BAR values with '0'. This has been done in the past so why the platform's BIOS engineers have chosen not to do that again in this instance is "out of character" and concerning. If, and only if, a device's Expansion ROM BAR is programmed with '0', then adding the "norom" kernel boot parameter will cause the kernel to ignore, and not attempt to assign resources to, such. Short of that, drivers can use, and check the return value of, pci_enable_rom(). That should fail if it's unassigned. Looking at it, it only fails if 'flags == 0' so I'm not sure that catches all cases of it being unassigned. [1] For a device's normal BARs - the BARs corresponding to the PCI specification's "Base Address 0 through 5" Type 0 configuration header space entries - that are initially ill programmed and the kernel can not subsequently assign appropriate resources for such, then the kernel's PCI core subsystem's 'pci_enable_device()' routine should fail. [2] While the analysis only covers one specific device, the 'dmesg' log shows that the same base root cause occurs in at least two additional instances.
A discussion on mailing list for anyone interests on this topic: [RFC] PCI: Unassigned Expansion ROM BARs http://www.gossamer-threads.com/lists/linux/kernel/2265600
> Yes, we discussed the Exp ROM BAR value not being 0 with our BIOS team. > Their initial thought is that there is no code added explicitly to set the > ROM BAR to a non zero value. We are trying to find out from which generation > systems, this value started getting set to non zero value. Still looking > into this. > > Also, want to mention here that xxxx system does not support PCI hotplug. > Only hotplug scenario supported on this system is - hot plugging PCIE SSD > disks. > > While looking into the details, we have the following thoughts and queries. > It would be helpful to know your thoughts on them. > > 1. On Expansion ROM BAR not being set to 0: > > While looking into this detail, we referred to the 'section 3.5 of the PCI > Firmware Specification Rev 3.2'. Section 3.5 seems to indicate that the > operating system needs to know if the Exp ROM has been configured before > handing over to OS and the way to check this is to check the Exp ROM BAR > enable bit. If this bit is unset, the ROM BAR is not configured and OS needs > to consider the BAR content as invalid by default. > > We went though the below references you provided - > > https://lkml.org/lkml/2015/9/23/737 > https://bugzilla.kernel.org/show_bug.cgi?id=104931 > > They seem to explain the need for the OS to access Exp ROM BAR and various > use cases, but it is not clear why the OS ignores the value of Exp ROM BAR > enable bit. Any insight/history/context you can share on this will be > helpful. > > The spec also seems to suggest that regualr BARs and Exp ROM BAR should not > be enabled together. This may not be an issue as modern devices do not seem > to share address decoders for regualar BARs and Exp ROM BAR. But wanted to > mention it for completeness. Thanks for getting system firmware feedback and tell them to feel free to continue the discussion if they have other questions or views. Since a device's hardware implementation _may_ share (BAR) address decoding between a regular BAR and an Exp ROM BAR it basically becomes a requirement for the platform firmware to hand off the device to the OS with the Exp ROM BAR's 'enable' bit disabled. The specification [1] - section 6.2.5.2 - states: "Bit 0 in the register is used to control whether or not the device accepts accesses to its expansion ROM." The key take away is to realize that the Exp Rom BAR's 'enable' bit is enabling/disabling "address decoding" and _not_ enabling/disabling the Exp ROM BAR itself in any way (i.e. its _not_ an indicator as to whether or not the Exp ROM exists [is enabled] or not [is disabled]). That leads to the logical question: "Then how can the device signal whether or not it supports an Exp ROM?" There seems to be a couple of possibilities. The device can return all 0's when attempting to "size" the Exp ROM BAR, indicating there is no Exp ROM content and thus no MMIO resource requirements need be allocated (this seems to be the consensus approach). Another possibility could be for the OS to attempt to read the Exp ROM content, looking for the proper header where the first two bytes are required to be 0x55 0xAA (this does not seem to be the correct approach as we've seen devices utilize this area for other, device specific, means). The former approach is what occurrs with most platforms and can be seen by manually attempting to "size" the Exp ROM BAR of two devices, one that does not have Exp ROM content and one that does: // A device without an Exp ROM % setpci -s 0f:00.0 30.l // read the BAR's programmed value 00000000 % setpci -s 0f:00.0 30.l=FFFFFFFE // "size" the BAR % setpci -s 0f:00.0 30.l 00000000 // A device with an Exp ROM (but currently 'disabled') % setpci -s 03:00.1 30.l // read the BAR's programmed value 00000000 % setpci -s 03:00.1 30.l=FFFFFFFE // "size" the BAR % setpci -s 03:00.1 30.l FFF80000 // .5 MB With respect to the observation concerning the PCI Firmware Specification [2]. Section 3.5 is concerned with, and focusing upon, device configuration - what's required and optional for system firmware to configure prior to OS handoff. Device configuration concerns whether or not the firmware has allocated system resources corresponding to the BAR and if so, programmed the BAR with the platform's address pertaining to such resources. Most current platforms configure the entire PCI hierarchy prior to handoff. The possibility of decoder sharing is also talked about: "Since not all devices may be configured prior to the operating system handoff, the operating system needs to know whether a specific BAR register has been configured by firmware. The operating system makes the determination by checking the I/O Enable, and Memory Enable bits in the device's command register, and Expansion ROM BAR enable bits. If the enable bit is set, then the corresponding resource register has been configured." Later the paragraph is repeated with an additional sentence at the end: "... If the enable bit is not set, the operating system cannot assume that the associated BAR register contains valid information." And then covered again in the last paragraph of the section. These paragraphs are discussing how the OS can determine whether or not the system firmware has configured the BAR. They are specifically concerned with how the OS can determine whether or not the BAR itself is valid or not (i.e. has the BAR been configured appropriately by system firmware or not). As we have already seen, the Exp ROM BAR enable bit is for enabling or disabling the BAR's decoding logic, not anything to do with whether or not Exp ROM content exists or not. The only mention concerning Exp ROM content itself is the very last sentence of the section and it just mentions validity, not enabling/disabling - "... and that the Expansion ROM BAR content is correct." [1] PCI Local Bus Specification; Revision 3.0 [2] PCI Firmware Specification; Revision 3.2 > > 2. > > While looking into the details shared in the issue description, we observed > that > > a) During system boot up, the sequence of allocation of BARs is > > BAR 2 > BAR 4 > BAR 6 > > But after the PCI remove + rescan, the sequence seems to be > > BAR 2 > BAR 6 > BAR 4 > > The above sequence results in resource allocation failure for BAR 4. If the > sequence of 2,4 and 6 was followed, the chance of BAR 4 failure is reduced. > Initially i thought that the BAR's are being allocated from high to low, > meaning, in the sequence of 6, 4, 2, but that does not seem to be the case. > It is not high to low. Any thoughts here ? From the issue description, it > seems like this is not the case with Cent OS 6.5 where sequence of resource > allocation is same in both scenarios (system boot as well as PCI remove + > rescan). I understand that there is a lot of change that is expected because > RHEL 7 has a much newer 3.10 kernel. But want to clarify this aspect as well. Yes, I noticed both also. The kernel's internals in this specific area are always trying to improve. Device configuration - I/O Port space and MMIO - are very contentious resources. As such, there has been, and likely continue to be, effort to try and maximize the effectiveness of allocating from such a constrained pool of resources. > > > Another viable solution would be to increase the MMIO ranges available for > > devices. In this issue's 'dmesg' log one can see that there are three > > instances of resources not being able to be allocated for devices due to > one > > or the other (Expansion ROM BAR not being set to '0', or MMIO range of > > immediately preceding upstream bridge being too small to accommodate the > > device's needs as is outlined in the analysis of Comment #13). > > This might fix the problem, but not sure if increasing the MMIO range is a > valid expectation to have from BIOS/firmware because, by increasing the MMIO > range, we are trying to allocate/accomodate resource for Exp ROM BAR's as > well, which does not seem to be required as per the section 3.5 of the spec > when Exp ROM BAR enable bit is unset. The OS seems to be enabling every > device Exp ROM irrespective of whether the BIOS/firmware configured it or > not. I think this was covered above noting that an Exp ROM BAR's 'enable' bit is enabling/disabling "Exp ROM BAR address decoding". BIOS/firmware can eliminate the additional MMIO constraint with respect to Exp ROM(s) by having the Exp ROM BAR's "sizing" mechanism return 0.
Thanks for Myron created this bug to collect information. (In reply to Myron Stowe from comment #0) > Created attachment 188211 [details] > expansion_rom_dmesg.txt (Expansion ROM BARs not programmed by BIOS) > [...snip] > > The kernel then "sizes" the BAR to see how much address space that BAR > requires - in this case we see BAR 6 of 01:00.0 needs 1 MB of contiguous > space - and subsequently tries to work around the BIOS' failure, attempting > to find available, currently unused, resource space that meets all the > requirements which is where the "no space for" message comes from. > > There is no contiguous space that meets all the requirements (subset, > alignment, type, ...) available which is fairly easy to see here; there was > only a 1 MB memory aperture provided by the PCI-to-PCI bridge device to > begin with and the 01:00.0 device consumed subsets of that for BARs 1 and 3 > so there is no way 1 MB remains free to satisfy BAR 6's needs. And so the > kernel outputs the "failed to assign" message. > Just want to grab more input to clear my thinking... In the above case, in case of the bridge windows size doesn't enough, if the device driver really needs expansion ROM (BAR 6), then the kernel should re-program and re-allocate the windows on upstream PCI-to-PCI bridges to make sure there have enough free bridge windows size for assigning "BAR 6" on issue device. Is it a right direction to work around this BIOS problem? Thanks