Bug 10461

Summary: PCI resource assignments fail due to poor allocation strategy
Product: Drivers Reporter: TJ (linux)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: high CC: alan, andi-bz, bjorn, lenb, mingo, steve, tglx, weiyang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: Windows Vista msinfo32 memory mapping
Comparison of Linux and Windows PCI range allocations
Four dmesg logs for different boot scenarios
full dmesg output of failure to allocate PCI resources
full dmesg output of failure to allocate PCI resources with pci=nocrs
full dmesg output of working setup
iomem output for working system
4 log files from v3.6rc4

Description TJ 2008-04-16 23:01:50 UTC
Latest working kernel version: None
Earliest failing kernel version: 2.6.22 (early kernels not checked)
Distribution: kernel.org, Ubuntu
Hardware Environment: Sony Vaio VGN-FE41Z (nVidia GeForce 7600, Intel 945PM northbridge)
Software Environment: kernel
Problem Description:

The laptop ships with 2GB RAM. I recently added 1GB to make a total of 3GB. This is the maximum that can be used due to the 32-bit i945PM chipset (top 1GB reserved for PCI memory-mapped I/O, etc.).

With Linux (x86_64) the integrated nVidia GeForce Go 7600 fails (nv or nvidia drivers) because its 64-bit BAR has been placed above the 4GB boundary, which if I understand correctly is not reachable since the i945PM chip-set can only address up to 4GB (32-bit address bus).

Somehow Windows Vista (32-bit) works correctly in all respects with 3GB RAM and its nVidia drivers so there must be a way to work around this.

Steps to reproduce: Boot with more than 2GB RAM installed.

=== 3GB RAM ===

[    0.316940] PCI: Cannot allocate resource region 9 of bridge 0000:00:01.0
[    0.317038] PCI: Cannot allocate resource region 1 of device 0000:01:00.0

[    0.000000] Linux version 2.6.25-rc9 (tj@hephaestion) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #3 SMP PREEMPT Thu Apr 17 02:06:09 BST 2008
[    0.000000] Command line: root=UUID=bb2c3a14-1588-4fb9-8411-71f114b568b4 ro debug
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000c2000 - 00000000000d0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bfe90000 (usable)
[    0.000000]  BIOS-e820: 00000000bfe90000 - 00000000bfe9a000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bfe9a000 - 00000000bff00000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bff00000 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed14000 - 00000000fed1a000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed90000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)


[   22.016654] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
[   22.016655] NVRM: BAR1 is 256M @ 0x00000000 (PCI:0001:00.0)
[   22.016658] NVRM: This is a 64-bit BAR mapped above 4GB by the system BIOS or
[   22.016659] NVRM: Linux kernel. The NVIDIA Linux graphics driver and other
[   22.016660] NVRM: system software do not currently support this configuration
[   22.016662] NVRM: reliably.
[   22.016668] nvidia: probe of 0000:01:00.0 failed with error -1
[   22.016687] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   22.016690] NVRM: None of the NVIDIA graphics adapters were initialized!


01:00.0 VGA compatible controller: nVidia Corporation G70 [GeForce Go 7600] (rev a1) (prog-if 00 [VGA])
	Subsystem: Sony Corporation Unknown device 81ef
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at d1000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at 100000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at d0000000 (64-bit, non-prefetchable) [size=16M]
	Region 5: I/O ports at 2000 [size=128]
	Capabilities: [60] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s <256ns, L1 <4us
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x16, ASPM L0s L1, Port 0
		Link: Latency L0s <256ns, L1 <4us
		Link: ASPM L0s L1 Enabled RCB 128 bytes CommClk+ ExtSynch-
		Link: Speed 2.5Gb/s, Width x16
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting

==== 2GB RAM ====

[    0.000000] Linux version 2.6.25-rc9 (tj@hephaestion) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #3 SMP PREEMPT Thu Apr 17 02:06:09 BST 2008
[    0.000000] Command line: root=UUID=bb2c3a14-1588-4fb9-8411-71f114b568b4 ro 
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000c2000 - 00000000000d0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000007fe90000 (usable)
[    0.000000]  BIOS-e820: 000000007fe90000 - 000000007fe9a000 (ACPI data)
[    0.000000]  BIOS-e820: 000000007fe9a000 - 000000007ff00000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007ff00000 - 0000000080000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed14000 - 00000000fed1a000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed90000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)

01:00.0 VGA compatible controller: nVidia Corporation G70 [GeForce Go 7600] (rev a1) (prog-if 00 [VGA])
	Subsystem: Sony Corporation Unknown device 81ef
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at d1000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at b0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at d0000000 (64-bit, non-prefetchable) [size=16M]
	Region 5: I/O ports at 2000 [size=128]
	Capabilities: [60] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s <256ns, L1 <4us
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x16, ASPM L0s L1, Port 0
		Link: Latency L0s <256ns, L1 <4us
		Link: ASPM L0s L1 Enabled RCB 128 bytes CommClk+ ExtSynch-
		Link: Speed 2.5Gb/s, Width x16
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting
Comment 1 TJ 2008-04-17 18:20:01 UTC
Here are the same set of reports from a 32-bit x86 build of 2.6.25-rc9 with £GB of RAM installed:

[    0.320738] PCI: Cannot allocate resource region 9 of bridge 0000:00:01.0
[    0.320834] PCI: Cannot allocate resource region 1 of device 0000:01:00.0

[    0.000000] Linux version 2.6.25-rc9-32bit (tj@hephaestion) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Fri Apr 18 01:21:54 BST 2008
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000c2000 - 00000000000d0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bfe90000 (usable)
[    0.000000]  BIOS-e820: 00000000bfe90000 - 00000000bfe9a000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bfe9a000 - 00000000bff00000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bff00000 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed00000 - 00000000fed00400 (reserved)
[    0.000000]  BIOS-e820: 00000000fed14000 - 00000000fed1a000 (reserved)
[    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed90000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
[    0.000000] 2174MB HIGHMEM available.
[    0.000000] 896MB LOWMEM available.

01:00.0 0300: 10de:0398 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: 104d:81ef
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 7
	Region 0: Memory at d1000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at <ignored> (64-bit, prefetchable)
	Region 3: Memory at d0000000 (64-bit, non-prefetchable) [size=16M]
	Region 5: I/O ports at 2000 [size=128]
	Capabilities: [60] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s <256ns, L1 <4us
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x16, ASPM L0s L1, Port 0
		Link: Latency L0s <256ns, L1 <4us
		Link: ASPM L0s L1 Enabled RCB 128 bytes CommClk+ ExtSynch-
		Link: Speed 2.5Gb/s, Width x16
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting
Comment 2 TJ 2008-04-17 18:27:10 UTC
Created attachment 15796 [details]
Windows Vista msinfo32 memory mapping

And here's a snippet showing the Nvidia PCI BAR below 4GB in a Windows Vista msinfo32 report:

0xC0000000-0xFEBFFFFF	PCI bus	OK	
0xC0000000-0xFEBFFFFF	Mobile Intel(R) 945GM/GU/PM/GMS/940GML/943GML and Intel(R) 945GT Express PCI Express Root Port - 27A1	OK	
0xC0000000-0xFEBFFFFF	NVIDIA GeForce Go 7600	OK	
0xFED40000-0xFED44FFF	PCI bus	OK	
0xD0000000-0xD1FFFFFF	Mobile Intel(R) 945GM/GU/PM/GMS/940GML/943GML and Intel(R) 945GT Express PCI Express Root Port - 27A1	OK	
0xD0000000-0xD1FFFFFF	NVIDIA GeForce Go 7600	OK	
0xD1000000-0xD1FFFFFF	NVIDIA GeForce Go 7600	OK
Comment 3 TJ 2008-04-17 19:06:28 UTC
More info from 2.6.25-rc9 x86_64 with 2GB installed:

$ DEV="/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/"
$ echo "$(cat $DEV/vendor):$(cat $DEV/device)"

0x10de:0x0398

$ ls -1 $DEV/resource*
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource0
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource1
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource3
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource5

$ cat $DEV/resource
0x00000000d1000000 0x00000000d1ffffff 0x0000000000000200
0x00000000b0000000 0x00000000bfffffff 0x000000000000120c
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000d0000000 0x00000000d0ffffff 0x0000000000000204
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000002000 0x000000000000207f 0x0000000000000101
0x0000000000000000 0x0000000000000000 0x0000000000000002
Comment 4 TJ 2008-04-17 20:14:49 UTC
Investigating how Windows Vista might differ from Linux in PCI device configuration I discovered an MS presentation "PCI Express Update for Windows Longhorn" that in its "Memory Resource Assignment" slide (20) says:

* Windows Server 2003
  * A PCI device with BIOS configuration above 4GB, is always assigned resources from below the 4GB region
  * If no range below 4GB region is available, then the device is assigned a range above the 4GB boundary
    * This holds good even if the Windows OS cannot physically access the address range above 4GB

* Longhorn **Vista**
  * A memory address range above 4GB is available for PCI devices only if that range is physically accessible by the OS
  * Within this constraint, Windows will always attempt to respect the BIOS configuration on a PCI device.
  * A PCI device with 64-bit BARs and no BIOS configuration is still assigned a memory address range above 4GB as available on the parent

The key statement there I noticed being "A memory address range above 4GB is available for PCI devices only if that range is physically accessible by the OS"


Then on slide 22, "BIOS Design Recommendations":

* The ACPI BIOS should describe the memory range above 4GB in the _CRS and/or _PRS of the PCI root bus
  * This is described using the QWord Address Space descriptor as defined in the ACPI spec (Section 6.4.3.5.1).
  * Windows will evaluate  the _SRS method with a buffer in the same format as the _CRS/_PRS
* The memory range for the PCI root bus should not overlap with the physical RAM or some other range
* The memory range is required to be physically accessible by the processor/chipset
* The ACPI BIOS should configure the resources on the PCI devices after evaluating the _OSI method to account for the Server 2003 behavior
* The ACPI BIOS should return an appropriate buffer in the evaluation of the _CRS/_PRS on a PCI root bus to account for the Server 2003 behavior

What caught my eye there is "The memory range is required to be physically accessible by the processor/chipset" because it seems that is the issue in this bug.
Comment 5 TJ 2008-04-17 22:32:30 UTC
Created attachment 15797 [details]
Comparison of Linux and Windows PCI range allocations

I've done an analysis of the PCI allocations between Linux with 2GB, 3GB, and Windows 3GB.

The most obvious thing is that Linux allocates small PCI root port ranges in prime locations.

As I understand it the range allocated to a device must be on a same-size boundary - if the range is 256MB then it should be allocated on a 256MB boundary too. Windows uses the first 256MB of the top 1GB PCI range, 0xC000000. Linux puts several small ranges in that space. The UHCI and other devices start at 0xD2304000 so the 2nd 256MB window is not available.

The allocation of small ranges in the 0xC0000000-0xCFFFFFFF space forces the 256MB of the GFX card to be placed at 0xB0000000-0xBFFFFFFF when there is 2GB of RAM installed (RAM ends at 0x7FFFFFFF).
With 3GB of RAM installed the end address of RAM is 0xBFFFFFFF. That forces the GFX to be allocated a different range and as the 0xC0000000+ range has those small allocations the only place left is above the 4GB boundary.

From examining the Windows allocation addresses with reference to the PCI specifications it is clear that Windows is using  the recommended *subtractive decode* to allocate ranges (in other words, start at the top of address space and allocate working down) whereas it looks like Linux is using a bottom-up strategy except for one device, the "Mobile PCI Bridge [8086:2448]"

I'm attaching the range comparison table.
Comment 6 TJ 2008-04-18 11:03:23 UTC
After a few debug printk() runs watching the allocation strategy I wondered why the PCI resources region doesn't start at the beginning of the largest gap:

[    0.000000] Allocating PCI resources starting at c2000000 (gap: c0000000:20000000)

since, when 3GB RAM is installed, the gap starts at 0xC0000000 but the allocation region begins at 0xC2000000.

The other issue is that only the largest gap seems to be used for allocations, which explains why smaller allocations for other devices effectively choke off use of the range in 32-bit address space.

In contrast, from looking at the addresses in the allocation comparison with Windows, it looks as if Windows uses *all* gaps for allocation rather than just the largest. It is noticeable that Windows allocates smaller regions in the gaps between the various 'high' e820 reservations.

In looking for the origins of the gap-rounding code I eventually found commit f0eca9626c6becb6fc56106b2e4287c6c784af3d from 2005-09-09:

[PATCH] Update PCI IOMEM allocation start
    
    This fixes the problem with "Averatec 6240 pcmcia_socket0: unable to
    apply power", which was due to the CardBus IOMEM register region being
    allocated at an address that was actually inside the RAM window that had
    been reserved for video frame-buffers in an UMA setup.

This introduces a simple 'rounding up' algorithm to create a 'gap' between top of system RAM and beginning of PCI IOMEM as a guard against unintentional over-writes.

The algorithm used was suggested in an example by Linus Torvalds with some provisos but was adopted verbatim in the patch for the Averatec bug. In his email, Linus went on to say:

"The other alternative is to make PCI allocations generally start at the
high range of the allowable - judging by the lspci listings I've seem from
people under Windows, that seems to be what Windows does, which might be a
good idea (ie the closer we match windows allocation patterns, the more
likely we're to not hit some unmarked region - because windows testing
would have hit it too)."

See: http://lists.infradead.org/pipermail/linux-pcmcia/2005-September/002625.html

That comment reflects my findings in dealing with this bug. Looking at the bug there are four issues:

1. No 256MB region on a 256MB boundary available for the GFX IOMEM in the single largest PCI IOMEM region.

2. The first available 256MB region on a 256MB boundary is unusable because pci_mem_start is being 'rounded up' to gap_start + round.

3. Multiple gaps higher in the address space are left unused whereas Windows uses them for smaller allocations thus keeping the largest gap free for the devices with large requirements.

4. Resources aren't being allocated top-down (subtractive decode) as recommended in PCI specs and Intel chipset datasheets, and done by Windows.

If [3] was implemented in addition to [4] the smaller allocations would be at the top of the 32-bit address space much like Windows.

Implementing [3] and [4] together should avoid the need for commit f0eca962 (Cardbus IOMEM in shared video RAM space) since the Cardbus IOMEM would be in a 'high' gap (as it would be with Windws).

Dropping commit f0eca962 would solve [2] since the GFX could allocate 256MB on the 256MB boundary at 0xC0000000 in the largest gap.

There might be an issue if a system has an undeclared shared video memory region *and* another PCI device that needs a large allocation.

Also, Linus' mention of maintaining an unused gap between top-of-RAM and bottom-of-PCI-IOMEM needs to be considered. Would implementation of [2] and [3] negate the need for it? 
Windows doesn't maintain a similar gap - is there a reason that Linux should?
Comment 7 Bjorn Helgaas 2011-05-27 16:36:34 UTC
Is this still a problem, or should we close it?  If we still need to work on it, a dmesg log from a current kernel, e.g., 2.6.39, would be helpful.
Comment 8 TJ 2011-05-28 15:49:56 UTC
Yes it is. Thanks for visiting Bjorn, I tried to email you on this topic after seeing your commits/reverts but your old email was no longer valid. I've resent my email to your new address.
Comment 9 TJ 2011-05-29 10:59:17 UTC
Created attachment 59962 [details]
Four dmesg logs for different boot scenarios

Four 2.6.38 dmesg logs attached, all with "use_crs":

3GB dock
3GB nodock
2GB dock
2GB nodock
Comment 10 Bjorn Helgaas 2011-06-14 23:59:37 UTC
Oops, the option you need is "pci=use_crs", not "use_crs".

  [mem 0xc0000000-0xfebfffff]    <== host bridge window from _CRS
    [mem 0xc0000000-0xdfffffff]  <== available per default PCI gap
    [mem 0xe0000000-0xefffffff]  <== MMCONFIG space, not available
    [mem 0xf0000000-0xfebfffff]  <== available only with "pci=use_crs"

That will open up the last region, which will help, but it may not be enough.  Give it a try and attach just the 3GB dmesg if it still doesn't work.
Comment 11 Bjorn Helgaas 2011-07-06 20:33:25 UTC
Ping!  Any chance you could try this?  (I'll be on vacation until July 12, but I can look at this again then.)
Comment 12 Steven Newbury 2012-04-09 00:17:04 UTC
I have a closely related probem:

Dell Latitude D830; 4GB RAM, 64 bit, Intel GM965 chipset

D/Dock docking station with Radeon HD5430 via PEX8112 PCIe-to-PCI Bridge

There simply isn't enough address space reserved below 4GB to accomodate the GFX card memory, why can't the space above 0x120000000 be utilised?

BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
 BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000df65a800 (usable)
 BIOS-e820: 00000000df65a800 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fed18000 - 00000000fed1c000 (reserved)
 BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
 BIOS-e820: 00000000feda0000 - 00000000feda6000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000120000000 (usable)
...
Allocating PCI resources starting at e0000000 (gap: e0000000:18000000)
...
pci 0000:00:1c.1:   bridge window [mem 0xf6c00000-0xf6cfffff]
pci 0000:00:1c.1:   bridge window [mem 0xf0400000-0xf05fffff 64bit pref]
pci 0000:00:1c.3: PCI bridge to [bus 0d-0e]
pci 0000:00:1c.3:   bridge window [io  0xd000-0xdfff]
pci 0000:00:1c.3:   bridge window [mem 0xf6a00000-0xf6bfffff]
pci 0000:00:1c.3:   bridge window [mem 0xf0600000-0xf07fffff 64bit pref]
pci 0000:00:1c.5: PCI bridge to [bus 09-09]
pci 0000:00:1c.5:   bridge window [io  0x4000-0x4fff]
pci 0000:00:1c.5:   bridge window [mem 0xf6900000-0xf69fffff]
pci 0000:00:1c.5:   bridge window [mem 0xf0800000-0xf09fffff 64bit pref]
pci 0000:03:08.0: BAR 15: can't assign mem pref (size 0x10000000)
pci 0000:04:00.0: BAR 0: can't assign mem pref (size 0x10000000)
pci 0000:04:00.0: BAR 0: trying firmware assignment [mem 0xe0000000-0xefffffff 64bit pref]
pci 0000:04:00.0: BAR 0: [mem 0xe0000000-0xefffffff 64bit pref] conflicts with PCI Bus 0000:00 [mem
 0xe0000000-0xf7ffffff]
pci 0000:03:08.0: PCI bridge to [bus 04-04]
pci 0000:03:08.0:   bridge window [io  0xc000-0xcfff]
pci 0000:03:08.0:   bridge window [mem 0xf6700000-0xf68fffff]
pci 0000:00:1e.0: PCI bridge to [bus 03-04]
pci 0000:00:1e.0:   bridge window [io  0xc000-0xcfff]
pci 0000:00:1e.0:   bridge window [mem 0xf6700000-0xf68fffff]
Comment 13 Steven Newbury 2012-04-09 09:41:00 UTC
From researching this over the weekend I'm certain TJ's PCI-DRAM project would solve my issue.  TJ, what happened to the code release?

The PCI address space allocation algorithom needs to allow for multiple gaps, and utilise address space above 4GB when available.  From Googling I've found this actually hits a lot of people, there's no need to prevent systems working in order to enforce 32bit limits on 64bit Linux.
Comment 14 Steven Newbury 2012-04-10 10:10:15 UTC
Created attachment 72869 [details]
full dmesg output of failure to allocate PCI resources

Attached dmesg as requested.  This is the output when booting attached to docking station (not hotplug).
Comment 15 Steven Newbury 2012-04-10 10:11:57 UTC
Above is WITH "pci=use_crs".  I'll reboot with pci=nocrs now...
Comment 16 Steven Newbury 2012-04-10 10:18:34 UTC
Created attachment 72870 [details]
full dmesg output of failure to allocate PCI resources with pci=nocrs

Again, with pci=nocrs.
Comment 17 Steven Newbury 2012-06-01 16:05:23 UTC
Created attachment 73490 [details]
full dmesg output of working setup

On request from Bjorn Helgaas here's the full dmesg of the working patched kernel.  System is booted undocked, then docked, radeon comes up as expected.  If the system is booted while docked it's the same except the DRM devices are reversed card0<->card1.  Everything still works.
Comment 18 Steven Newbury 2012-06-01 16:08:03 UTC
Created attachment 73491 [details]
iomem output for working system

I've also attached the /proc/iomem output for the above.
Comment 19 Bjorn Helgaas 2012-08-31 15:02:17 UTC
The working dmesg Steven posted in comment #17 shows that the BIOS advertises no host bridge windows above 4GB:

pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff]
pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff]
pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff]
pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]

Obviously, some of that space above 4GB does work, but without the _CRS information from the BIOS, we really can't use it safely.

If Windows works in this configuration, it's probably because Windows uses a different allocation strategy that packs the resources better into the space below 4GB.  An AIDA64 report would tell us about that.  I've never seen Windows allocate space outside the host bridge windows, so I'd be very surprised if Windows would use space above 4GB on this machine.

If Dell claims to support this config and if Windows works, we might be able to figure out to make Linux do the same thing.  Otherwise, my opinion is that Steven's issue is unfixable.
Comment 20 Bjorn Helgaas 2012-08-31 15:05:18 UTC
TJ, your logs in comment #9 are with "use_crs", which does nothing.  Can you try a current kernel with "pci=use_crs"?
Comment 21 TJ 2012-09-02 06:02:44 UTC
Created attachment 79031 [details]
4 log files from v3.6rc4

This bug morphed from the "64-bit BAR mapped above 4GB on 32-bit northbridge" since it was only in the earliest reports for older kernels (2.6.2x) that the mapping showed the attempt to go above the 4GB boundary.

In the later kernels (e.g. 2.6.38) that never happens; it always tries to put the VRAM at 0xB0000000 and fails when 3GB RAM is installed. I'm not clear why that is - there's nothing in BIOS that allows the user to influence the placement of resources.

So this bug, for me, morphed as the real issue became apparent and maybe should be re-titled in view of the research retained here, to "PCI resource assignments fail due to poor allocation strategy".

Attached are logs from v3.6rc4. Issue is the same as always. Linux does a poor job of organising allocations compared to Windows - it's more than 10 years behind.
Comment 22 Wei Yang 2013-06-28 08:11:17 UTC
All,

I am trying to look into this problem. 

By reading the comment log, it seems kernel has allocate some space to nvidia device while it says this space is not suitable. Am I right?

Here is a snippet from the attachment. 

[    1.339074] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
[    1.339074] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0)
[    1.339165] NVRM: The system BIOS may have misconfigured your GPU.
[    1.339240] nvidia: probe of 0000:01:00.0 failed with error -1
[    1.339312] NVRM: The NVIDIA probe routine failed for 1 device(s).
[    1.339364] NVRM: None of the NVIDIA graphics adapters were initialized!

This is printed by nvidia private driver?
Comment 23 Wei Yang 2013-07-01 14:01:21 UTC
All, 

I took another look in the log, while have some confusion.

Value read from BAR is the same.

=======value read from BAR=============
[    0.357566] pci 0000:01:00.0: [10de:0398] type 00 class 0x030000
[    0.357582] pci 0000:01:00.0: reg 10: [mem 0xd5000000-0xd5ffffff]
[    0.357600] pci 0000:01:00.0: reg 14: [mem 0xb0000000-0xbfffffff 64bit pref]
[    0.357618] pci 0000:01:00.0: reg 1c: [mem 0xd4000000-0xd4ffffff 64bit]
[    0.357631] pci 0000:01:00.0: reg 24: [io  0x2000-0x207f]
[    0.357643] pci 0000:01:00.0: reg 30: [mem 0x00000000-0x0001ffff pref]

But the resource aperture for root bus in 2G and 3G are different.
Take a look at the last two, which means we have less resource for root bridge, from 2G to 1G.
=======2G=============
[    0.341054] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.341057] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.341060] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    0.341062] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.341065] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff]
[    0.341067] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff]
[    0.341070] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff]
[    0.341073] pci_bus 0000:00: root bus resource [mem 0x80000000-0xfebfffff]
[    0.341075] pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed44fff]

=======3G============= 
[    0.353316] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.353378] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    0.353440] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    0.353503] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.353566] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff]
[    0.353630] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff]
[    0.353694] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff]
[    0.353758] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff]
[    0.353821] pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed44fff]

In my mind, on x86 it will try to assign resource according to the value retrieved from BIOS by pci_claim_resource(). This would fail because support this range on 3G. 

Since it fails, kernel will try to set it back and try to allocate it later. 

Here is another log:
==============fall back to BIOS value===============
[    0.415338] pci 0000:01:00.0: BAR 1: can't assign mem pref (size 0x10000000)
[    0.415393] pci 0000:01:00.0: BAR 1: trying firmware assignment [mem 0xb0000000-0xbfffffff 64bit pref]
[    0.415457] pci 0000:01:00.0: BAR 1: [mem 0xb0000000-0xbfffffff 64bit pref] conflicts with System RAM [mem 0x00100000-0xbfe8ffff]

This log looks like kernel tried to assign it but failed. Then it tries to the value from BIOS. So I guess there is no mem resource?

Could TJ provide the /proc/iomem?