Bug 24392

Summary: AGP aperture disabled, worked in 2.6.35
Product: Drivers Reporter: Stephen Kitt (steve)
Component: Video(AGP)Assignee: Dave Airlie (airlied)
Status: CLOSED CODE_FIX    
Severity: normal CC: bjorn.helgaas, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16444    
Attachments: dmesg booting 2.6.35 with the AGP aperture enabled
dmesg booting 2.6.36 with the AGP aperture disabled
dmesg booting 2.6.36 with the AGP aperture enabled

Description Stephen Kitt 2010-12-06 06:31:47 UTC
Created attachment 39082 [details]
dmesg booting 2.6.35 with the AGP aperture enabled

Hi,

Since upgrading to 2.6.36 I've noticed that the AGP aperture is disabled... My system is an old 32-bit-only Pentium 4 with a fair amount of devices, which means address space is tight, but everything does fit. I've tried reducing the size of the aperture, but that doesn't change anything; the attached logs correspond to a 128MB aperture.

If there's anything I can do to help figure out what's going wrong, feel free to ask!

Regards,

Stephen
Comment 1 Stephen Kitt 2010-12-06 06:32:26 UTC
Created attachment 39092 [details]
dmesg booting 2.6.36 with the AGP aperture disabled
Comment 2 Stephen Kitt 2010-12-06 10:05:42 UTC
I'm guessing this is due to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=96576a9e1a0cdb8a43d3af5846be0948f52b4460 - I'll try to check this evening (CET).
Comment 3 Stephen Kitt 2010-12-06 21:33:41 UTC
Created attachment 39202 [details]
dmesg booting 2.6.36 with the AGP aperture enabled

I can confirm that the above commit is at fault - reverting it fixes the bug. See the attached dmesg for the log booting this patched version, with the AGP aperture enabled as in 2.6.35!
Comment 4 Rafael J. Wysocki 2010-12-07 23:32:56 UTC
First-Bad-Commit : 96576a9e1a0cdb8a43d3af5846be0948f52b4460
Comment 5 Bjorn Helgaas 2010-12-20 19:17:02 UTC
From broken dmesg (https://bugzilla.kernel.org/attachment.cgi?id=39092):

  pci 0000:00:00.0: reg 10: [mem 0xf0000000-0xf7ffffff pref]
  pci 0000:00:1f.1: reg 24: [mem 0x00000000-0x000003ff]
  pci 0000:00:1e.0: PCI bridge to [bus 02-02] (subtractive decode)
  pci 0000:00:1e.0:   bridge window [mem 0xef000000-0xfbffffff]
  pci 0000:00:00.0: address space collision: [mem 0xf0000000-0xf7ffffff pref] conflicts with PCI Bus 0000:02 [mem 0xef000000-0xfbffffff]
  pci 0000:00:1f.1: BAR 5: assigned [mem 0xc0000000-0xc00003ff]
  agpgart-intel 0000:00:00.0: device not available (can't reserve [mem 0x00000000-0x07ffffff pref])

The conflict between the AGP 00:00.0 BAR 0 and the 00:1e.0 bridge window
looks real, so reassigning the AGP BAR looks like the right thing to do.
We assign 00:1f.1 BAR 5 in the pcibios_assign_resources() path, and
I think we would assign the AGP BAR there, too, except that the class
code of 00:00.0 is probably 0x000600 (PCI_CLASS_BRIDGE_HOST), and we
explicitly ignore host bridges in __dev_sort_resources(), so it's up
to the driver to catch this and assign it explicitly before calling
pci_enable_device().

That's kind of ugly because it's an exception to the normal "call
pci_enable_device() first" rule, and it leads to bogus "conflicts"
like this:

  pnp 00:0d: disabling [mem 0x00000000-0x0009ffff] because it overlaps 0000:00:00.0 BAR 0 [mem 0x00000000-0x07ffffff pref]

but I think we're stuck with it for now, and the simplest solution
is to just revert 96576a9e1a.

Here's the old commit that made us ignore host bridge BARs:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=1d6f81a248eb2febbe24892fa4d54db382a1286c

  2004/12/17 13:44:31-08:00 macro
  [PATCH] PCI: Don't touch BARs of host bridges

   BARs of host bridges often have special meaning and AFAIK are best left
  to be setup by the firmware or system-specific startup code and kept
  intact by the generic resource handler.  For example a couple of host
  bridges used for MIPS processors interpret BARs as target-mode decoders
  for accessing host memory by PCI masters (which is quite reasonable).
  For them it's desirable to keep their decoded address range overlapping
  with the host RAM for simplicity if nothing else (I can imagine running
  out of address space with lots of memory and 32-bit PCI with no DAC
  support in the participating devices).

   This is already the case with the i386 and ppc platform-specific PCI
  resource allocators.  Please consider the following change for the generic
  allocator.  Currently we have a pile of hacks implemented for host bridges
  to be left untouched and I'd be pleased to remove them.

  From: "Maciej W. Rozycki" <macro@mips.com>
  Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Comment 6 Rafael J. Wysocki 2010-12-20 20:58:35 UTC
On Monday, December 20, 2010, Bjorn Helgaas wrote:
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=24392
> > First-Bad-Commit:
> http://git.kernel.org/linus/96576a9e1a0cdb8a43d3af5846be0948f52b4460
> 
> From broken dmesg (https://bugzilla.kernel.org/attachment.cgi?id=39092):
> 
>   pci 0000:00:00.0: reg 10: [mem 0xf0000000-0xf7ffffff pref]
>   pci 0000:00:1f.1: reg 24: [mem 0x00000000-0x000003ff]
>   pci 0000:00:1e.0: PCI bridge to [bus 02-02] (subtractive decode)
>   pci 0000:00:1e.0:   bridge window [mem 0xef000000-0xfbffffff]
>   pci 0000:00:00.0: address space collision: [mem 0xf0000000-0xf7ffffff pref]
>   conflicts with PCI Bus 0000:02 [mem 0xef000000-0xfbffffff]
>   pci 0000:00:1f.1: BAR 5: assigned [mem 0xc0000000-0xc00003ff]
>   agpgart-intel 0000:00:00.0: device not available (can't reserve [mem
>   0x00000000-0x07ffffff pref])
> 
> The conflict between the AGP 00:00.0 BAR 0 and the 00:1e.0 bridge window
> looks real, so reassigning the AGP BAR looks like the right thing to do.
> We assign 00:1f.1 BAR 5 in the pcibios_assign_resources() path, and
> I think we would assign the AGP BAR there, too, except that the class
> code of 00:00.0 is probably 0x000600 (PCI_CLASS_BRIDGE_HOST), and we
> explicitly ignore host bridges in __dev_sort_resources(), so it's up
> to the driver to catch this and assign it explicitly before calling
> pci_enable_device().
> 
> That's kind of ugly because it's an exception to the normal "call
> pci_enable_device() first" rule, and it leads to bogus "conflicts"
> like this:
> 
>   pnp 00:0d: disabling [mem 0x00000000-0x0009ffff] because it overlaps
>   0000:00:00.0 BAR 0 [mem 0x00000000-0x07ffffff pref]
> 
> but I think we're stuck with it for now, and the simplest solution
> is to just revert 96576a9e1a.
> 
> Here's the old commit that made us ignore host bridge BARs:
>  
>   http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=1d6f81a248eb2febbe24892fa4d54db382a1286c
> 
>   2004/12/17 13:44:31-08:00 macro
>   [PATCH] PCI: Don't touch BARs of host bridges
> 
>    BARs of host bridges often have special meaning and AFAIK are best left
>   to be setup by the firmware or system-specific startup code and kept
>   intact by the generic resource handler.  For example a couple of host
>   bridges used for MIPS processors interpret BARs as target-mode decoders
>   for accessing host memory by PCI masters (which is quite reasonable).
>   For them it's desirable to keep their decoded address range overlapping
>   with the host RAM for simplicity if nothing else (I can imagine running
>   out of address space with lots of memory and 32-bit PCI with no DAC
>   support in the participating devices).
> 
>    This is already the case with the i386 and ppc platform-specific PCI
>   resource allocators.  Please consider the following change for the generic
>   allocator.  Currently we have a pile of hacks implemented for host bridges
>   to be left untouched and I'd be pleased to remove them.
> 
>   From: "Maciej W. Rozycki" <macro@mips.com>
>   Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
> --
Comment 7 Florian Mickler 2011-01-30 21:37:00 UTC
Stephen, is this bug still present in current mainline kernels? (2.6.37 or 2.6.38-rc2?)
Comment 8 Stephen Kitt 2011-01-30 22:57:35 UTC
Florian, thanks for taking an interest in this bug!

It is still present in 2.6.37, and the same fix still applies.

Should I send a patch? (Basically, reverting the patch at fault, and adding a comment explaining the apparently incorrect order...)
Comment 9 Florian Mickler 2011-01-31 12:16:59 UTC
Yes, please send the patch with an appropriate description to lkml and Dave Airlie  and put Andrew Morton (akpm@linux-foundation.org), as well as Kulikov Vasiliy and Bjorn Helgaas on the cc list.
Comment 10 Stephen Kitt 2011-01-31 14:30:38 UTC
I've sent a patch, see https://lkml.org/lkml/2011/1/31/212
Comment 11 Florian Mickler 2011-01-31 22:36:35 UTC
Good.

Patch: https://lkml.org/lkml/2011/1/31/212
Comment 12 Florian Mickler 2011-02-19 23:47:20 UTC
merged in .38-rc4: 
commit a70b95c017e8b518e1e069853355e4e497453dbb
Author: Stephen Kitt <steve@sk2.org>
Date:   Mon Jan 31 14:25:43 2011 -0800

    agp: ensure GART has an address before enabling it