Bug 24392
Summary: | AGP aperture disabled, worked in 2.6.35 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Stephen Kitt (steve) |
Component: | Video(AGP) | Assignee: | Dave Airlie (airlied) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | bjorn.helgaas, florian, maciej.rutecki, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.36 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 16444 | ||
Attachments: |
dmesg booting 2.6.35 with the AGP aperture enabled
dmesg booting 2.6.36 with the AGP aperture disabled dmesg booting 2.6.36 with the AGP aperture enabled |
Created attachment 39092 [details]
dmesg booting 2.6.36 with the AGP aperture disabled
I'm guessing this is due to http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=96576a9e1a0cdb8a43d3af5846be0948f52b4460 - I'll try to check this evening (CET). Created attachment 39202 [details]
dmesg booting 2.6.36 with the AGP aperture enabled
I can confirm that the above commit is at fault - reverting it fixes the bug. See the attached dmesg for the log booting this patched version, with the AGP aperture enabled as in 2.6.35!
First-Bad-Commit : 96576a9e1a0cdb8a43d3af5846be0948f52b4460 From broken dmesg (https://bugzilla.kernel.org/attachment.cgi?id=39092): pci 0000:00:00.0: reg 10: [mem 0xf0000000-0xf7ffffff pref] pci 0000:00:1f.1: reg 24: [mem 0x00000000-0x000003ff] pci 0000:00:1e.0: PCI bridge to [bus 02-02] (subtractive decode) pci 0000:00:1e.0: bridge window [mem 0xef000000-0xfbffffff] pci 0000:00:00.0: address space collision: [mem 0xf0000000-0xf7ffffff pref] conflicts with PCI Bus 0000:02 [mem 0xef000000-0xfbffffff] pci 0000:00:1f.1: BAR 5: assigned [mem 0xc0000000-0xc00003ff] agpgart-intel 0000:00:00.0: device not available (can't reserve [mem 0x00000000-0x07ffffff pref]) The conflict between the AGP 00:00.0 BAR 0 and the 00:1e.0 bridge window looks real, so reassigning the AGP BAR looks like the right thing to do. We assign 00:1f.1 BAR 5 in the pcibios_assign_resources() path, and I think we would assign the AGP BAR there, too, except that the class code of 00:00.0 is probably 0x000600 (PCI_CLASS_BRIDGE_HOST), and we explicitly ignore host bridges in __dev_sort_resources(), so it's up to the driver to catch this and assign it explicitly before calling pci_enable_device(). That's kind of ugly because it's an exception to the normal "call pci_enable_device() first" rule, and it leads to bogus "conflicts" like this: pnp 00:0d: disabling [mem 0x00000000-0x0009ffff] because it overlaps 0000:00:00.0 BAR 0 [mem 0x00000000-0x07ffffff pref] but I think we're stuck with it for now, and the simplest solution is to just revert 96576a9e1a. Here's the old commit that made us ignore host bridge BARs: http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=1d6f81a248eb2febbe24892fa4d54db382a1286c 2004/12/17 13:44:31-08:00 macro [PATCH] PCI: Don't touch BARs of host bridges BARs of host bridges often have special meaning and AFAIK are best left to be setup by the firmware or system-specific startup code and kept intact by the generic resource handler. For example a couple of host bridges used for MIPS processors interpret BARs as target-mode decoders for accessing host memory by PCI masters (which is quite reasonable). For them it's desirable to keep their decoded address range overlapping with the host RAM for simplicity if nothing else (I can imagine running out of address space with lots of memory and 32-bit PCI with no DAC support in the participating devices). This is already the case with the i386 and ppc platform-specific PCI resource allocators. Please consider the following change for the generic allocator. Currently we have a pile of hacks implemented for host bridges to be left untouched and I'd be pleased to remove them. From: "Maciej W. Rozycki" <macro@mips.com> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> On Monday, December 20, 2010, Bjorn Helgaas wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=24392
> > First-Bad-Commit:
> http://git.kernel.org/linus/96576a9e1a0cdb8a43d3af5846be0948f52b4460
>
> From broken dmesg (https://bugzilla.kernel.org/attachment.cgi?id=39092):
>
> pci 0000:00:00.0: reg 10: [mem 0xf0000000-0xf7ffffff pref]
> pci 0000:00:1f.1: reg 24: [mem 0x00000000-0x000003ff]
> pci 0000:00:1e.0: PCI bridge to [bus 02-02] (subtractive decode)
> pci 0000:00:1e.0: bridge window [mem 0xef000000-0xfbffffff]
> pci 0000:00:00.0: address space collision: [mem 0xf0000000-0xf7ffffff pref]
> conflicts with PCI Bus 0000:02 [mem 0xef000000-0xfbffffff]
> pci 0000:00:1f.1: BAR 5: assigned [mem 0xc0000000-0xc00003ff]
> agpgart-intel 0000:00:00.0: device not available (can't reserve [mem
> 0x00000000-0x07ffffff pref])
>
> The conflict between the AGP 00:00.0 BAR 0 and the 00:1e.0 bridge window
> looks real, so reassigning the AGP BAR looks like the right thing to do.
> We assign 00:1f.1 BAR 5 in the pcibios_assign_resources() path, and
> I think we would assign the AGP BAR there, too, except that the class
> code of 00:00.0 is probably 0x000600 (PCI_CLASS_BRIDGE_HOST), and we
> explicitly ignore host bridges in __dev_sort_resources(), so it's up
> to the driver to catch this and assign it explicitly before calling
> pci_enable_device().
>
> That's kind of ugly because it's an exception to the normal "call
> pci_enable_device() first" rule, and it leads to bogus "conflicts"
> like this:
>
> pnp 00:0d: disabling [mem 0x00000000-0x0009ffff] because it overlaps
> 0000:00:00.0 BAR 0 [mem 0x00000000-0x07ffffff pref]
>
> but I think we're stuck with it for now, and the simplest solution
> is to just revert 96576a9e1a.
>
> Here's the old commit that made us ignore host bridge BARs:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=1d6f81a248eb2febbe24892fa4d54db382a1286c
>
> 2004/12/17 13:44:31-08:00 macro
> [PATCH] PCI: Don't touch BARs of host bridges
>
> BARs of host bridges often have special meaning and AFAIK are best left
> to be setup by the firmware or system-specific startup code and kept
> intact by the generic resource handler. For example a couple of host
> bridges used for MIPS processors interpret BARs as target-mode decoders
> for accessing host memory by PCI masters (which is quite reasonable).
> For them it's desirable to keep their decoded address range overlapping
> with the host RAM for simplicity if nothing else (I can imagine running
> out of address space with lots of memory and 32-bit PCI with no DAC
> support in the participating devices).
>
> This is already the case with the i386 and ppc platform-specific PCI
> resource allocators. Please consider the following change for the generic
> allocator. Currently we have a pile of hacks implemented for host bridges
> to be left untouched and I'd be pleased to remove them.
>
> From: "Maciej W. Rozycki" <macro@mips.com>
> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
> --
Stephen, is this bug still present in current mainline kernels? (2.6.37 or 2.6.38-rc2?) Florian, thanks for taking an interest in this bug! It is still present in 2.6.37, and the same fix still applies. Should I send a patch? (Basically, reverting the patch at fault, and adding a comment explaining the apparently incorrect order...) Yes, please send the patch with an appropriate description to lkml and Dave Airlie and put Andrew Morton (akpm@linux-foundation.org), as well as Kulikov Vasiliy and Bjorn Helgaas on the cc list. I've sent a patch, see https://lkml.org/lkml/2011/1/31/212 Good. Patch: https://lkml.org/lkml/2011/1/31/212 merged in .38-rc4: commit a70b95c017e8b518e1e069853355e4e497453dbb Author: Stephen Kitt <steve@sk2.org> Date: Mon Jan 31 14:25:43 2011 -0800 agp: ensure GART has an address before enabling it |
Created attachment 39082 [details] dmesg booting 2.6.35 with the AGP aperture enabled Hi, Since upgrading to 2.6.36 I've noticed that the AGP aperture is disabled... My system is an old 32-bit-only Pentium 4 with a fair amount of devices, which means address space is tight, but everything does fit. I've tried reducing the size of the aperture, but that doesn't change anything; the attached logs correspond to a 128MB aperture. If there's anything I can do to help figure out what's going wrong, feel free to ask! Regards, Stephen