Bug 15480
Summary: | [regression] Fails to boot properly unless given pci=nocrs | ||
---|---|---|---|
Product: | ACPI | Reporter: | Yanko Kaneti (yaneti) |
Component: | Config-Other | Assignee: | Bjorn Helgaas (bjorn.helgaas) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, bjorn.helgaas, lenb, maciej.rutecki, rjw, rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 15310 | ||
Attachments: |
boot log without using pci=nocrs
lspci -vvnn debug patch (trim window to remove conflicts) /proc/iomem boot log latest mainline git + att. 25460 boot log latest mainline git + att. 25460 2.6.34-0.11.rc1.git1.fc13.x86_64 + patches dmesg GA-MA78GM-S2H rev 1.0 F11 - acpidump GA-MA78GM-S2H rev 1.0 F11 - DSDT 2.6.34-0.17.rc2.git1.fc14.x86_64 + workaround + att. 25523 debug Windows GA-MA78GM-S2H PCI bus resources truncate _CRS windows with _LEN > _MAX - _MIN + 1 2.6.34-0.17.1.rc2.git1.fc14.x86_64 + attachment 25691 log Windows _MIN/_MAX/_LEN parsing |
Created attachment 25417 [details]
lspci -vvnn
http://bugzilla.kernel.org/show_bug.cgi?id=15480 This is a regression since 2.6.33. Thanks a lot for the report! pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000dffff] pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff] pci_root PNP0A03:00: can't allocate host bridge window [mem 0xcff00000-0x10ed0ffff] The last window completely encloses the previous one, which is fine, so the problem must be that something overlaps *part* of that last window. Please attach /proc/iomem (booted with "pci=nocrs"), and try the attached patch to find out what region conflicts. If you happen to have Windows on this machine, I'd also like to know what the Device Manager reports about these host bridge resources. On Tuesday 09 March 2010 01:56:02 pm Bjorn Helgaas wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=15480 > > This is a regression since 2.6.33. Thanks a lot for the report! > > pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] > pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] > pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] > pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000dffff] > pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff] > pci_root PNP0A03:00: can't allocate host bridge window [mem > 0xcff00000-0x10ed0ffff] > > The last window completely encloses the previous one, which is fine, > so the problem must be that something overlaps *part* of that last > window. My guess is that the conflict is with a System RAM area, possibly one starting at 0x100000000 like the one here: http://fixunix.com/debian/514784-bug-492865-installation-report-mostly-good-some-gripes-about-partitioning-installer-error-mesgs.html That feels like a BIOS bug in the host bridge description, since accesses to the conflict area will probably go to RAM, not to PCI. Can you try the attached debug patch and report the dmesg output? If this makes your system boot, we'll have to think about whether this is the right workaround, and whether and how we'd want to get the conflicting resource out of kernel/resource.c. There are no other interfaces there that return the conflict resource, so maybe there's a reason for keeping them internal. Bjorn commit db86a01c1dd7d0a6c18e1b9edd479c1e6a08de93 Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Tue Mar 9 11:43:17 2010 -0700 diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index 6e22454..42d8f01 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -118,7 +118,7 @@ static acpi_status setup_resource(struct acpi_resource *acpi_res, void *data) { struct pci_root_info *info = data; - struct resource *res; + struct resource *res, *conflict; struct acpi_resource_address64 addr; acpi_status status; unsigned long flags; @@ -157,21 +157,39 @@ setup_resource(struct acpi_resource *acpi_res, void *data) return AE_OK; } - if (insert_resource(root, res)) { + for (;;) { + conflict = insert_resource_conflict(root, res); + if (!conflict) + break; + dev_err(&info->bridge->dev, - "can't allocate host bridge window %pR\n", res); - } else { - pci_bus_add_resource(info->bus, res, 0); - info->res_num++; - if (addr.translation_offset) - dev_info(&info->bridge->dev, "host bridge window %pR " - "(PCI address [%#llx-%#llx])\n", - res, res->start - addr.translation_offset, - res->end - addr.translation_offset); - else - dev_info(&info->bridge->dev, - "host bridge window %pR\n", res); + "host bridge window %pR conflicts with %pR\n", + res, conflict); + if (res->start < conflict->end && conflict->end < res->end) + res->start = conflict->end + 1; + if (res->start < conflict->start && conflict->start < res->end) + res->end = conflict->start - 1; + + if (res->start >= res->end) { + dev_err(&info->bridge->dev, + "can't allocate host bridge window\n"); + return AE_OK; + } + + dev_info(&info->bridge->dev, + "host bridge window trimmed to %pR\n", res); } + + pci_bus_add_resource(info->bus, res, 0); + info->res_num++; + if (addr.translation_offset) + dev_info(&info->bridge->dev, "host bridge window %pR " + "(PCI address [%#llx-%#llx])\n", + res, res->start - addr.translation_offset, + res->end - addr.translation_offset); + else + dev_info(&info->bridge->dev, + "host bridge window %pR\n", res); return AE_OK; } diff --git a/include/linux/ioport.h b/include/linux/ioport.h index dda9841..9f88526 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -117,6 +117,7 @@ extern void reserve_region_with_split(struct resource *root, resource_size_t start, resource_size_t end, const char *name); extern int insert_resource(struct resource *parent, struct resource *new); +extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new); extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new); extern int allocate_resource(struct resource *root, struct resource *new, resource_size_t size, resource_size_t min, diff --git a/kernel/resource.c b/kernel/resource.c index 2d5be5d..8ec71a2 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -496,6 +496,16 @@ int insert_resource(struct resource *parent, struct resource *new) return conflict ? -EBUSY : 0; } +struct resource *insert_resource_conflict(struct resource *parent, struct resource *new) +{ + struct resource *conflict; + + write_lock(&resource_lock); + conflict = __insert_resource(parent, new); + write_unlock(&resource_lock); + return conflict; +} + /** * insert_resource_expand_to_fit - Insert a resource into the resource tree * @root: root resource descriptor Created attachment 25460 [details]
debug patch (trim window to remove conflicts)
same patch as included inline above
Created attachment 25463 [details]
/proc/iomem
Sorry for the delay, Here is /proc/iomem
I'll try to find the time to test with the patches tomorrow.
Created attachment 25471 [details] boot log latest mainline git + att. 25460 Yes, with applied attachment 25460 [details] the machine seems to boot ok without pci= tweaking. Fedora like kernel config. Log attached. Created attachment 25473 [details]
boot log latest mainline git + att. 25460
Something got munged in the previous attachment
Created attachment 25484 [details] 2.6.34-0.11.rc1.git1.fc13.x86_64 + patches dmesg Latest fedora rawhide kernel + the three patch series from http://lkml.org/lkml/2010/3/11/512 Boots and works fine so far. Dmesg attached Thanks. Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com> I tried to reproduce this by tweaking the SeaBIOS DSDT to report a similar overlap and booting Windows via qemu. Windows stops with this error: http://support.microsoft.com/kb/314830, so I'm concerned that there's still some _CRS-parsing subtlety we're missing. Yanko, would you mind attaching an acpidump (see http://kernel.org/pub/linux/kernel/people/helgaas/debug)? Also, if you can apply the patch here: https://bugzilla.kernel.org/show_bug.cgi?id=15533#c5 and attach the resulting dmesg, that would also be useful. Thanks very much. Created attachment 25673 [details]
GA-MA78GM-S2H rev 1.0 F11 - acpidump
Created attachment 25674 [details]
GA-MA78GM-S2H rev 1.0 F11 - DSDT
Created attachment 25679 [details]
2.6.34-0.17.rc2.git1.fc14.x86_64 + workaround + att. 25523 debug
Created attachment 25690 [details] Windows GA-MA78GM-S2H PCI bus resources From the DSDT in comment 13, we can see that the BIOS starts with this template: DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, ... 0x00100000, // Range Minimum 0xFEBFFFFF, // Range Maximum 0xFFF00000, // Length and fills in the starting address, probably based on the system memory size. What we see in Linux (from comment 14) is this: [07] 32-Bit DWORD Address Space Resource Resource Type : Memory Range Min Relocatability : MinFixed Max Relocatability : MaxFixed Address Minimum : CFF00000 (_MIN) Address Maximum : FEBFFFFF (_MAX) Address Length : 3EE10000 (_LEN) Per ACPI spec, _LEN must be (_MAX - _MIN + 1), but 3EE10000 != FEBFFFFF - CFF00000 + 1, so this looks like a BIOS defect. But Windows deals with it, and Linux should, too. purana@gmail.com went far out of his way to collect the attached Windows Device Manager screenshot from a GA-MA78GM-S2H. The resources shown there match what Linux found, except for this "end-of-memory to FEBFFFFF" region. There, Windows appears to have trimmed the _LEN so it fits between _MIN and _MAX. I think it will be much better for Linux to enforce this "LEN <= _MAX - _MIN + 1" constraint than to trim it based on other resources that conflict. This way, we'll end up with [mem 0xcff00000-0xfebfffff] rather than [mem 0xcff00000-0xffffffff], which should match Windows exactly and will remove the possibility of placing a device at 0xfec00000, where it probably won't work. Created attachment 25691 [details]
truncate _CRS windows with _LEN > _MAX - _MIN + 1
Yanko, can you test this patch, please? You should only need this patch on top of 2.6.34-0.17.rc2.git1.fc14.x86_64 (or whatever recent upstream kernel you like). We should see [mem 0xcff00000-0xfebfffff] rather than [mem
0xcff00000-0xffffffff], which I think is more accurate.
Created attachment 25692 [details] 2.6.34-0.17.1.rc2.git1.fc14.x86_64 + attachment 25691 [details] log Works ok so far. d558b483d5a73f5718705b270cb2090f66ea48c8 Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Thu Mar 25 09:28:30 2010 -0600 x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1 shipped (via the PCI tree) in Linux-2.6.34-rc3 commit b049fdf93dd1925aea02210e5e8fcedcc607c05c Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Thu Mar 25 10:32:49 2010 -0600 PNPACPI: truncate _CRS windows with _LEN > _MAX - _MIN + 1 is in the acpi tree. On Thursday 08 April 2010, Bjorn Helgaas wrote: > On Wednesday 07 April 2010 03:08:37 pm Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a summary report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.33. Please verify if it still should be listed and let the > tracking team > > know (either way). > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15480 > > Subject : [regression] Fails to boot properly unless given > pci=nocrs > > Submitter : Yanko Kaneti <yaneti@declera.com> > > Date : 2010-03-09 01:24 (30 days old) > > Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com> > > Patch : http://lkml.org/lkml/2010/3/11/512 > > This should be closed. The fix is in Linus' tree: > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d558b483d5a73f5718705b270cb2090f66ea48c8 Created attachment 26152 [details] Windows _MIN/_MAX/_LEN parsing This experiment used the same QEMU/SeaBIOS environment as https://bugzilla.kernel.org/show_bug.cgi?id=15817 The normal host bridge _CRS contains this: DWordMemory (..., 0xE0000000, // Address Range Minimum 0xFEBFFFFF, // Address Range Maximum 0x1EC00000, // Address Length where 0xFEBFFFFF == 0xE0000000 + 0x1EC00000 - 1. I replaced the _MAX with 0xF2123456, booted Windows, and collected this screenshot. It appears that Windows ignored _LEN (0x1EC00000) and merely used [_MIN to _MAX]. |
Created attachment 25416 [details] boot log without using pci=nocrs Since commit 7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26 x86/PCI: use host bridge _CRS info by default on 2008 and newer machines my systems fails to boot properly GigaByte GA-MA78GM-S2H rev.1.0 with BIOS - F11, 09/16/2009 The attached log ends at the point where I decided to reboot