Bug 16228
Description
Brian Bloniarz
2010-06-16 17:57:11 UTC
Created attachment 26812 [details]
Boot log 2.6.34 (timeout waiting for root device)
Created attachment 26813 [details]
Boot log 2.6.33.5 (successful)
Created attachment 26814 [details]
Boot log 2.6.35-rc3 w/ pci=nocrs (successful)
Created attachment 26819 [details]
allow BAR override
Your root device is connected via AHCI, so the 2.6.34 timeout looks
like it's caused by the same issue of the 00:1f.2 AHCI controller
not working.
We read IRQ info from PCI config space, so the fact that we did read
that tells us nothing about the BAR being moved (my mistake).
My only theory is that there may be two BIOS bugs:
1) BIOS left 0000:00:1f.2 at [mem 0xff970000-0xff9707ff], which is
outside all the windows on the PNP0A03:00 host bridge, and
2) The reported host bridge window [mem 0xbff00000-0xdfffffff] is actually
wrong, and putting the device at [mem 0xbff00000-0xbff007ff] doesn't
work.
Linux looks for unused areas starting at low addresses, and Windows
starts from the top and goes down. If the beginning of that host
bridge window is incorrect, Linux would be more likely to trip over it
than Windows.
Maybe you could use this patch and experiment with placing the device
at different places. I think the easiest thing is to boot 2.6.33.5 first
and collect the /proc/iomem output (please also attach that here). That
will help find unused areas.
After you find a possible spot (I can help, given the /proc/iomem output),
you can try things like "pci=bar=0000:00:1f.2[24]=0xdfff0000" to reassign
that BAR 5.
Created attachment 26825 [details]
/proc/iomem as it appears on 2.6.33.5
Thanks, Brian. Would you mind making your attachments "text/plain", so firefox can easily open them? Here are some things to try. These are working down from high addresses, so I think they're more like what Windows would do. pci=bar=0000:00:1f.2[24]=0xff980800 pci=bar=0000:00:1f.2[24]=0xff970800 pci=bar=0000:00:1f.2[24]=0xfed9f800 pci=bar=0000:00:1f.2[24]=0xfbfff800 Created attachment 26826 [details]
Boot log 2.6.35-rc3 w/ pci=bar=0000:00:1f.2[24]=0xfa000000 (successful)
Could you give me some ideas for addresses to try? I don't know much about PCI.
My first shot in the dark was "pci=bar=0000:00:1f.2[24]=0xfa000000", and that boots successfully. I'm attaching the bootlog from that, if that helps.
That's perfect, thanks! The E820 memory map from BIOS reports a range that overlaps the first megabyte (0xbff00000-0xbfffffff) of that host bridge window: BIOS-e820: 0000000000100000 - 00000000bfdf9c00 (usable) BIOS-e820: 00000000bfdf9c00 - 00000000bfe4bc00 (ACPI NVS) BIOS-e820: 00000000bfe4bc00 - 00000000bfe4dc00 (ACPI data) BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved) pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff] The 0x100000-0xbfdf9c00 range is system RAM, and I would expect that the ACPI NVS and data area is also RAM, and the last reserved piece is probably RAM, too, because RAM tends to end at nicely aligned boundaries. Table 14-1 in the ACPI 4.0 spec says "reserved" ranges from the E820 table are "unsuitable for a standard device to use as a device memory space." I think maybe Linux should regard those reserved ranges as unavailable for allocation to PCI devices, even if they happen to be included in a host bridge window. What do you think, Yinghai? Awesome, sounds like you have a theory. I also went through the 4 locations you suggested, I have boot logs for those. pci=bar=0000:00:1f.2[24]=0xff980800: succeeds pci=bar=0000:00:1f.2[24]=0xff970800: fails It gives identical BUG behavior. However, the log also contains: [ 3.283225] pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970800-0xff970fff] So is there something funny about that address? pci=bar=0000:00:1f.2[24]=0xfed9f800: succeeds pci=bar=0000:00:1f.2[24]=0xfbfff800: succeeds. Created attachment 26827 [details]
Boot log 2.6.35-rc3 w/ pci=bar=0000:00:1f.2[24]=0xff970800 (fail)
Oops, the 0xff970800 failure is my fault, I should have known that would fail, because it's not inside the [mem 0xff97c000-0xff97ffff] window. I was looking at the /proc/iomem from 2.6.33, and it had the device at 0xff970000, so I figure we could put it right after that. The only reason the device was there is that 2.6.33 just ignores the host bridge windows. Anyway, I think we just need to figure out the best way to ignore the 0xbff00000-0xbfffffff range. Hopefully we can find a generic way that will fix similar issues on other boxes, too. On 06/17/2010 09:37 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16228 > > > Bjorn Helgaas <bjorn.helgaas@hp.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |yinghai@kernel.org > > > > > --- Comment #8 from Bjorn Helgaas <bjorn.helgaas@hp.com> 2010-06-17 16:37:42 > --- > That's perfect, thanks! > > The E820 memory map from BIOS reports a range that overlaps the first > megabyte (0xbff00000-0xbfffffff) of that host bridge window: > > BIOS-e820: 0000000000100000 - 00000000bfdf9c00 (usable) > BIOS-e820: 00000000bfdf9c00 - 00000000bfe4bc00 (ACPI NVS) > BIOS-e820: 00000000bfe4bc00 - 00000000bfe4dc00 (ACPI data) > BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved) > pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff] > > The 0x100000-0xbfdf9c00 range is system RAM, and I would expect that > the ACPI NVS and data area is also RAM, and the last reserved piece is > probably RAM, too, because RAM tends to end at nicely aligned boundaries. > > Table 14-1 in the ACPI 4.0 spec says "reserved" ranges from the E820 > table are "unsuitable for a standard device to use as a device memory > space." > > I think maybe Linux should regard those reserved ranges as unavailable > for allocation to PCI devices, even if they happen to be included in a > host bridge window. What do you think, Yinghai? for above 1M area, current kernel will honor setting from HW register and later will use insert_resource_expand_to_fit() with e820 reserved entries. so in this case will have one big 0xbfe4dc00 - 0xe000000 reserved entry in /proc/iomem as parent of 0xbff00000 - 0xe0000000 Sane BIOS should not put allocated range to PCI bridge/device into e820 table as reserved entries. But there IS some BIOS doing that. So Linus decided to trust setting from PCI bar at first. later for pci_assign_unsigned, We should avoid those range. one solution is expanding my one of old version for below 1M handling to: use reserve_region_with_split instead. will get /proc/iomem bfe4dc00 - bfefffff reserved bff00000 - dfffffff PCI BUS #00 bff00000 - bfffffff reserved ... so will use bff0000 - bfffffff reserved as holder to prevent those range to be allocated to unassigned devices. reserve_region_with_split need to need be updated a little bit to make sure it will not put holder on range with children already. something like this Subject: [PATCH] x86, resource: Add reserve_region_with_split_check_child() It will cover the whole region to BUSY, except that some regions that have children under them. those children normally is PCI bar but it is falling into E820_RESERVED. We can not put BUSY on them, otherwise driver can not use pci_request_region() later /proc/iomem will have 00010000-00094fff : System RAM 00095000-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000a0000-000bffff : reserved 000c0000-000cffff : reserved 000d0000-000dffff : PCI Bus 0000:00 000d0000-000dffff : reserved 000e0000-000fffff : reserved -v2: Add function pointer to put string comparing with caller -v3: expand to use it above 1M resources. Tested-by: Guenter Roeck <guenter.roeck@ericsson.com> Tested-by: Andy Isaacson <adi@hexapodia.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/e820.c | 15 ++++++++++++--- include/linux/ioport.h | 3 +++ kernel/resource.c | 29 ++++++++++++++++++++++++----- 3 files changed, 39 insertions(+), 8 deletions(-) Index: linux-2.6/arch/x86/kernel/e820.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/e820.c +++ linux-2.6/arch/x86/kernel/e820.c @@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void) * pci device BAR resource and insert them later in * pcibios_resource_survey() */ - if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) { + if (e820.map[i].type != E820_RESERVED) { res->flags |= IORESOURCE_BUSY; insert_resource(&iomem_resource, res); } @@ -1128,6 +1128,14 @@ static unsigned long ram_alignment(resou #define MAX_RESOURCE_SIZE ((resource_size_t)-1) +static int __init check_func(struct resource *cf) +{ + if (strstr(cf->name, "PCI Bus")) + return 1; + + return 0; +} + void __init e820_reserve_resources_late(void) { int i; @@ -1135,8 +1143,9 @@ void __init e820_reserve_resources_late( res = e820_res; for (i = 0; i < e820.nr_map; i++) { - if (!res->parent && res->end) - insert_resource_expand_to_fit(&iomem_resource, res); + if (!res->parent && res->end) { + reserve_region_with_split_check_child(&iomem_resource, res->start, res->end, res->name, check_func); + } res++; } Index: linux-2.6/include/linux/ioport.h =================================================================== --- linux-2.6.orig/include/linux/ioport.h +++ linux-2.6/include/linux/ioport.h @@ -120,6 +120,9 @@ void release_child_resources(struct reso extern void reserve_region_with_split(struct resource *root, resource_size_t start, resource_size_t end, const char *name); +void reserve_region_with_split_check_child(struct resource *root, + resource_size_t start, resource_size_t end, + const char *name, int (*check_func)(struct resource *cf)); extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new); extern int insert_resource(struct resource *parent, struct resource *new); extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new); Index: linux-2.6/kernel/resource.c =================================================================== --- linux-2.6.orig/kernel/resource.c +++ linux-2.6/kernel/resource.c @@ -607,9 +607,14 @@ int adjust_resource(struct resource *res return result; } +static int __init check_func_nop(struct resource *cf) +{ + return 1; +} + static void __init __reserve_region_with_split(struct resource *root, resource_size_t start, resource_size_t end, - const char *name) + const char *name, bool check_child, int (*check_func)(struct resource *cf)) { struct resource *parent = root; struct resource *conflict; @@ -631,13 +636,18 @@ static void __init __reserve_region_with kfree(res); /* conflict covered whole area */ - if (conflict->start <= start && conflict->end >= end) + if (conflict->start <= start && conflict->end >= end) { + if (check_child && !conflict->child && check_func(conflict)) + __reserve_region_with_split(conflict, start, end, name, false, check_func_nop); return; + } if (conflict->start > start) - __reserve_region_with_split(root, start, conflict->start-1, name); + __reserve_region_with_split(root, start, conflict->start-1, name, check_child, check_func); if (conflict->end < end) - __reserve_region_with_split(root, conflict->end+1, end, name); + __reserve_region_with_split(root, conflict->end+1, end, name, check_child, check_func); + if (check_child && !conflict->child && check_func(conflict)) + __reserve_region_with_split(conflict, conflict->start, conflict->end, name, false, check_func_nop); } void __init reserve_region_with_split(struct resource *root, @@ -645,7 +655,16 @@ void __init reserve_region_with_split(st const char *name) { write_lock(&resource_lock); - __reserve_region_with_split(root, start, end, name); + __reserve_region_with_split(root, start, end, name, false, check_func_nop); + write_unlock(&resource_lock); +} + +void __init reserve_region_with_split_check_child(struct resource *root, + resource_size_t start, resource_size_t end, + const char *name, int (*check_func)(struct resource *cf)) +{ + write_lock(&resource_lock); + __reserve_region_with_split(root, start, end, name, true, check_func); write_unlock(&resource_lock); } Created attachment 26832 [details]
Bootlog 2.6.35-rc3 with Yinghai's reserve_region_with_split_check_child patch
I tried booting with this reserve_region_with_split_check_child() patch, my machine still doesn't boot. The bootlog & /proc/iomem is attached if that helps at all. It seems like it tries to use the same address as the unpatched 2.6.35-rc3?
On 06/17/2010 03:26 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16228 > > > > > > --- Comment #13 from Brian Bloniarz <phunge0@hotmail.com> 2010-06-17 > 22:26:27 --- > Created an attachment (id=26832) > --> (https://bugzilla.kernel.org/attachment.cgi?id=26832) > Bootlog 2.6.35-rc3 with Yinghai's reserve_region_with_split_check_child patch > > I tried booting with this reserve_region_with_split_check_child() patch, my > machine still doesn't boot. The bootlog & /proc/iomem is attached if that > helps > at all. It seems like it tries to use the same address as the unpatched > 2.6.35-rc3? > you got bfe4dc00-bfefffff : reserved bff00000-dfffffff : PCI Bus 0000:00 bff00000-bff007ff : 0000:00:1f.2 bff00000-bff007ff : ahci c0000000-cfffffff : PCI Bus 0000:02 should have bfe4dc00-bfefffff : reserved bff00000-dfffffff : PCI Bus 0000:00 bff00000-bfffffff : reserved c0000000-cfffffff : PCI Bus 0000:02 please check this version Subject: [PATCH] x86, resource: Add reserve_region_with_split_check_child() It will cover the whole region to BUSY, except that some regions that have children under them. those children normally is PCI bar but it is falling into E820_RESERVED. We can not put BUSY on them, otherwise driver can not use pci_request_region() later /proc/iomem will have 00010000-00094fff : System RAM 00095000-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000a0000-000bffff : reserved 000c0000-000cffff : reserved 000d0000-000dffff : PCI Bus 0000:00 000d0000-000dffff : reserved 000e0000-000fffff : reserved -v2: Add function pointer to put string comparing with caller -v3: expand to use it above 1M resources. -v4: handle partial overlap case Tested-by: Guenter Roeck <guenter.roeck@ericsson.com> Tested-by: Andy Isaacson <adi@hexapodia.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/e820.c | 15 ++++++++++++--- include/linux/ioport.h | 3 +++ kernel/resource.c | 35 ++++++++++++++++++++++++++++++----- 3 files changed, 45 insertions(+), 8 deletions(-) Index: linux-2.6/arch/x86/kernel/e820.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/e820.c +++ linux-2.6/arch/x86/kernel/e820.c @@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void) * pci device BAR resource and insert them later in * pcibios_resource_survey() */ - if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) { + if (e820.map[i].type != E820_RESERVED) { res->flags |= IORESOURCE_BUSY; insert_resource(&iomem_resource, res); } @@ -1128,6 +1128,14 @@ static unsigned long ram_alignment(resou #define MAX_RESOURCE_SIZE ((resource_size_t)-1) +static int __init check_func(struct resource *cf) +{ + if (strstr(cf->name, "PCI Bus")) + return 1; + + return 0; +} + void __init e820_reserve_resources_late(void) { int i; @@ -1135,8 +1143,9 @@ void __init e820_reserve_resources_late( res = e820_res; for (i = 0; i < e820.nr_map; i++) { - if (!res->parent && res->end) - insert_resource_expand_to_fit(&iomem_resource, res); + if (!res->parent && res->end) { + reserve_region_with_split_check_child(&iomem_resource, res->start, res->end, res->name, check_func); + } res++; } Index: linux-2.6/include/linux/ioport.h =================================================================== --- linux-2.6.orig/include/linux/ioport.h +++ linux-2.6/include/linux/ioport.h @@ -120,6 +120,9 @@ void release_child_resources(struct reso extern void reserve_region_with_split(struct resource *root, resource_size_t start, resource_size_t end, const char *name); +void reserve_region_with_split_check_child(struct resource *root, + resource_size_t start, resource_size_t end, + const char *name, int (*check_func)(struct resource *cf)); extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new); extern int insert_resource(struct resource *parent, struct resource *new); extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new); Index: linux-2.6/kernel/resource.c =================================================================== --- linux-2.6.orig/kernel/resource.c +++ linux-2.6/kernel/resource.c @@ -607,9 +607,14 @@ int adjust_resource(struct resource *res return result; } +static int __init check_func_nop(struct resource *cf) +{ + return 1; +} + static void __init __reserve_region_with_split(struct resource *root, resource_size_t start, resource_size_t end, - const char *name) + const char *name, bool check_child, int (*check_func)(struct resource *cf)) { struct resource *parent = root; struct resource *conflict; @@ -631,13 +636,24 @@ static void __init __reserve_region_with kfree(res); /* conflict covered whole area */ - if (conflict->start <= start && conflict->end >= end) + if (conflict->start <= start && conflict->end >= end) { + if (check_child && !conflict->child && check_func(conflict)) + __reserve_region_with_split(conflict, start, end, name, false, check_func_nop); return; + } if (conflict->start > start) - __reserve_region_with_split(root, start, conflict->start-1, name); + __reserve_region_with_split(root, start, conflict->start-1, name, check_child, check_func); if (conflict->end < end) - __reserve_region_with_split(root, conflict->end+1, end, name); + __reserve_region_with_split(root, conflict->end+1, end, name, check_child, check_func); + + if (check_child && check_func(conflict)) { + resource_size_t common_start, common_end; + + common_start = max(start, conflict->start); + common_end = min(end, conflict->end); + __reserve_region_with_split(conflict, common_start, common_end, name, check_child, check_func); + } } void __init reserve_region_with_split(struct resource *root, @@ -645,7 +661,16 @@ void __init reserve_region_with_split(st const char *name) { write_lock(&resource_lock); - __reserve_region_with_split(root, start, end, name); + __reserve_region_with_split(root, start, end, name, false, check_func_nop); + write_unlock(&resource_lock); +} + +void __init reserve_region_with_split_check_child(struct resource *root, + resource_size_t start, resource_size_t end, + const char *name, int (*check_func)(struct resource *cf)) +{ + write_lock(&resource_lock); + __reserve_region_with_split(root, start, end, name, true, check_func); write_unlock(&resource_lock); } Created attachment 26844 [details]
Bootlog 2.6.35-rc3 with Yinghai's -v4 patch
This -v4 patch did the trick for me! The bootlog & /proc/iomem is attached. /proc/iomem shows up as you said it should.
On Thu, 17 Jun 2010 16:36:39 -0700 Yinghai Lu <yinghai@kernel.org> wrote: > On 06/17/2010 03:26 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=16228 > > > > > > > > > > > > --- Comment #13 from Brian Bloniarz <phunge0@hotmail.com> 2010-06-17 > 22:26:27 --- > > Created an attachment (id=26832) > > --> (https://bugzilla.kernel.org/attachment.cgi?id=26832) > > Bootlog 2.6.35-rc3 with Yinghai's reserve_region_with_split_check_child > patch > > > > I tried booting with this reserve_region_with_split_check_child() patch, my > > machine still doesn't boot. The bootlog & /proc/iomem is attached if that > helps > > at all. It seems like it tries to use the same address as the unpatched > > 2.6.35-rc3? > > > > you got > bfe4dc00-bfefffff : reserved > bff00000-dfffffff : PCI Bus 0000:00 > bff00000-bff007ff : 0000:00:1f.2 > bff00000-bff007ff : ahci > c0000000-cfffffff : PCI Bus 0000:02 > > > should have > bfe4dc00-bfefffff : reserved > bff00000-dfffffff : PCI Bus 0000:00 > bff00000-bfffffff : reserved > c0000000-cfffffff : PCI Bus 0000:02 > > please check this version > > Subject: [PATCH] x86, resource: Add reserve_region_with_split_check_child() > > It will cover the whole region to BUSY, except that some regions that have > children under them. > > those children normally is PCI bar but it is falling into E820_RESERVED. > We can not put BUSY on them, otherwise driver can not use > pci_request_region() > later > > /proc/iomem will have > 00010000-00094fff : System RAM > 00095000-0009ffff : reserved > 000a0000-000bffff : PCI Bus 0000:00 > 000a0000-000bffff : reserved > 000c0000-000cffff : reserved > 000d0000-000dffff : PCI Bus 0000:00 > 000d0000-000dffff : reserved > 000e0000-000fffff : reserved > > -v2: Add function pointer to put string comparing with caller > -v3: expand to use it above 1M resources. > -v4: handle partial overlap case > > Tested-by: Guenter Roeck <guenter.roeck@ericsson.com> > Tested-by: Andy Isaacson <adi@hexapodia.org> > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/kernel/e820.c | 15 ++++++++++++--- > include/linux/ioport.h | 3 +++ > kernel/resource.c | 35 ++++++++++++++++++++++++++++++----- > 3 files changed, 45 insertions(+), 8 deletions(-) > > Index: linux-2.6/arch/x86/kernel/e820.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/e820.c > +++ linux-2.6/arch/x86/kernel/e820.c > @@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void) > * pci device BAR resource and insert them later in > * pcibios_resource_survey() > */ > - if (e820.map[i].type != E820_RESERVED || res->start < > (1ULL<<20)) { > + if (e820.map[i].type != E820_RESERVED) { > res->flags |= IORESOURCE_BUSY; > insert_resource(&iomem_resource, res); I thought this hunk was already upstream? If not, it probably should be, didn't we decide to mark the lowest 1M region as busy to avoid trouble? The rest has been commented on iirc, and looks pretty ugly. checking for the string "PCI Bus" seems like a band-aid over the real problem at best... Rez, can you assist? On 06/18/2010 10:36 AM, Jesse Barnes wrote: > On Thu, 17 Jun 2010 16:36:39 -0700 > Yinghai Lu <yinghai@kernel.org> wrote: > >> On 06/17/2010 03:26 PM, bugzilla-daemon@bugzilla.kernel.org wrote: >>> https://bugzilla.kernel.org/show_bug.cgi?id=16228 >>> >>> >>> >>> >>> >>> --- Comment #13 from Brian Bloniarz <phunge0@hotmail.com> 2010-06-17 >>> 22:26:27 --- >>> Created an attachment (id=26832) >>> --> (https://bugzilla.kernel.org/attachment.cgi?id=26832) >>> Bootlog 2.6.35-rc3 with Yinghai's reserve_region_with_split_check_child >>> patch >>> >>> I tried booting with this reserve_region_with_split_check_child() patch, my >>> machine still doesn't boot. The bootlog & /proc/iomem is attached if that >>> helps >>> at all. It seems like it tries to use the same address as the unpatched >>> 2.6.35-rc3? >>> >> >> you got >> bfe4dc00-bfefffff : reserved >> bff00000-dfffffff : PCI Bus 0000:00 >> bff00000-bff007ff : 0000:00:1f.2 >> bff00000-bff007ff : ahci >> c0000000-cfffffff : PCI Bus 0000:02 >> >> >> should have >> bfe4dc00-bfefffff : reserved >> bff00000-dfffffff : PCI Bus 0000:00 >> bff00000-bfffffff : reserved >> c0000000-cfffffff : PCI Bus 0000:02 >> >> please check this version >> >> Subject: [PATCH] x86, resource: Add reserve_region_with_split_check_child() >> >> It will cover the whole region to BUSY, except that some regions that have >> children under them. >> >> those children normally is PCI bar but it is falling into E820_RESERVED. >> We can not put BUSY on them, otherwise driver can not use >> pci_request_region() >> later >> >> /proc/iomem will have >> 00010000-00094fff : System RAM >> 00095000-0009ffff : reserved >> 000a0000-000bffff : PCI Bus 0000:00 >> 000a0000-000bffff : reserved >> 000c0000-000cffff : reserved >> 000d0000-000dffff : PCI Bus 0000:00 >> 000d0000-000dffff : reserved >> 000e0000-000fffff : reserved >> >> -v2: Add function pointer to put string comparing with caller >> -v3: expand to use it above 1M resources. >> -v4: handle partial overlap case >> >> Tested-by: Guenter Roeck <guenter.roeck@ericsson.com> >> Tested-by: Andy Isaacson <adi@hexapodia.org> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org> >> >> --- >> arch/x86/kernel/e820.c | 15 ++++++++++++--- >> include/linux/ioport.h | 3 +++ >> kernel/resource.c | 35 ++++++++++++++++++++++++++++++----- >> 3 files changed, 45 insertions(+), 8 deletions(-) >> >> Index: linux-2.6/arch/x86/kernel/e820.c >> =================================================================== >> --- linux-2.6.orig/arch/x86/kernel/e820.c >> +++ linux-2.6/arch/x86/kernel/e820.c >> @@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void) >> * pci device BAR resource and insert them later in >> * pcibios_resource_survey() >> */ >> - if (e820.map[i].type != E820_RESERVED || res->start < >> (1ULL<<20)) { >> + if (e820.map[i].type != E820_RESERVED) { >> res->flags |= IORESOURCE_BUSY; >> insert_resource(&iomem_resource, res); > > I thought this hunk was already upstream? If not, it probably should > be, didn't we decide to mark the lowest 1M region as busy to avoid > trouble? yes, that one only handle below 1M problem. and this one handle above 1M problem and below 1M problem. > > The rest has been commented on iirc, and looks pretty ugly. checking > for the string "PCI Bus" seems like a band-aid over the real problem at > best... that seems to be the only way that tell that resource is belong to bridge. Thanks Yinghai > > ... checking
> > for the string "PCI Bus" seems like a band-aid over the real problem at
> > best...
>
> that seems to be the only way that tell that resource is belong to bridge.
Whether it's the only way or not, it's still unacceptably ugly, IMHO.
And the whole reserve_region_with_split_check_child() thing is just
as bad. There's no way we can maintain that.
The resource management framework is based on hierarchical, nested
resources. That makes good sense in most cases, where you have
devices that respond at certain address ranges, and any upstream
bridges have to have windows that completely enclose the device
ranges.
But I don't think it makes quite as much sense for things like the
E820 map. In the E820 map, there's no concept of "one range == one
device." There's not even a concept of devices. So I'm not sure
we can enforce the idea that a device region must have a certain
relationship with an E820 region. As far as I can tell, there are
no requirements about how the BIOS lays out the E820 map. For example,
it may use two E820 entries to describe a single contiguous region.
The assumption that an E820 entry must have a hierarchical relationship
with device ranges has led to things like insert_resource_expand_to_fit()
and the split between e820_reserve_resources() and e820_reserve_resources_late().
Both of those feel like ad hoc workarounds for system configurations
that are probably completely acceptable.
I don't know whether we can fix resources to handle E820 reservations
naturally, or whether we need some kind of separate map (I hope not,
because that sounds like a tangle in itself). I'd like to see what
Windows does with this case ... I vaguely remember that it might keep
overlapping ranges that Linux would complain about.
Hi all, thanks for your help so far. Is there any update on this bug? I will only have access to the machine that exhibits the problem for another 2 weeks if there are patches to be tested. Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com> Sorry for the delay in looking at this. I know it's a high-priority regression, and I certainly haven't forgotten about it. I won't have a proposal before you lose access to the machine, but I hope the Dell folks on the CC: list will be able to reproduce the problem and test fixes. It's possible that a BIOS change would cover up the Linux problem, so would you mind collecting the BIOS version before you lose your machine, Brian? Attaching the output of "dmidecode" should be enough and doesn't require a reboot. Created attachment 27191 [details]
dmidecode output from the machine
Your dmidecode shows that your are using BIOS A01. I see this problem even with the latest Dell T3500 BIOS A07. So there is (as of today) no BIOS fix. FYI: Fedora Bug Report <https://bugzilla.redhat.com/show_bug.cgi?id=620313>. Fedora is currently working on moving from 2.6.33 to 2.6.34.x for Fedora 13. If anybody has Windows on a box that shows this failure, could you please attach an Everest report of the hardware (free trial version at http://www.lavalys.com/)? Both Linux and Windows try to move PCI devices into host bridge windows. Linux allocates window space from the bottom up, so it does this: pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff] pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970000-0xff9707ff] pci 0000:00:1f.2: BAR 5: assigned [mem 0xbff00000-0xbff007ff] That doesn't work, I think because 0xbff00000 is actually RAM. Windows allocates window space from the top down, so it won't use the 0xbff00000 area unless the window is almost completely consumed by other devices. I expected that even in that case, Windows would avoid the 0xbfe4dc00-0xc0000000 area because it's marked "reserved" in the E820 map. However, I've been experimenting with Windows 2008 R2 on qemu, and it does *not* seem to avoid E820 "reserved" areas when moving PCI devices. If Windows doesn't avoid E820 reserved areas, there must be another reason, such as a PNP device, that would keep it from putting a PCI device at 0xbff00000. Otherwise, if we put a couple 256M graphics cards in the box to consume the host bridge window, Windows would have the same problem we're seeing with Linux. Created attachment 27398 [details] print PNP resources Since nobody seems to have Windows on one of these boxes, could someone please add this patch and post the resulting dmesg log? It's fine to use "pci=nocrs" so the machine boots. And please collect the log with the "dmesg" command or use the "ignore_loglevel" option so the PCI resource information, e.g., "pci 0000:00:1f.2: reg 24: [mem 0xff970000-0xff9707ff]", is included. Also, please collect the DSDT so I can make sure we account for all the resources used by ACPI devices. Instructions are here: http://kernel.org/pub/linux/kernel/people/helgaas/debug Created attachment 27403 [details]
dmesg, acpidump & DSDT after applying PNP patch
Archive with the 3 requested files.
NOTE: the acpixtract from pmtools-20071106 behaves differently from the instructions you provided on the web page. I had to use
acpixtract -sDSDT acpidump.asc
and the resulting file was named DSDT.dat.
I can't provide any debug info from Windows, because none of my machines has ever been installed with Windows :-)
Created attachment 27414 [details]
Everest report of Windows resources
Everest report from a T3500 with A07 BIOS.
I don't see any devices in the BFF00000-BFFFFFFF range (except the
PNP0A03 host bridge window itself), so I don't think anything prevents
Windows from placing a PCI device there. If that happened, I think
that device probably would not work.
But it would be hard to force that to happen, because Windows allocates
space from the top down, and there's a lot of available space above
BFF00000 that would be used first. Here's my accounting of used and
available space under Windows:
BFF00000-F7FFFFFF PNP0A03:00 (host bridge to bus 00)
BFF00000-BFFFFFFF available (0x100000 = 1M)
C0000000-DFFFFFFF 00:03.0 Root Port 3 (bridge to bus 02, 512M window)
C0000000-DFFFFFFF 02:00.0 NVidia FX 580 (512M)
E0000000-F3CFFFFF available (0x13cfffff = 316M)
F3D00000-F3DFFFFF 00:1c.5 Root Port 6 (bridge to bus 05, 1M window)
F3D00000-F3DDFFFF available (0xe0000 = 896K on bus 05)
F3DE0000-F3DEFFFF 05:00.0 Broadcom 57xx NIC (64K)
F3DF0000-F3DFFFFF 05:00.0 Broadcom 57xx NIC (64K)
F3E00000-F3EFFFFF 00:01.0 Root Port 1 (bridge to bus 01, 1M window))
F3E00000-F3EFFFFF available (1M on bus 01)
F3F00000-F3FFFFFF 00:07.0 Root Port 7 (bridge to bus 03, 1M window))
F3F00000-F3FFFFFF available (1M on bus 03)
F4000000-F7EFFFFF 00:03.0 Root Port 3 (bridge to bus 02, 63M window))
F4000000-F5FFFFFF 02:00.0 NVidia FX 580 (32M)
F6000000-F6FFFFFF 02:00.0 NVidia FX 580 (16M)
F7000000-F7EFFFFF available (0xf00000 = 15M on bus 02)
F7F00000-F7FF9FFF available (0xfa000 = 1000K on bus 00)
F7FFA000-F7FFA3FF 00:1a.0 USB Enhanced Host Controller (0x400 = 1K)
F7FFA400-F7FFAFFF available (0xc00 = 3072 bytes on bus 00)
F7FFB000-F7FFB0FF 0:1f.3 SMBus Controller (0x100 = 256 bytes)
F7FFB100-F7FFBFFF available (0xf00 = 3840 bytes on bus 00)
F7FFC000-F7FFFFFF 00:1b.0 Audio Controller (0x4000 = 16K)
FED20000-FED9FFFF PNP0A03:00 (host bridge window to bus 00, 512K)
FED20000-FED9FFFF available (0x80000 = 512K)
FF97C000-FF97FFFF PNP0A03:00 (host bridge window to bus 00, 16K)
FF97C000-FF97FFFF available (0x4000 = 16K)
FF980000-FF980FFF PNP0A03:00 (host bridge window to bus 00, 4K)
FF980000-FF9803FF 00:1d.7 USB Enhanced Host Controller (0x400 = 1K)
FF980400-FF9807FF available (0x400 == 1K)
FF980800-FF980FFF 00:1f.2 SATA RAID Controller (0x800 = 2K)
Any progress on this matter? Do you have a patch that I could test with 2.6.34 or 2.6.35? FYI: Fedora has pushed a 2.6.34 package to F13 stable updates, i.e. normal users will now run into this problem. Created attachment 28952 [details]
allocate from top down within a region
Thanks for the Fedora heads up. I already got a new report, and I'm
afraid we'll get a lot more.
I think the fix for this is to do what Windows does, namely:
1) Allocate new space starting with the top window, going
to the bottom one, and
2) Within each window, allocate from the top to the bottom.
This patch only implements 2), but I think it's probably enough to
work around the immediate problem.
Created attachment 28992 [details]
allocate from top down within a region (v2)
Oops, a kind soul tested the previous patch and reported that it failed.
Here's another try.
Here's the Fedora 13 report of the same problem: https://bugzilla.redhat.com/show_bug.cgi?id=620313 Created attachment 29072 [details]
dmesg log from 2.6.34 with patch V2 applied
With patch V2 applied 2.6.34 boots up fine on my T3500 (dmesg log attached).
Stefan, your machine isn't using CRS, possibly because it's older than 2008. We do open the 1c.0 bridge windows to bus 04, and it looks like we're doing the right thing there, but there's no device on bus 04, so it really doesn't make a difference either way. I would think this machine would boot either with or without "pci=nocrs", even without my patch. I would guess that if you boot with "pci=use_crs", your machine will fail to boot vanilla 2.6.34. And hopefully, if you add my patch, it *will* boot. If so, I'd like to see the dmesg from the boot with my patch. I have another report that the patch isn't sufficient, so if it fails, it would help to see a serial console log or video to see where it goes wrong. Thanks! (In reply to comment #34) > Stefan, your machine isn't using CRS, possibly because it's older than > 2008. We do open the 1c.0 bridge windows to bus 04, and it looks like > we're doing the right thing there, but there's no device on bus 04, so > it really doesn't make a difference either way. I would think this > machine would boot either with or without "pci=nocrs", even without > my patch. I can assure you that the F13 2.6.34 kernel does not boot without "pci=nocrs" on the T3500. That's why I opened Fedora bug #620313. With patch V2 applied to the F13 kernel RPM the machine boots without pci=nocrs, i.e. it works out of the box. The machine was bought last year (2009) and I have installed the latest Dell BIOS (A07). So I'm not sure how your 2008 rule would apply. > I would guess that if you boot with "pci=use_crs", your machine will > fail to boot vanilla 2.6.34. And hopefully, if you add my patch, it > *will* boot. If so, I'd like to see the dmesg from the boot with my > patch. I have another report that the patch isn't sufficient, so if I'll try to boot with "pci=use_crs" with the plain Fedora package and the one with the patch. (In reply to comment #35) > With patch V2 applied to the F13 kernel RPM the machine boots without > pci=nocrs, i.e. it works out of the box. *blush* I picked up the next-in-line F13 2.6.34 RPM that has "pci=nocrs" by default :-/ No wonder that it boots and doesn't show _CRS... I'll try with "pci=use_crs" ASAP. Confirmed: patch V2 does not fix the issue for me. I'll try the latest debug patch attached to the Fedora bz ASAP. Created attachment 29492 [details]
dmesg log from 2.6.34 with patch V3 applied and pci=use_crs
As the Fedora bug has been closed with pci=nocrs as workaround, let's continue the discussion here.
Patch V3 fixes the issue on my T3500:
Kernel command line: ro root=/dev/mapper/VolGroup00-LogVol00 rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=fi intel_iommu=off pci=use_crs
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
pci 0000:00:1f.2: reg 10: [io 0xfe00-0xfe07]
pci 0000:00:1f.2: reg 14: [io 0xfe10-0xfe13]
pci 0000:00:1f.2: reg 18: [io 0xfe20-0xfe27]
pci 0000:00:1f.2: reg 1c: [io 0xfe30-0xfe33]
pci 0000:00:1f.2: reg 20: [io 0xfec0-0xfedf]
pci 0000:00:1f.2: reg 24: [mem 0xff970000-0xff9707ff]
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970000-0xff9707ff]
pci 0000:00:1f.2: BAR 5: assigned [mem 0xff980800-0xff980fff]
pci 0000:00:1f.2: BAR 5: set to [mem 0xff980800-0xff980fff] (PCI address [0xff980800-0xff980fff]
ahci 0000:00:1f.2: version 3.0
ahci 0000:00:1f.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20
ahci 0000:00:1f.2: irq 53 for MSI/MSI-X
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf led clo pio ccc ems sxs
ahci 0000:00:1f.2: setting latency timer to 64
Thanks a lot for testing this, Stefan! Based on the Everest report in comment 28, it looks like we're now doing the same thing as Windows, and in fact, we chose the identical address for the 00:1f.2 SATA device. I'll polish up the V3 patches and post them soon. Created attachment 30172 [details] alloc top-down (v3) This is the patch Stefan tested in comment 38. (This was originally attached to https://bugzilla.redhat.com/show_bug.cgi?id=620313, but I'm also attaching it here for completeness.) Created attachment 30182 [details]
align to end of free space
This is a SeaBIOS patch I used with Windows 7 on qemu to verify that
given an area of available space, Windows uses the *end*, not the
beginning.
Created attachment 30192 [details]
use highest available gap first
This is a SeaBIOS patch I used with Windows 7 on qemu to verify that
given a window with several areas available, e.g., several gaps between
children of the resource, Windows uses the highest area first.
Created attachment 30202 [details]
use highest bridge windows first
This is a SeaBIOS patch I used with Windows 7 on qemu to verify that
given a PCI host bridge with several windows, Windows fills up the
highest windows first.
Stefan, could you please attach the /proc/iomem contents corresponding to your test in comment 38? I'd like to understand what we're doing with this E820 reserved entry: BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved) Created attachment 30242 [details]
handling of e820 reserved entries
This SeaBIOS patch shows two interesting things:
1) Windows 7 ignores E820 reserved entries when it moves PCI devices.
2) When an E820 reserved entry conflicts with a host bridge window,
Linux *expands* the E820 entry to contain the entire window. Then
we insert it, so it becomes the parent of the window, which means
we effectively ignore the reservation.
On these Dell machines, the E820 map contains a reserved entry that
conflicts with a host bridge window:
BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xf7ffffff]
As a result, I think we'll see this in Stefan's /proc/iomem:
bfe4dc00-f7ffffff reserved (expanded)
bff00000-f7ffffff PCI Bus 0000:00
When we move the 1f.2 BAR into the window, the area at 0xbff00000 looks
like it's available because we allocate from the window. The e820 reserved
area is now the parent of the window, so it has no effect on the allocation.
Created attachment 30662 [details]
dmesg log from 2.6.34 with patch V4 applied and pci=use_crs
Results with latest patch sent to LKML.
Created attachment 30672 [details]
/proc/iomem from 2.6.34 with patch V4 applied and pci=use_crs
As with patch V3 the system boots fine with pci=use_crs.
On Monday, September 20, 2010, Bjorn Helgaas wrote:
> On Monday, September 20, 2010 01:57:17 pm Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.34 and 2.6.35.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.34 and 2.6.35. Please verify if it still should
> > be listed and let the tracking team know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16228
> > Subject : BUG/boot failure on Dell Precision T3500
> (pci/ahci_stop_engine)
> > Submitter : Brian Bloniarz <brian.bloniarz@gmail.com>
> > Date : 2010-06-16 17:57 (97 days old)
> > Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com>
>
> The four patches here: https://patchwork.kernel.org/patch/189182/
> should resolve this problem. They've been tested by Stefan Becker.
Patch : https://patchwork.kernel.org/patch/189182/ Patch : https://patchwork.kernel.org/patch/189232/ Patch : https://patchwork.kernel.org/patch/189242/ Patch : https://patchwork.kernel.org/patch/189252/ FYI: I just updated my T3500 to the latest BIOS A08 from Dell. It still doesn't boot 2.6.34 without the proposed patches. So still no BIOS fix... Thanks for the update. I think it's critical that we make Linux tolerate this issue. Even if there were a BIOS fix, we shouldn't require users to update their machines. same issue here, Dell Precision T3500 BIOS A07 (will upgrade to A08 as soon as I post) after upgrading to 10.10 Maverick. I'll have to recall how to apply patches and compile the kernel, last time I've done it was years ago RHEL 5 or 6, so spoiled now by Ubuntu but found this yet in another thread: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/647043/comments/14 which has a link to a patched test kernel, with the same list of patches: > I had Lisa do a series of tests. We subsequently discovered a series of > patches that have been submitted > upstream but are a bit invasive (ie. they won't land in upstream 2.6.36, and > will likely have to wait for 2.6.37). > > http://marc.info/?l=linux-kernel&m=128476278029918&w=2 > Patch : https://patchwork.kernel.org/patch/189182/ > Patch : https://patchwork.kernel.org/patch/189232/ > Patch : https://patchwork.kernel.org/patch/189242/ > Patch : https://patchwork.kernel.org/patch/189252/ > > I've subsequently built a test kernel with the above patches applied. Lisa > has confirmed this fixes the issue > and the USB ports work with this test kernel. The test kernel can be found at > the following: > > http://people.canonical.com/~ogasawara/lp647043/i386/ > > I'll have to discuss with the Ubuntu Kernel SRU team if they will qualify for > a Stable Release Update to Maverick. -avi just FYI, did the A08 BIOS upgrade, and as mentioned earlier, the A08 BIOS update did not resolve the problem. pci=nocrs does work with the 10.10 Ubuntu kernel and A08 BIOS found a report that A08 does not work with the older 10.04 Kernel, but I did not try the older kernel with A08. did not yet try the test patched kernel, since I can boot with pci=nocrs and I need to get some real work done... -avi Do these need to go into stable? commit 1af3c2e45e7a641e774bbb84fa428f2f0bf2d9c9 Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Tue Oct 26 15:41:54 2010 -0600 x86: allocate space within a region top-down commit dc9887dc02e37bcf83f4e792aa14b07782ef54cf Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Tue Oct 26 15:41:44 2010 -0600 x86/PCI: allocate space from the end of a region, not the beginning commit b126b4703afa4010b161784a43650337676dd03b Author: Bjorn Helgaas <bjorn.helgaas@hp.com> Date: Tue Oct 26 15:41:39 2010 -0600 PCI: allocate bus resources from the top down These fixes appeared in 2.6.37-rc1. Confirmation that they still work would be good, then after we get more runtime, we might consider putting the whole lot into stable (the patches mentioned in comment 54, plus several related ones). I tested patch V5 on top of kernel-2.6.34.7-59.fc13.x86_64. The T3500 booted with pci=use_crs. @Bjorn: tested 2.6.37-rc1 from yesterday's Linus git repo and it works fine on my T3500 (2.6.34 to 36 had the above problems). So good fix. I'd just like to say that I hope this fix can be backported into stable. There are a fairly large number of people unable to boot a new kernel after upgrading to Ubuntu 10.10, and the problems seem related: http://ubuntuforums.org/showthread.php?t=1595809 Hi, I'm one of those people - I've been running 2.6.32.26 on a 64-bit Maverick Installation on my Dell T3500, since I cannot boot with 2.6.35 (and have been unable to do so since upgrading to 10.10 in October). Any chance that this will be backported before Natty is released? Thanks. The most likely change for -stable would be to make "pci=nocrs" the default, as discussed here: https://bugzilla.kernel.org/show_bug.cgi?id=22132#c8 That could easily be done in the Ubuntu distro kernel as well, and probably doesn't depend on the change being in -stable itself. I'm not involved in Ubuntu kernel maintenance, so I don't know what it would take to make that happen. In the meantime, using "pci=nocrs" manually should be a workaround that would allow you to boot newer kernels. Created attachment 40472 [details]
avoid e820 regions
We've gone around and around on this, and the current proposal is to
apply the attached patch to 2.6.37-rc6.
If anybody could validate that this fixes the bug, that would be great.
@Bjorn: tested current git both with and without your patch and both appear to work on Dell T3500 Thanks, Bernhard! Current git (e.g., a3383e8372c) without my patch works because it allocates top-down and avoids the 0xbfe4dc00-0xc0000000 reserved region that way. With my patch, it allocates bottom-up and avoids that region because it's mentioned in the E820 table. |