Bug 15744

Summary: [2.6.34-rc1 REGRESSION] ahci 0000:00:1f.2: controller reset failed (0xffffffff)
Product: Drivers Reporter: Maciej Rutecki (maciej.rutecki)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: CLOSED CODE_FIX    
Severity: normal CC: adi, bjorn.helgaas, chemobejk, GooseYArd, jbarnes, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 15310    
Attachments: dmesg with Yinghai's patch
dmesg.reserve-vga-e820-v2
dmesg with request_resource_conflict patch
dmesg with debugging patch from https://bugzilla.kernel.org/show_bug.cgi?id=15533#c5
acpidump output from t3400 bios a04
dmesg with debug, this time with feeling (and correct spelling)
dmesg with patches suggested in Message-ID: <4BC51312.6080302@oracle.com>
/proc/iomem from 2.6.34-rc3-00412-g0abe3d1
dmesg with yinghai patch v4
framework for testing patch on non-Dell system
test patch to avoid PCI allocation below 1MB
dmesg-avoid-pci-below-1MB
with bjorn's patch, ahci enabled, acpi debug stuff off, _crs on
Log (1/2): Dell T3500 BIOS A07, ACPI debug on, AHCI failed
Log (2/2): Dell T3500 BIOS A07, ACPI debug on, pci=nocrs, AHCI works

Description Maciej Rutecki 2010-04-09 19:08:20 UTC
Subject    : [2.6.34-rc1 REGRESSION] ahci 0000:00:1f.2: controller reset failed (0xffffffff)
Submitter  : Andy Isaacson <adi@hexapodia.org>
Date       : 2010-04-06 22:54
Message-ID : 20100406225425.GA8292@hexapodia.org
References : http://marc.info/?l=linux-kernel&m=127059449031511&w=2

This entry is being used for tracking a regression from 2.6.33.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 adi 2010-04-09 22:31:37 UTC
Created attachment 25933 [details]
dmesg with Yinghai's patch

Bootlog with patch from
Message-ID: <4BBF626A.3060800@oracle.com>
Comment 2 adi 2010-04-10 00:01:53 UTC
Created attachment 25934 [details]
dmesg.reserve-vga-e820-v2

dmesg with -v2 patch from <4BBFB1D8.6090802@oracle.com>
Comment 3 adi 2010-04-10 02:06:03 UTC
Created attachment 25935 [details]
dmesg with request_resource_conflict patch

http://lkml.org/lkml/2010/4/9/428

On Fri, Apr 09, 2010 at 06:10:49PM -0700, Yinghai wrote:
> in addition to -v2 patch
> 
> please apply this patch too
> 
> also please boot with "debug" in boot command line.
> 
> Thanks
> 
> Yinghai
> 
> [PATCH] x86,acpi: use request_resource instead of instead of insert_resource
> 
> So make pci root resouce from _CRS honor the one We reserve in e820 below 1M
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/pci/acpi.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/arch/x86/pci/acpi.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/acpi.c
> +++ linux-2.6/arch/x86/pci/acpi.c
> @@ -168,7 +168,7 @@ setup_resource(struct acpi_resource *acp
>               return AE_OK;
>       }
> 
> -     conflict = insert_resource_conflict(root, res);
> +     conflict = request_resource_conflict(root, res);
>       if (conflict) {
>               dev_err(&info->bridge->dev,
>                       "address space collision: host bridge window %pR "

That works.  dmesg.gz attached (I'm pretty sure I'm skating close to
vger's size limit without gz).  "debug" didn't seem to make much
difference?
-andy
Comment 4 adi 2010-04-12 19:26:10 UTC
Created attachment 25969 [details]
dmesg with debugging patch from https://bugzilla.kernel.org/show_bug.cgi?id=15533#c5
Comment 5 adi 2010-04-12 19:31:40 UTC
Created attachment 25970 [details]
acpidump output from t3400 bios a04
Comment 6 adi 2010-04-12 19:39:38 UTC
Created attachment 25971 [details]
dmesg with debug, this time with feeling (and correct spelling)

s/debug_levell/debug_level/
Comment 7 adi 2010-04-14 19:27:23 UTC
Created attachment 26002 [details]
dmesg with patches suggested in Message-ID: <4BC51312.6080302@oracle.com>

My git log -p just to be clear:

commit 0abe3d1ae21d8dd2afc5a3fa7f1527f0592cb0a1
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Wed Apr 14 11:52:02 2010 -0700

    x86, resource: Add reserve_region_with_split_check_child()
    
    It will cover the whole region to BUSY, except that some regions that have
    children under them.
    
    those children normally is PCI bar but it is falling into E820_RESERVED.
    We can not put BUSY on them, otherwise driver can not use pci_request_region()
    later
    
    /proc/iomem will have
    00010000-00094fff : System RAM
    00095000-0009ffff : reserved
    000a0000-000bffff : PCI Bus 0000:00
      000a0000-000bffff : reserved
    000c0000-000cffff : reserved
    000d0000-000dffff : PCI Bus 0000:00
      000d0000-000dffff : reserved
    000e0000-000fffff : reserved
    
    Signed-off-by: Yinghai Lu <yinghai@kernel.org>

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 7bca3c6..47fe371 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1094,7 +1094,7 @@ void __init e820_reserve_resources(void)
                 * pci device BAR resource and insert them later in
                 * pcibios_resource_survey()
                 */
-               if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
+               if (e820.map[i].type != E820_RESERVED) {
                        res->flags |= IORESOURCE_BUSY;
                        insert_resource(&iomem_resource, res);
                }
@@ -1135,8 +1135,12 @@ void __init e820_reserve_resources_late(void)
 
        res = e820_res;
        for (i = 0; i < e820.nr_map; i++) {
-               if (!res->parent && res->end)
-                       insert_resource_expand_to_fit(&iomem_resource, res);
+               if (!res->parent && res->end) {
+                       if (res->start < (1ULL<<20)) {
+                               reserve_region_with_split_check_child(&iomem_resource, res->start, res->end, res->name);
+                       } else
+                               insert_resource_expand_to_fit(&iomem_resource, res);
+               }
                res++;
        }
 
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 26fad18..2acd9c4 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -119,6 +119,9 @@ void release_child_resources(struct resource *new);
 extern void reserve_region_with_split(struct resource *root,
                             resource_size_t start, resource_size_t end,
                             const char *name);
+void reserve_region_with_split_check_child(struct resource *root,
+                            resource_size_t start, resource_size_t end,
+                            const char *name);
 extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new);
 extern int insert_resource(struct resource *parent, struct resource *new);
 extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new);
diff --git a/kernel/resource.c b/kernel/resource.c
index 9c358e2..2b61f74 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -608,7 +608,7 @@ int adjust_resource(struct resource *res, resource_size_t start, resource_size_t
 
 static void __init __reserve_region_with_split(struct resource *root,
                resource_size_t start, resource_size_t end,
-               const char *name)
+               const char *name, bool check_child)
 {
        struct resource *parent = root;
        struct resource *conflict;
@@ -630,13 +630,18 @@ static void __init __reserve_region_with_split(struct resource *root,
        kfree(res);
 
        /* conflict covered whole area */
-       if (conflict->start <= start && conflict->end >= end)
+       if (conflict->start <= start && conflict->end >= end) {
+               if (check_child && !conflict->child && strstr(conflict->name, "PCI Bus"))
+                       __reserve_region_with_split(conflict, start, end, name, false);
                return;
+       }
 
        if (conflict->start > start)
-               __reserve_region_with_split(root, start, conflict->start-1, name);
+               __reserve_region_with_split(root, start, conflict->start-1, name, check_child);
        if (conflict->end < end)
-               __reserve_region_with_split(root, conflict->end+1, end, name);
+               __reserve_region_with_split(root, conflict->end+1, end, name, check_child);
+       if (check_child && !conflict->child && strstr(conflict->name, "PCI Bus"))
+               __reserve_region_with_split(conflict, conflict->start, conflict->end, name, false);
 }
 
 void __init reserve_region_with_split(struct resource *root,
@@ -644,7 +649,16 @@ void __init reserve_region_with_split(struct resource *root,
                const char *name)
 {
        write_lock(&resource_lock);
-       __reserve_region_with_split(root, start, end, name);
+       __reserve_region_with_split(root, start, end, name, false);
+       write_unlock(&resource_lock);
+}
+
+void __init reserve_region_with_split_check_child(struct resource *root,
+               resource_size_t start, resource_size_t end,
+               const char *name)
+{
+       write_lock(&resource_lock);
+       __reserve_region_with_split(root, start, end, name, true);
        write_unlock(&resource_lock);
 }
 

commit b6d8921288154127928d25a6eca4a0d3fb9ae0e4
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Wed Apr 14 11:52:02 2010 -0700

    Update e820 at first, and later put them resource tree.
    
    Reserved that early, will not be allocated to unassigned PCI BAR
    
    Signed-off-by: Yinghai Lu <yinghai@kernel.org>
    Cc: Guenter Roeck <guenter.roeck@ericsson.com>
    Cc: Andy Isaacson <adi@hexapodia.org>
    Tested-by: Andy Isaacson <adi@hexapodia.org>
    Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
    Acked-by: H. Peter Anvin <hpa@zytor.com>

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 86b1506..f4c0fe4 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -44,7 +44,6 @@ static inline void visws_early_detect(void) { }
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
-extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_MRST
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index b2e2460..966b37f 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -22,7 +22,6 @@ static void __init i386_default_early_setup(void)
 {
        /* Initilize 32bit specific setup functions */
        x86_init.resources.probe_roms = probe_roms;
-       x86_init.resources.reserve_resources = i386_reserve_resources;
        x86_init.mpparse.setup_ioapic_ids = setup_ioapic_ids_from_mpc;
 
        reserve_ebda_region();
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4851ef..1aa1ebe 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -693,7 +693,7 @@ static void __init trim_bios_range(void)
         * area (640->1Mb) as ram even though it is not.
         * take them out.
         */
-       e820_remove_range(BIOS_BEGIN, BIOS_END - BIOS_BEGIN, E820_RAM, 1);
+       e820_add_region(BIOS_BEGIN, BIOS_END - BIOS_BEGIN, E820_RESERVED);
        sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
 }
 
@@ -1052,20 +1052,3 @@ void __init setup_arch(char **cmdline_p)
 
        mcheck_init();
 }
-
-#ifdef CONFIG_X86_32
-
-static struct resource video_ram_resource = {
-       .name   = "Video RAM area",
-       .start  = 0xa0000,
-       .end    = 0xbffff,
-       .flags  = IORESOURCE_BUSY | IORESOURCE_MEM
-};
-
-void __init i386_reserve_resources(void)
-{
-       request_resource(&iomem_resource, &video_ram_resource);
-       reserve_standard_io_resources();
-}
-
-#endif /* CONFIG_X86_32 */

commit d620a7cf05d4f12f5bbb1060d766e8139ab31458
Merge: 2aedd19 45c4d01
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Apr 8 10:02:02 2010 -0700

    Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
    
    * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
      libata: Fix accesses at LBA28 boundary (old bug, but nasty) (v2)
Comment 8 adi 2010-04-14 19:42:15 UTC
Created attachment 26003 [details]
/proc/iomem from 2.6.34-rc3-00412-g0abe3d1
Comment 9 adi 2010-04-21 19:30:49 UTC
Created attachment 26084 [details]
dmesg with yinghai patch v4

dmesg with patch series from http://lkml.org/lkml/2010/4/21/22
Comment 10 Bjorn Helgaas 2010-04-26 17:34:14 UTC
Created attachment 26148 [details]
framework for testing patch on non-Dell system

The problem is that BIOS placed the AHCI device somewhere outside all the host bridge windows it reported via _CRS.  This is really a BIOS bug, but there's also a bug in the way Linux deals with this situation.

Linux moved the AHCI device into a host bridge window, but the one we chose was at 0xa0000 (see http://marc.info/?l=linux-kernel&m=127059449031511&w=2).  This was not a good choice because there's already a VGA controller there.

I tested this scenario on Windows using QEMU and SeaBIOS, and Windows also moved the PCI device into one of the host bridge windows.  It uses a different allocation algorithm that appears to allocate from high addresses first.  In any event, it did not move the PCI device to 0xa0000.

This patch is a kludge for testing possible Linux fixes on an HP DL360.  It fabricates host bridge windows in the correct order and pretends a PCI device was programmed to be outside them.  With this patch, we see the following:

  pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
  pci_root PNP0A03:00: host bridge window [mem 0xf5d00000-0xf6ffffff]
  pci_root PNP0A03:01: host bridge window [mem 0xf7e00000-0xf7efffff]
  pci_root PNP0A03:02: host bridge window [mem 0xf7f00000-0xf7ffffff]

  pci 0000:00:05.2: reg 14: BIOS left at 0xf5f60000, moving to 0xf5c00000
  pci 0000:00:05.2: reg 14: [mem 0xf5c00000-0xf5c007ff]
  pci 0000:00:05.2: no compatible bridge window for [mem 0xf5c00000-0xf5c007ff]
  pci 0000:00:05.2: BAR 1: assigned [mem 0x000a0000-0x000a07ff]

We pretended the 00:05.2 device was at 0xf5c00000, which is outside all of the windows.  Linux moved it to the first available space, which is at 0xa0000.
Comment 11 Bjorn Helgaas 2010-04-26 17:45:00 UTC
Created attachment 26149 [details]
test patch to avoid PCI allocation below 1MB

With this patch, I see the following on the DL360:

  pci 0000:00:05.2: reg 14: BIOS left at 0xf5f60000, moving to 0xf5c00000
  pci 0000:00:05.2: reg 14: [mem 0xf5c00000-0xf5c007ff]
  pci 0000:00:05.2: no compatible bridge window for [mem 0xf5c00000-0xf5c007ff]
  pci 0000:00:05.2: BAR 1: assigned [mem 0xf5d34000-0xf5d347ff]

This time, we moved the PCI BAR into the [mem 0xf5d00000-0xf6ffffff] host bridge window, not the legacy VGA window at [mem 0x000a0000-0x000bffff].  I think this is correct behavior.

Any other testing reports would be welcome; please apply just this patch and attach the resulting dmesg log to this bugzilla.
Comment 12 Bjorn Helgaas 2010-04-26 17:48:39 UTC
*** Bug 15841 has been marked as a duplicate of this bug. ***
Comment 13 Andy Bailey 2010-04-26 18:01:17 UTC
Good new Bjorn, all the testing I did with this and the patches from Yinghai were _after_ I updated the Bios on my machine, so it looks like it was your patch that did the trick. I neglected to build my current kernel with acpi debugging enabled, but I should be able to get you an updated boot log some time today or tomorrow.
Comment 14 adi 2010-04-26 18:11:13 UTC
Created attachment 26150 [details]
dmesg-avoid-pci-below-1MB

(In reply to comment #11)
> Created an attachment (id=26149) [details]
> test patch to avoid PCI allocation below 1MB

With this patch on top of c81eddb0e3728661d1585fbc564449c94165cc36 my T3400 boots fine.  dmesg attached.

-andy
Comment 15 Andy Bailey 2010-04-27 13:51:08 UTC
Created attachment 26160 [details]
with bjorn's patch, ahci enabled, acpi debug stuff off, _crs on
Comment 16 Bjorn Helgaas 2010-04-29 19:03:50 UTC
This should be fixed by:

  commit 55051feb57eba600b366006757304a0af3ada2bd
  Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
  Date:   Fri Apr 23 17:05:24 2010 -0600

    x86/PCI: never allocate PCI MMIO resources below BIOS_END

This is in Linus' tree and should appear in 2.6.34-rc6 and 2.6.34.
Comment 17 adi 2010-05-01 05:34:53 UTC
(In reply to comment #16)
> This should be fixed by:
> 
>   commit 55051feb57eba600b366006757304a0af3ada2bd
>   Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
>   Date:   Fri Apr 23 17:05:24 2010 -0600
> 
>     x86/PCI: never allocate PCI MMIO resources below BIOS_END
> 
> This is in Linus' tree and should appear in 2.6.34-rc6 and 2.6.34.

Yes, I've verified that be1066bbcd booted with no problems on the T3400.  This report can be closed.  (Er, I thought I could do that but apparently not -- there's no option to change the bug state.)

-andy
Comment 18 Stefan Becker 2010-08-03 07:51:38 UTC
Are you sure the fix made it to 2.6.34? I just tried the Fedora 13 test package for 2.6.34.1 on a Dell Precision T3500 with the latest A07 BIOS and ran into the exact same problem. The machine only boots up with "pci=nocrs".
Comment 19 adi 2010-08-03 08:07:41 UTC
(In reply to comment #18)
> Are you sure the fix made it to 2.6.34? I just tried the Fedora 13 test
> package
> for 2.6.34.1 on a Dell Precision T3500 with the latest A07 BIOS and ran into
> the exact same problem. The machine only boots up with "pci=nocrs".

The patch did make it in for 2.6.34, and it does work for me.  Can you boot with acpi.debug_level=0x10000 acpi.debug_layer=0x100 and capture the entire dmesg, and append it here?
Comment 20 Stefan Becker 2010-08-03 08:40:55 UTC
Created attachment 27329 [details]
Log (1/2): Dell T3500 BIOS A07, ACPI debug on, AHCI failed
Comment 21 Stefan Becker 2010-08-03 08:42:37 UTC
Created attachment 27330 [details]
Log (2/2): Dell T3500 BIOS A07, ACPI debug on, pci=nocrs, AHCI works

dmesg logs with ACPI debug on, as requested
Comment 22 Bjorn Helgaas 2010-08-03 15:01:36 UTC
Stefan, you are actually seeing a related but slightly different bug.
See bug 16228 comment 4 and 8.  I added you to the CC: list of that bug.
I hope to work on that this week, and I'll need some testers for it.

In both bugs, BIOS left the AHCI controller outside the host bridge
windows it reported via ACPI, and in both cases, Linux moved the
controller to try to fix that.  In this bug (15744), we moved it to
0xa0000, which overlapped the legacy VGA range, so commit 55051feb5
fixed it by making sure we never allocate new PCI resources below
BIOS_END (0x100000).

In bug 16228, we moved the ACHI controller to 0xbff00000, which is
inside an E820 map "reserved" region, and we somehow need to make sure
we don't allocate resources from those regions either.

It may be that if we figure out how to correctly handle these reserved
regions, i.e., "reserved" stuff from E820, legacy VGA, etc., we can
come up with a better fix that will cover both bugs at once.