Bug 17011

Summary: Toshiba Laptop won't boot if I don't provide pci=nocrs on command line
Product: Platform Specific/Hardware Reporter: Anisse Astier (anisse)
Component: i386Assignee: Bjorn Helgaas (bjorn.helgaas)
Status: CLOSED CODE_FIX    
Severity: normal CC: anisse, bjorn.helgaas, florian, jbarnes, pmd.lotr.gandalf, rjw, sreenivasa-reddy.berahalli
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: DSDT, lspci, and a patch to fix the problem
Dmesg with pci=nocrs
Dmesg on 2.6.34 with pci=nocrs
coalesce PCI host bridge windows
Dmesg log with given patch on 2.6.35
Dmesg with x86-pci-coalesce-overlapping-host-bridge-windows.patch

Description Anisse Astier 2010-08-25 16:46:01 UTC
Laptop model: Toshiba L670D-10P. Please find lspci -nnvvv and DSDT in attachment.


I didn't bisect anything, but my guess it that the offending commit is:

commit 7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date:   Tue Feb 23 10:24:41 2010 -0700

    x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
    
    The main benefit of using ACPI host bridge window information is that
    we can do better resource allocation in systems with multiple host bridges,
    e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183
    
    Sometimes we need _CRS information even if we only have one host bridge,
    e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681
    
    Most of these systems are relatively new, so this patch turns on
    "pci=use_crs" only on machines with a BIOS date of 2008 or newer.
    
    Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


So I guess the assumption is wrong, and this 2010 laptop still doesn't report CRS information correctly. I don't know if we should revert this commit or add a DMI quirk as I propose a patch in attachment (this patch needs more explanation-verbosity though).
Comment 1 Anisse Astier 2010-08-25 16:48:04 UTC
Created attachment 27951 [details]
DSDT, lspci, and a patch to fix the problem

Forgot the attachment. Here it is.
Comment 2 Bjorn Helgaas 2010-08-25 17:11:46 UTC
Thanks for the report!  Can you please attach a complete dmesg log when
using "pci=use_crs" (use attachment type text/plain)?

I don't know what the problem is yet, but I want to figure it out and
fix it rather than add a quirk to just avoid it.  Windows works on this
box, so I think we'll be better off if we can fix a Linux bug or at
least make it tolerate the same things Windows tolerates.
Comment 3 Anisse Astier 2010-08-26 09:16:17 UTC
I forgot to add that I could boot it correctly with 2.6.33, but not .34 and .35 (that's why I marked it as a regression, although this machine didn't exist when .33 was released)

I wish I could provide a dmesg log, but I already had troubles debugging this problem, because I couldn't get one. The netconsole starts well after; I used printk boot_delay to be able to read everything, and saw your message advising to use pci=nocrs.

I could try to make a video, but I'd need to have the proper equipment to have a clean recording. Let me figure it out.

Or do you have any better idea ? I don't have access to a serial port on this laptop.
Comment 4 Anisse Astier 2010-08-26 12:18:28 UTC
Here is what I could get:
http://dl.free.fr/coZV9s47p
Quality is very low, but we can see at least one ACPI error.
Comment 5 Bjorn Helgaas 2010-08-26 15:07:52 UTC
Oh, I'm sorry!  I meant a dmesg log from a boot with "pci=nocrs",
not with "pci=use_crs".  I think your machine boots fine with
"pci=nocrs", doesn't it?  I'll look at the video, too, but the
text log will be easier.
Comment 6 Anisse Astier 2010-08-26 15:20:21 UTC
Created attachment 28031 [details]
Dmesg with pci=nocrs

Indeed, we misunderstood each other. Here is the boot log with pci=nocrs. Obviously the video was for pci=use_crs case.
Comment 7 Bjorn Helgaas 2010-09-06 03:11:21 UTC
You understood me perfectly; I just asked for the wrong thing :-)

Anyway, thanks for the video, it had exactly what I needed.  Your
BIOS reports these windows:

    pci_root PNP0A03:00: host bridge window [mem 0xb0000000-0xffffffff]
    pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xdfffffff]
    pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xffffffff]

Notice that the second and third windows overlap the first one.  And the
second one starts one byte before the first one (I'm pretty sure this is
a BIOS bug, because 0xafffffff is an awfully unlikely alignment for the
beginning of a window).

Linux currently discards the second window because of this collision:

    pci_root PNP0A03:00: address space collision: host bridge window [mem 0xafffffff-0xdfffffff] conflicts with PCI Bus 0000:00 [mem 0xb0000000-0xffffffff]

That's pretty benign.  But we insert the third window as a *child* of the
first, since it's a proper subrange.  Then later, we find more collisions,
I think because we expect the PCI devices to be children of the first
window, not grandchildren:

    pci 0000:00:01.0: address space collision: [mem 0xff300000-0xff4fffff] conflicts with PCI Bus 0000:00 [mem 0xf0000000-0xffffffff]
    pci 0000:00:06.0: address space collision: [mem 0xff600000-0xff6fffff] conflicts with PCI Bus 0000:00 [mem 0xf0000000-0xffffffff]
    ...

I experimented with Windows in a similar configuration, and it keeps all
the host bridge windows and leaves the PCI devices where they are.

I think we need to figure out how to handle this gracefully in Linux.
Resources are currently strictly hierarchical, and these don't quite fit
that model.
Comment 8 Pramod Dematagoda 2010-09-06 08:02:27 UTC
I have this same problem on my Dell Inspiron M501R.

I have attached my dmesg log when using the pci=nocrs option for the kernel.

I will attach a video as well.
Comment 9 Pramod Dematagoda 2010-09-06 08:04:29 UTC
Created attachment 29082 [details]
Dmesg on 2.6.34 with pci=nocrs
Comment 10 Pramod Dematagoda 2010-09-06 08:32:19 UTC
Video of kernel booting up:
http://www.mediafire.com/file/qanzajgk1aqcvmm/06092010012.mp4

The quality isn't really good, sorry about that.
Comment 11 Bjorn Helgaas 2010-09-06 17:53:28 UTC
Thanks, Pramod!  I'm quite sure you're seeing the same problem.  You have
these windows, which overlap in exactly the same way they do on the
Toshiba L670D-10P:

  pci_root PNP0A03:00: host bridge window [mem 0xc0000000-0xffffffff] (ignored)
  pci_root PNP0A03:00: host bridge window [mem 0xbfffffff-0xdfffffff] (ignored)
  pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xffffffff] (ignored)

As soon as I resolve https://bugzilla.kernel.org/show_bug.cgi?id=16228, I'll
get to work on this one.
Comment 12 Bjorn Helgaas 2010-09-20 22:02:46 UTC
Created attachment 30862 [details]
coalesce PCI host bridge windows

Can you please test this patch and attach the resulting dmesg?
Comment 13 Pramod Dematagoda 2010-09-21 02:44:52 UTC
Tested this patch with 2.6.35 vanilla and it booted up perfectly.

The dmesg log is attached.

Thanks. :)
Comment 14 Pramod Dematagoda 2010-09-21 02:45:58 UTC
Created attachment 30882 [details]
Dmesg log with given patch on 2.6.35
Comment 15 Anisse Astier 2010-09-22 15:38:01 UTC
Created attachment 30972 [details]
Dmesg with x86-pci-coalesce-overlapping-host-bridge-windows.patch

It works!

[    0.195643] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
[    0.195648] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
[    0.195652] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
[    0.195657] pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000fffff]
[    0.195662] pci_root PNP0A03:00: host bridge window [mem 0xb0000000-0xffffffff]
[    0.195667] pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xdfffffff]
[    0.195672] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xffffffff]
[    0.195677] pci_root PNP0A03:00: host bridge window expanded to [mem 0xafffffff-0xffffffff]; [mem 0xafffffff-0xdfffffff] ignored
[    0.195684] pci_root PNP0A03:00: host bridge window expanded to [mem 0xafffffff-0xffffffff]; [mem 0xf0000000-0xffffffff] ignored



Tested-by: Anisse Astier <anisse@astier.eu>


Thanks
Comment 16 Pramod Dematagoda 2010-11-03 00:27:14 UTC
The problem is still there in kernel 2.6.37-rc1, has this patch been submitted upstream yet?
Comment 17 Bjorn Helgaas 2010-11-16 15:25:47 UTC
The patch didn't make it in 2.6.37-rc1, but it is in -rc2.  Can you give
that a try?
Comment 18 Bjorn Helgaas 2010-12-03 16:26:45 UTC
Can anybody test 2.6.37-rc2 or later and confirm that this problem is fixed?
Thanks!
Comment 19 Anisse Astier 2010-12-03 18:25:01 UTC
Sorry for the delay. I starred your first the notification email, but didn't have the time to do the test.

I won't have access to the hardware before Monday, but it should be pretty quick to test then.
Comment 20 Anisse Astier 2010-12-06 09:45:08 UTC
I just tested with 2.6.37-rc4+ (Linus' tree as of Dec 2), and it works as expected on Toshiba L670D:


[    0.177326] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
[    0.177326] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
[    0.177326] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
[    0.177326] pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000fffff]
[    0.177326] pci_root PNP0A03:00: host bridge window [mem 0xb0000000-0xffffffff]
[    0.177326] pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xdfffffff]
[    0.177326] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xffffffff]
[    0.177326] pci_root PNP0A03:00: host bridge window expanded to [mem 0xafffffff-0xffffffff]; [mem 0xafffffff-0xdfffffff] ignored
[    0.177326] pci_root PNP0A03:00: host bridge window expanded to [mem 0xafffffff-0xffffffff]; [mem 0xf0000000-0xffffffff] ignored


Thanks