Bug 23132 - without "pci=nocrs", Dell Inspiron 1545 hangs
Summary: without "pci=nocrs", Dell Inspiron 1545 hangs
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Bjorn Helgaas
URL: http://lkml.org/lkml/2010/11/17/324
Keywords:
Depends on:
Blocks: 16444
  Show dependency tree
 
Reported: 2010-11-17 19:18 UTC by Bjorn Helgaas
Modified: 2010-12-19 12:13 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.37-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
2.6.36 dmesg log (41.93 KB, text/plain)
2010-11-17 19:18 UTC, Bjorn Helgaas
Details
2.6.37-rc2 log (41.82 KB, text/plain)
2010-11-18 08:27 UTC, David
Details
debug patch to skip PCI assignments (2.48 KB, patch)
2010-12-10 23:16 UTC, Bjorn Helgaas
Details | Diff
avoid e820 regions (22.39 KB, patch)
2010-12-16 22:27 UTC, Bjorn Helgaas
Details | Diff

Description Bjorn Helgaas 2010-11-17 19:18:59 UTC
Created attachment 37352 [details]
2.6.36 dmesg log

[From email:]

Commit dc9887dc ("x86/PCI: allocate space from the end of a region, not
the beginning") causes the kernel to hang on my Dell Inspiron 1545.
2.6.36 and booting with pci=nocrs works fine. Reverting this commit also
works. I am attaching the kernel log and system iomem from the working
2.6.36. It hangs at this location consistently:

pci-stub: invalid id string ""
ACPI: AC Adapter [AC] (on-line)
input: Lid Switch as
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input0
<hang>

Attached dmesg log is from 2.6.36.
Comment 1 David 2010-11-18 08:27:28 UTC
Created attachment 37402 [details]
2.6.37-rc2 log

Attached the 2.6.37-rc2 log file  with dc9887dc reverted and pnp.debug enabled.
Comment 2 Bjorn Helgaas 2010-12-10 23:16:54 UTC
Created attachment 39742 [details]
debug patch to skip PCI assignments

Here's something to experiment with.

Reverting dc9887dc should only affect these PCI bridge window assignments:

  pci 0000:00:1c.0: BAR 8: assigned [mem 0xffc00000-0xffdfffff]
  pci 0000:00:1c.0: BAR 9: assigned [mem 0xfef00000-0xff0fffff 64bit pref]
  pci 0000:00:1c.1: BAR 9: assigned [mem 0xff100000-0xff2fffff 64bit pref]
  pci 0000:00:1c.2: BAR 9: assigned [mem 0xff300000-0xff4fffff 64bit pref]
  pci 0000:00:1c.0: BAR 7: assigned [io  0x7000-0x7fff]
  pci 0000:00:1c.1: BAR 7: assigned [io  0x8000-0x8fff]

These are done in that order from these host bridge windows:

  pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]
  pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
  pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]

You can see that we're starting with the highest host bridge windows,
and filling each window from bottom to top (there's some unused space
below the 00:1c.0 BAR 9 to align it).

Commit dc9887dc will change that so we fill each window from top to
bottom.  This should not affect 00:1c.0 BAR 8 because it completely
fills the host bridge window, but will affect the others.

I don't think we actually *need* any of these windows we're assigning;
both devices behind the bridges (0c:00.0 behind 00:1c.1 and 09:00.0
behind 00:1c.2) already have all the resources they need.

My theory is that there's an unreported device somewhere in the
[mem 0xff500000-0xff9fffff] range, and when we assign a bridge
window there, it causes a conflict.  Windows *should* see a similar
problem, but maybe it takes a lazy approach and assigns bridge
windows only when necessary, which would hide the problem.

If you apply this debug patch and boot with this parameter:

  pci=skip=00:1c.0:9;00:1c.1:9;00:1c.2:9

we won't assign those windows either, and it might boot.  If
that doesn't work, try skipping the I/O port windows as well.
Comment 3 Bjorn Helgaas 2010-12-11 00:19:38 UTC
Here's a better approach that's already in the kernel and doesn't
require a patch.  If you boot with "reserve=0xff9fffff,0x500000",
we'll still assign all the windows, but we should avoid the
[mem 0xff500000-0xff9fffff] range.

If that boots, it's a good clue that there's an unreported device
there.  I'm not sure where to go from there.  We could do a series
of experiments like I did here:

  https://bugzilla.kernel.org/show_bug.cgi?id=23332#c20

to determine exactly where the problem is, then add a quirk for
the Inspiron 1545.

The problem with that approach is that if Linux assigns bridge windows
more aggressively than Windows does, we'll likely trip over the
same problem on other machines, and I really don't want to have
to deal with them one-by-one.

This situation (a bridge with nothing behind it, where BIOS leaves
the windows disabled, and Linux assigns them) seems fairly common,
so I should be able to find a machine with Windows so I can find out
what it does.
Comment 4 Bjorn Helgaas 2010-12-16 22:27:13 UTC
Created attachment 40482 [details]
avoid e820 regions

We plan to apply this patch to 2.6.37-rc6.  I believe it will fix
this issue, because one of the things the patch does is revert
commit dc9887dc, but confirmation would be great.

Note You need to log in before you can comment on or make changes to this bug.