Bug 23132

Summary: without "pci=nocrs", Dell Inspiron 1545 hangs
Product: Drivers Reporter: Bjorn Helgaas (bjorn.helgaas)
Component: PCIAssignee: Bjorn Helgaas (bjorn.helgaas)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, maciej.rutecki, rjw, sreenivasa-reddy.berahalli
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://lkml.org/lkml/2010/11/17/324
Kernel Version: 2.6.37-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16444    
Attachments: 2.6.36 dmesg log
2.6.37-rc2 log
debug patch to skip PCI assignments
avoid e820 regions

Description Bjorn Helgaas 2010-11-17 19:18:59 UTC
Created attachment 37352 [details]
2.6.36 dmesg log

[From email:]

Commit dc9887dc ("x86/PCI: allocate space from the end of a region, not
the beginning") causes the kernel to hang on my Dell Inspiron 1545.
2.6.36 and booting with pci=nocrs works fine. Reverting this commit also
works. I am attaching the kernel log and system iomem from the working
2.6.36. It hangs at this location consistently:

pci-stub: invalid id string ""
ACPI: AC Adapter [AC] (on-line)
input: Lid Switch as
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input0
<hang>

Attached dmesg log is from 2.6.36.
Comment 1 David 2010-11-18 08:27:28 UTC
Created attachment 37402 [details]
2.6.37-rc2 log

Attached the 2.6.37-rc2 log file  with dc9887dc reverted and pnp.debug enabled.
Comment 2 Bjorn Helgaas 2010-12-10 23:16:54 UTC
Created attachment 39742 [details]
debug patch to skip PCI assignments

Here's something to experiment with.

Reverting dc9887dc should only affect these PCI bridge window assignments:

  pci 0000:00:1c.0: BAR 8: assigned [mem 0xffc00000-0xffdfffff]
  pci 0000:00:1c.0: BAR 9: assigned [mem 0xfef00000-0xff0fffff 64bit pref]
  pci 0000:00:1c.1: BAR 9: assigned [mem 0xff100000-0xff2fffff 64bit pref]
  pci 0000:00:1c.2: BAR 9: assigned [mem 0xff300000-0xff4fffff 64bit pref]
  pci 0000:00:1c.0: BAR 7: assigned [io  0x7000-0x7fff]
  pci 0000:00:1c.1: BAR 7: assigned [io  0x8000-0x8fff]

These are done in that order from these host bridge windows:

  pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]
  pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
  pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]

You can see that we're starting with the highest host bridge windows,
and filling each window from bottom to top (there's some unused space
below the 00:1c.0 BAR 9 to align it).

Commit dc9887dc will change that so we fill each window from top to
bottom.  This should not affect 00:1c.0 BAR 8 because it completely
fills the host bridge window, but will affect the others.

I don't think we actually *need* any of these windows we're assigning;
both devices behind the bridges (0c:00.0 behind 00:1c.1 and 09:00.0
behind 00:1c.2) already have all the resources they need.

My theory is that there's an unreported device somewhere in the
[mem 0xff500000-0xff9fffff] range, and when we assign a bridge
window there, it causes a conflict.  Windows *should* see a similar
problem, but maybe it takes a lazy approach and assigns bridge
windows only when necessary, which would hide the problem.

If you apply this debug patch and boot with this parameter:

  pci=skip=00:1c.0:9;00:1c.1:9;00:1c.2:9

we won't assign those windows either, and it might boot.  If
that doesn't work, try skipping the I/O port windows as well.
Comment 3 Bjorn Helgaas 2010-12-11 00:19:38 UTC
Here's a better approach that's already in the kernel and doesn't
require a patch.  If you boot with "reserve=0xff9fffff,0x500000",
we'll still assign all the windows, but we should avoid the
[mem 0xff500000-0xff9fffff] range.

If that boots, it's a good clue that there's an unreported device
there.  I'm not sure where to go from there.  We could do a series
of experiments like I did here:

  https://bugzilla.kernel.org/show_bug.cgi?id=23332#c20

to determine exactly where the problem is, then add a quirk for
the Inspiron 1545.

The problem with that approach is that if Linux assigns bridge windows
more aggressively than Windows does, we'll likely trip over the
same problem on other machines, and I really don't want to have
to deal with them one-by-one.

This situation (a bridge with nothing behind it, where BIOS leaves
the windows disabled, and Linux assigns them) seems fairly common,
so I should be able to find a machine with Windows so I can find out
what it does.
Comment 4 Bjorn Helgaas 2010-12-16 22:27:13 UTC
Created attachment 40482 [details]
avoid e820 regions

We plan to apply this patch to 2.6.37-rc6.  I believe it will fix
this issue, because one of the things the patch does is revert
commit dc9887dc, but confirmation would be great.