Bug 96961 - ia64 disk, network devices unusable
Summary: ia64 disk, network devices unusable
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Bjorn Helgaas
URL: http://lkml.kernel.org/r/CA+8MBbJv-Rw...
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-20 22:00 UTC by Bjorn Helgaas
Modified: 2015-04-30 13:32 UTC (History)
0 users

See Also:
Kernel Version: v4.1 merge window
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg log (working) (37.67 KB, text/plain)
2015-04-20 22:00 UTC, Bjorn Helgaas
Details
dmesg log (failing) (37.50 KB, text/plain)
2015-04-20 22:03 UTC, Bjorn Helgaas
Details
dmesg log (working, with ACPI _CRS descriptor dump) (43.78 KB, text/plain)
2015-04-20 22:19 UTC, Bjorn Helgaas
Details
test patch to ignore Producer/Consumer bit (1.21 KB, patch)
2015-04-20 22:21 UTC, Bjorn Helgaas
Details | Diff
dmesg log (working, ignoring Producer/Consumer) (33.17 KB, text/plain)
2015-04-20 22:26 UTC, Bjorn Helgaas
Details

Description Bjorn Helgaas 2015-04-20 22:00:32 UTC
Created attachment 174601 [details]
dmesg log (working)

Tony Luck reported that after c770cb4cb505 ("PCI: Mark invalid BARs as unassigned"), some disk and network devices did not work.  I'm attaching a dmesg log with c770cb4cb505 reverted.
Comment 1 Bjorn Helgaas 2015-04-20 22:03:08 UTC
Created attachment 174611 [details]
dmesg log (failing)

Here's a dmesg log *with* c770cb4cb505, where the disk and net devices don't work.

We think the host bridge apertures are (note that we didn't find an
MMIO aperture):

  ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff])
  pci_bus 0000:80: root bus resource [io  0x9000-0xfffe]
  pci_bus 0000:80: root bus resource [bus 80-ff]

but BIOS set up MMIO windows on the bridges at 80:01.0 and 80:05.0,
and it assigned space for igb and mptsas BARs below those bridges:

  pci 0000:80:01.0: PCI bridge to [bus 81]
  pci 0000:80:01.0:   bridge window [mem 0xa0100000-0xa01fffff]
  pci 0000:81:00.0: reg 0x10: [mem 0xa0160000-0xa017ffff]
  pci 0000:81:00.1: reg 0x10: [mem 0xa0120000-0xa013ffff]

  pci 0000:80:05.0: PCI bridge to [bus 83]
  pci 0000:80:05.0:   bridge window [mem 0xa0000000-0xa00fffff]
  pci 0000:83:00.0: reg 0x14: [mem 0xa0010000-0xa0013fff 64bit]

Prior to c770cb4cb505, we complained about the PCI bridge windows not
being inside a host bridge window, but we enabled the igb and mptsas
devices anyway.  After c770cb4cb505, we don't enable them (you reported
that your disk is broken, and I think those two NICs are also broken).
Comment 2 Bjorn Helgaas 2015-04-20 22:19:18 UTC
Created attachment 174621 [details]
dmesg log (working, with ACPI _CRS descriptor dump)

This log shows several ACPI _CRS descriptors that we currently ignore
because their Producer/Consumer bits are set to "Consumer".   Here's one
that contains the MMIO area used by these igb and mptsas devices:

  host bridge resource 12 length 0x38       ACPI_RESOURCE_TYPE_ADDRESS32
    : 00000000: 00 01 00 01 01 01 00 00 00 00 00 00 00 00 00 00
    : 00000010: 00 00 00 a0 ff ff ff ef 00 00 00 00 00 00 00 50
    : 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    : 00000030: 0d 00 00 00 50 00 00 00
  [mem 0xa0000000-0xefffffff]

Decoded:
  00 acpi_resource_address32.resource_type (ACPI_MEMORY_RANGE)
  01 acpi_resource_address32.producer_consumer (ACPI_CONSUMER)

This [mem 0xa0000000-0xefffffff] is marked "Consumer", which means the
device consumes the range itself and does not produce it on the downstream
side of the bridge.  But this is the range that contains these downstream
device BARs (among others):

  pci 0000:80:13.0: reg 0x10: [mem 0xa0220000-0xa0220fff]
  pci 0000:80:16.0: reg 0x10: [mem 0xa0200000-0xa0203fff 64bit]
  pci 0000:81:00.0: reg 0x10: [mem 0xa0160000-0xa017ffff]
  pci 0000:81:00.1: reg 0x10: [mem 0xa0120000-0xa013ffff]
  pci 0000:83:00.0: reg 0x14: [mem 0xa0010000-0xa0013fff 64bit]
Comment 3 Bjorn Helgaas 2015-04-20 22:21:13 UTC
Created attachment 174631 [details]
test patch to ignore Producer/Consumer bit

This patch ignores the Producer/Consumer bit in PCI host bridge _CRS descriptors.
Comment 4 Bjorn Helgaas 2015-04-20 22:26:18 UTC
Created attachment 174641 [details]
dmesg log (working, ignoring Producer/Consumer)

This log shows working mptsas and igb devices that use the host bridge windows that are marked "Consumer".  This adds the windows marked "+"
below:

   ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f])
   acpi PNP0A08:00: host bridge window [io  0x0000-0x0cf7]
   acpi PNP0A08:00: host bridge window [io  0x1000-0x8fff]
  +acpi PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
  +acpi PNP0A08:00: host bridge window [mem 0x000c0000-0x000fffff]
  +acpi PNP0A08:00: host bridge window [mem 0xfec00000-0xfec3ffff]
  +acpi PNP0A08:00: host bridge window [mem 0xfed1c000-0xfed1c0ff]
  +acpi PNP0A08:00: host bridge window [mem 0xfed40000-0xfedfffff]
   acpi PNP0A08:00: host bridge window [mem 0x50000000-0x9fffffff]
  +acpi PNP0A08:00: host bridge window [mem 0x10000000000-0x100fffffffe]
  +acpi PNP0A08:00: host bridge window [mem 0x10100000000-0x101fffffffe]
  +acpi PNP0A08:00: host bridge window [mem 0x10200000000-0x102fffffffe]
  +acpi PNP0A08:00: host bridge window [mem 0x10300000000-0x103fffffffe]

   ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff])
   acpi PNP0A08:01: host bridge window [io  0x9000-0xfffe]
  +acpi PNP0A08:01: host bridge window [mem 0xfec40000-0xfec7ffff]
  +acpi PNP0A08:01: host bridge window [mem 0xa0000000-0xefffffff]
  +acpi PNP0A08:01: host bridge window [mem 0x10400000000-0x104fffffffe]
  +acpi PNP0A08:01: host bridge window [mem 0x10500000000-0x105fffffffe]
  +acpi PNP0A08:01: host bridge window [mem 0x10600000000-0x106fffffffe]
  +acpi PNP0A08:01: host bridge window [mem 0x10700000000-0x107fffffffe]
Comment 5 Bjorn Helgaas 2015-04-30 13:32:27 UTC
This should be fixed by http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9fbbda5c8e0a, which appeared in v4.1-rc1.

Note You need to log in before you can comment on or make changes to this bug.