Bug 85991

Summary: Issue sizing BAR(s)
Product: Drivers Reporter: Myron Stowe (myron.stowe)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED CODE_FIX    
Severity: normal CC: bjorn
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 3.17 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg log
Nix failing dmesg (pristine v3.18.5)
Nix working dmesg (v3.18.5 with 36e8164882ca reverted)

Description Myron Stowe 2014-10-10 20:14:23 UTC
Encountering the following error messages during boot:

  pci 0000:7f:1e.3: BAR 0: error updating (0xc7ffd00a != 0x00001a)
  pci 0000:ff:1e.3: BAR 0: error updating (0xc7ffd01a != 0x00001a)

Focusing on 0000:7f:1e.3 in 'dmesg' yields:

  e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
  
  PCI: Discovered peer bus 7f
  PCI: root bus 7f: using default resources
  PCI: Probing PCI hardware (bus 7f)
  PCI host bridge to bus 0000:7f
  pci_bus 0000:7f: root bus resource [io  0x0000-0xffff]
  pci_bus 0000:7f: root bus resource [mem 0x00000000-0x3fffffffffff]
  pci_bus 0000:7f: No busn resource found for root bus, will use [bus 7f-ff]

  pci 0000:7f:1e.3: [8086:2fc0] type 00 class 0x088000
  pci 0000:7f:1e.3: reg 0x10: [mem 0x00000010-0x0000001f pref]

  pci 0000:7f:1e.3: address space collision: [mem 0x00000010-0x0000001f pref] conflicts with reserved [mem 0x00000000-0x00000fff]
  pci 0000:7f:1e.3: BAR 0: assigned [mem 0xc7ffd000-0xc7ffd00f pref]
  pci 0000:7f:1e.3: BAR 0: error updating (0xc7ffd00a != 0x00001a)
  pci_bus 0000:7f: resource 4 [io  0x0000-0xffff]
  pci_bus 0000:7f: resource 5 [mem 0x00000000-0x3fffffffffff]

It looks like device enumeration encountered device 7f:1e.3 whose BAR 0 was programmed with 0x00000010 - a fairly suspect address.

Later, the kernel detects that the BAR conflicts with a previously indicated "reserved" area - 0x00000000-0x00000fff and an attempt is made to re-assign the BAR to an available region - 0xc7ffd000-0xc7ffd00f - but the assignment encounters an error when programming the BAR.

A similar sequence occurs for 0000:ff:1e.3

In short, there seems to be some issue with programming BAR 0 of devices 7f:1e.3 and ff:1e.3 on this platform.  These devices look to be part of, or related to, the processor's (Xeon(R) CPU E5-2690 - Haswell) "uncore" functions:
  [8086:2fc0] Xeon E5 v3/Core i7 Power Control Unit
Comment 1 Myron Stowe 2014-10-10 20:23:28 UTC
Created attachment 153141 [details]
dmesg log

This dmesg log is from a RHEL7.0 system but the issue also occurs on an upstream v3.17 kernel.
Comment 2 Myron Stowe 2014-10-10 20:30:14 UTC
Intel responded back indicating that these devices should be hidden via a workaround in the platform's BIOS.

The issue is covered in: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf (refer to HSE43).
Comment 3 Myron Stowe 2014-10-30 21:59:48 UTC
I posted a patch to the linux-pci list that should catch such occurrences in the future:
  https://lkml.org/lkml/2014/10/30/637


From "[PATCH 0/3] PCI: Fix detection of read-only BARs" -

While non-conformant, PCI devices having read-only (r/o) BARs - registers
that when read return fixed, non-zero, values regardless of whether or
not their being sized - occasionally turn up.  Pre-git commit 1307ef662199
[1] was the initial attempt to detect such BARs.  The detection mechanism
used however ended up exposing further unexpected behaviors on enough
devices that it had to be reverted [2].

A subsequent solution, which is still currently in use, was put in place
with pre-git commit 2c79a80ab7b7 ("[PCI] Correctly size hardwired empty
BARs").  This solution's logic detects r/o BARs via the following (see
'pci_size()'):
        /* base == maxbase can be valid only if the BAR has
           already been programmed with all 1s.  */
        if (base == maxbase && ((base | size) & mask) != mask)
                return 0;

Later, commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") was
introduced, re-factoring PCI's core BAR sizing logic.  The commit altered
__pci_read_base's local variable 'l', stripping off its lower,
non-addressing related bits, prior to being passed in as the 'base'
parameter to pci_size().  This masking broke the r/o BAR detection logic's
first comparison check for r/o BARs that have any of their lower order
bits, bits that are not part of a BARs "base address" field, set.  For
such cases, the 'base == maxbase' comparison check will no longer ever be
"true".

This series restores the r/o BAR detection logic so that it will once
again catch, and ignore, such occurrences as have been encountered to
date:
  - AGP aperture BAR of AMD-7xx host bridges; if the AGP window
    disabled, this BAR is read-only and read as 0x00000008 [1]
  - BAR0-4 of ALi IDE controllers can be non-zero and read-only [1]
  - Intel Sandy Bridge - Thermal Management Controller [8086:0103];
    BAR 0 returning 0xfed98004 [3]
  - Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
    Bar 0 returning 0x00001a [4]


[1] From Thomas Gleixner's "Linux kernel history" repository:
    https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9
    pre-git commit 1307ef662199  "PCI: probing read-only Bars"
[2] Pre-git commit 182d090b9dfe  "Undo due to weird behaviour on various
    boxes"
[3] https://bugzilla.kernel.org/show_bug.cgi?id=43331
[4] https://bugzilla.kernel.org/show_bug.cgi?id=85991
Comment 4 Myron Stowe 2015-02-03 20:59:28 UTC
Nix encountered regressive behavior on a Soekris NET5501 board from a 3.18.3 "stable" kernel.  The regressive bahavior was bisected to upstream commit 36e8164882ca ("PCI: Restore detection of read-only BARs") which was recently back-ported to 3.18 "stable" as commit efdb9b956aa0.

Reference: https://lkml.org/lkml/2015/1/30/694

Nix subsequently supplied a set of 'dmesg', 'lspci -tv', and 'lspci -vvv' logs from both prior to the commit in question and after the commit in question.

Reference: https://lkml.org/lkml/2015/2/2/516
Comment 5 Myron Stowe 2015-02-03 21:11:17 UTC
Based on Nix's work and supplying of logs I studied what was occurring and worked up an analysis with Bjorn's help.

Reference: https://lkml.org/lkml/2015/2/3/465

Based on the analysis, it looks as if the device in question is not behaving properly with respect to PCI sizing.  Nix was able to instrument the kernel to verify such.  Indeed, the device is not responding to sizing requests properly, the BARs are acting in a read-only manner.  So, the upstream commit in question is working as intended.

Reference: https://lkml.org/lkml/2015/2/3/651


In past cases (see start of this BZ for occurrances), we ignored read-only BARs as they were bogus and could be ignored.  However, in this case, we need such for the device to probe, attach, and function correctly.

From Bjorn:
"Prior to 36e8164882ca, we didn't detect that, and we computed the size based on
the lowest order bit that was set in the address.  This gave us incorrect sizes, but it did work in the sense that the driver could operate the device.

After 36e8164882ca, we detect that the BARs are read-only and ignore them completely, which breaks this case because we don't mark the resource as I/O and we don't fill in the starting address, so even though the quirk runs, it just sets the first resource to [??? 0x0000-0x0007], which doesn't work.

I think this is the right behavior for the PCI core because we can't tell how big these BARs are.  The only alternative is to assume they are as big as possible given their current addresses.  But that would mean a 4KB read-only BAR that happened to be aligned on a 2GB boundary would have to consume 2GB of address space, and *that* doesn't seem reasonable.

We already have a quirk for this device, so I think the best fix is to change the quirk so it reads these three BARs directly and restores the resources based on its hard-coded knowledge of how big they are."
Comment 6 Bjorn Helgaas 2015-02-04 03:03:45 UTC
Created attachment 165761 [details]
Nix failing dmesg (pristine v3.18.5)
Comment 7 Bjorn Helgaas 2015-02-04 03:07:08 UTC
Created attachment 165771 [details]
Nix working dmesg (v3.18.5 with 36e8164882ca reverted)

Relevant diffs from dmesg:

--- nix.good	2015-02-03 21:00:49.887519055 -0600
+++ nix.bad	2015-02-03 21:01:08.203216967 -0600
@@ -1,7 +1,7 @@
-Good boot (36e8164882ca ("PCI: Restore detection of read-only BARs") reverted):
+Bad boot (pristine 3.18.5):

 pci 0000:00:14.0: [1022:2090] type 00 class 0x060100
-pci 0000:00:14.0: reg 0x10: [io  0x6000-0x7fff]
-pci 0000:00:14.0: reg 0x14: [io  0x6100-0x61ff]
-pci 0000:00:14.0: reg 0x18: [io  0x6200-0x63ff]
 pci 0000:00:14.0: CS5536 ISA bridge bug detected (incorrect header); workaround applied

-cs5535-gpio cs5535-gpio: reserved resource region [io  0x6100-0x61ff]
-cs5535-mfgpt cs5535-mfgpt: reserved resource region [io  0x6200-0x63ff]
-cs5535-mfgpt cs5535-mfgpt: 8 MFGPT timers available
+cs5535-gpio cs5535-gpio: can't request region
+cs5535-gpio: probe of cs5535-gpio failed with error -5
+cs5535-mfgpt cs5535-mfgpt: can't request region
+cs5535-mfgpt: probe of cs5535-mfgpt failed with error -5
 cs5535-mfd 0000:00:14.0: 5 devices registered.

-cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0' registered
+scx200_acb: can't allocate io 0x0-0x7
+cs5535-smb: probe of cs5535-smb failed with error -5
Comment 8 Bjorn Helgaas 2015-02-19 14:56:53 UTC
Fixed by 36e8164882ca ("PCI: Restore detection of read-only BARs"), which appeared in v3.19-rc1.

The problem reported in comment #4, which was exposed by 36e8164882ca, was fixed by 06cf35f903aa ("PCI: Handle read-only BARs on AMD CS553x devices"), which appeared in v3.19.