Bug 43331

Summary: error updating BAR
Product: Drivers Reporter: Bjorn Helgaas (bjorn)
Component: PCIAssignee: Bjorn Helgaas (bjorn)
Status: RESOLVED CODE_FIX    
Severity: normal CC: martin, myron.stowe, perry_yuan, unruh
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://lkml.org/lkml/2012/5/31/471
Kernel Version: 3.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg fragment, lspci, /proc/iomem
complete dmesg log
Complete Aida report on Windows on this machine
acpidump of my Linux system
dmesg with pci=nocrs
SSDT5.dsl
debug patch to trace BAR sizing & update
rdwrmem.c
check for read-only BARs
proposed patch
dmesg comparison with Myron's patch on 3.17.2

Description Bjorn Helgaas 2012-06-01 15:29:17 UTC
From Bill (original report at https://lkml.org/lkml/2012/5/31/471):

System Panasonic toughbook Cf-S10 (Intel graphics using ram for its memory, Intel 6205 wireless card).

I am running Mageia 2 kernel 3.3.6-desktop586-2.mga2

Every time I boot up I get the error messages
pci 0000:00:04.0: BAR 0: error updating (0xdfa00004 != 0xfed98004)

I have no idea what this means. The system appears to be running OK (Ie I have
not noticed problems) I do have to run with noapic or the system randomly crashes (usually during boot, but sometimes after bootup)

It was suggested that I try the pci=nocrs option which does get rid of that
error message, but also results in a much slower boot and slower running of
the system.
Comment 1 Bjorn Helgaas 2012-06-01 15:41:35 UTC
Created attachment 73489 [details]
dmesg fragment, lspci, /proc/iomem

More details from original report.
Comment 2 Bjorn Helgaas 2012-06-01 22:39:41 UTC
Created attachment 73492 [details]
complete dmesg log

Attaching the complete dmesg log.  Here's the relevant part:

    ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3e])
    pci_root PNP0A08:00: host bridge window [io  0x0000-0x0cf7]
    pci_root PNP0A08:00: host bridge window [io  0x0d00-0xffff]
    pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
    pci_root PNP0A08:00: host bridge window [mem 0x000d4000-0x000d7fff]
    pci_root PNP0A08:00: host bridge window [mem 0x000d8000-0x000dbfff]
    pci_root PNP0A08:00: host bridge window [mem 0x000dc000-0x000dffff]
    pci_root PNP0A08:00: host bridge window [mem 0xdfa00000-0xfeafffff]
    pci_root PNP0A08:00: host bridge window [mem 0xfed40000-0xfed44fff]
    pci 0000:00:04.0: [8086:0103] type 0 class 0x001180
    pci 0000:00:04.0: reg 10: [mem 0xfed98000-0xfed9ffff 64bit]
    pci 0000:00:04.0: BAR 0: assigned [mem 0xdfa00000-0xdfa07fff 64bit]
    pci 0000:00:04.0: no compatible bridge window for [mem 0xfed98000-0xfed9ffff 64bit]
    pci 0000:00:04.0: BAR 0: assigned [mem 0xdfa00000-0xdfa07fff 64bit]
    pci 0000:00:04.0: BAR 0: error updating (0xdfa00004 != 0xfed98004)
    pnp 00:0e: [mem 0xfed98000-0xfed9ffff]
    system 00:0e: [mem 0xfed98000-0xfed9ffff] has been reserved
    system 00:0e: Plug and Play ACPI device, IDs PNP0c02 (active)

Looks like a straightforward case of the BIOS leaving a device outside the host bridge apertures, and we're trying to fix it.  Microsoft Windows normally does the same thing, but where Linux uses the lowest unused space (0xdfa00000-0xdfa07fff in this case), Windows uses the highest unused space.  It's also conceivable that Windows notices the ACPI resource ([mem 0xfed98000-0xfed9ffff]) that overlaps it and takes that into account.

If you happen to have Windows on this box, it would be very interesting to see what it does with the 00:04.0 device via Device Manager or something like http://www.aida64.com/
Comment 3 W Unruh 2012-06-02 02:58:25 UTC
Created attachment 73493 [details]
Complete Aida report on Windows on this machine

Enclosed is the Aida Extreme report. It seems that the PCI bus 0 device 4 is the
Intel Sandy Bridge - Thermal Management Controller. This may explain why I only get 4 hours from the battery, when Panasonic claims 11 Hr ( and the windows battery monitor claimed 4.5 hr remaining when Linux for the same battery level claimed only 1.7 Hr.-- ie the power management was not working on Linux).
The following seem to be some of the relevant sections (assuming 00:04.0 really is bus 0 device 4 function 0

[ System devices / Intel(R) Dynamic Power Performance Management Processor Driver ]
 
		Device Properties:
			Driver Description  	Intel(R) Dynamic Power Performance Management Processor Driver
			Driver Date  	1/28/2011
			Driver Version  	5.0.2.1040
			Driver Provider  	Intel
			INF File  	oem34.inf
			Hardware ID  	PCI\VEN_8086&DEV_0103&SUBSYS_833810F7&REV_09
			Location Information  	PCI bus 0, device 4, function 0
			PCI Device  	Intel Sandy Bridge - Thermal Management Controller
 
		Device Resources:
			IRQ  	16
			Memory  	FEAF8000-FEAFFFFF 


....[ Intel Sandy Bridge - Thermal Management Controller ]
 
		Device Properties:
			Device Description  	Intel Sandy Bridge - Thermal Management Controller
			Bus Type  	PCI
			Bus / Device / Function  	0 / 4 / 0
			Device ID  	8086-0103
			Subsystem ID  	10F7-8338
			Device Class  	1180 (Data Acquisition / Signal Processing Controller)
			Revision  	09
			Fast Back-to-Back Transactions  	Supported, Disabled
 
		Device Features:
			66 MHz Operation  	Not Supported
			Bus Mastering  	Enabled 

............
Memory FEAF8000-FEAFFFFF  	Exclusive  	Intel(R) Dynamic Power Performance Management Processor Driver
Comment 4 Bjorn Helgaas 2012-06-02 03:51:22 UTC
Thanks, that's perfect!  The Linux 00:04.0 notation indeed means bus 0, device 4, function 0, a Thermal Management Controller.  And Windows does move it: the BIOS would have left it at [mem 0xfed98000-0xfed9ffff 64bit] just as when booting Linux, but Aida shows that it's now at FEAF8000-FEAFFFFF.  That's the highest available space where it would fit, in this aperture:

  pci_root PNP0A08:00: host bridge window [mem 0xdfa00000-0xfeafffff]

Interestingly, Aida says the ACPI PNP0C02 device at 0xfed98000 is *also* a thermal monitoring device:

  Hardware ID  	ACPI\PNP0C02
  PnP Device  	Thermal Monitoring ACPI Device
 
  Device Resources:
    Memory  	FED98000-FED9FFFF
 
I think probably these are really two ways of referring to the same device.  Can you collect an acpidump from your machine?  Usually the BIOS doesn't expose one device in two ways, but maybe there's something in the ACPI namespace that tells the OS the connection.  I'm doubtful, because apparently Windows didn't connect them either, since it moved the PCI device, and the ACPI device still claims to be at the original location.

I wonder if that BAR is really read-only, and Windows just *thinks* it moved it but really didn't, while Linux checks whether it moved and noticed that it didn't.  In that case, maybe all we need to do is reword that error message.

It is interesting that with "pci=nocrs" your system is slower.  I would expect the message to go away because we won't try to move the device into a host bridge aperture.  But I don't know why the system would be slower.  Can you attach a dmesg log of that boot?
Comment 5 W Unruh 2012-06-02 04:41:44 UTC
Created attachment 73494 [details]
acpidump of my Linux system 

Here is an acpidump of my system. (acpidump>/tmp/a)
(I sure cannot read it)
Comment 6 W Unruh 2012-06-02 05:13:45 UTC
Looking at the timing in dmesg for pci=nocrs, there is no evidence of a slowdown at all. I must have imagined the slowdown. 

Could the pci problems be causing the increased power drain?
Comment 7 W Unruh 2012-06-02 05:26:53 UTC
Created attachment 73495 [details]
dmesg with pci=nocrs

Anyway, here is the dmesg with pci=nocrs just in case it is of help
Comment 8 Bjorn Helgaas 2012-06-04 00:18:11 UTC
Created attachment 73505 [details]
SSDT5.dsl

To decode acpidump:
  $ acpixtract -a acpidump
  $ iasl -d DSDT.dat
  $ iasl -d SSDT5.dat
  ...

SSDT5.dsl seems of interest here, but I still can't figure much out.  It mentions these:

    \_SB\DDRC
        _HID PNP0C02
        _UID 0x04
        _CRS [mem 0xFED98000-0xFED9FFFF]
    \_SB\DPPM
        _HID INT3400 Intel(R) Dynamic Power Performance Management Driver
        _CID PNP0C02

Windows sees the INT3400 device (DPPM), but Linux doesn't.  And Windows associates the DDRC device with "Thermal Monitoring", but I don't know how.

I doubt this PCI issue is causing increased power drain.  I guess if you wanted to experiment, you could see whether the battery lasts longer with "pci=nocrs", since that avoids the attempt to move the device.  But I think it's more likely that Windows just has better power management support on this laptop -- for example, I don't think Linux has anything corresponding to the "Dynamic Power Performance Management" driver that binds to the INT3400 and related devices.

Re the PCI BAR update issue, I suspect that what we ought to do is either ignore the fact that the update failed (as Windows seems to do), or maybe better, notice that it failed and fall back to using the original BAR value, possibly marking it with IORESOURCE_PCI_FIXED so we know that we can't move it in the future.
Comment 9 Bjorn Helgaas 2012-07-06 18:06:38 UTC
On Sat, Jun 02, 2012 at 11:30:23AM +0800, Jiang Liu wrote:
> ... address range 0xfed98000-0xfed9ffff has been reserved by motherboard
> device(PNP0C02).  I guess that BIOS has assigned address "0xfed98000" to
> 0000:00:04.0 for thermal management functionality. The BAR0 of
> 0000:00:04.0 may be locked down (can't be changed by OS) because the ACPI
> BIOS may have dependency on the assigned address ranges.

I don't think the BAR can be completely read-only.  If it were, we wouldn't
have any way to determine its size.  We believe it is 32K in size:

    pci 0000:00:04.0: reg 10: [mem 0xfed98000-0xfed9ffff 64bit]

so we should have written 0xffffffff to the low 32 bits of the BAR and read
back 0xffff8004 (32K = 2^15, so the low-order 15 bits should be read-only,
including the prefetchable bit (0), the type bits (10 for 64-bit), and the
memory space indicator (0)).

Can you experiment with setting that BAR manually, e.g., by running these
commands as root:

    # setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
    # setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
    # setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1

That's basically what the kernel does in pci_update_resource(), so this
will likely fail, too.

In __pci_read_base(), where we size the BAR, we disable decoding first,
which we *don't* do in pci_update_resource().  So if the above doesn't
work, can you try this:

    # setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
    # setpci -s 00:04.0 COMMAND=0
    # setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
    # setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1
Comment 10 W Unruh 2012-07-06 23:37:16 UTC
On Fri, 6 Jul 2012, Bjorn Helgaas wrote:

> On Sat, Jun 02, 2012 at 11:30:23AM +0800, Jiang Liu wrote:
>> ... address range 0xfed98000-0xfed9ffff has been reserved by motherboard
>> device(PNP0C02).  I guess that BIOS has assigned address "0xfed98000" to
>> 0000:00:04.0 for thermal management functionality. The BAR0 of
>> 0000:00:04.0 may be locked down (can't be changed by OS) because the ACPI
>> BIOS may have dependency on the assigned address ranges.
>
> I don't think the BAR can be completely read-only.  If it were, we wouldn't
> have any way to determine its size.  We believe it is 32K in size:
>
>    pci 0000:00:04.0: reg 10: [mem 0xfed98000-0xfed9ffff 64bit]
>
> so we should have written 0xffffffff to the low 32 bits of the BAR and read
> back 0xffff8004 (32K = 2^15, so the low-order 15 bits should be read-only,
> including the prefetchable bit (0), the type bits (10 for 64-bit), and the
> memory space indicator (0)).
>
> Can you experiment with setting that BAR manually, e.g., by running these
> commands as root:
>
>    # setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
>    # setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
>    # setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1


planet:0[root]>setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
0006
fed98004
00000000
planet:0[root]>setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
planet:0[root]>setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1
fed98004
00000000
planet:0[root]


So it seems to work but does  not change the base address (or was that what
you meant by "does not work")



>
> That's basically what the kernel does in pci_update_resource(), so this
> will likely fail, too.
>
> In __pci_read_base(), where we size the BAR, we disable decoding first,
> which we *don't* do in pci_update_resource().  So if the above doesn't
> work, can you try this:
>
>    # setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
>    # setpci -s 00:04.0 COMMAND=0
>    # setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
>    # setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1

planet:0[root]>setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
0006
fed98004
00000000
planet:0[root]>setpci -s 00:04.0 COMMAND=0
planet:0[root]>setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
planet:0[root]>setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1
fed98004
00000000
planet:0[root]>

Ie, same thing-- no change in the address.
Comment 11 Bjorn Helgaas 2012-07-07 00:14:05 UTC
Created attachment 74971 [details]
debug patch to trace BAR sizing & update

Huh.  I don't know what's going on.  If you can recompile a kernel, would you mind trying this patch and attaching the resulting dmesg log?  It won't fix anything, but maybe we can see what happens when we size the BAR.
Comment 12 Bjorn Helgaas 2012-07-07 00:44:21 UTC
Can you also include the output of "lspci -xxx -s00:00.0"?  The "2nd Generation Intel Core Processor Family Mobile" spec (vol 2) mentions an MCHBAR at 48-4Fh 00:00.0 config space that seems related to the PCU, which seems related to thermal management.
Comment 13 Yinghai Lu 2012-07-07 00:55:53 UTC
On Fri, Jul 6, 2012 at 4:37 PM, Bill Unruh <unruh@physics.ubc.ca> wrote:
> On Fri, 6 Jul 2012, Bjorn Helgaas wrote:
>
>> On Sat, Jun 02, 2012 at 11:30:23AM +0800, Jiang Liu wrote:
>>>
>>> ... address range 0xfed98000-0xfed9ffff has been reserved by motherboard
>>> device(PNP0C02).

No, pnp resource reservation is after pci bar reservation.

>>>I guess that BIOS has assigned address "0xfed98000" to
>>> 0000:00:04.0 for thermal management functionality. The BAR0 of
>>> 0000:00:04.0 may be locked down (can't be changed by OS) because the ACPI
>>> BIOS may have dependency on the assigned address ranges.
>>
>>
>> I don't think the BAR can be completely read-only.  If it were, we
>> wouldn't
>> have any way to determine its size.  We believe it is 32K in size:
>>
>>    pci 0000:00:04.0: reg 10: [mem 0xfed98000-0xfed9ffff 64bit]
>>
>> so we should have written 0xffffffff to the low 32 bits of the BAR and
>> read
>> back 0xffff8004 (32K = 2^15, so the low-order 15 bits should be read-only,
>> including the prefetchable bit (0), the type bits (10 for 64-bit), and the
>> memory space indicator (0)).

because it is locked down by BIOS to chipset, readback should be 0xfed98004.

and pci_size will return 32k for 0xfed98000.

>>
>> Can you experiment with setting that BAR manually, e.g., by running these
>> commands as root:
>>
>>    # setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
>>    # setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
>>    # setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1
>
>
>
> planet:0[root]>setpci -s 00:04.0 COMMAND BASE_ADDRESS_0 BASE_ADDRESS_1
> 0006
> fed98004
> 00000000
> planet:0[root]>setpci -s 00:04.0 BASE_ADDRESS_0=0xdfa00000
> planet:0[root]>setpci -s 00:04.0 BASE_ADDRESS_0 BASE_ADDRESS_1
> fed98004
> 00000000
> planet:0[root]

BIOS _CRS

[    1.956431] PCI: Using host bridge windows from ACPI; if necessary,
use "pci=nocrs" and report a bug
[    1.956921] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3e])
[    1.957225] pci_root PNP0A08:00: host bridge window [io  0x0000-0x0cf7]
[    1.957227] pci_root PNP0A08:00: host bridge window [io  0x0d00-0xffff]
[    1.957229] pci_root PNP0A08:00: host bridge window [mem
0x000a0000-0x000bffff]
[    1.957232] pci_root PNP0A08:00: host bridge window [mem
0x000d4000-0x000d7fff]
[    1.957234] pci_root PNP0A08:00: host bridge window [mem
0x000d8000-0x000dbfff]
[    1.957235] pci_root PNP0A08:00: host bridge window [mem
0x000dc000-0x000dffff]
[    1.957237] pci_root PNP0A08:00: host bridge window [mem
0xdfa00000-0xfeafffff]
[    1.957239] pci_root PNP0A08:00: host bridge window [mem
0xfed40000-0xfed44fff]

and it does not include 0xfed98000 for the 00:04.0, so kernel can not
reserve it,
and kernel will reject it and try to get new range for 00:04.0 but
bios lock down
that in chipset setting register.

so try get one update bios that could return resource for root bus.

or you could try to live with booting with pci=nocrs before your get new BIOS.

Thanks

Yinghai
Comment 14 W Unruh 2012-07-07 07:15:34 UTC
On Sat, 7 Jul 2012, bugzilla-daemon@bugzilla.kernel.org wrote:

> --- Comment #12 from Bjorn Helgaas <bhelgaas@google.com>  2012-07-07 00:44:21
> ---
> Can you also include the output of "lspci -xxx -s00:00.0"?  The "2nd
> Generation
> Intel Core Processor Family Mobile" spec (vol 2) mentions an MCHBAR at 48-4Fh
> 00:00.0 config space that seems related to the PCU, which seems related to
> thermal management.

planet:0[root]>lspci -xxx -s 00:00.0
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family
DRAM Controller (rev 09)
00: 86 80 04 01 06 00 90 20 09 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 f7 10 38 83
30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00
40: 01 90 d1 fe 00 00 00 00 01 00 d1 fe 00 00 00 00
50: 11 02 00 00 91 00 00 00 00 00 00 00 01 00 00 db
60: 05 00 00 f8 00 00 00 00 01 80 d1 fe 00 00 00 00
70: 00 00 00 fe 00 00 00 00 00 0c 00 fe 7f 00 00 00
80: 10 11 11 01 00 00 11 00 1a 00 00 00 00 00 00 00
90: 01 00 00 00 01 00 00 00 01 00 50 1e 01 00 00 00
a0: 01 00 00 00 01 00 00 00 01 00 60 1e 01 00 00 00
b0: 01 00 a0 db 01 00 80 db 01 00 00 db 01 00 a0 df
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 09 00 0c 01 9e 61 00 e2 90 00 08 14 00 00 00 00
f0: 00 00 00 00 00 00 00 00 b8 0f 06 00 00 00 00 00
Comment 15 W Unruh 2012-07-07 07:22:25 UTC
On Fri, 6 Jul 2012, Yinghai Lu wrote:
>
> so try get one update bios that could return resource for root bus.
>
> or you could try to live with booting with pci=nocrs before your get new
> BIOS.

Where would I get a new BIOS ? Has Panasonic released a bios update for this
machine that fixes this problem?

I guess the other possibility is to live with the error message, or is this
error part of the reason why my machine has such bad battery use? (less than 
1/2 the time that Windows gets on the same machine).
Comment 16 W Unruh 2012-07-07 11:09:28 UTC
On Sat, 7 Jul 2012, bugzilla-daemon@bugzilla.kernel.org wrote:
>
> Huh.  I don't know what's going on.  If you can recompile a kernel, would you
> mind trying this patch and attaching the resulting dmesg log?  It won't fix
> anything, but maybe we can see what happens when we size the BAR.

right now I'm at a coference in which my computer is needed, so I will not be
able to do so for a while.
Comment 17 Bjorn Helgaas 2012-07-07 13:42:01 UTC
> because it is locked down by BIOS to chipset, readback should be 0xfed98004.
>
> and pci_size will return 32k for 0xfed98000.

A device with a read-only BAR doesn't conform to the PCI spec.  We
can't determine how much space the device consumes.
It's just an accident that BIOS put it at an address that we happen to
interpret as 32K.  We have no idea if the device consumes 4K, 8K, 16K,
or 32K.

> so try get one update bios that could return resource for root bus.
>
> or you could try to live with booting with pci=nocrs before your get new
> BIOS.

A BIOS update is not a useful answer.  The point is that we need
better reporting of the situation so we don't have to spend all this
time debugging this issue again.  We need to figure out how to
identify this as a non spec-compliant device and continue as best we
can.  We probably need to report it to the user because if there's a
device of unknown size consuming address space, we're liable to cause
a conflict by placing another device on top of it.
Comment 18 Yinghai Lu 2012-07-07 19:02:21 UTC
On Sat, Jul 7, 2012 at 12:22 AM, Bill Unruh <unruh@physics.ubc.ca> wrote:
> On Fri, 6 Jul 2012, Yinghai Lu wrote:
> Where would I get a new BIOS ? Has Panasonic released a bios update for this
> machine that fixes this problem?

looks like they do not bios update for S10 yet.

http://www.panasonic.com/business/toughbook/computer-support-bios.asp#CF-U1

maybe you can try to email or call their support.

>
> I guess the other possibility is to live with the error message, or is this
> error part of the reason why my machine has such bad battery use? (less than
> 1/2 the time that Windows gets on the same machine).

not sure, You can try to boot with pci=nocrs to see if there is any difference.

Thanks

Yinghai
Comment 19 Bjorn Helgaas 2012-07-09 17:26:42 UTC
> looks like they do not bios update for S10 yet.
>
> http://www.panasonic.com/business/toughbook/computer-support-bios.asp#CF-U1
>
> maybe you can try to email or call their support.

Please stop suggesting a BIOS upgrade.  The BIOS is totally out of our
control, and if the current BIOS works with Windows but not Linux,
99.99% of users will just run Windows.  Those users will just give up
on Linux, and we'll never even hear about it.

Fortunately, Bill is a sophisticated user who cared enough to report
the issue, so we have a chance to do something about it.  What we need
is for Linux to deal with this situation better, and that *IS* in our
control.

For example, we could detect the BAR being read-only when we size it.
We could use the read-only address to derive the worst-case BAR size
(32K in this case).  We could mark it as IORESOURCE_PCI_FIXED and use
that to avoid trying to move it.  We could log a message to this
effect so the next time it will take 10 minutes instead of 6 weeks to
diagnose the problem.

The end result for Bill will be the same (an unexpected message while
booting), but we can at least make the message intelligible rather
than "BAR 0: error updating (0xdfa00004 != 0xfed98004)".
Comment 20 Bjorn Helgaas 2012-07-09 17:54:12 UTC
Created attachment 75151 [details]
rdwrmem.c

Per this message and the /proc/iomem contents:

  pci 0000:00:04.0: BAR 0: assigned [mem 0xdfa00000-0xdfa07fff 64bit]

Linux believes it has relocated the device to 0xdfa00000.  But the setpci experiment shows that the BAR still contains 0xfed98000.  If you compile the attached program and run it as follows:

  # ./rdwrmem -b4 -l32768 -s0xdfa00000 -m
  # ./rdwrmem -b4 -l32768 -s0xfed98000 -m
  # setpci -s 00:04.0 COMMAND=0
  # ./rdwrmem -b4 -l32768 -s0xfed98000 -m

I suspect we'll see nothing at 0xdfa00000 (all 0xffffffff data), useful data at 0xfed98000 the first time, and probably nothing at 0xfed98000 after disabling memory decoding.

Bill, can you try this and attach the output?  If this theory is correct, we should at least change Linux so we don't pretend that we relocated the device when we haven't.
Comment 21 W Unruh 2012-07-09 20:23:28 UTC
>
>  # ./rdwrmem -b4 -l32768 -s0xdfa00000 -m

All 2048 lines like
DFA00000:FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

>  # ./rdwrmem -b4 -l32768 -s0xfed98000 -m

First few lines are
mmap 0xfed98000 -> 0xb77c7000
FED98000:00000000 00000930 00000006 00083235
FED98010:01F01F28 01F01E29 00000000 00000000
FED98020:0001811F 0001811F 00000000 00000000
FED98030:000000FF 00000A08 00000000 00000000

>  # setpci -s 00:04.0 COMMAND=0

Nothing returned

>  # ./rdwrmem -b4 -l32768 -s0xfed98000 -m

First 4 lines
mmap 0xfed98000 -> 0xb775c000
FED98000:FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FED98010:FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FED98020:FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FED98030:FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

and all the rest the same.


>
> I suspect we'll see nothing at 0xdfa00000 (all 0xffffffff data), useful data
> at
> 0xfed98000 the first time, and probably nothing at 0xfed98000 after disabling
> memory decoding.
>
> Bill, can you try this and attach the output?  If this theory is correct, we
> should at least change Linux so we don't pretend that we relocated the device
> when we haven't.

Do you really want the full output or is what I gave you sufficient?

>
>
Comment 22 Bjorn Helgaas 2012-07-09 20:26:22 UTC
> Do you really want the full output or is what I gave you sufficient?

That's perfect; the output you included is sufficient to confirm my hypothesis.  Thanks!
Comment 23 Bjorn Helgaas 2012-07-16 22:14:29 UTC
Created attachment 75511 [details]
check for read-only BARs

Bill, if you're able to recompile your kernel, can you test the attached patch?
Comment 24 Bjorn Helgaas 2012-07-30 15:39:35 UTC
> Bill, if you're able to recompile your kernel, can you test the attached
> patch?

Ping?  Let me know if it would help for me to build a kernel image for
you.  Thanks!
Comment 25 W Unruh 2012-08-04 12:06:47 UTC
That might help. I have been travelling like mad again and have not found time
to do it. Thanks

Bill
On Mon, 30 Jul 2012, bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=43331
>
>
>
>
>
> --- Comment #24 from Bjorn Helgaas <bhelgaas@google.com>  2012-07-30 15:39:35
> ---
>> Bill, if you're able to recompile your kernel, can you test the attached
>> patch?
>
> Ping?  Let me know if it would help for me to build a kernel image for
> you.  Thanks!
>
>
Comment 26 Bjorn Helgaas 2012-08-06 23:21:08 UTC
OK, give this a try.  It likely won't have all the drivers for your machine, but if it has enough for your root filesystem, that should be all we need for this test.

http://helgaas.com/linux/kbz43331/bzImage
Comment 27 Bjorn Helgaas 2012-10-01 20:54:31 UTC
Ping, Bill, let us know if you have a chance to try the patch in comment #23 (or the kernel from comment #26, which includes the patch).
Comment 28 Bjorn Helgaas 2014-06-05 23:44:26 UTC
Created attachment 138311 [details]
proposed patch

To summarize: device 00:04.0 has a BAR that is read-only.  Since the BAR is read-only, we cannot accurate determine its size.  We assume it is 32K in size based on the lowest-order bit set in the BAR address, but it could be smaller (16K, 8K, 4K, etc.):

  pci 0000:00:04.0: reg 10: [mem 0xfed98000-0xfed9ffff 64bit]
  pci 0000:00:04.0: no compatible bridge window for [mem 0xfed98000-0xfed9ffff 64bit]
  pci 0000:00:04.0: BAR 0: assigned [mem 0xdfa00000-0xdfa07fff 64bit]
  pci 0000:00:04.0: BAR 0: error updating (0xdfa00004 != 0xfed98004)

The BIOS happened to leave the BAR programmed to an address outside all host bridge apertures, so we tried to move it into an aperture (Windows also tries to move it).

There is an ACPI PNP0c02 device at the same address:

  pnp 00:0e: [mem 0xfed98000-0xfed9ffff]
  system 00:0e: Plug and Play ACPI device, IDs PNP0C02 (active)

This is probably the result of a BIOS defect.  I think the BIOS intent was to hide the PCI device and expose it only as the PNP0C02 device.

I think I'll propose the attached patch to notice that it is read-only and mark it as IORESOURCE_PCI_FIXED.  This is basically the same as what I proposed in comment #23.
Comment 29 Martin Lucina 2014-08-28 16:03:44 UTC
Hi,

I also have this machine (Panasonic CF-S10), albeit with a newer BIOS:

BIOS Information
        Vendor: American Megatrends Inc.
        Version: V3.00L10
        Release Date: 07/22/2011

I've been seeing this exact same error and just now came across this bug; have tried the attached patch against the 3.14 longterm kernel which is what I'm currently using and it does indeed make the error go away.  However I do get a bunch more messages about PCI device 00:00:04 with the patch.  Would you like me to send dmesg logs with/without the patch?  Shall I use 3.14 or the newest kernel?

Note to the original submitter: I never had to use any special pci options on the machine (had it since kernel 3.2.x) and the power management has massively improved since, the biggest gain was with the Intel P-State driver CONFIG_X86_INTEL_PSTATE and the support for RC6 power saving states in the Intel graphics drivers.
Comment 30 Bjorn Helgaas 2014-10-10 17:14:56 UTC
[+cc Myron]

Hi Martin, I had second thoughts about the approach in the comment #23 patch.  I now think it would be better to completely ignore that read-only BAR and pretend it doesn't exist at all because (1) we can only guess at the size, (2) we can't change it, and (3) there's a similar defect where the register at the architected BAR location does not contain an address at all (see erratum HSE43 in [1]).

If we pretend the BAR doesn't exist, we will avoid all the resource management headaches, which is probably what causes the rest of the 00:04.0 messages you're seeing.

Myron is working on a patch to do this; when he posts it, it'd be great if you could give it a whirl.

[1] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf
Comment 31 Martin Lucina 2014-10-13 14:17:14 UTC
Hi Bjorn. Sure, I will test the patch once it is ready. I may have to test against a 3.14 kernel as I had problems getting newer kernels to cooperate with the Debian wheezy userspace installed on my machine, however that should have no impact on your patch.
Comment 32 Myron Stowe 2014-10-30 22:07:22 UTC
Martin:

I posted a patch to the linux-pci list that should catch such occurrences in the future:
  https://lkml.org/lkml/2014/10/30/637

Please test when you get time and let me know your results.  When testing, it would be good if you could capture 'dmesg' logs from before and after and then compare those against each other to make sure I didn't introduce any unwanted regressions.  I've done sucn comparisons on my machines but one can never test enough.

Thanks,
 Myron


From "[PATCH 0/3] PCI: Fix detection of read-only BARs" -

While non-conformant, PCI devices having read-only (r/o) BARs - registers
that when read return fixed, non-zero, values regardless of whether or
not their being sized - occasionally turn up.  Pre-git commit 1307ef662199
[1] was the initial attempt to detect such BARs.  The detection mechanism
used however ended up exposing further unexpected behaviors on enough
devices that it had to be reverted [2].

A subsequent solution, which is still currently in use, was put in place
with pre-git commit 2c79a80ab7b7 ("[PCI] Correctly size hardwired empty
BARs").  This solution's logic detects r/o BARs via the following (see
'pci_size()'):
        /* base == maxbase can be valid only if the BAR has
           already been programmed with all 1s.  */
        if (base == maxbase && ((base | size) & mask) != mask)
                return 0;

Later, commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") was
introduced, re-factoring PCI's core BAR sizing logic.  The commit altered
__pci_read_base's local variable 'l', stripping off its lower,
non-addressing related bits, prior to being passed in as the 'base'
parameter to pci_size().  This masking broke the r/o BAR detection logic's
first comparison check for r/o BARs that have any of their lower order
bits, bits that are not part of a BARs "base address" field, set.  For
such cases, the 'base == maxbase' comparison check will no longer ever be
"true".

This series restores the r/o BAR detection logic so that it will once
again catch, and ignore, such occurrences as have been encountered to
date:
  - AGP aperture BAR of AMD-7xx host bridges; if the AGP window
    disabled, this BAR is read-only and read as 0x00000008 [1]
  - BAR0-4 of ALi IDE controllers can be non-zero and read-only [1]
  - Intel Sandy Bridge - Thermal Management Controller [8086:0103];
    BAR 0 returning 0xfed98004 [3]
  - Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
    Bar 0 returning 0x00001a [4]


[1] From Thomas Gleixner's "Linux kernel history" repository:
    https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9
    pre-git commit 1307ef662199  "PCI: probing read-only Bars"
[2] Pre-git commit 182d090b9dfe  "Undo due to weird behaviour on various
    boxes"
[3] https://bugzilla.kernel.org/show_bug.cgi?id=43331
[4] https://bugzilla.kernel.org/show_bug.cgi?id=85991
Comment 33 Martin Lucina 2014-11-03 12:50:35 UTC
Created attachment 156381 [details]
dmesg comparison with Myron's patch on 3.17.2
Comment 34 Martin Lucina 2014-11-03 12:52:22 UTC
Myron,

I've tested your patch against the 3.17.2 kernel and it seems to work, with nothing untoward showing in dmesg (that wasn't there before). See the attached diff of the relevant portion.

Is there anything else you'd like me to test which might be affected by the patch?

Thanks,

Martin
Comment 35 Bjorn Helgaas 2014-11-03 17:20:22 UTC
Martin, that dmesg diff looks perfect.  Previously "lspci -vs00:04.0" probably showed something like:

  00:04.0
    Memory at fed98000 (64-bit, non-prefetchable) [size=32K]

With Myron's patch, that memory region should disappear from the lspci output.
Comment 36 Martin Lucina 2014-11-03 17:39:11 UTC
Bjorn,

Before:

  Memory at dfa00000 (64-bit, non-prefetchable) [size=32K]

After:

  Memory at <ignored> (64-bit, non-prefetchable)
Comment 37 Myron Stowe 2014-11-03 19:13:31 UTC
Right, the BAR is now recognized as invalid -

  + pci 0000:00:04.0: [Firmware Bug]: reg 0x10: invalid BAR (can't size)

and subsequently, the kernel no longer tries to allocate resources for it -

  - pci 0000:00:04.0: can't claim BAR 0 [mem 0xfed98000-0xfed9ffff 64bit]: no compatible bridge window
    ...
  - pci 0000:00:04.0: BAR 0: assigned [mem 0xdfa00000-0xdfa07fff 64bit]
  - pci 0000:00:04.0: BAR 0: error updating (0xdfa00004 != 0xfed98004)

Thanks Martin!
Comment 38 Martin Lucina 2014-11-03 20:30:23 UTC
You're welcome. Will this patch make it into any of the LTS kernels? (3.14?)
Comment 39 Myron Stowe 2014-11-03 20:44:59 UTC
Hay Martin:

Interesting question.  Technically, the breakage goes all the way back to v2.6.27 -

  13:38:29 zim:~/kernels/linux% git tag --contains 6ac665c63dca
  v2.6.27
  ...

My initial thought is: assuming this makes it upstream, I would like it to get some "soak time" to make sure nothing else pops out as a regression before we consider LTS kernels since the fix is in such core, low-level, code.

I'll discuss this with Bjorn.

Myron
Comment 40 Bjorn Helgaas 2015-02-19 14:55:28 UTC
Fixed by 36e8164882ca ("PCI: Restore detection of read-only BARs"), which appeared in v3.19-rc1.

06cf35f903aa ("PCI: Handle read-only BARs on AMD CS553x devices") fixed a related problem exposed by 36e8164882ca and appeared in v3.19.

Kudos to Myron!  This was a long haul but I think we ended up with a nice clean fix.