Bug 200769 - 0 resource size pci devices found and break the calculation of bridge window's size
Summary: 0 resource size pci devices found and break the calculation of bridge window'...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-09 06:25 UTC by AceLan Kao
Modified: 2018-08-23 08:50 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Initial patch for the issue (1.35 KB, application/mbox)
2018-08-09 06:25 UTC, AceLan Kao
Details
lspci -vvnn (262.10 KB, text/plain)
2018-08-09 06:26 UTC, AceLan Kao
Details
dmesg (124.52 KB, text/plain)
2018-08-09 06:26 UTC, AceLan Kao
Details
dmesg with patch applied and print out the r_size (128.76 KB, text/plain)
2018-08-09 06:28 UTC, AceLan Kao
Details

Description AceLan Kao 2018-08-09 06:25:57 UTC
Created attachment 277767 [details]
Initial patch for the issue

There are some 0 resource size pci devices, and it leads to the
accumulator fails to maintain the correct value.
It results in a strange issue on my machine that xhci_hcd failed to init.
[    2.437278] xhci_hcd 0000:05:00.0: init 0000:05:00.0 fail, -16
[    2.437300] xhci_hcd: probe of 0000:05:00.0 failed with error -16
    
To fix this, check if the resource size equals to 0, don't increase size.
Comment 1 AceLan Kao 2018-08-09 06:26:33 UTC
Created attachment 277769 [details]
lspci -vvnn
Comment 2 AceLan Kao 2018-08-09 06:26:58 UTC
Created attachment 277771 [details]
dmesg
Comment 3 AceLan Kao 2018-08-09 06:28:24 UTC
Created attachment 277773 [details]
dmesg with patch applied and print out the r_size
Comment 4 AceLan Kao 2018-08-10 01:55:19 UTC
From my colleague Ivan Hu:

Checking the Base Address registers(BARs) of the PCI devices by tool RU on UEFI shell enviroments, which means under bios boot environment, haven't booted to OS yet. Find a lot of BARs of PCI devices are zero, such as,
B0.D5.F0, B0.D5.F2, B0.D8.F0-2, B0.D11.F0-1, B14.D5.F0, B14.D5.F2, B14.D8.F0-7,...

And I try to set all-one value to offset 0x10(BARs) on B0.D5.F0, and got 0 return, looks like device's requested memory size in an encoded form is 0. I think that's why the BARs are zero.
Comment 5 Bjorn Helgaas 2018-08-15 00:24:55 UTC
> And I try to set all-one value to offset 0x10(BARs) on B0.D5.F0, and
> got 0 return, looks like device's requested memory size in an encoded
> form is 0.

That's incorrect.  If you read zero back after writing all ones, it doesn't mean the requested memory size it zero; it means the BAR is unimplemented.  See PCIe r4.0, sec 7.5.1.2.1.

Your dmesg log shows no BARs for 00:05.0:

  [    0.889609] pci 0000:00:05.0: [8086:2024] type 00 class 0x088000
  [    0.890524] pci 0000:00:05.2: [8086:2025] type 00 class 0x088000

and your lspci output also correctly shows no BARs.

Note that this is for "00:05.0 System peripheral [0880]: Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers", not the Thunderbolt USB device at 05:00.0.
Comment 6 AceLan Kao 2018-08-21 02:38:57 UTC
Bjron, 

Do you have any idea how we can fix the issue introduced by this commit>
   c9c75143a596 PCI: Fix calculation of bridge window's size and alignment

It doesn't consider the r_size == 0 case, and increase the size.
+                       size += max(r_size, align);
Comment 7 Bjorn Helgaas 2018-08-21 14:04:10 UTC
That loop looks like this:

  for (i = 0; i < PCI_NUM_RESOURCES; i++) {
    struct resource *r = &dev->resource[i];

    if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
        ((r->flags & mask) != type && ...))
            continue;
    r_size = resource_size(r);

The question here is why we didn't take the "continue" branch.  We apparently think we have a valid resource (with r->flags that probably indicate a MEM resource) of size zero.  But I don't think that should happen.

If IORESOURCE_MEM is set, the size should be non-zero.  Otherwise, drivers will try to use this zero-sized resource, e.g., via ioremap(), which isn't going to work.

Your log from comment #2 shows that when we initially size the BAR, we find a 64KB memory BAR:

[    1.171947] pci 0000:05:00.0: [8086:15d4] type 00 class 0x0c0330
[    1.171979] pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff]
...
[    1.401685] pci 0000:05:00.0: BAR 0: no space for [mem size 0x00010000]
[    1.401687] pci 0000:05:00.0: BAR 0: failed to assign [mem size 0x00010000]

The "no space" line means we don't think we have space for the BAR.  We need to preserve the size so we know how much space the BAR requires.

But since you're seeing r_size == 0, it seems like we must have cleared out the size.  We need to figure out where that happens and why and probably change that code.
Comment 8 AceLan Kao 2018-08-23 08:50:19 UTC
After we found the below commit introduced this issue, my first thought is to preserve it's previous code flow, and before applied below commit, it runs "size += r_size", so I think maybe we should keep it that way. That's why I didn't take the "continue" branch.
   c9c75143a596 PCI: Fix calculation of bridge window's size and alignment

I'll spend some time to do further debugging, and try if "continue" branch works better.

Note You need to log in before you can comment on or make changes to this bug.