Bug 42606

Summary: PCI I/O range not assigned under newer kernels
Product: Drivers Reporter: Martin Burnicki (martin.burnicki)
Component: PCIAssignee: Bjorn Helgaas (bjorn)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, bjorn, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 vs. 2.6.36.1/3.1.4 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg log from kernel 2.6.32.6 (Knoppix 6.2.1 based on Debian squeeze/sid)
dmesg log from kernel 3.1.4 (Ubuntu 10.10)
dmesg log from kernel 2.6.32-36-generic (Ubuntu 10.10)
dmesg log from a 3.1.4 kernel with the card in a slot which works
Windows resource info (from AIDA64)
quirk to ignore _CRS on Supermicro X8DTH-i/6/iF/6F

Description Martin Burnicki 2012-01-19 11:22:09 UTC
Created attachment 72121 [details]
dmesg log from kernel  2.6.32.6 (Knoppix 6.2.1 based on Debian squeeze/sid)

Observed a problem with a PCI Express card in a certain server PC with 8 PCI Express slots on the mainboard. There are no other PCI slots than the PCIe ones, and all slots are empty except the one with the card under test, which requires a legacy I/O address range for base address 0.

In some of the slots the card works properly, but in some other slots on the same
machine the card doesn't work since it is not accessible by the associated kernel
driver. In case it doesn't work lspci -v says for the I/O base address 0:

 Region 0: I/O ports at <ignored>

Found out that this problem did *not* occur with kernels 2.6.32.6 and earlier, but *does* occur with kernels 2.6.36.1 and later, up to at least 3.4.1, on the same hardware.

Yet I didn't try more kernel versions between 2.6.32 and 2.6.36 to identify in which version exactly the problem started to occur.

Appending dmesg output from kernels 2.6.32.6 (now) and 3.1.4 (later) as requested by Bjorn.
Comment 1 Martin Burnicki 2012-01-19 11:26:49 UTC
Created attachment 72122 [details]
dmesg log from kernel  3.1.4 (Ubuntu 10.10)

Another dmesg log from a more recent kernel where no I/O address range is assigned.
Comment 2 Martin Burnicki 2012-01-23 15:04:02 UTC
Created attachment 72171 [details]
dmesg log from kernel 2.6.32-36-generic (Ubuntu 10.10)

dmesg log from a 2.6.32 kernel where the problem doesn't occur.
Replaces an earlier log which was incomplete and thus didn't contain all the required information.
Comment 3 Martin Burnicki 2012-01-23 17:01:38 UTC
Created attachment 72176 [details]
dmesg log from a 3.1.4 kernel with the card in a slot which works
Comment 4 Bjorn Helgaas 2012-01-23 19:45:20 UTC
Created attachment 72180 [details]
Windows resource info (from AIDA64)

We have two host bridges; both report the same [io  0xf000-0xffff] region:

    ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f])
    pci_root PNP0A08:00: host bridge window [io  0x0d00-0xefff]
    pci_root PNP0A08:00: host bridge window [io  0xf000-0xffff]
    ACPI: PCI Root Bridge [BR50] (domain 0000 [bus 80-fb])
    pci_root PNP0A08:01: host bridge window [io  0xf000-0xffff]

When the card is below the PCI0 bridge, BIOS puts it in the [io  0x0d00-0xefff] range, which works (3.1.4 dmesg log from attachment #72176 [details]):

    pci 0000:00:07.0: PCI bridge to [bus 04-04]
    pci 0000:00:07.0:   bridge window [io  0xd000-0xdfff]
    pci 0000:04:00.0: reg 10: [io  0xdc00-0xdcff]

When the card is below BR50, BIOS puts it in the [io  0xf000-0xffff] range, which Linux believes is assigned to PCI0 and unavailable for BR50 (3.1.4 dmesg log from attachment #72122 [details]):

    pci 0000:80:07.0: PCI bridge to [bus 85-85]
    pci 0000:80:07.0:   bridge window [io  0xf000-0xffff]
    pci 0000:80:07.0: address space collision: [io  0xf000-0xffff] conflicts with PCI Bus 0000:00 [io  0xf000-0xffff] 
    pci 0000:85:00.0: reg 10: [io  0xfc00-0xfcff]
    pci 0000:85:00.0: no compatible bridge window for [io  0xfc00-0xfcff]

The attached AIDA64 dump shows that Windows Server 2008 accepts the BR50 configuration (device at 85:00.0, with ports 0xfc00-0xfcff), and it works.

I think the PCI0 [io  0xf000-0xffff] _CRS descriptor is likely a BIOS bug.  No devices or bridges below PCI0 use that range.

We could consider relaxing the address space collision check, at least at the host bridge level.

Booting with "pci=nocrs" is a workaround.
Comment 5 Martin Burnicki 2012-02-01 11:44:43 UTC
Anything else i could/should do?
Comment 6 Alan 2012-08-30 13:52:23 UTC
Was this ever resolved or left with the workaround ?
Comment 7 Martin Burnicki 2012-08-30 14:09:26 UTC
I don't know since there was no more reply to my latest comment. :-(
Comment 8 Bjorn Helgaas 2014-06-03 21:47:44 UTC
Can you check whether there are any BIOS updates available?  I know that's not an ideal solution, and theoretically if Windows works, Linux should work too.  But I'm not sure how we could work around this in a generic way.
Comment 9 Martin Burnicki 2014-06-04 20:57:47 UTC
Bjorn, just a quick note right now:
I'm actually out on vacation so I'm unable to check this. I'll come back to this when I'm back at the office after June 16.
Comment 10 Bjorn Helgaas 2016-10-28 21:04:06 UTC
This looks like a BIOS bug on Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.0a    09/29/2010.

If this is still an issue, we might consider some kind of DMI-based quirk to remove [io  0xf000-0xffff] from the PCI0 _CRS.
Comment 11 Martin Burnicki 2016-10-31 09:14:03 UTC
Bjorn,

the workaround you mentioned in comment 4, "Booting with "pci=nocrs", really helped to get the PCI cards working again.

I'm not sure if a workaround for such a specific BIOS bug should generally be added to the Linux kernel, as long as "pci=nocrs" is still supported by the kernel to fix this locally, if required.

So I think this issue can be closed.

Thanks for your help to fiddle this out.

Martin
Comment 12 Bjorn Helgaas 2016-12-28 21:14:14 UTC
Created attachment 248861 [details]
quirk to ignore _CRS on Supermicro X8DTH-i/6/iF/6F

Hi Martin, here's a patch that basically turns on "pci=nocrs" automatically on this system.  If you have time to test it, that'd be great.
Comment 13 Bjorn Helgaas 2017-01-25 22:31:02 UTC
The comment #12 patch is in Linus' tree [1] and appeared in v4.10-rc5, so I'm going to close this as resolved.  Please reopen it if you still need to boot with "pci=nocrs" -- it's possible that I got the DMI strings wrong.  If that's the case, please attach the output of "sudo dmidecode" so I can fix the quirk strings.

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=89e9f7bcd874