Bug 51331 - pci address space collision on Stratus ftServer 4500 (usb and network are not recognized)
Summary: pci address space collision on Stratus ftServer 4500 (usb and network are not...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-05 17:23 UTC by Fadeeva Marina
Modified: 2013-01-04 09:22 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.6.6
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
log from non-working 3.6.6 kernel with no network/usb detected (67.82 KB, text/plain)
2012-12-05 17:23 UTC, Fadeeva Marina
Details
kernel boot log from working 2.6.32 kernel (80.29 KB, text/plain)
2012-12-05 17:24 UTC, Fadeeva Marina
Details
dmidecode from ftserver 4500 (20.83 KB, text/plain)
2012-12-10 20:01 UTC, Fadeeva Marina
Details
lspci from ftserver 4500 (8.13 KB, text/plain)
2012-12-18 11:04 UTC, Fadeeva Marina
Details

Description Fadeeva Marina 2012-12-05 17:23:46 UTC
Created attachment 88521 [details]
log from non-working 3.6.6 kernel with no network/usb detected

Hello,

Hope this is a right place to ask.

I've found that with 3.X (vanilla and Fedora 17) kernels network cards and usb devices are not recognized on Stratus ftserver 4500 system.

Kernel reports many 'address space collision' that seems to be the reason why the devices are not seen.

The problem appears with all 3.X kernels, 2.6.32 kernel boots and works fine on this hardware (do not know about other kernel versions).

Attached you can find 3.6.6 kernel boot log where no network(igb)/usb devices are recognized and from working 2.6.32 from the same system.

Could you please kindly advice how to solve this issue? I'm ready to provide any required information on this matter.

Thank you in advance.
Comment 1 Fadeeva Marina 2012-12-05 17:24:19 UTC
Created attachment 88531 [details]
kernel boot log from working 2.6.32 kernel
Comment 2 Bjorn Helgaas 2012-12-06 18:44:30 UTC
> https://bugzilla.kernel.org/show_bug.cgi?id=51331

> I've found that with 3.X (vanilla and Fedora 17) kernels network cards and
> usb
> devices are not recognized on Stratus ftserver 4500 system.
>
> Kernel reports many 'address space collision' that seems to be the reason why
> the devices are not seen.
>
> The problem appears with all 3.X kernels, 2.6.32 kernel boots and works fine
> on
> this hardware (do not know about other kernel versions).

Your 2.6.32 kernel detects the 03:01.0 bridge that leads to the igb device:

      pci 0000:03:01.0: PCI bridge to [bus 0e-11]
      pci 0000:0e:1c.0: PCI bridge to [bus 0f]
      pci 0000:0f:00.0: reg 10 32bit mmio: [0x8a140000-0x8a15ffff]
      igb 0000:0f:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19

but the 3.6.6 kernel doesn't find 03:01.0 or the devices behind it.

This looks like the problem fixed by this commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=284f5f9dbac170b054c1e386ef92cbf654e91bba

That commit appeared in v3.5, so it should be in your 3.6.6 kernel.
It adds a quirk that scans all devices when the DMI SYS_VENDOR matches
"ftServer".  That quirk should be turned on automatically on your box,
but for some reason, your 3.6.6 kernel didn't find the correct DMI
information.

The 2.6.32 kernel found this:

    DMI: Stratus ftServer 4500/G7KRV, BIOS BIOS Version 5.0:17 10/19/2010

which would match the quirk.  But the 3.6.6 kernel didn't print this
DMI information.  Was that boot on the same BIOS version?

The 3.6.6 log looks like a console capture, and it omits KERN_DEBUG
messages.  Can you boot 3.6.6 with the "early_ioremap_debug
pci=pcie_scan_all" flags and attach a dmesg dump (which will include
KERN_DEBUG messages)?
Comment 3 Fadeeva Marina 2012-12-10 19:18:47 UTC
Yes, I've already found this myself. 
pci=pcie_scan_all parameter allows 3.6.6. kernel to boot.
I had to add the below quirk to the kernel to avoid passing the parameter:

diff -up linux/arch/x86/pci/common.c.stratus linux/arch/x86/pci/common.c
--- linux/arch/x86/pci/common.c.stratus 2012-12-10 18:50:44.235982802 +0300
+++ linux/arch/x86/pci/common.c 2012-12-10 18:56:23.996856200 +0300
@@ -436,6 +436,14 @@ static const struct dmi_system_id __devi
                        DMI_MATCH(DMI_SYS_VENDOR, "ftServer"),
                },
        },
+       {
+               .callback = set_scan_all,
+               .ident = "Stratus ftServer 4500",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Stratus"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "ftServer 4500"),
+               },
+       },                                                                                                                                  
        {}                                                                                                                                  
 };
Comment 4 Fadeeva Marina 2012-12-10 20:01:46 UTC
Created attachment 88861 [details]
dmidecode from ftserver 4500
Comment 5 Myron Stowe 2012-12-18 01:44:14 UTC
Hi Fadeeva:

Bjorn and I have been looking into this and have a couple of options that we are considering that should resolve the issue for you.

Would it be possible for you to attach a couple of logs from the offending machine?  If so I would like to see the output of 'lspci -t' and 'lspci -vv'.

Thanks,
 Myron
Comment 6 Fadeeva Marina 2012-12-18 11:03:56 UTC
Hi Myron,

Sorry, I do not have access to the Stratus servers anymore.
I only have 'lspci -m -vvv' output from ftserver 4500. The output was taken when running rhel6 with 2.6.32 kernel.

btw, I also noticed that ftserver 6310 also requires pci=pcie_scan_all parameter to boot 3.6 kernel, so, I've added the below quirk to pci/common.c to avoid passing the parameter for the kernel.

+       {
+               .callback = set_scan_all,
+               .ident = "Stratus ftServer 6310",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Stratus"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "ftServer 6310"),
+               },
+       },

with the above change it boots OK.
Comment 7 Fadeeva Marina 2012-12-18 11:04:44 UTC
Created attachment 89411 [details]
lspci from ftserver 4500
Comment 8 Myron Stowe 2012-12-21 00:16:52 UTC
Thanks Fadeeva.  I submitted a patch to linux-pci yesterday - https://lkml.org/lkml/2012/12/19/358 - that should solve the issue upstream.

We toyed around with the idea of reverting commit 284f5f9 completely along with the only_one_child() parts of commit f07852d.  That approach did work but we decided in the end to just augment the existing Stratus quirk in a similar manner as you did.

I ended up augmenting the existing Stratus quirk as opposed to introducing new quirks that were specifically targeted at various model numbers as DMI match uses strstr() (in dmi_matches()), so "ftServer" should end up matching all Stratus 'ftServer' platforms.

Thanks for the help in tracking this down,
 Myron
Comment 9 Fadeeva Marina 2012-12-21 07:20:20 UTC
Hi Myron,

Great! Thank you!
Comment 10 Florian Mickler 2013-01-04 09:22:42 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc2:

commit 1278998f8ff6d66044ed00b581bbf14aacaba215
Author: Myron Stowe <myron.stowe@redhat.com>
Date:   Wed Dec 26 10:39:23 2012 -0700

    PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)

Note You need to log in before you can comment on or make changes to this bug.