Bug 10583

Summary: x86 PCI dmi_check_system() quirk checks not being run
Product: Platform Specific/Hardware Reporter: Matt Domsch (Matt_Domsch)
Component: x86-64Assignee: Matt Domsch (Matt_Domsch)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, jgarzik, linux-bugs, muli
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: 0001-fix-x86-DMI-checks-for-PCI-quirks.patch

Description Matt Domsch 2008-05-01 07:01:10 UTC
Latest working kernel version: unknown
Earliest failing kernel version: unknown
Distribution: Fedora 8
Hardware Environment: Dell PowerEdge 1955
Software Environment: Fedora 8
Problem Description:

DMI matching to auto-enable pci=bfsort is not working.  pci=bfsort is not being
enabled automatically, leading to onboard NICs being assigned names "backwards"
from expectations. dmi_check_system(pciprobe_dmi_table) is not getting run anymore on x86.

pci_acpi_init() ran and set pcibios_scanned++.
pci_legacy_init() then ran, saw pcibios_scanned > 0, and returned.
If it hadn't returned, it would have called pcibios_scan_root(), which calls
dmi_check_system(pciprobe_dmi_table), which is how all the DMI-based quirks are discovered and used.

The solution will involve moving the dmi_check_system(pciprobe_dmi_table) call out of pci_legacy_init() and into somewhere that is executed regardless of pcibios_scanned having been run.  Perhaps inside of or ahead of pci_acpi_init().


I'm not sure when this broke.
Fedora 8 gold kernel 2.6.23.1-42.fc8 and and all intermediate kernels up until Fedora rawhide kernel 2.6.25-8.fc9.x86_64 also fail.



Steps to Reproduce:
1. install Fedora 8 with above kernel on a system listed in pciprobe_dmi_table[] such as a Dell PowerEdge 1955.
2. boot.  dmesg | grep bfsort  should yield something
  
Actual results:
no results, ethernet NICs named backwards

Expected results:
dmesg shows bfsort getting enabled, ethernet NICs named properly.

Filed in Red Hat Bugzilla: 
https://bugzilla.redhat.com/show_bug.cgi?id=444791

Comment #3 From Dave Jones (davej@redhat.com) 	on 2008-04-30 14:01 EST 	[reply] 	 

if you boot with initcall_debug, does pci_legacy_init get called ?

I wonder if we changed the default PCI probing type so that we're not preferring
PCI BIOS on that system any more.


Comment #4 From Matt Domsch (matt_domsch@dell.com) 	on 2008-04-30 22:39 EST 	[reply] 	 

pci_legacy_init() gets called, but exits immediately.

Calling initcall 0xffffffff81487af9: pci_acpi_init+0x0/0xae()
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
initcall 0xffffffff81487af9: pci_acpi_init+0x0/0xae() returned 0.
initcall 0xffffffff81487af9 ran for 0 msecs: pci_acpi_init+0x0/0xae()
Calling initcall 0xffffffff81487ba7: pci_legacy_init+0x0/0xf8()
initcall 0xffffffff81487ba7: pci_legacy_init+0x0/0xf8() returned 0.
initcall 0xffffffff81487ba7 ran for 0 msecs: pci_legacy_init+0x0/0xf8()

pci_acpi_init() ran and set pcibios_scanned++.
pci_legacy_init() then ran, saw pcibios_scanned > 0, and returned.
If it hadn't returned, it would have called pcibios_scan_root(), which calls
dmi_check_system(), which is how all the DMI-based quirks are discovered and used.

So we'll need to get dmi_check_system() executed regardless of if it's
pci_acpi_init() or pci_legacy_init() that's really used.

This may have broken before the i386/x86-64 merger, I can't tell very well.
Comment 1 Matt Domsch 2008-05-01 10:49:20 UTC
2.6.22 worked fine; 2.6.23.1 as in Fedora 8 failed.  I'm bisecting now...
Comment 2 Matt Domsch 2008-05-01 12:44:46 UTC
git commit 08f1c192c3c32797068bfe97738babb3295bbf42 I believe is the culprit.  Note how it removes the call to pcibios_scan_root().

commit 08f1c192c3c32797068bfe97738babb3295bbf42
Author: Muli Ben-Yehuda <muli@il.ibm.com>
Date:   Sun Jul 22 00:23:39 2007 +0300

    x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata

    This patch introduces struct pci_sysdata to x86 and x86-64, and
    converts the existing two users (NUMA, Calgary) to use it.

    This lays the groundwork for having other users of sysdata, such as
    the PCI domains work.

    The Calgary bits are tested, the NUMA bits just look ok.

    Signed-off-by: Jeff Garzik <jeff@garzik.org>
    Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment 3 Matt Domsch 2008-05-01 14:36:46 UTC
I'm working on a patch to pull the DMI testing out of arch/x86/common.c, putting it in arch/x86/dmi.c, and calling it as a subsys_init() before the ACPI or legacy subsys_init() routines run.  That will set the flags appropriately early enough for all the current users of this feature.
Comment 4 Matt Domsch 2008-05-01 20:00:10 UTC
Created attachment 16005 [details]
0001-fix-x86-DMI-checks-for-PCI-quirks.patch

patch submitted to LKML.  It also applies cleanly to 2.6.25.1, so should be a candidate for -stable if accepted to mainline/mm.
Comment 5 Matt Domsch 2008-05-02 07:11:21 UTC
There's a cleaner solution in x86.git already, pending push to Linus.

commit 9817aa147000086bc11b571620ecc1c73a4a614b
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date:   Mon Apr 14 15:40:37 2008 -0700

    x86 PCI: call dmi_check_pciprobe()

    this change:

    | commit 08f1c192c3c32797068bfe97738babb3295bbf42
    | Author: Muli Ben-Yehuda <muli@il.ibm.com>
    | Date:   Sun Jul 22 00:23:39 2007 +0300
    |
    |    x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
    |
    |    This patch introduces struct pci_sysdata to x86 and x86-64, and
    |    converts the existing two users (NUMA, Calgary) to use it.
    |
    |    This lays the groundwork for having other users of sysdata, such as
    |    the PCI domains work.
    |
    |    The Calgary bits are tested, the NUMA bits just look ok.

    replaces pcibios_scan_root with pci_scan_bus_parented...

    but in pcibios_scan_root we have a DMI check:

        dmi_check_system(pciprobe_dmi_table);

    when when have several peer root buses this could be called multiple
    times (which is bad), so move that call to pci_access_init().

    Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Comment 6 Matt Domsch 2008-05-06 10:32:57 UTC
the above commit has been pushed to 2.6.26-rc1, and I've submitted the equivalent backported fix to stable@.
Comment 7 Matt Domsch 2008-05-07 14:01:08 UTC
in mainline and queued for 2.6.25.2.  Closing.