Bug 10396
Summary: | BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:2096] | ||
---|---|---|---|
Product: | Drivers | Reporter: | TJ (linux) |
Component: | PCI | Assignee: | TJ (linux) |
Status: | RESOLVED DUPLICATE | ||
Severity: | high | CC: | acpi-bugzilla, bunk |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | v2.6.25-rc8 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 56331 | ||
Attachments: |
v2.6.20 successful boot log
v.2.6.25-rc8 failed boot log v.2.6.25-rc8 pci=noacpi (successful) boot log PowerEdge 6300 acpidump -b (binary) PowerEdge 6300 lspci -vxxx PowerEdge 6300 pirq_mps PowerEdge 6300 acpidump (hex) v.2.6.25-rc8 acpi-pci-bus debug failed boot log v2.6.25-rc8 acpi-pci-bus debug "PINPOINTED" failed boot log v2.6.25-rc8 acpi-pci-bus debug "FIXED" *successful* boot log Proposed patch to i450NX fix-up v2.6.25-rc8 acpi-pci-bus *no-debug* "FIXED" successful boot log Proposed patch to i450NX fix-up Revised less-wordy patch as submitted to linux-pci mailing list Much better ACPI patch from Matthew Wilcox |
Description
TJ
2008-04-05 08:53:09 UTC
Created attachment 15617 [details]
v2.6.20 successful boot log
Created attachment 15618 [details]
v.2.6.25-rc8 failed boot log
I found a mailing list report of a similar problem where ACPI was the cause. "kernel-2.6.23.1-30.fc8 problem with symbios scsi controllers" http://www.redhat.com/archives/rhl-devel-list/2007-October/msg02035.html That caused me try the boot option "pci=noacpi" and the kernel started. There are numerous other issues as a result so it doesn't solve the regression since v2.6.20 but it does narrow down the hunt for the culprit. In part the v2.6.25-rc8 boot log reports: [ 16.207231] Adaptec aacraid driver 1.1-5[2455]-ms [ 47.074484] AAC0: kernel 2.8-1[6099] [ 47.074498] AAC0: monitor 2.8-1[6099] [ 47.078494] AAC0: bios 2.8-1[6099] [ 47.082496] AAC0: serial 8A0376 [ 47.086509] scsi2 : percraid [ 47.091184] scsi 2:0:0:0: Direct-Access DELL Array1 V1.0 PQ: 0 ANSI: 2 [ 46.643256] e100: eth0: e100_probe: addr 0xfe5ff000, irq 11, MAC addr 00:02:b3:bb:d7:8c [ 47.095591] scsi 2:0:1:0: Direct-Access DELL Archive V1.0 PQ: 0 ANSI: 2 [ 46.670604] e100: eth1: e100_probe: addr 0xfe5fe000, irq 14, MAC addr 00:02:b3:bb:d7:8d [ 47.113160] scsi 2:1:0:0: Direct-Access IBM DMVS09M 0220 PQ: 0 ANSI: 3 [ 47.117449] scsi 2:1:1:0: Direct-Access IBM DMVS09M 0220 PQ: 0 ANSI: 3 [ 47.122638] scsi 2:1:2:0: Direct-Access IBM DMVS09M 0220 PQ: 0 ANSI: 3 [ 47.127812] scsi 2:1:3:0: Direct-Access IBM DMVS09M 0220 PQ: 0 ANSI: 3 [ 47.133005] scsi 2:1:4:0: Direct-Access IBM DMVS09M 0220 PQ: 0 ANSI: 3 [ 47.145724] scsi 2:1:5:0: Direct-Access HITACHI DK32DJ-18MC D4D4 PQ: 0 ANSI: 3 [ 46.749850] ata_piix 0000:00:10.0: no available native port [ 47.181601] scsi 2:1:6:0: Processor DELL 1x6 U2W SCSI BP 5.39 PQ: 0 ANSI: 2 [ 46.693844] PCI: Enabling device 0000:00:02.2 (0000 -> 0001) [ 46.693882] PCI: No IRQ known for interrupt pin D of device 0000:00:02.2. Probably buggy MP table. Created attachment 15619 [details]
v.2.6.25-rc8 pci=noacpi (successful) boot log
I'm attempting to re-assign this bug to ACPI > Config-Interrupts v2.6.25-rc8 with "pci=noacpi" Output of Perl script dump_pirq $ wget http://linux.dell.com/files/tools/dump_pirq $ chmod 755 dump_pirq $ sudo ./dump_pirq Interrupt routing table found at address 0xfc7a0: Version 1.0, size 0x00d0 Interrupt router is device 00:02.0 PCI exclusive interrupt mask: 0x0000 [] Compatible router: vendor 0x8086 device 0x122e Device 00:02.0 (slot 0): INTA: link 0x64, irq mask 0x4000 [14] INTB: link 0x65, irq mask 0x8000 [15] INTD: link 0x63, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 02:04.0 (slot 0): INTA: link 0x65, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 02:06.0 (slot 0): INTA: link 0x66, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 02:08.0 (slot 0): INTA: link 0x64, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 00:06.0 (slot 1): INTA: link 0x69, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTB: link 0x64, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTC: link 0x67, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTD: link 0x68, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 00:08.0 (slot 2): INTA: link 0x68, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTB: link 0x69, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTC: link 0x64, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTD: link 0x67, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 00:0a.0 (slot 3): INTA: link 0x67, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTB: link 0x68, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTC: link 0x69, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTD: link 0x64, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 03:01.0 (slot 4): INTA: link 0x63, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTB: link 0x60, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTC: link 0x61, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTD: link 0x62, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 03:03.0 (slot 5): INTA: link 0x62, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTB: link 0x63, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTC: link 0x60, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTD: link 0x61, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 03:05.0 (slot 6): INTA: link 0x61, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTB: link 0x62, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTC: link 0x63, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTD: link 0x60, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Device 03:07.0 (slot 7): INTA: link 0x60, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTB: link 0x61, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTC: link 0x62, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] INTD: link 0x63, irq mask 0x5cf8 [3,4,5,6,7,10,11,12,14] Interrupt router at 00:02.0: Intel 82371AB PIIX4/PIIX4E PCI-to-ISA bridge PIRQ1 (link 0x60): unrouted PIRQ2 (link 0x61): unrouted PIRQ3 (link 0x62): irq 5 PIRQ4 (link 0x63): unrouted Serial IRQ: [disabled] [quiet] [frame=21] [pulse=4] Level mask: 0x4e20 [5,9,10,11,14] $ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 178 0 0 0 IO-APIC-edge timer 1: 33 0 2 1 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC-XT cascade 4: 235 0 1 0 IO-APIC-edge serial 6: 2 1 0 0 IO-APIC-edge floppy 7: 0 0 0 0 IO-APIC-edge parport0 8: 3 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-edge acpi 11: 1938 1 0 0 IO-APIC-fasteoi eth0 12: 104 0 0 1 IO-APIC-edge i8042 14: 105 0 1 0 IO-APIC-fasteoi ata_piix, aic7xxx 15: 0 0 0 0 IO-APIC-edge ata_piix 18: 2390 1 0 0 IO-APIC-fasteoi aacraid 20: 228 1 0 0 IO-APIC-fasteoi aic7xxx 21: 14 1 0 0 IO-APIC-fasteoi aic7xxx 22: 14 0 1 0 IO-APIC-fasteoi aic7xxx NMI: 0 0 0 0 Non-maskable interrupts LOC: 19217 8829 11373 6098 Local timer interrupts RES: 3614 4379 3396 2418 Rescheduling interrupts CAL: 88 203 209 220 function call interrupts TLB: 1076 1121 1100 599 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 Created attachment 15620 [details]
PowerEdge 6300 acpidump -b (binary)
Created attachment 15621 [details]
PowerEdge 6300 lspci -vxxx
Created attachment 15622 [details] PowerEdge 6300 pirq_mps Generated using pirq_mps from bug #8714 and downloaded from http://bugzilla.kernel.org/attachment.cgi?id=14134 Please re-attach the acpidump without any parameters when using acpidump. :) Created attachment 15653 [details]
PowerEdge 6300 acpidump (hex)
Since the issue is a mismatch between the busses ACPI knows about, and those that PCI does, I'm currently building a debug kernel with printk()s of calls/returns for the ACPI/PCI bus discovery path:
drivers/acpi/pci_root.c::acpi_pci_root_add()
arch/x86/pci/acpi.c::pci_acpi_scan_root()
drivers/pci/probe.c::pci_scan_bus_parented()
drivers/pci/probe.c::pci_create_bus()
drivers/pci/search.c::pci_find_bus()
drivers/pci/search.c::pci_do_find_bus()
Please do not get distracted by this comment, the driver has a timeout of 3 minutes on the first interrupt driven command. This timeout is supposed to print a message: printk(KERN_ERR "aacraid: aac_fib_send: first asynchronous command timed out.\n" "Usually a result of a PCI interrupt routing problem;\n" "update mother board BIOS or consider utilizing one of\n" "the SAFE mode kernel options (acpi, apic etc)\n"); The assumption was that an udelay used in this loop would not result in a CPU stuck condition. Can one please advise me on the recommended changes in this loop in .../linux/drivers/scsi/aacraid/commsup.c so that we can report this condition accurately without harm? 3 minutes was considered a possible start-up time for the Adapter with a fully populated set of targets, so even without the ACPI problem reported here, we still have to ride through this worst-case Firmware Startup. Sincerely -- Mark Salyzyn Created attachment 15677 [details] v.2.6.25-rc8 acpi-pci-bus debug failed boot log This log contains the printk() function entry/exit messages mentioned in comment #11. There are a *lot* of messages since the kernel command line was: "debug acpi.debug_level=0xff acpi.debug_layer=0x1f" Of the three root buses, the first one is always detected correctly. In this log that begins at 19.363070. The second root bus handling begins at 36.936161 and third root bus at 37.023840. Both the latter fail with a similar pattern: [ 36.910354] bus: 'acpi': driver_probe_device: matched device PNP0A03:01 with driver pci_root [ 36.914329] bus: 'acpi': really_probe: probing driver pci_root with device PNP0A03:01 [ 36.918350] nsutils-0869 [03] ns_get_node : _SEG, AE_NOT_FOUND [ 36.928169] Execute Method: [\_SB_.PX0B._BBN] (Node f5412d20) [ 36.936161] ACPI: PCI Root Bridge [PX0B] (0000:02) [ 36.938332] PCI: arch/x86/pci/acpi.c::pci_acpi_scan_root(device, 0, 2) [ 36.942336] PCI: drivers/pci/probe.c::pci_scan_bus_parented() [ 36.946329] PCI: drivers/pci/probe.c::pci_create_bus() [ 36.950331] PCI: drivers/pci/search.c::pci_find_bus(0, 2) [ 36.954331] PCI: drivers/pci/search.c::pci_do_find_bus(bus, 2) [ 36.958330] PCI: drivers/pci/search.c::pci_do_find_bus(bus, 2) [ 36.962330] PCI: drivers/pci/search.c::pci_do_find_bus()=NULL [ 36.966330] PCI: drivers/pci/search.c::pci_do_find_bus()=NULL [ 36.970332] PCI: drivers/pci/search.c::pci_do_find_bus(bus, 2) [ 36.974332] PCI: drivers/pci/search.c::pci_do_find_bus()=F5462C00 [ 36.978332] PCI: drivers/pci/search.c::pci_find_bus()=F5462C00 [ 36.982332] PCI: drivers/pci/probe.c::pci_create_bus()=NULL [ 36.986331] PCI: pci_scan_bus_parented()=NULL [ 36.990332] PCI: arch/x86/pci/acpi.c::pci_acpi_scan_root()=NULL [ 36.994333] ACPI: Bus 0000:02 not present in PCI namespace Unfortunately I neglected to report each of the exit paths from drivers/pci/probe.c::pci_create_bus() So I'm adding those, rebuilding, and will report back when I have a boot log with those included. Without that though, what I can't help wondering about is the mismatch between the ACPI (DSDT) allocated bus numbers (0, 2, 3) and the PNP0A03 numbers (0,1,2) - I'm hoping those are simply indexes into a PNP0A03 table with one for each discovered bus. E.g. [ 36.914329] bus: 'acpi': really_probe: probing driver pci_root with device PNP0A03:01 [ 36.936161] ACPI: PCI Root Bridge [PX0B] (0000:02) Of the four possible exit paths from drivers/pci/probe.c::pci_create_bus() the first, "err_out", would result in a kernel message "PCI: Bus %04x:%02x already known\n" and that doesn't show up and so the exit path must be one of the remaining three. The 2nd is a device_register() error (dev_reg_err) The 3rd is a device_register() error (class_dev_reg_err) The 4th is a device_create_file() error (dev_create_file_err) Created attachment 15685 [details] v2.6.25-rc8 acpi-pci-bus debug "PINPOINTED" failed boot log Well it had to prove me wrong. The cause of the issue is "Bus already known", despite what I speculated in comment #13 (unlucky for some!). For some reason the call to pr_debug() in drivers/pci/probe.c::pci_create_bus() isn't resulting in a kernel message so it misled my reasoning as to the error. if (pci_find_bus(pci_domain_nr(b), bus)) { /* If we already got to this bus through a different bridge, ignore it */ pr_debug("PCI: Bus %04x:%02x already known\n", pci_domain_nr(b), bus); printk(KERN_INFO "PCI: drivers/pci/probe.c::pci_create_bus() Bus already known\n"); goto err_out; } [ 37.222374] bus: 'acpi': driver_probe_device: matched device PNP0A03:01 with driver pci_root [ 37.226348] bus: 'acpi': really_probe: probing driver pci_root with device PNP0A03:01 [ 37.230370] nsutils-0869 [03] ns_get_node : _SEG, AE_NOT_FOUND [ 37.238371] Execute Method: [\_SB_.PX0B._BBN] (Node f5412d20) [ 37.247233] ACPI: PCI Root Bridge [PX0B] (0000:02) [ 37.250352] PCI: arch/x86/pci/acpi.c::pci_acpi_scan_root(device, 0, 2) [ 37.254361] PCI: drivers/pci/probe.c::pci_scan_bus_parented() [ 37.258349] PCI: drivers/pci/probe.c::pci_create_bus() [ 37.262351] PCI: drivers/pci/search.c::pci_find_bus(0, 2) [ 37.266350] PCI: drivers/pci/search.c::pci_find_next_bus(0) [ 37.270351] PCI: drivers/pci/search.c::pci_find_next_bus(F5462800) [ 37.274351] PCI: drivers/pci/search.c::pci_do_find_bus(bus, 2) [ 37.278351] PCI: drivers/pci/search.c::pci_do_find_bus(bus, 2) [ 37.282350] PCI: drivers/pci/search.c::pci_do_find_bus()=NULL [ 37.286350] PCI: drivers/pci/search.c::pci_do_find_bus()=NULL [ 37.290351] PCI: drivers/pci/search.c::pci_find_next_bus(F5462800) [ 37.294352] PCI: drivers/pci/search.c::pci_find_next_bus(F5462C00) [ 37.298352] PCI: drivers/pci/search.c::pci_do_find_bus(bus, 2) [ 37.302352] PCI: drivers/pci/search.c::pci_do_find_bus()=F5462C00 [ 37.306352] PCI: drivers/pci/search.c::pci_find_bus()=F5462C00 [ 37.310352] PCI: drivers/pci/probe.c::pci_create_bus() Bus already known [ 37.314353] PCI: drivers/pci/probe.c::pci_create_bus()=NULL [ 37.318352] PCI: pci_scan_bus_parented()=NULL [ 37.322354] PCI: arch/x86/pci/acpi.c::pci_acpi_scan_root()=NULL [ 37.326355] ACPI: Bus 0000:02 not present in PCI namespace Looking back through the log for references to the buses I found: [ 4.382109] device: 'PNP0A03:00': device_add [ 4.384305] PM: Adding info for acpi:PNP0A03:00 [ 4.388291] bus: 'acpi': add device PNP0A03:00 [ 11.666574] device: 'PNP0A03:01': device_add [ 11.668762] PM: Adding info for acpi:PNP0A03:01 [ 11.672747] bus: 'acpi': add device PNP0A03:01 [ 11.824765] device: 'PNP0A03:02': device_add [ 11.828773] PM: Adding info for acpi:PNP0A03:02 [ 11.832757] bus: 'acpi': add device PNP0A03:02 [ 19.401234] device: 'pci0000:00': device_add [ 19.405256] PM: Adding info for No Bus:pci0000:00 [ 19.409237] device: '0000:00': device_add [ 19.413257] PM: Adding info for No Bus:0000:00 [ 19.425226] PCI: drivers/pci/probe.c::pci_create_bus()=F5462800 [ 19.481239] device: 'pci0000:02': device_add [ 19.485255] PM: Adding info for No Bus:pci0000:02 [ 19.489241] device: '0000:02': device_add [ 19.493261] PM: Adding info for No Bus:0000:02 [ 19.497312] PCI: drivers/pci/probe.c::pci_create_bus()=F5462C00 [ 19.593247] device: 'pci0000:03': device_add [ 19.597262] PM: Adding info for No Bus:pci0000:03 [ 19.601248] device: '0000:03': device_add [ 19.605267] PM: Adding info for No Bus:0000:03 [ 19.609310] PCI: drivers/pci/probe.c::pci_create_bus()=F550E000 The ACPI PNP0A03 devices are: PX0A: _UID 1, _BBN 0, _ADR 0x00120000, pci_bus=0xF5462800, device 0000:0 PX0B: _UID 2, _BBN 2?, _ADR_0x00130000, pci_bus=0xF5462C00, device 0000:2 PX1A: _UID 3, _BBN 3?, _ADR 0x00140000, pci_bus=0xF550E000, device 0000:3 The ? above indicates _BBN is calculated in a method and the values I show here are the ones reported by ACPI as bus number. So buses 0:2 and 0:3 are created when the first root bridge (PX0A) is scanned. That causes the later scan/create in ACPI/PCI to fail with the "Bus already known" reason. [ 37.306352] PCI: drivers/pci/search.c::pci_find_bus()=F5462C00 [ 37.310352] PCI: drivers/pci/probe.c::pci_create_bus() Bus already known [ 37.430360] PCI: drivers/pci/search.c::pci_find_bus()=F550E000 [ 37.434360] PCI: drivers/pci/probe.c::pci_create_bus() Bus already known Here's the call logic when 0000:0 has been created and is being scanned itself: [ 19.369259] ACPI: PCI Root Bridge [PX0A] (0000:00) [ 19.373231] PCI: arch/x86/pci/acpi.c::pci_acpi_scan_root(device, 0, 0) [ 19.377237] PCI: drivers/pci/probe.c::pci_scan_bus_parented() [ 19.381229] PCI: drivers/pci/probe.c::pci_create_bus() [ 19.385232] PCI: drivers/pci/search.c::pci_find_bus(0, 0) [ 19.389230] PCI: drivers/pci/search.c::pci_find_next_bus(0) [ 19.393230] PCI: drivers/pci/search.c::pci_find_next_bus(0) [ 19.397230] PCI: drivers/pci/search.c::pci_find_bus()=NULL [ 19.401234] device: 'pci0000:00': device_add [ 19.405256] PM: Adding info for No Bus:pci0000:00 [ 19.409237] device: '0000:00': device_add [ 19.413257] PM: Adding info for No Bus:0000:00 [ 19.425226] PCI: drivers/pci/probe.c::pci_create_bus()=F5462800 [ 19.425495] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, [ 19.425503] * this clock source is slow. Consider trying other clock sources [ 19.429300] pci 0000:00:02.3: quirk: region 0800-083f claimed by PIIX4 ACPI [ 19.433238] pci 0000:00:02.3: quirk: region 0850-085f claimed by PIIX4 SMB [ 19.437545] pci 0000:00:10.0: Searching for i450NX host bridges [ 19.441239] PCI: drivers/pci/probe.c::pci_scan_bus_parented() [ 19.445233] PCI: drivers/pci/probe.c::pci_create_bus() [ 19.449235] PCI: drivers/pci/search.c::pci_find_bus(0, 2) [ 19.453233] PCI: drivers/pci/search.c::pci_find_next_bus(0) [ 19.457234] PCI: drivers/pci/search.c::pci_find_next_bus(F5462800) [ 19.461235] PCI: drivers/pci/search.c::pci_do_find_bus(bus, 2) [ 19.465234] PCI: drivers/pci/search.c::pci_do_find_bus()=NULL [ 19.469235] PCI: drivers/pci/search.c::pci_find_next_bus(F5462800) [ 19.473235] PCI: drivers/pci/search.c::pci_find_next_bus(0) [ 19.477234] PCI: drivers/pci/search.c::pci_find_bus()=NULL [ 19.481239] device: 'pci0000:02': device_add [ 19.485255] PM: Adding info for No Bus:pci0000:02 [ 19.489241] device: '0000:02': device_add [ 19.493261] PM: Adding info for No Bus:0000:02 [ 19.497312] PCI: drivers/pci/probe.c::pci_create_bus()=F5462C00 So it appears the issue is caused by an interaction with the i450NX fix-up code in: arch/x86/pci/fixup.c::pci_fixup_i450nx() I've been looking at the git commit logs for this (formerly known as arch/i386/pci/fixup.c) and the ACPI/PCI source files but so far haven't noticed anything that wouls explain this - certainly not between v2.6.20 and v2.6.22. I think this is the point where the experts need to get involved. I found this reference to the issue in AKM's 2.6.0 mm tree and the linux-scsi mailing list archive: "I can tell you what's going on here. This is a 450NX based motherboard. The 450NX chipset from Intel was the first chipset to have peer PCI busses. For backwards compatibility, some machine makers hacked their PCI BIOS to have a fake bridge device on PCI bus 0 that points to the same bus number as the peer bus. This way if the OS didn't know about the peer bus registers it would still find the devices by scanning behind the bridge. In this case we are scanning behind this fake bridge and then also scanning based upon the peer bus registers in the chipset, and as a result we are finding the device twice. In order to fix this problem you need to change the peer bus quirk code for the 450NX chipset to scan the list of bus 0 devices looking for a bridge that has the same config as the peer bus registers and if so delete the bridge from the list. That will avoid double scanning and will avoid having the PCI code try and configure sub busses via a fake bridge when it should do all configurations via the 450NX peer bus registers. -- Doug Ledford <dledford@redhat.com>" http://marc.info/?l=linux-scsi&m=106839680416899&w=2 I have created a patch of arch/x86/pci/fixup.c::pci_fixup_i450nx() that does a dmi_check_system() against a struct dmi_system_id with a DMI_MATCH pair. If the ident matches it *doesn't* do the i450NX secondary bus scan. I'll report back once the revised kernel has been built and tested. Created attachment 15691 [details]
v2.6.25-rc8 acpi-pci-bus debug "FIXED" *successful* boot log
My patch appears to have fixed the issue, although I'd like some feedback on the patch itself and the boot log in case it could cause knock-on effects due to the way the buses are detected and scanned now.
Created attachment 15692 [details]
Proposed patch to i450NX fix-up
Please check this and consider alongside the kernel boot log attached to the previous comment to ensure no unexpected side effects.
Created attachment 15701 [details]
v2.6.25-rc8 acpi-pci-bus *no-debug* "FIXED" successful boot log
Another kernel log, this time without kernel DEBUG messages (still contains my trace-path printk() reports)
Created attachment 15702 [details]
Proposed patch to i450NX fix-up
Corrected the number of arguments used by DBG()
Re-assigning to TJ as this is not an ACPI bug. Created attachment 15712 [details]
Revised less-wordy patch as submitted to linux-pci mailing list
Created attachment 15730 [details]
Much better ACPI patch from Matthew Wilcox
This was provided on the linux-pci mailing list. I have tested it and it works and is much better than my DMI matching solution.
Zhao Yakui reported on linux-acpi that another bug #10124 and associated patch should fix this. I grabbed the patch mentioned: http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc8/2.6.25-rc8-mm2/broken-out/acpi-unneccessary-to-scan-the-pci-bus-already-scanned.patch and confirmed it does solve this issue too. Marking this bug as a duplicate. *** This bug has been marked as a duplicate of bug 10124 *** |