Bug 5847 - 2.6.14/15 PCI-X scan fails if ACPI MCFG is enabled.
Summary: 2.6.14/15 PCI-X scan fails if ACPI MCFG is enabled.
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Jesse Barnes
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-07 07:25 UTC by Gilboa Davara
Modified: 2008-05-01 17:11 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.15
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Vanilla 2.6.15 kernel config. (53.60 KB, text/plain)
2006-01-07 07:26 UTC, Gilboa Davara
Details
Kernel boot log. (2.6.15, normal) (10.30 KB, text/plain)
2006-01-07 07:26 UTC, Gilboa Davara
Details
Kernel boot log. (2.6.15, pci=routeirq) (12.67 KB, text/plain)
2006-01-07 07:27 UTC, Gilboa Davara
Details
lspci -vv output (15.91 KB, text/plain)
2006-01-07 07:32 UTC, Gilboa Davara
Details
Kernel boot log. (2.6.15, normal, take 2) (10.98 KB, text/plain)
2006-01-07 22:09 UTC, Gilboa Davara
Details
lspci -v on a 2.6.14 kernel (6.88 KB, text/plain)
2006-01-09 07:50 UTC, Gilboa Davara
Details
lspci -vv on a 2.6.14 kernel (13.38 KB, text/plain)
2006-01-09 07:52 UTC, Gilboa Davara
Details
lspci -vvxx on a 2.6.14 kernel (17.64 KB, text/plain)
2006-01-09 07:53 UTC, Gilboa Davara
Details

Description Gilboa Davara 2006-01-07 07:25:12 UTC
Most recent kernel where this bug did not occur:
FC4 x86_64 2.6.13-1532smp.

Distribution:
Fedora Core 4.
x86_64.

Hardware Environment:
Motherboard:
Tyan Thunder K8WE, BIOS 1.01.
nVidia nForce Professional 2200 and 2050 chipset.
4 x 512MB PC3200 ECC/REG.
550w PP&C SLI PSU.

Expansion Slots:
Slot0: PCI-E 16x, free.
Slot1: PCI, Creative Sound Blaster Audigy 2zs.
Slot2: PCI-E 16x, Gigabyte GF6800GT/256MB.
Slot3: PCI-X 133Mhz, free.
Slot4: PCI-X 100Mhz, free.
Slot5: PCI-X 100Mhz, Adaptec 29160/PCI64bit.

Disk Drives:
4 x 36GB 10K RPM U160 SCSI in software MD5/LVM setup.
1 x IDE DVDRW drive.

Software Environment:
Fedora Core 4.
ARCH: x86_64
GCC: 4.0.2 20051125
GLIBC: 2.3.5

Problem Description:
Machine was problem free using 2.6.11 (FC3, both i386 and 64bit kernels), 2.6.12
and 2.6.14 kernels.
Fails to boot with 2.6.14 and 2.6.15 kernels. Both vanilla and FC4 builds.
When the machine tries to boot using a 14/15 kernel, the aic7xxx module is
loaded, but does not detect any device. (No output is printed after the initial
module load. See attached logs)
I tried building a 2.6.15 vanilla kernel with all aic7xxxx debug options turned
on, but saw nothing.
I tried using both normal and PCI=routeirq both both failed. 

Steps to reproduce:
1. Build kernel using suppled configuration file. (See below)
2. Try booting.
3. aic7xxx loads. No output.
4. Devices are not detected, root mount fails. 
5. Kernel panic.
Comment 1 Gilboa Davara 2006-01-07 07:26:08 UTC
Created attachment 6955 [details]
Vanilla 2.6.15 kernel config.
Comment 2 Gilboa Davara 2006-01-07 07:26:47 UTC
Created attachment 6956 [details]
Kernel boot log. (2.6.15, normal)
Comment 3 Gilboa Davara 2006-01-07 07:27:11 UTC
Created attachment 6957 [details]
Kernel boot log. (2.6.15, pci=routeirq)
Comment 4 Gilboa Davara 2006-01-07 07:32:04 UTC
Created attachment 6958 [details]
lspci -vv output
Comment 5 Andrew Morton 2006-01-07 13:28:26 UTC
Strange, it seems to have completely ignored the adaptec adapter.

Could you also attach the output of `lspci -vvxx -s 0a:09.0'?

Also, your dmesg output is a bit mangled and isn't all there.  Can yu
do `dmesg -s 1000000 > foo' and then reattach `foo' to this report?

Thanks.
Comment 6 Gilboa Davara 2006-01-07 21:51:13 UTC
[root@gilboa-home-dev ~]# lspci -vvxx -s 0a:09.0
0a:09.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 72 (10000ns min, 6250ns max), Cache Line Size 10
        Interrupt: pin A routed to IRQ 177
        BIST result: 00
        Region 0: I/O ports at 3000 [disabled] [size=256]
        Region 1: Memory at c0500000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at 80000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 05 90 80 00 16 01 b0 02 02 00 00 01 10 48 00 80
10: 01 30 00 00 04 00 50 c0 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 a0 e2
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 28 19

Comment 7 Gilboa Davara 2006-01-07 21:53:17 UTC
The kernel log was yanked out of a serial console (The kernel dies to due to
lack of mount-able root).
I'll redo the first garbled kernel log. (The second one seems OK.)
Comment 8 Gilboa Davara 2006-01-07 22:09:26 UTC
Created attachment 6967 [details]
Kernel boot log. (2.6.15, normal, take 2)
Comment 9 James Bottomley 2006-01-08 08:01:01 UTC
> Machine was problem free using 2.6.11 (FC3, both i386 and 64bit kernels), 2.6.12
> and 2.6.14 kernels.
> Fails to boot with 2.6.14 and 2.6.15 kernels. Both vanilla and FC4 builds.
> When the machine tries to boot using a 14/15 kernel, the aic7xxx module is
> loaded, but does not detect any device. (No output is printed after the initial
> module load. See attached logs)
> I tried building a 2.6.15 vanilla kernel with all aic7xxxx debug options turned
> on, but saw nothing.
> I tried using both normal and PCI=routeirq both both failed. 


> [root@gilboa-home-dev ~]# lspci -vvxx -s 0a:09.0
> 0a:09.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
>         Subsystem: Adaptec 29160 Ultra160 SCSI Controller
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
> Stepping- SERR+ FastB2B-
>         Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 72 (10000ns min, 6250ns max), Cache Line Size 10
>         Interrupt: pin A routed to IRQ 177
>         BIST result: 00
>         Region 0: I/O ports at 3000 [disabled] [size=256]
>         Region 1: Memory at c0500000 (64-bit, non-prefetchable) [size=4K]
>         [virtual] Expansion ROM at 80000000 [disabled] [size=128K]
>         Capabilities: [dc] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 00: 05 90 80 00 16 01 b0 02 02 00 00 01 10 48 00 80
> 10: 01 30 00 00 04 00 50 c0 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 a0 e2
> 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 28 19

It could be that the driver is being overly fussy and that there's a
mismatch in the PCI ID table.  Someone with better PCI knowledge might
like to comment, but I think that's

vendor 0x9005 (VENDOR_ID_ADAPTEC2)
Device 0x0080 (DEVICE_ID_ADAPTEC2_7892A)
Subvendor 0x0116
Subdevice 0x02b0

The subvendor and subdevice are ones that don't exist in the adaptec PCI
tables; we have four possible 0080 devices, all with either aic or cpq
subvendors, so it's not matching in the subdevice/subvendor.  The
curiosity is that I don't think this code has changed for ages, so I
don't see why it would previously have matched.

James


Comment 10 Gilboa Davara 2006-01-08 10:06:39 UTC
As far as I remember my card is a normal 29160.

I reckon we could add a small patch that will fix the problem (but it'll be ugly
as hell):

aic7xxxx_pci.h:102
+#define ID_AHA_29160G          0x0080900502b00116ull

aic7xxxx_pci.c:354
+{
+  ID_AHA_29160G,
+  ID_ALL_MASK,
+  "Gilboa's weird Adaptec 29160 Ultra160 SCSI adapter",
+  ahc_aic7892_setup
+},

But what strikes me as odd, is that for the life of me, I can't understand how
it worked in the first place.
The check code in aic7xxx.c should have failed in older kernel just as well.
Obscure compiler problem with 64bit bit operations? I doubt it.

When I get back home I'll add a couple of printks, trying to see what what
changes between 2.6.15 and 2.6.13.
Comment 11 Gilboa Davara 2006-01-09 05:31:35 UTC
Baaah!
I added a couple of printks to the driver and the PCI driver and it seems that
is with the PCI scan failing to detect the PCI-X bridge.
I'll create a 2.6.14/5 bootpxe setup and I'll try to post the lspci output it sees.
Comment 12 Gilboa Davara 2006-01-09 07:50:44 UTC
Created attachment 6977 [details]
lspci -v on a 2.6.14 kernel
Comment 13 Gilboa Davara 2006-01-09 07:52:54 UTC
Created attachment 6978 [details]
lspci -vv on a 2.6.14 kernel
Comment 14 Gilboa Davara 2006-01-09 07:53:22 UTC
Created attachment 6979 [details]
lspci -vvxx on a 2.6.14 kernel
Comment 15 Gilboa Davara 2006-01-09 07:59:01 UTC
AFAIK this a PCI/BIOS/ACPI/what-ever bug and not an aix7xxx one.
The 2.4.14/15 bus fails to enumerate the HT links and the PCI-X bridge (and the
underling cards).

Changing bug title and description.
Comment 16 Gilboa Davara 2006-01-15 20:22:23 UTC
Tried v1.02 BIOS. Still the same problem.
Comment 17 Gilboa Davara 2006-01-18 21:54:58 UTC
After contacting Tyan, It was suggested that I update the BIOS and disable the
ACPI MCFG table by selecting (and here comes the a duh!) Linux instead of other
in the OS type. 

Never the less, I wonder if this is a problem with the table parsing in mmconfig.c?
Should I leave this bug open?
Comment 18 Jesse Barnes 2008-03-14 12:18:16 UTC
Gilboa, 2.6.25-rc has some MCFG related changes (2.6.24 as well iirc), so the kernel should work no matter which "OS" type you select in the BIOS.  Care you retest and confirm?
Comment 19 Gilboa Davara 2008-03-15 10:30:03 UTC
Sadly enough, the machine is off line at the moment. (Water damage)
I'll do my best to revive it and re-test.

- Gilboa
Comment 20 Jesse Barnes 2008-05-01 17:11:31 UTC
I'm pretty sure this one is fixed now (Yinghai fixed some HT mapping bugs fairly recently), but if you can still reproduce it, please re-open.

Thanks,
Jesse

Note You need to log in before you can comment on or make changes to this bug.