Most recent kernel where this bug did not occur: FC4 x86_64 2.6.13-1532smp. Distribution: Fedora Core 4. x86_64. Hardware Environment: Motherboard: Tyan Thunder K8WE, BIOS 1.01. nVidia nForce Professional 2200 and 2050 chipset. 4 x 512MB PC3200 ECC/REG. 550w PP&C SLI PSU. Expansion Slots: Slot0: PCI-E 16x, free. Slot1: PCI, Creative Sound Blaster Audigy 2zs. Slot2: PCI-E 16x, Gigabyte GF6800GT/256MB. Slot3: PCI-X 133Mhz, free. Slot4: PCI-X 100Mhz, free. Slot5: PCI-X 100Mhz, Adaptec 29160/PCI64bit. Disk Drives: 4 x 36GB 10K RPM U160 SCSI in software MD5/LVM setup. 1 x IDE DVDRW drive. Software Environment: Fedora Core 4. ARCH: x86_64 GCC: 4.0.2 20051125 GLIBC: 2.3.5 Problem Description: Machine was problem free using 2.6.11 (FC3, both i386 and 64bit kernels), 2.6.12 and 2.6.14 kernels. Fails to boot with 2.6.14 and 2.6.15 kernels. Both vanilla and FC4 builds. When the machine tries to boot using a 14/15 kernel, the aic7xxx module is loaded, but does not detect any device. (No output is printed after the initial module load. See attached logs) I tried building a 2.6.15 vanilla kernel with all aic7xxxx debug options turned on, but saw nothing. I tried using both normal and PCI=routeirq both both failed. Steps to reproduce: 1. Build kernel using suppled configuration file. (See below) 2. Try booting. 3. aic7xxx loads. No output. 4. Devices are not detected, root mount fails. 5. Kernel panic.
Created attachment 6955 [details] Vanilla 2.6.15 kernel config.
Created attachment 6956 [details] Kernel boot log. (2.6.15, normal)
Created attachment 6957 [details] Kernel boot log. (2.6.15, pci=routeirq)
Created attachment 6958 [details] lspci -vv output
Strange, it seems to have completely ignored the adaptec adapter. Could you also attach the output of `lspci -vvxx -s 0a:09.0'? Also, your dmesg output is a bit mangled and isn't all there. Can yu do `dmesg -s 1000000 > foo' and then reattach `foo' to this report? Thanks.
[root@gilboa-home-dev ~]# lspci -vvxx -s 0a:09.0 0a:09.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02) Subsystem: Adaptec 29160 Ultra160 SCSI Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 72 (10000ns min, 6250ns max), Cache Line Size 10 Interrupt: pin A routed to IRQ 177 BIST result: 00 Region 0: I/O ports at 3000 [disabled] [size=256] Region 1: Memory at c0500000 (64-bit, non-prefetchable) [size=4K] [virtual] Expansion ROM at 80000000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00: 05 90 80 00 16 01 b0 02 02 00 00 01 10 48 00 80 10: 01 30 00 00 04 00 50 c0 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 a0 e2 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 28 19
The kernel log was yanked out of a serial console (The kernel dies to due to lack of mount-able root). I'll redo the first garbled kernel log. (The second one seems OK.)
Created attachment 6967 [details] Kernel boot log. (2.6.15, normal, take 2)
> Machine was problem free using 2.6.11 (FC3, both i386 and 64bit kernels), 2.6.12 > and 2.6.14 kernels. > Fails to boot with 2.6.14 and 2.6.15 kernels. Both vanilla and FC4 builds. > When the machine tries to boot using a 14/15 kernel, the aic7xxx module is > loaded, but does not detect any device. (No output is printed after the initial > module load. See attached logs) > I tried building a 2.6.15 vanilla kernel with all aic7xxxx debug options turned > on, but saw nothing. > I tried using both normal and PCI=routeirq both both failed. > [root@gilboa-home-dev ~]# lspci -vvxx -s 0a:09.0 > 0a:09.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02) > Subsystem: Adaptec 29160 Ultra160 SCSI Controller > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- > Stepping- SERR+ FastB2B- > Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Latency: 72 (10000ns min, 6250ns max), Cache Line Size 10 > Interrupt: pin A routed to IRQ 177 > BIST result: 00 > Region 0: I/O ports at 3000 [disabled] [size=256] > Region 1: Memory at c0500000 (64-bit, non-prefetchable) [size=4K] > [virtual] Expansion ROM at 80000000 [disabled] [size=128K] > Capabilities: [dc] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > 00: 05 90 80 00 16 01 b0 02 02 00 00 01 10 48 00 80 > 10: 01 30 00 00 04 00 50 c0 00 00 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 05 90 a0 e2 > 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 28 19 It could be that the driver is being overly fussy and that there's a mismatch in the PCI ID table. Someone with better PCI knowledge might like to comment, but I think that's vendor 0x9005 (VENDOR_ID_ADAPTEC2) Device 0x0080 (DEVICE_ID_ADAPTEC2_7892A) Subvendor 0x0116 Subdevice 0x02b0 The subvendor and subdevice are ones that don't exist in the adaptec PCI tables; we have four possible 0080 devices, all with either aic or cpq subvendors, so it's not matching in the subdevice/subvendor. The curiosity is that I don't think this code has changed for ages, so I don't see why it would previously have matched. James
As far as I remember my card is a normal 29160. I reckon we could add a small patch that will fix the problem (but it'll be ugly as hell): aic7xxxx_pci.h:102 +#define ID_AHA_29160G 0x0080900502b00116ull aic7xxxx_pci.c:354 +{ + ID_AHA_29160G, + ID_ALL_MASK, + "Gilboa's weird Adaptec 29160 Ultra160 SCSI adapter", + ahc_aic7892_setup +}, But what strikes me as odd, is that for the life of me, I can't understand how it worked in the first place. The check code in aic7xxx.c should have failed in older kernel just as well. Obscure compiler problem with 64bit bit operations? I doubt it. When I get back home I'll add a couple of printks, trying to see what what changes between 2.6.15 and 2.6.13.
Baaah! I added a couple of printks to the driver and the PCI driver and it seems that is with the PCI scan failing to detect the PCI-X bridge. I'll create a 2.6.14/5 bootpxe setup and I'll try to post the lspci output it sees.
Created attachment 6977 [details] lspci -v on a 2.6.14 kernel
Created attachment 6978 [details] lspci -vv on a 2.6.14 kernel
Created attachment 6979 [details] lspci -vvxx on a 2.6.14 kernel
AFAIK this a PCI/BIOS/ACPI/what-ever bug and not an aix7xxx one. The 2.4.14/15 bus fails to enumerate the HT links and the PCI-X bridge (and the underling cards). Changing bug title and description.
Tried v1.02 BIOS. Still the same problem.
After contacting Tyan, It was suggested that I update the BIOS and disable the ACPI MCFG table by selecting (and here comes the a duh!) Linux instead of other in the OS type. Never the less, I wonder if this is a problem with the table parsing in mmconfig.c? Should I leave this bug open?
Gilboa, 2.6.25-rc has some MCFG related changes (2.6.24 as well iirc), so the kernel should work no matter which "OS" type you select in the BIOS. Care you retest and confirm?
Sadly enough, the machine is off line at the moment. (Water damage) I'll do my best to revive it and re-test. - Gilboa
I'm pretty sure this one is fixed now (Yinghai fixed some HT mapping bugs fairly recently), but if you can still reproduce it, please re-open. Thanks, Jesse