Bug 6038

Summary: boot hang w/o pci=nommconf - nVidia CK804
Product: Drivers Reporter: Patrick (ps_mail)
Component: PCIAssignee: Patrick (ps_mail)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: acpi-bugzilla, bunk, zwane
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.14 up to 2.6.23.8 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: output of 2.6.6-rc2-mm1 without hda attached
output of 2.6.6-rc2-mm1 without hda attached2
output of 2.6.6-rc2-mm1 with hda attached
acpidump from 2.6.13.5
/proc/interrupts from 2.6.13.5
lspci -vv from 2.6.13.5
dmesg from 2.6.13.5
Serial-Console-2.6.13.5
Serial-Console-2.6.16-normal (with acpi....)
2.6.16-normal(with acpi)-and-debug
2.6.16-acpi=off and bios setting apic=disabled
2.6.16-acpi=off and debug and bios apic=enabled
2.6.16 noapic and debug and bios apic enabled
2.6.16 noapic and debug and bios apic=disabled
2.6.16 /proc/interrupts with acpi=off
output of 2.6.23.8 without pci=nommconf
output of 2.6.23.8 with pci=nommconf

Description Patrick 2006-02-09 07:43:00 UTC
Most recent kernel where this bug did not occur: 2.6.13.5
Distribution: Debian Sid
Hardware Environment: X86_64
Software Environment: Debian Sid, kernel compiled with gcc-Version 4.0.3
20060128 (prerelease) (Debian 4.0.2-8)
Problem Description: 
ACPI is not working anymore since kernel 2.6.14 (kernel <=2.6.13.5 works
fine, 2.6.14 up to kernel 2.6.16-rc2-mm1 it hangs on boot)

The Kernel hangs on boot when it's detecting the partitions of my ide/sata
and prints out dma timer expiry.(Booting with acpi=off works. with acpi=noirq
and noapic doesn't)

when i remove my hda on my nvidia pata controller it boots without hanging but
doesn't find any other harddrives on the promise controller or sata (cdroms on
pata are found)

i have uploaded screenshots from booting 2.6.16-rc2-mm1 with hda, without hda
and /proc/interrupts, acpidump, dmesg, lspci -vv from 2.6.13.5 to
http://home.arcor.de/ps_mail/

hope you can find the bug, if you need more informations or testing anything,
then let me know.

Steps to reproduce:
press the powerbutton to boot the system ;-)
Comment 1 Patrick 2006-02-09 07:46:38 UTC
Created attachment 7278 [details]
output of 2.6.6-rc2-mm1 without hda attached
Comment 2 Patrick 2006-02-09 07:49:30 UTC
Created attachment 7279 [details]
output of 2.6.6-rc2-mm1 without hda attached2
Comment 3 Patrick 2006-02-09 07:50:30 UTC
Created attachment 7280 [details]
output of 2.6.6-rc2-mm1 with hda attached
Comment 4 Patrick 2006-02-09 07:51:38 UTC
Created attachment 7281 [details]
acpidump from 2.6.13.5
Comment 5 Patrick 2006-02-09 07:52:23 UTC
Created attachment 7282 [details]
/proc/interrupts from 2.6.13.5
Comment 6 Patrick 2006-02-09 07:55:14 UTC
Created attachment 7283 [details]
lspci -vv from 2.6.13.5
Comment 7 Patrick 2006-02-09 07:58:15 UTC
so after i found out where to up the attachments... here they are^^

a short version of boot:

normal boot dmesg (2.6.13.5):
...
NFORCE-CK804: IDE controller at PCI slot 0000:00:06.0
NFORCE-CK804: chipset revision 242
NFORCE-CK804: not 100% native mode: will probe irqs later
NFORCE-CK804: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE-CK804: 0000:00:06.0 (rev f2) UDMA133 controller
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
hda: IC35L180AVV207-1, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: ASUS CRW-4012A, ATAPI CD/DVD-ROM drive
hdd: _NEC DVD_RW ND-1300A, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
PDC20265: IDE controller at PCI slot 0000:05:08.0
ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
ACPI: PCI Interrupt 0000:05:08.0[A] -> Link [APC3] -> GSI 18 (level,
low) -> IRQ 18
PDC20265: chipset revision 2
PDC20265: ROM enabled at 0x60000000
PDC20265: 100% native mode on irq 18
PDC20265: (U)DMA Burst Bit DISABLED Primary PCI Mode Secondary PCI Mode.
PDC20265: FORCING BURST BIT 0x00->0x01 ACTIVE
    ide2: BM-DMA at 0x9400-0x9407, BIOS settings: hde:pio, hdf:pio
    ide3: BM-DMA at 0x9408-0x940f, BIOS settings: hdg:pio, hdh:pio
Probing IDE interface ide2...
hde: ST380021A, ATA DISK drive
ide2 at 0x8400-0x8407,0x8802 on irq 18
Probing IDE interface ide3...
hda: max request size: 1024KiB
hda: 361882080 sectors (185283 MB) w/7965KiB Cache, CHS=22526/255/63,
UDMA(100)
hda: cache flushes supported
 hda: hda1 < hda5 hda6 hda7 hda8 >
...

boot dmesg from 2.6.16-rc2-mm1:
NFORCE-CK804: IDE controller at PCI slot 0000:00:06.0
NFORCE-CK804: chipset revision 242
NFORCE-CK804: not 100% native mode: will probe irqs later
NFORCE-CK804: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE-CK804: 0000:00:06.0 (rev f2) UDMA133 controller
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
NFORCE-CK804: simplex device: DMA disabled
ide1: NFORCE-CK804 Bus-Master DMA disabled (BIOS)
hda: IC35L180AVV207-1, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: ASUS CRW-4012A, ATAPI CD/DVD-ROM drive
hdd: _NEC DVD_RW ND-1300A, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
PDC20265: IDE controller at PCI slot 0000:05:08.0
ACPI: PCI Interrupt Link [APC3] BIOS reported IRQ 5, using IRQ 18
ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
GSI 16 sharing vector 0xB1 and IRQ 16
ACPI: PCI Interrupt 0000:05:08.0[A] -> Link [APC3] -> GSI 18 (level,
low) -> IRQ 16
PDC20265: chipset revision 2
PDC20265: 100% native mode on irq 16
PDC20265: port 0x01f0 already claimed by ide0
PDC20265: port 0x0170 already claimed by ide1
PDC20265: neither IDE port enabled (BIOS)
hda: max request size: 512KiB
hda: 361882080 sectors (185283 MB) w/7965KiB Cache, CHS=22526/255/63,
UDMA(100)
hda: cache flushes supported
 hda: <4>hda: dma_timer_expiry: dma status == 0xff
hda: dma_timer_expiry: dma status == 0xff
hda: dma_timer_expiry: dma status == 0xff
...
... and so on, never ending...
Comment 8 Shaohua 2006-02-09 18:44:00 UTC
>ACPI: PCI Interrupt Link [APC3] BIOS reported IRQ 5, using IRQ 18
>ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
Can you have a full dmesg in the failure case? such as get from serial console.
Comment 9 Patrick 2006-02-10 03:09:09 UTC
Created attachment 7287 [details]
dmesg from 2.6.13.5

oh i forgot full dmesg of 2.6.13.5...

i will try to get dmesg from 2.6.16-rc2-mm1 if i could get it...
Comment 10 Patrick 2006-02-10 03:19:52 UTC
hm, question... the serial console would show the same as on my jpeg's or? (ok
I've not compiled the networksetup into the kernel of 2.6.16 but the important
parts of the ide controllers are included, so I think the jpegs should show you
the output of dmesg... (please correct me if i'm wrong))
Comment 11 Shaohua 2006-02-12 17:14:59 UTC
You could get the full dmesg from serial console (in another PC). please refer 
to kernel source/Documentation/serial-console.txt.
Comment 12 Len Brown 2006-02-12 23:45:57 UTC
Please boot the latest kernel with "acpi=off",
attach the complete output from "dmesg -s64000"
and paste a copy of /proc/interrupts.

You mentioned that booting with "noapic" failed also.
Please boot with "noapic" and "debug" and attach
the complete serial console log.
Comment 13 Patrick 2006-02-14 17:20:51 UTC
Created attachment 7328 [details]
Serial-Console-2.6.13.5

ok here comes the serial console output...
Comment 14 Patrick 2006-02-14 17:21:47 UTC
Created attachment 7329 [details]
Serial-Console-2.6.16-normal (with acpi....)
Comment 15 Patrick 2006-02-14 18:00:41 UTC
Created attachment 7330 [details]
2.6.16-normal(with acpi)-and-debug
Comment 16 Patrick 2006-02-14 18:03:47 UTC
Created attachment 7331 [details]
2.6.16-acpi=off and bios setting apic=disabled
Comment 17 Patrick 2006-02-14 18:04:39 UTC
Created attachment 7332 [details]
2.6.16-acpi=off and debug and bios apic=enabled
Comment 18 Patrick 2006-02-14 18:06:02 UTC
Created attachment 7333 [details]
2.6.16 noapic and debug and bios apic enabled
Comment 19 Patrick 2006-02-14 18:13:30 UTC
Created attachment 7334 [details]
2.6.16 noapic and debug and bios apic=disabled
Comment 20 Patrick 2006-02-14 18:18:47 UTC
Created attachment 7335 [details]
2.6.16 /proc/interrupts with acpi=off
Comment 21 Patrick 2006-02-14 18:23:07 UTC
so, i hope everything is up now...
Comment 22 Patrick 2006-02-22 11:26:46 UTC
f.y.i, today tested with 2.6.16-rc4-mm1, output still the same...
Comment 23 Patrick 2006-02-24 09:02:04 UTC
so... after many testcompiles i found out that the bug was introduced by
2.6.14-rc1 (2.6.13-git12 runs fine)...

as workaround for booting the kernel, i've added the folliwing lines:

in arch/x86_64/pci/mmconfig.c line 155
        /* Kludge for now. Don't use mmconfig on AMD systems because
           those have some busses where mmconfig doesn't work,
           and we don't parse ACPI MCFG well enough to handle that.
           Remove when proper handling is added. */
        if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
                return 0;


so i think there is still something buggy in ACPI MCFG^^
Comment 24 Patrick 2006-02-24 09:03:08 UTC
(testet with 2.6.15)
Comment 25 Patrick 2006-03-20 06:59:29 UTC
tested 2.6.16 => no change
Comment 26 Fu Michael 2007-11-07 00:25:58 UTC
Hi, Patrick, MCFG handling has been changed in recent kernel. Would you please kindly retest to see if this bug still exist? thanks.
Comment 27 Patrick 2007-11-07 01:19:52 UTC
Hi Michael,

i'll test it this weekend and post the results...
Comment 28 Patrick 2007-11-17 10:06:55 UTC
still the same with 2.6.23.8...

do you need the output?
Comment 29 ykzhao 2007-12-13 21:34:14 UTC
Hi, Patrick
Will you please attach the dmesg output? It will be  great if you can enable the debug of ACPI and PCI in kernel configuration.
Thanks.
Comment 30 ykzhao 2008-01-02 00:34:10 UTC
Hi, Patrick
Will you please try to boot the system with the option of pci=nommconf and attach the output of dmesg?
Thanks.
Comment 31 Patrick 2008-01-03 00:21:27 UTC
Hi,

i've just tested the pci=nommconf option and the system boots fine.
I'll post the dmesg output in the next few days.
Comment 32 Patrick 2008-01-13 09:25:13 UTC
Created attachment 14435 [details]
output of 2.6.23.8 without pci=nommconf
Comment 33 Patrick 2008-01-13 09:26:06 UTC
Created attachment 14436 [details]
output of 2.6.23.8 with pci=nommconf
Comment 34 Jesse Barnes 2008-03-14 12:43:55 UTC
Patrick, this code changed quite a bit in 2.6.24 and 2.6.25, does the problem still occur for you there?
Comment 35 Patrick 2008-03-31 12:56:41 UTC
Hi Jesse,

the problem ist fixed!!! (testet with 2.6.24.4)

thanks to all who worked on it!
Comment 36 Adrian Bunk 2008-03-31 13:37:06 UTC
Thanks for this update.