Most recent kernel where this bug did *NOT* occur: none known Distribution: Debian Sid (unstable) Hardware Environment: Gigabyte GA-M57SLI-S4 Software Environment: early boot before init Problem Description: Kernel fails to boot and hangs when booting without the following boot parameters: pci=noacpi,routeirq shortly after apic initialisation. When reseting (not repowering) the kernel boots a little bit further when omitting these parameters. It is also possible to boot with the parameter noapic but then some onboard components fail to work. Steps to reproduce: Take Mainboard "Gigabyte GA-M57SLI-S4" try to boot kernel without the above parameters.
What is the most recent kernel that booted without any cmdline parameters? Please attach the output from dmesg -s64000 for that kernel, and also paste a copy of /proc/interrupts. From that boot, please also capture the output from acpidump and also lspci -vv Regarding the failing boot... Please update to 2.6.21-rc4 and confirm the issue is still present. Is it possible to boot with cmdline parameter "debug" and capture the serial console log showing the hang?
please re-open if this is still an issue
Created attachment 10914 [details] Serial Log cmdline: apic=debug acpi_dbg_level=0xffffffff debug console=ttyS0
Created attachment 10915 [details] dmesg -s64000 cmdline: apic=debug acpi_dbg_level=0xffffffff pci=noacpi,routeirq snd-hda-intel.enable_msi=1
Created attachment 10916 [details] acpidump
Created attachment 10917 [details] lspci -vv
Created attachment 10918 [details] /proc/interrupts
sorry, for the late action, had been ill :-(. The "Stuck ??" output is most suspective. Things i forgot to mention (sorry the cold had already started as i submitted this bug, so the initial report was ... buggy :-) Installed the latest bios version F7. This is a rev 1.0 board not a rev 1.1. Things that changed: 2.6.20.2 gives me: ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... failed. ...trying to set up timer as Virtual Wire IRQ... failed. ...trying to set up timer as ExtINT IRQ... failed :(. Kernel panic - not syncing: IO-APIC + timer doesn't work! This message is gone, and the system boots as far as you can see in the serial log. It's somehow ironic that i thought a board which is supported by linuxbios.org has no problems with linux support. I will be more responsive now.
I just tested 2.6.21-rc5. It boots as far as 2.6.21-rc4 (judging by screen output and not serial console output)
Linux version 2.6.21-rc4 ... Nvidia board detected. Ignoring ACPI timer override. If you got timer trouble try acpi_use_timer_override ACPI: PM-Timer IO Port: 0x1008 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: BIOS IRQ0 pin2 override ignored. What do you see if you boot with "acpi_use_timer_override"?
The board boots just fine using "acpi_use_timer_override".
Confirmed with Gigabyte GA-M57SLI-S4 HW 1.0, firmware F9. Requires acpi_use_timer_override (or noapic) to boot the latest stable kernel (both gentoo's current stable and the latest from kernel.org 2.6.22??).
Created attachment 12075 [details] acpidump - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Created attachment 12076 [details] dmesg - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Created attachment 12077 [details] interrupts - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Created attachment 12078 [details] lspci -vv - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Possibly related bug ids... http://bugzilla.kernel.org/show_bug.cgi?id=8714 http://bugzilla.kernel.org/show_bug.cgi?id=8368
I also found the gentoo patch list, it doesn't look like they've provided anything that could cause this error. http://dev.gentoo.org/~dsd/genpatches/patches-2.6.22-1.htm
Looks like I found another thread covering the same problem... http://bugzilla.kernel.org/show_bug.cgi?id=7928
Problem is known, but hasn't been fixed yet. Basically the early quirk needs to be changed to cover the NF3/NF4 PCI IDs only
Do ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/early-quirks-unification ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/nvidia-timer-quirk (apply both in this order) fix it?
Created attachment 12084 [details] early-quirks-unification patch rejects and partials Andi Kleen, the first patch had two chunks rejected (maybe one of Gentoo's other patches took out that one include...) , so I manually applied them. I've attached the partially patched files so you can see where the rejects were. Compiling and testing now.
The patch is against mainline (2.6.22-git15). Please test with that. No gentoo or other patches please.
Well... since I'm not yet using it for my mythtv storage backend I suppose I can try something bleeding edge. Where's that howto I saw about git checkouts... (If you don't hear from me in 30 min I probably haven't found it yet...) -- My compile starts to die here... arch/x86_64/kernel/early-quirks.c:19:23: error: asm/iommu.h: No such file or directory arch/x86_64/kernel/early-quirks.c: In function ‘via_bugs’: arch/x86_64/kernel/early-quirks.c:25: error: ‘force_iommu’ undeclared (first use in this function) arch/x86_64/kernel/early-quirks.c:25: error: (Each undeclared identifier is reported only once arch/x86_64/kernel/early-quirks.c:25: error: for each function it appears in.) arch/x86_64/kernel/early-quirks.c:26: error: ‘iommu_aperture_allowed’ undeclared (first use in this function) arch/x86_64/kernel/early-quirks.c:29: error: ‘iommu_aperture_disabled’ undeclared (first use in this function) make[1]: *** [arch/x86_64/kernel/early-quirks.o] Error 1 make: *** [arch/x86_64/kernel] Error 2 ./include/config/iommu.h ./include/asm-powerpc/iommu.h ./include/asm-powerpc/iseries/iommu.h ./include/asm-sparc64/iommu.h ./include/asm-sparc/iommu.h
using http://linux.yyz.us/git-howto.html http://www.kernel.org/pub/software/scm/git/docs/everyday.html steps git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6 cd linux-2.6 patch -p1 < (patches in order) ... Without looking at the rejects (same files and chunk failures), I'd say I checked out the wrong tree. What source should I use?
Luckily kernel.org hosts a difference between the last major release and the current git version. However they've just updated to git16. So I'm now testing the patches against linux-2.6.22-git16. They fail in the exact same places and the same way as noted earlier. I copied my old config, and ran make oldconfig... It does however error out more cleanly... arch/x86_64/kernel/early-quirks.c:22:23: error: asm/iommu.h: No such file or directory make[1]: *** [arch/x86_64/kernel/early-quirks.o] Error 1 make: *** [arch/x86_64/kernel] Error 2 Looking deeper, I can't even find where IOMMU gets selected, but there's no prompt for it in make menuconfig. I'm looking right about an item that's just under it.
Created attachment 12085 [details] Altered two patch chunks slightly, now applies cleanly against kernel.org's git16 patch. Andi Kleen, do these changes work for you too? I was able to boot this kernel, attaching dmesg next, would you like anything else? --- linux.orig/arch/x86_64/kernel/early-quirks.c +++ linux/arch/x86_64/kernel/early-quirks.c @@ -16,6 +16,8 @@ #include <asm/proto.h> #include <asm/dma.h> +#include <asm/io_apic.h> + static void __init via_bugs(void) { --- linux.orig/arch/i386/kernel/Makefile +++ linux/arch/i386/kernel/Makefile @@ -42,5 +42,6 @@ obj-$(CONFIG_EARLY_PRINTK) += early_prin obj-$(CONFIG_HPET_TIMER) += hpet.o obj-$(CONFIG_K8_NB) += k8.o +obj-$(CONFIG_PCI) += early-quirks.o obj-$(CONFIG_VMI) += vmi.o vmiclock.o obj-$(CONFIG_PARAVIRT) += paravirt.o @
Created attachment 12086 [details] dmesg of successful boot with the last two patches.
The patch you downloaded was against -git15 (from ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) But git is already further that is where the conflicts come from. I just uploaded new patches that should apply to git HEAD Anyways, thanks for testing. The main issue the patches have is that the PCI IDs to detect the range of nvidia chipset which suffer from this BIOS bug may be incomplete or too broad. But we'll need some wider testing to find all that. The status of this bug could be changed to patch available.
It seems like bug was corrected in BIOS v. F11b (MB rev. 1.1) (But virtualization don't work)
Tested it just now on F11B for my Gigabyte Technology Co., Ltd. M57SLI-S4 (MB rev 1.0). I didn't recall seeing an HPET option in the BIOS before, made sure it was in 64 bit mode, but now it's detected at 3 32 bit hpet timers. IIRC, the whole bug is that there wasn't a valid HPET timer, but it required a mode that was previously specified to only work with an HPET timer. (The previous patch allowed it to get far enough in to detect the legacy mode timer and use that.) I believe similar BIOS flashes for other motherboards should also resolve this issue for any recently released kernel as well.
moving this bug to the timers category from the ACPI category