Bug 8219 - boot hang unless "acpi_use_timer_override" Gigabyte GA-M57SLI-S4
Summary: boot hang unless "acpi_use_timer_override" Gigabyte GA-M57SLI-S4
Status: RESOLVED CODE_FIX
Alias: None
Product: Timers
Classification: Unclassified
Component: Interval Timers (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Andi Kleen
URL:
Keywords:
Depends on: 8368
Blocks:
  Show dependency tree
 
Reported: 2007-03-16 14:43 UTC by Seiden TIge
Modified: 2007-09-05 15:16 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.21-rc3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Serial Log (14.11 KB, text/plain)
2007-03-22 15:23 UTC, Seiden TIge
Details
dmesg -s64000 (24.06 KB, text/plain)
2007-03-22 15:24 UTC, Seiden TIge
Details
acpidump (91.35 KB, text/plain)
2007-03-22 15:26 UTC, Seiden TIge
Details
lspci -vv (17.29 KB, text/plain)
2007-03-22 15:27 UTC, Seiden TIge
Details
/proc/interrupts (912 bytes, text/plain)
2007-03-22 15:28 UTC, Seiden TIge
Details
acpidump - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux (19.88 KB, application/octet-stream)
2007-07-20 20:23 UTC, Michael Evans
Details
dmesg - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux (27.95 KB, application/octet-stream)
2007-07-20 20:23 UTC, Michael Evans
Details
interrupts - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux (329 bytes, application/octet-stream)
2007-07-20 20:24 UTC, Michael Evans
Details
lspci -vv - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux (3.93 KB, application/octet-stream)
2007-07-20 20:24 UTC, Michael Evans
Details
early-quirks-unification patch rejects and partials (4.65 KB, application/octet-stream)
2007-07-21 08:22 UTC, Michael Evans
Details
Altered two patch chunks slightly, now applies cleanly against kernel.org's git16 patch. (6.02 KB, patch)
2007-07-21 16:52 UTC, Michael Evans
Details | Diff
dmesg of successful boot with the last two patches. (28.97 KB, application/octet-stream)
2007-07-21 17:07 UTC, Michael Evans
Details

Description Seiden TIge 2007-03-16 14:43:10 UTC
Most recent kernel where this bug did *NOT* occur: none known
Distribution: Debian Sid (unstable)
Hardware Environment: Gigabyte GA-M57SLI-S4
Software Environment: early boot before init
Problem Description: Kernel fails to boot and hangs when booting without the 
following boot parameters: pci=noacpi,routeirq
shortly after apic initialisation. When reseting (not repowering) the kernel 
boots a little bit further when omitting these parameters. It is also possible
to boot with the parameter noapic but then some onboard components fail to 
work.

Steps to reproduce: Take Mainboard "Gigabyte GA-M57SLI-S4" try to boot kernel 
without the above parameters.
Comment 1 Len Brown 2007-03-16 16:17:58 UTC
What is the most recent kernel that booted without any cmdline parameters?

Please attach the output from dmesg -s64000 for that kernel,
and also paste a copy of /proc/interrupts.
From that boot, please also capture the output from acpidump
and also lspci -vv

Regarding the failing boot...
Please update to 2.6.21-rc4 and confirm the issue is still present.
Is it possible to boot with cmdline parameter "debug" and capture
the serial console log showing the hang?

Comment 2 Len Brown 2007-03-21 18:24:22 UTC
please re-open if this is still an issue
Comment 3 Seiden TIge 2007-03-22 15:23:22 UTC
Created attachment 10914 [details]
Serial Log

cmdline: apic=debug acpi_dbg_level=0xffffffff debug console=ttyS0
Comment 4 Seiden TIge 2007-03-22 15:24:29 UTC
Created attachment 10915 [details]
dmesg -s64000

cmdline: apic=debug acpi_dbg_level=0xffffffff pci=noacpi,routeirq
snd-hda-intel.enable_msi=1
Comment 5 Seiden TIge 2007-03-22 15:26:34 UTC
Created attachment 10916 [details]
acpidump
Comment 6 Seiden TIge 2007-03-22 15:27:04 UTC
Created attachment 10917 [details]
lspci -vv
Comment 7 Seiden TIge 2007-03-22 15:28:25 UTC
Created attachment 10918 [details]
/proc/interrupts
Comment 8 Seiden TIge 2007-03-22 15:50:19 UTC
sorry, for the late action, had been ill :-(.
The "Stuck ??" output is most suspective. 

Things i forgot to mention (sorry the cold had already started 
as i submitted this bug, so the initial report was ... buggy :-)
Installed the latest bios version F7.
This is a rev 1.0 board not a rev 1.1.

Things that changed:
2.6.20.2 gives me:
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ... failed.
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ... failed :(.
Kernel panic - not syncing: IO-APIC + timer doesn't work!

This message is gone, and the system boots as far as you can see in the serial 
log.

It's somehow ironic that i thought a board which is supported by linuxbios.org 
has no problems with linux support.

I will be more responsive now.
Comment 9 Seiden TIge 2007-03-27 02:54:07 UTC
I just tested 2.6.21-rc5. It boots as far as 2.6.21-rc4 (judging by screen 
output and not serial console output) 
Comment 10 Len Brown 2007-03-30 18:32:24 UTC
Linux version 2.6.21-rc4 
...
Nvidia board detected. Ignoring ACPI timer override.
If you got timer trouble try acpi_use_timer_override
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.

What do you see if you boot with "acpi_use_timer_override"?
Comment 11 Seiden TIge 2007-04-03 00:43:38 UTC
The board boots just fine using "acpi_use_timer_override".
Comment 12 Michael Evans 2007-07-20 20:14:03 UTC
Confirmed with Gigabyte GA-M57SLI-S4 HW 1.0, firmware F9.  Requires acpi_use_timer_override (or noapic) to boot the latest stable kernel (both gentoo's current stable and the latest from kernel.org 2.6.22??).
Comment 13 Michael Evans 2007-07-20 20:23:06 UTC
Created attachment 12075 [details]
acpidump - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Comment 14 Michael Evans 2007-07-20 20:23:42 UTC
Created attachment 12076 [details]
dmesg - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Comment 15 Michael Evans 2007-07-20 20:24:17 UTC
Created attachment 12077 [details]
interrupts - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Comment 16 Michael Evans 2007-07-20 20:24:53 UTC
Created attachment 12078 [details]
lspci -vv - Linux mainline 2.6.22-gentoo-r1 #2 SMP Fri Jul 20 08:03:19 PDT 2007 x86_64 AMD Athlon(tm) X2 Dual Core Processor BE-2350 AuthenticAMD GNU/Linux
Comment 17 Michael Evans 2007-07-21 02:12:10 UTC
Possibly related bug ids...

http://bugzilla.kernel.org/show_bug.cgi?id=8714
http://bugzilla.kernel.org/show_bug.cgi?id=8368
Comment 18 Michael Evans 2007-07-21 02:35:05 UTC
I also found the gentoo patch list, it doesn't look like they've provided anything that could cause this error.

http://dev.gentoo.org/~dsd/genpatches/patches-2.6.22-1.htm
Comment 19 Michael Evans 2007-07-21 02:42:44 UTC
Looks like I found another thread covering the same problem...

http://bugzilla.kernel.org/show_bug.cgi?id=7928
Comment 20 Andi Kleen 2007-07-21 03:26:54 UTC
Problem is known, but hasn't been fixed yet. Basically the early quirk
needs to be changed to cover the NF3/NF4 PCI IDs only
Comment 22 Michael Evans 2007-07-21 08:22:14 UTC
Created attachment 12084 [details]
early-quirks-unification patch rejects and partials

Andi Kleen, the first patch had two chunks rejected (maybe one of Gentoo's other patches took out that one include...) , so I manually applied them.  I've attached the partially patched files so you can see where the rejects were.  Compiling and testing now.
Comment 23 Andi Kleen 2007-07-21 08:32:23 UTC
The patch is against mainline (2.6.22-git15). Please test with that. No
gentoo or other patches please.
Comment 24 Michael Evans 2007-07-21 08:44:45 UTC
Well... since I'm not yet using it for my mythtv storage backend I suppose I can try something bleeding edge.  Where's that howto I saw about git checkouts...

(If you don't hear from me in 30 min I probably haven't found it yet...)

--
My compile starts to die here... 
arch/x86_64/kernel/early-quirks.c:19:23: error: asm/iommu.h: No such file or
directory
arch/x86_64/kernel/early-quirks.c: In function ‘via_bugs’:
arch/x86_64/kernel/early-quirks.c:25: error: ‘force_iommu’ undeclared
(first use in this function)
arch/x86_64/kernel/early-quirks.c:25: error: (Each undeclared identifier is
reported only once
arch/x86_64/kernel/early-quirks.c:25: error: for each function it appears in.)
arch/x86_64/kernel/early-quirks.c:26: error: ‘iommu_aperture_allowed’
undeclared (first use in this function)
arch/x86_64/kernel/early-quirks.c:29: error: ‘iommu_aperture_disabled’
undeclared (first use in this function)
make[1]: *** [arch/x86_64/kernel/early-quirks.o] Error 1
make: *** [arch/x86_64/kernel] Error 2


./include/config/iommu.h
./include/asm-powerpc/iommu.h
./include/asm-powerpc/iseries/iommu.h
./include/asm-sparc64/iommu.h
./include/asm-sparc/iommu.h
Comment 25 Michael Evans 2007-07-21 08:57:06 UTC
using
http://linux.yyz.us/git-howto.html
http://www.kernel.org/pub/software/scm/git/docs/everyday.html

steps
git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6
cd linux-2.6
patch -p1 < (patches in order)

...

Without looking at the rejects (same files and chunk failures), I'd say I checked out the wrong tree.  What source should I use?
Comment 26 Michael Evans 2007-07-21 09:32:26 UTC
Luckily kernel.org hosts a difference between the last major release and the current git version.  However they've just updated to git16.  So I'm now testing the patches against linux-2.6.22-git16.  They fail in the exact same places and the same way as noted earlier.

I copied my old config, and ran make oldconfig...

It does however error out more cleanly...

arch/x86_64/kernel/early-quirks.c:22:23: error: asm/iommu.h: No such file or directory
make[1]: *** [arch/x86_64/kernel/early-quirks.o] Error 1
make: *** [arch/x86_64/kernel] Error 2

Looking deeper, I can't even find where IOMMU gets selected, but there's no prompt for it in make menuconfig.  I'm looking right about an item that's just under it.
Comment 27 Michael Evans 2007-07-21 16:52:05 UTC
Created attachment 12085 [details]
Altered two patch chunks slightly, now applies cleanly against kernel.org's git16 patch.

Andi Kleen, do these changes work for you too?  I was able to boot this kernel, attaching dmesg next, would you like anything else?

--- linux.orig/arch/x86_64/kernel/early-quirks.c
+++ linux/arch/x86_64/kernel/early-quirks.c
@@ -16,6 +16,8 @@
 #include <asm/proto.h>
 #include <asm/dma.h>

+#include <asm/io_apic.h>
+
 static void __init via_bugs(void)
 {


--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -42,5 +42,6 @@ obj-$(CONFIG_EARLY_PRINTK)    += early_prin
 obj-$(CONFIG_HPET_TIMER)       += hpet.o
 obj-$(CONFIG_K8_NB)            += k8.o
+obj-$(CONFIG_PCI)              += early-quirks.o

 obj-$(CONFIG_VMI)              += vmi.o vmiclock.o
 obj-$(CONFIG_PARAVIRT)         += paravirt.o
@
Comment 28 Michael Evans 2007-07-21 17:07:02 UTC
Created attachment 12086 [details]
dmesg of successful boot with the last two patches.
Comment 29 Andi Kleen 2007-07-22 06:28:28 UTC
The patch you downloaded was against -git15
(from ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots) 
But git is already further that is where the conflicts come from. I just
uploaded new patches that should apply to git HEAD 

Anyways, thanks for testing. The main issue the patches have is that
the PCI IDs to detect the range of nvidia chipset which suffer from
this BIOS bug may be incomplete or too broad. But we'll need some
wider testing to find all that.

The status of this bug could be changed to patch available.
Comment 30 Vlad 2007-08-10 22:35:25 UTC
It seems like bug was corrected in BIOS v. F11b (MB rev. 1.1)
(But virtualization don't work)
Comment 31 Michael Evans 2007-08-15 03:23:57 UTC
Tested it just now on F11B for my Gigabyte Technology Co., Ltd. M57SLI-S4 (MB rev 1.0).  I didn't recall seeing an HPET option in the BIOS before, made sure it was in 64 bit mode, but now it's detected at 3 32 bit hpet timers.

IIRC, the whole bug is that there wasn't a valid HPET timer, but it required a mode that was previously specified to only work with an HPET timer.  (The previous patch allowed it to get far enough in to detect the legacy mode timer and use that.)

I believe similar BIOS flashes for other motherboards should also resolve this issue for any recently released kernel as well.
Comment 32 Len Brown 2007-09-05 15:16:30 UTC
moving this bug to the timers category
from the ACPI category

Note You need to log in before you can comment on or make changes to this bug.