Bug 8136

Summary: 2.6.21-rc2-mm2 won't boot
Product: Alternate Trees Reporter: Nicolas Mailhot (Nicolas.Mailhot)
Component: mmAssignee: Andrew Morton (akpm)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21-rc2-mm2 Subsystem:
Regression: --- Bisected commit-id:
Attachments: config for 2.6.21-rc2.mm2
diff with working 2.6.21-rc2.mm1 config
lspci
2.6.21-rc2.mm1 dmesg
Printscreen with vesfg disabled and earlyprintk=vga

Description Nicolas Mailhot 2007-03-06 12:07:06 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.21-rc2-mm1
Distribution: Fedora Devel
Hardware Environment: Giga-byte Technology GA-K8N Ultra-9 Mainboard (AMD 64 X2 +
Nvidia CK804)
Software Environment: N/A
Problem Description: kernel won't boot, blank screen & no activity after leaving
the bootloader (no messages, no penguins)

Steps to reproduce: try to boot
Comment 1 Nicolas Mailhot 2007-03-06 12:07:55 UTC
Created attachment 10624 [details]
config for 2.6.21-rc2.mm2
Comment 2 Nicolas Mailhot 2007-03-06 12:08:47 UTC
Created attachment 10625 [details]
diff with working 2.6.21-rc2.mm1 config
Comment 3 Nicolas Mailhot 2007-03-06 12:09:15 UTC
Created attachment 10626 [details]
lspci
Comment 4 Nicolas Mailhot 2007-03-06 12:43:31 UTC
Created attachment 10627 [details]
2.6.21-rc2.mm1 dmesg
Comment 5 Anonymous Emailer 2007-03-06 14:42:28 UTC
Reply-To: akpm@linux-foundation.org


Can you please add

	earlyprintk=vga

to the kernel boot parameters, see if we get any useful
information?  If so, a digital photograph of the screen
might be useful.

Thanks.

Comment 6 Nicolas Mailhot 2007-03-06 15:36:29 UTC
Created attachment 10631 [details]
Printscreen with vesfg disabled and earlyprintk=vga
Comment 7 Nicolas Mailhot 2007-03-06 15:41:30 UTC
That's funny, I was just re-reading http://lkml.org/lkml/2006/10/25/26 and
wondering if I had HPET working on my CK804 system or not

My dmesg says
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
then
..MP-BIOS bug: 8254 timer not connected to IO-APIC

(and on a working kernel
..MP-BIOS bug: 8254 timer not connected to IO-APIC
Using local APIC timer interrupts.
result 12558072
Detected 12.558 MHz APIC timer.)

it looks like the problem described in the lkml thread

Comment 8 Anonymous Emailer 2007-03-06 16:16:03 UTC
Reply-To: akpm@linux-foundation.org

On Tue, 6 Mar 2007 15:36:29 -0800
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8136
> 

Let's take this to email.

> 
> 
> 
> ------- Additional Comments From Nicolas.Mailhot@LaPoste.net  2007-03-06 15:36 -------
> Created an attachment (id=10631)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=10631&action=view)
> Printscreen with vesfg disabled and earlyprintk=vga
> 

So rc2-mm2 panics due to "MP-BIOS bug: 8254 timer not connected to IO-APIC" and
rc2-mm1 does not.

Could be ACPI, could be x86_64 timer changes, could be something else.

Would you have time to bisect it? 
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
explains how.

If so, I'd suggest you drill in on the patches between
x86_64-mm-defconfig-update.patch and
optimize-and-simplify-get_cycles_sync.patch: the x86 changes.

Comment 9 Anonymous Emailer 2007-03-06 16:29:37 UTC
Reply-To: rjw@sisk.pl

Hi,

On Wednesday, 7 March 2007 01:15, Andrew Morton wrote:
> On Tue, 6 Mar 2007 15:36:29 -0800
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=8136
> > 
> 
> Let's take this to email.
> 
> > 
> > 
> > 
> > ------- Additional Comments From Nicolas.Mailhot@LaPoste.net  2007-03-06 15:36 -------
> > Created an attachment (id=10631)
> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=10631&action=view)
> > Printscreen with vesfg disabled and earlyprintk=vga
> > 
> 
> So rc2-mm2 panics due to "MP-BIOS bug: 8254 timer not connected to IO-APIC" and
> rc2-mm1 does not.

I'm observing a similar thing on my dual-core AMD64 testbed desktop.  Still,
another dual-core AMD64 machine I have runs -rc2-mm2 just fine.

One of the differences between them is that the failing one uses gcc 4.1.0 (sigh).

> Could be ACPI, could be x86_64 timer changes, could be something else.
> 
> Would you have time to bisect it? 
> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
> explains how.
> 
> If so, I'd suggest you drill in on the patches between
> x86_64-mm-defconfig-update.patch and
> optimize-and-simplify-get_cycles_sync.patch: the x86 changes.

I'll try to debug it tomorrow.

Greetings,
Rafael

Comment 10 Nicolas Mailhot 2007-03-06 22:56:57 UTC
$ rpm -q gcc
gcc-4.1.2-3.x86_64
Comment 11 Nicolas Mailhot 2007-03-06 23:36:35 UTC
Le mardi 06 mars 2007 à 16:15 -0800, Andrew Morton a écrit :

> So rc2-mm2 panics due to "MP-BIOS bug: 8254 timer not connected to IO-APIC" and
> rc2-mm1 does not.
> 
> Could be ACPI, could be x86_64 timer changes, could be something else.
> 
> Would you have time to bisect it? 
> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
> explains how.
> 
> If so, I'd suggest you drill in on the patches between
> x86_64-mm-defconfig-update.patch and
> optimize-and-simplify-get_cycles_sync.patch: the x86 changes.

I may have some more debug time this evening (CET), probably not enough
for a full bisection. I'd really love to have timer/clock problems
nailed once and for all on this box (MP BIOS, RTC, HPET, whatever)

Comment 12 Nicolas Mailhot 2007-03-07 14:07:02 UTC
Le mardi 06 mars 2007 à 16:15 -0800, Andrew Morton a écrit :
> On Tue, 6 Mar 2007 15:36:29 -0800
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=8136

> So rc2-mm2 panics due to "MP-BIOS bug: 8254 timer not connected to IO-APIC" and
> rc2-mm1 does not.
> 
> Could be ACPI, could be x86_64 timer changes, could be something else.
> 
> Would you have time to bisect it? 
> 
> I'd suggest you drill in on the patches between
> x86_64-mm-defconfig-update.patch and
> optimize-and-simplify-get_cycles_sync.patch: the x86 changes.

Removing the x86 patchset (342-430) and utrace (647-663) makes the
system boot. (no surprise, but good to confirm). I'll try a few more
tests tomorrow, need to sleep now.

Comment 13 Nicolas Mailhot 2007-03-08 10:51:30 UTC
2.6.21-rc3.mm2 works again so I'll close the bug
Comment 14 Anonymous Emailer 2007-03-08 13:30:55 UTC
Reply-To: rjw@sisk.pl

On Wednesday, 7 March 2007 01:32, Rafael J. Wysocki wrote:
> Hi,
> 
> On Wednesday, 7 March 2007 01:15, Andrew Morton wrote:
> > On Tue, 6 Mar 2007 15:36:29 -0800
> > bugme-daemon@bugzilla.kernel.org wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=8136
> > > 
> > 
> > Let's take this to email.
> > 
> > > 
> > > 
> > > 
> > > ------- Additional Comments From Nicolas.Mailhot@LaPoste.net  2007-03-06 15:36 -------
> > > Created an attachment (id=10631)
> > >  --> (http://bugzilla.kernel.org/attachment.cgi?id=10631&action=view)
> > > Printscreen with vesfg disabled and earlyprintk=vga
> > > 
> > 
> > So rc2-mm2 panics due to "MP-BIOS bug: 8254 timer not connected to IO-APIC" and
> > rc2-mm1 does not.
> 
> I'm observing a similar thing on my dual-core AMD64 testbed desktop.  Still,
> another dual-core AMD64 machine I have runs -rc2-mm2 just fine.
> 
> One of the differences between them is that the failing one uses gcc 4.1.0 (sigh).
> 
> > Could be ACPI, could be x86_64 timer changes, could be something else.
> > 
> > Would you have time to bisect it? 
> > http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
> > explains how.
> > 
> > If so, I'd suggest you drill in on the patches between
> > x86_64-mm-defconfig-update.patch and
> > optimize-and-simplify-get_cycles_sync.patch: the x86 changes.
> 
> I'll try to debug it tomorrow.

OK, this seems to be fixed in 2.6.21-rc3-mm2.