Bug 18402

Summary: 2.6.36-rc4 KPs whenever booting on a Pentium 4 HT system
Product: Platform Specific/Hardware Reporter: Michael Marley (michael)
Component: i386Assignee: platform_i386
Severity: blocking CC: hpa, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36-rc4 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 16444    
Attachments: Kernel configuration I compiled with
lspci -vvvnnn
dmesg from last good kernel (2.6.36-rc3)
Kernel panic message
Boot with "noapic"

Description Michael Marley 2010-09-13 15:07:43 UTC
Created attachment 29732 [details]
Kernel configuration I compiled with

Sometime between 2.6.36-rc3 and 2.6.36-rc4, a regression bug was introduced that prevents the kernel from booting on a Pentium 4 HT system with i875/ICH5 chipset.  Instead, it kernelpanics with a message indicating that the system timer is not ticking.  (I don't have the exact message since the system in question is headless and I will not have access to a monitor for it for quite some time.)  I normally have "clocksource=hpet" in the kernel command line, but I also tried without this and I get the same problem.  I have attached some info about the system.
Comment 1 Michael Marley 2010-09-13 15:08:14 UTC
Created attachment 29742 [details]
Comment 2 Michael Marley 2010-09-13 15:08:37 UTC
Created attachment 29752 [details]
Comment 3 Michael Marley 2010-09-13 15:08:57 UTC
Created attachment 29762 [details]
Comment 4 Michael Marley 2010-09-13 15:09:25 UTC
Created attachment 29772 [details]
lspci -vvvnnn
Comment 5 Michael Marley 2010-09-13 15:09:52 UTC
Created attachment 29782 [details]
Comment 6 Michael Marley 2010-09-13 15:10:34 UTC
Created attachment 29792 [details]
Comment 7 Rafael J. Wysocki 2010-09-13 20:30:30 UTC
Please attach boot log from the last known good kernel.

It would be good to know the point in the boot sequence where the panic happens.
Comment 8 Michael Marley 2010-09-13 22:51:22 UTC
Created attachment 29842 [details]
dmesg from last good kernel (2.6.36-rc3)

Here is the dmesg from 2.6.36-rc3, the last known-good kernel.
Comment 9 Michael Marley 2010-09-13 22:53:18 UTC
I don't know the exact point that the KP occurs because I don't have a monitor for the system, but I can say that it always occurs about a half-second after the system begins to boot.  During a normal boot, the network LED flashes about 1sec into the bootup, but this never happens with 2.6.36-rc4.
Comment 10 Michael Marley 2010-09-16 01:24:08 UTC
Any progress on this?

It looks like from looking at the successful boot dmesg that it panics around the time it would normally say:

[    0.341986] hpet clockevent registered
[    0.341986] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.341986] hpet0: 3 comparators, 64-bit 14.318180 MHz counter
[    0.345025] Switching to clocksource hpet
Comment 11 Michael Marley 2010-09-16 01:40:02 UTC
Now I am thinking I remember that it said "Timer not working" so I googled that but only came up with a bunch of APIC-related stuff.  I already tried booting with "noapic", and that had no effect.
Comment 12 Thomas Gleixner 2010-09-18 12:39:04 UTC
Hard to tell w/o any hint what kind of panic it runs into. I just went
through the x86 changes between -rc3 and -rc4 and I can't see an
obvious candidate. It might be something which was hidden due to
different timing though. Any chance to hook up a serial console ?


Comment 13 Michael Marley 2010-09-18 12:59:35 UTC
I have secured a monitor to borrow for the system, but I won't be able to hook that up until tomorrow evening.  I will post the exact error then.
Comment 14 Michael Marley 2010-09-20 00:51:29 UTC
Created attachment 30642 [details]
Kernel panic message

Here is the exact output from the kernel panic.  I tried rebooting with apic=debug, but that had no effect.
Comment 15 Michael Marley 2010-09-20 00:52:55 UTC
Created attachment 30652 [details]
Boot with "noapic"

If I boot with "noapic," the kernel just freezes.  Here is what it looks like after freezing.
Comment 16 Michael Marley 2010-09-20 19:35:31 UTC
I got this message:

"This message has been generated automatically as a part of a summary report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.35.  Please verify if it still should be listed and let the tracking team
know (either way)."

The bug is a regression from 2.6.36-rc3 to 2.6.36-rc4.
Comment 17 Michael Marley 2010-09-21 01:54:45 UTC
No surprise, but -rc5 does not fix the issue; it produces exactly the same kernel panic.  Any progress on finding out what caused it?
Comment 18 Thomas Gleixner 2010-09-21 10:19:01 UTC
> --- Comment #17 from Michael Marley <michael@michaelmarley.com>  2010-09-21
> 01:54:45 ---
> No surprise, but -rc5 does not fix the issue; it produces exactly the same
> kernel panic.  Any progress on finding out what caused it?

Hmm, I have to admit that I have no clue at all. The delta in the
related areas (ioapic, apic, hpet, 8259, 8253) from rc3 to rc4 is
exaclty zero. Any chance, that you can bisect it ?


Comment 19 Michael Marley 2010-09-21 10:22:35 UTC
I will try.
Comment 20 Michael Marley 2010-09-21 16:39:09 UTC
OK, I officially have no idea what is going on here.  My bisect completely failed, all 8 kernels I built from it failed in exactly the same way.  So, I tried building 2.6.36-rc3 again, and I get the same problem as I was getting.  Then, I tried compiling both -rc3 and -rc5 with gcc-4.4 instead of 4.5, and still got the same error on both.
Comment 21 H. Peter Anvin 2010-09-21 17:28:08 UTC
*Which* gcc 4.4 and 4.5 are you using?  In particular, are you using stock gcc or a distro build?

Stock gcc 4.4.4 is known to miscompile the kernel.
Comment 22 Michael Marley 2010-09-21 19:08:37 UTC
I am using the GCC 4.4.4 and 4.5.1 build from Ubuntu Maverick.  I had been using 4.5.0 and 4.5.1 successfully for a while (keeping them up-to-date, of course), but it quit working sometime between -rc3 and -rc4.  Now, I cannot compile any 2.6.36 kernel successfully using any compiler on my system.
Comment 23 Michael Marley 2010-09-22 02:06:22 UTC
I found the problem.  It is not a kernel issue or a compiler issue, but instead a linker issue.  There was an update of Binutils on ubuntu not long after I compiled 2.6.36-rc3 that caused all future builds to work improperly.  I will file a bug there.  Sorry to bother you.