|Summary:||2.6.36-rc4 KPs whenever booting on a Pentium 4 HT system|
|Product:||Platform Specific/Hardware||Reporter:||Michael Marley (michael)|
|Severity:||blocking||CC:||hpa, maciej.rutecki, rjw|
|Bug Depends on:|
Kernel configuration I compiled with
dmesg from last good kernel (2.6.36-rc3)
Kernel panic message
Boot with "noapic"
Description Michael Marley 2010-09-13 15:07:43 UTC
Created attachment 29732 [details] Kernel configuration I compiled with Sometime between 2.6.36-rc3 and 2.6.36-rc4, a regression bug was introduced that prevents the kernel from booting on a Pentium 4 HT system with i875/ICH5 chipset. Instead, it kernelpanics with a message indicating that the system timer is not ticking. (I don't have the exact message since the system in question is headless and I will not have access to a monitor for it for quite some time.) I normally have "clocksource=hpet" in the kernel command line, but I also tried without this and I get the same problem. I have attached some info about the system.
Comment 7 Rafael J. Wysocki 2010-09-13 20:30:30 UTC
Please attach boot log from the last known good kernel. It would be good to know the point in the boot sequence where the panic happens.
Comment 8 Michael Marley 2010-09-13 22:51:22 UTC
Created attachment 29842 [details] dmesg from last good kernel (2.6.36-rc3) Here is the dmesg from 2.6.36-rc3, the last known-good kernel.
Comment 9 Michael Marley 2010-09-13 22:53:18 UTC
I don't know the exact point that the KP occurs because I don't have a monitor for the system, but I can say that it always occurs about a half-second after the system begins to boot. During a normal boot, the network LED flashes about 1sec into the bootup, but this never happens with 2.6.36-rc4.
Comment 10 Michael Marley 2010-09-16 01:24:08 UTC
Any progress on this? It looks like from looking at the successful boot dmesg that it panics around the time it would normally say: [ 0.341986] hpet clockevent registered [ 0.341986] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 [ 0.341986] hpet0: 3 comparators, 64-bit 14.318180 MHz counter [ 0.345025] Switching to clocksource hpet
Comment 11 Michael Marley 2010-09-16 01:40:02 UTC
Now I am thinking I remember that it said "Timer not working" so I googled that but only came up with a bunch of APIC-related stuff. I already tried booting with "noapic", and that had no effect.
Comment 12 Thomas Gleixner 2010-09-18 12:39:04 UTC
Hard to tell w/o any hint what kind of panic it runs into. I just went through the x86 changes between -rc3 and -rc4 and I can't see an obvious candidate. It might be something which was hidden due to different timing though. Any chance to hook up a serial console ? Thanks, tglx
Comment 13 Michael Marley 2010-09-18 12:59:35 UTC
I have secured a monitor to borrow for the system, but I won't be able to hook that up until tomorrow evening. I will post the exact error then.
Comment 14 Michael Marley 2010-09-20 00:51:29 UTC
Created attachment 30642 [details] Kernel panic message Here is the exact output from the kernel panic. I tried rebooting with apic=debug, but that had no effect.
Comment 15 Michael Marley 2010-09-20 00:52:55 UTC
Created attachment 30652 [details] Boot with "noapic" If I boot with "noapic," the kernel just freezes. Here is what it looks like after freezing.
Comment 16 Michael Marley 2010-09-20 19:35:31 UTC
I got this message: "This message has been generated automatically as a part of a summary report of recent regressions. The following bug entry is on the current list of known regressions from 2.6.35. Please verify if it still should be listed and let the tracking team know (either way)." The bug is a regression from 2.6.36-rc3 to 2.6.36-rc4.
Comment 17 Michael Marley 2010-09-21 01:54:45 UTC
No surprise, but -rc5 does not fix the issue; it produces exactly the same kernel panic. Any progress on finding out what caused it?
Comment 18 Thomas Gleixner 2010-09-21 10:19:01 UTC
> --- Comment #17 from Michael Marley <firstname.lastname@example.org> 2010-09-21 > 01:54:45 --- > No surprise, but -rc5 does not fix the issue; it produces exactly the same > kernel panic. Any progress on finding out what caused it? Hmm, I have to admit that I have no clue at all. The delta in the related areas (ioapic, apic, hpet, 8259, 8253) from rc3 to rc4 is exaclty zero. Any chance, that you can bisect it ? Thanks, tglx
Comment 19 Michael Marley 2010-09-21 10:22:35 UTC
I will try.
Comment 20 Michael Marley 2010-09-21 16:39:09 UTC
OK, I officially have no idea what is going on here. My bisect completely failed, all 8 kernels I built from it failed in exactly the same way. So, I tried building 2.6.36-rc3 again, and I get the same problem as I was getting. Then, I tried compiling both -rc3 and -rc5 with gcc-4.4 instead of 4.5, and still got the same error on both.
Comment 21 H. Peter Anvin 2010-09-21 17:28:08 UTC
*Which* gcc 4.4 and 4.5 are you using? In particular, are you using stock gcc or a distro build? Stock gcc 4.4.4 is known to miscompile the kernel.
Comment 22 Michael Marley 2010-09-21 19:08:37 UTC
I am using the GCC 4.4.4 and 4.5.1 build from Ubuntu Maverick. I had been using 4.5.0 and 4.5.1 successfully for a while (keeping them up-to-date, of course), but it quit working sometime between -rc3 and -rc4. Now, I cannot compile any 2.6.36 kernel successfully using any compiler on my system.
Comment 23 Michael Marley 2010-09-22 02:06:22 UTC
I found the problem. It is not a kernel issue or a compiler issue, but instead a linker issue. There was an update of Binutils on ubuntu not long after I compiled 2.6.36-rc3 that caused all future builds to work improperly. I will file a bug there. Sorry to bother you.