Latest working kernel version: 2.6.27 Earliest failing kernel version: 2.6.28-rc1 Distribution: Ubuntu Hardy Hardware Environment: Dual Processor Pentium 4 Xeon Whitebox Software Environment: Problem Description: oops in early boot My fileserver boots under 2.6.27, but it is failing to boot on 2.6.28-rc2. It took me a while to bisect, so after I finished the bisection, I retested with the latest mainline (v2.6.28-rc3-54-g75fa677), and the problem still shows up. Essentially, the system panics in early boot, resulting in multiple oops. I finally was able get the very first oops, and the image of that oops can be found here: http://thunk.org/tytso/2.6.27-regress/92b29b8/IMG_0331.JPG From the console snapshot, it looks like two CPU simultaneously OOPS'ed with a: BUG: unable to handle kernel NULL dereference at 00000000 BUG: unable to handle kernel NULL dereference at 00000038 On the stack is "scheduler_tick+0x83/0x15f" When doing a bisection, the last good commit (i.e., the last one which I can boot on my system) is git id: d6c88a50 (which preceeds 2.6.28-rc1). The first bad git ID is: commit d6c88a507ef0b6afdb013cba4e7804ba7324d99a Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Oct 15 15:27:23 2008 +0200 genirq: revert dynarray Revert the dynarray changes. They need more thought and polishing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> ... but in fact, the failure is different from the above messages. The failure is once again in early boot, but the oops message is quite different: http://thunk.org/tytso/2.6.27-regress/b9d7ccf/IMG_0322.JPG The failure was in kmem_cache_alloc+0xab/0xc4, called by __create_workqueue_key+0x21/0x145. Walking forwards, the first git ID which shows the same failure as what shows up in 2.6.28-rc2 and 2.6.28-rc3 is git ID: 92b29b8, which apparently is a merge of the tracing-v28-for-linus branch. Because there were two failures back-to-back, it's possible either I or git bisect got confused, since normally the bisect normally doesn't terminate on a merge commit. I'll try double-checking the two ancestors of the merge commit by hand, but in the meantime I thought I'd send what I have in case it rings a bell. Again, my system is totally failing to boot since 2.6.28-rc1. This is a Aberdeen (white box) fileserver, with a SuperMicro X6DH8-XG2 motherboard, with two Pentium 4 Xeon 3.0GHz with hyperthreading.
Created attachment 18664 [details] Bisection log
Created attachment 18665 [details] Boot log from a successful boot of 2.6.27 kernel
Created attachment 18666 [details] Kernel config file from a failing kernel during the bisection
Created attachment 18667 [details] The cpuinfo file from my system
Created attachment 18668 [details] /proc/timer_list from my 2.6.27 kernel
Note: The LKML thread discussing this bug can be found here: http://lkml.org/lkml/2008/11/4/450
References : http://lkml.org/lkml/2008/11/4/450
Handled-By : Yinghai Lu <yinghai@kernel.org> Patch : http://lkml.org/lkml/2008/11/4/431 Patch : http://lkml.org/lkml/2008/11/5/81
Fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1b4897688011cd05e07f00dcfe6af3331eb36a3c http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c78d0cf2925bffae8a6f00e7d9b8e971b0392edd