Bug 11950 - Early boot failure for 2.6.28-rc[123]: NULL pointer deref stemming from scheduler_tick
Summary: Early boot failure for 2.6.28-rc[123]: NULL pointer deref stemming from sched...
Status: CLOSED INVALID
Alias: None
Product: Timers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: john stultz
URL:
Keywords:
Depends on:
Blocks: 11808
  Show dependency tree
 
Reported: 2008-11-04 14:28 UTC by Linux ext4 mailing list
Modified: 2008-11-04 15:38 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.28-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Bisection log (2.56 KB, text/plain)
2008-11-04 14:30 UTC, Linux ext4 mailing list
Details
Boot log from a successful boot of 2.6.27 kernel (47.99 KB, text/plain)
2008-11-04 14:30 UTC, Linux ext4 mailing list
Details
Kernel config file from a failing kernel during the bisection (59.66 KB, text/plain)
2008-11-04 14:31 UTC, Linux ext4 mailing list
Details
The cpuinfo file from my system (2.38 KB, text/plain)
2008-11-04 14:32 UTC, Linux ext4 mailing list
Details
/proc/timer_list from my 2.6.27 kernel (6.32 KB, text/plain)
2008-11-04 14:32 UTC, Linux ext4 mailing list
Details
/proc/timer_list from my 2.6.27 kernel (6.32 KB, text/plain)
2008-11-04 14:32 UTC, Linux ext4 mailing list
Details

Description Linux ext4 mailing list 2008-11-04 14:28:56 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28-rc1
Distribution: Ubuntu
Hardware Environment: Pentium 4 Xeon Dual Processor
Software Environment:
Problem Description:  Failure in early boot

My fileserver boots under 2.6.27, but it is failing to boot on
2.6.28-rc2.  It took me a while to bisect, so after I finished the
bisection, I retested with the latest mainline
(v2.6.28-rc3-54-g75fa677), and the problem still shows up.

Essentially, the system panics in early boot, resulting in multiple
oops.  I finally was able get the very first oops, and the image of
that oops can be found here:

http://thunk.org/tytso/2.6.27-regress/92b29b8/IMG_0331.JPG

From the console snapshot, it looks like two CPU simultaneously
OOPS'ed with a:

BUG: unable to handle kernel NULL dereference at 00000000
BUG: unable to handle kernel NULL dereference at 00000038

On the stack is "scheduler_tick+0x83/0x15f"

When doing a bisection, the last good commit (i.e., the last one which
I can boot on my system) is git id: d6c88a50 (which preceeds 2.6.28-rc1).

The first bad git ID is:

commit d6c88a507ef0b6afdb013cba4e7804ba7324d99a
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Oct 15 15:27:23 2008 +0200

    genirq: revert dynarray

    Revert the dynarray changes. They need more thought and polishing.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... but in fact, the failure is different from the above messages.
The failure is once again in early boot, but the oops message is quite
different:

http://thunk.org/tytso/2.6.27-regress/b9d7ccf/IMG_0322.JPG

The failure was in kmem_cache_alloc+0xab/0xc4, called by
__create_workqueue_key+0x21/0x145.

Walking forwards, the first git ID which shows the same failure as
what shows up in 2.6.28-rc2 and 2.6.28-rc3 is git ID: 92b29b8, which
apparently is a merge of the tracing-v28-for-linus branch.  Because
there were two failures back-to-back, it's possible either I or git
bisect got confused, since normally the bisect normally doesn't
terminate on a merge commit.  I'll try double-checking the two
ancestors of the merge commit by hand, but in the meantime I thought
I'd send what I have in case it rings a bell.

Again, my system is totally failing to boot since 2.6.28-rc1.  This is
a Aberdeen (white box) fileserver, with a SuperMicro X6DH8-XG2
motherboard, with two Pentium 4 Xeon 3.0GHz with hyperthreading.
Comment 1 Linux ext4 mailing list 2008-11-04 14:30:15 UTC
Created attachment 18658 [details]
Bisection log
Comment 2 Linux ext4 mailing list 2008-11-04 14:30:58 UTC
Created attachment 18659 [details]
Boot log from a successful boot of 2.6.27 kernel
Comment 3 Linux ext4 mailing list 2008-11-04 14:31:42 UTC
Created attachment 18660 [details]
Kernel config file from a failing kernel during the bisection
Comment 4 Linux ext4 mailing list 2008-11-04 14:32:12 UTC
Created attachment 18661 [details]
The cpuinfo file from my system
Comment 5 Linux ext4 mailing list 2008-11-04 14:32:41 UTC
Created attachment 18662 [details]
/proc/timer_list from my 2.6.27 kernel
Comment 6 Linux ext4 mailing list 2008-11-04 14:32:52 UTC
Created attachment 18663 [details]
/proc/timer_list from my 2.6.27 kernel
Comment 7 Linux ext4 mailing list 2008-11-04 14:36:27 UTC
Argh, I was logged in as the wrong user.  and there doesn't seem to be any way to fix the reporter id, and I don't want to spam the linux-ext4 mailing list.  So I'll close this and re-open it.  Sigh....

Note You need to log in before you can comment on or make changes to this bug.