Bug 5397

Summary: init_workqueues() should be called before enabling timer interrupts
Product: Timers Reporter: Boris Weissman (weissman)
Component: OtherAssignee: john stultz (john.stultz)
Status: CLOSED INSUFFICIENT_DATA    
Severity: normal CC: akpm
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13 Subsystem:
Regression: --- Bisected commit-id:

Description Boris Weissman 2005-10-07 15:18:15 UTC
Most recent kernel where this bug did not occur:
Distribution: all 2.6 based ones
Hardware Environment: i386 & x64
Software Environment:
Problem Description:

Steps to reproduce:

Verified that this is present in 2.6.13 (and all earlier 2.6 kernels
that I checked).

There is a simple initialization order bug: worqueues are initialized
too late. init/main.c:init() executes initializers in this order:
	smp_prepare_cpus(max_cpus);  <-- start APIC timer interrupts

	do_pre_smp_initcalls();

	fixup_cpu_present_map();
	smp_init();
	sched_init_smp();

	cpuset_init_smp();

	/*
	 * Do this before initcalls, because some drivers want to access
	 * firmware files.
	 */
	populate_rootfs();

	do_basic_setup();            <-- calls init_workqueues() - too late

The problem is that we have APIC timer interrupts enabled before
workqueues are initialized. This is bad because some timer interrupt
callbacks defer work via workqueues. For instance, con_init() schedules
a timer callback (well, in 10 minutes). This callback uses a workqueue:

    static void blank_screen_t(unsigned long dummy)
    {
	blank_timer_expired = 1;
	schedule_work(&console_work);   <<--- the culprit
    }

There may be other similar callbacks now or there will be some in the
future. Since workqueues are a mechanism to defer work, it is better to
have them initialized before timer (and other) interrupts are enabled.
workqueue is a simple data structure - it should be possible to move
init_workqueues() above smp_prepare_cpus() in init().

We are actually hitting this bug in VMware. In our case, the screen 
blanking callback above is triggered before init_workqueues(). It is
triggered before its scheduled 10min timeout because of Bug 5366 that
might cause jiffies to jump forward into the future.
http://bugzilla.kernel.org/show_bug.cgi?id=5366

Filing under "Timers", but this might not be the most appopriate category.
Comment 1 Andrew Morton 2005-10-10 20:13:39 UTC
hm, I thought this came up before and we fixed it.  Oh well.

Please send a patch ;)
Comment 2 john stultz 2006-02-01 13:47:39 UTC
Does this bug still exist now that #5366 is closed?
Comment 3 john stultz 2006-07-10 11:38:50 UTC
Boris: This bug is a bit stale. Now that bug #5366 is closed, do you still see
the issue w/ 2.6.16+ ?
Comment 4 john stultz 2006-09-21 12:52:56 UTC
No response for awhile now. Closing.