Most recent kernel where this bug did not occur: Distribution: all 2.6 based ones Hardware Environment: i386 & x64 Software Environment: Problem Description: Steps to reproduce: Verified that this is present in 2.6.13 (and all earlier 2.6 kernels that I checked). There is a simple initialization order bug: worqueues are initialized too late. init/main.c:init() executes initializers in this order: smp_prepare_cpus(max_cpus); <-- start APIC timer interrupts do_pre_smp_initcalls(); fixup_cpu_present_map(); smp_init(); sched_init_smp(); cpuset_init_smp(); /* * Do this before initcalls, because some drivers want to access * firmware files. */ populate_rootfs(); do_basic_setup(); <-- calls init_workqueues() - too late The problem is that we have APIC timer interrupts enabled before workqueues are initialized. This is bad because some timer interrupt callbacks defer work via workqueues. For instance, con_init() schedules a timer callback (well, in 10 minutes). This callback uses a workqueue: static void blank_screen_t(unsigned long dummy) { blank_timer_expired = 1; schedule_work(&console_work); <<--- the culprit } There may be other similar callbacks now or there will be some in the future. Since workqueues are a mechanism to defer work, it is better to have them initialized before timer (and other) interrupts are enabled. workqueue is a simple data structure - it should be possible to move init_workqueues() above smp_prepare_cpus() in init(). We are actually hitting this bug in VMware. In our case, the screen blanking callback above is triggered before init_workqueues(). It is triggered before its scheduled 10min timeout because of Bug 5366 that might cause jiffies to jump forward into the future. http://bugzilla.kernel.org/show_bug.cgi?id=5366 Filing under "Timers", but this might not be the most appopriate category.
hm, I thought this came up before and we fixed it. Oh well. Please send a patch ;)
Does this bug still exist now that #5366 is closed?
Boris: This bug is a bit stale. Now that bug #5366 is closed, do you still see the issue w/ 2.6.16+ ?
No response for awhile now. Closing.