Created attachment 61762 [details] crash stack trace About 5 percent of our machines running on linux 2.6.32.11 hangs after 209 days running, kernel crash stack trace in attachment1 [details]. when we disasemble the machine code, we found it meet divide-by-zero error.where the sentence is at: kernel/sched.c :: update_sg_lb_stats: sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; it seemes like when update_group_power, the power is calculated as zero. kernel configure attached follow.
Created attachment 61772 [details] kernel configuration
(apparently the scheduler code is unmaintained) I believe this was later fixed by if (!power) power = 1; in kernel/sched_fair.c:update_cpu_power(). Either we forgot to backport that fix into 2.6.32.11 or we weren't maintaining the 2.6.32.x stream by the time the fix was merged.