Kernel Bug Tracker – Bug 37382
machine hang due to task rebalance after about 209 days uptime
Last modified: 2012-08-24 14:38:28 UTC
Created attachment 61762 [details]
crash stack trace
About 5 percent of our machines running on linux 220.127.116.11 hangs after 209 days running, kernel crash stack trace in attachment1 [details].
when we disasemble the machine code, we found it meet divide-by-zero error.where
the sentence is at:
kernel/sched.c :: update_sg_lb_stats:
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
it seemes like when update_group_power, the power is calculated as zero.
kernel configure attached follow.
Created attachment 61772 [details]
(apparently the scheduler code is unmaintained)
I believe this was later fixed by
power = 1;
in kernel/sched_fair.c:update_cpu_power(). Either we forgot to backport that fix into 18.104.22.168 or we weren't maintaining the 2.6.32.x stream by the time the fix was merged.