Subject : regression introduced by - timers: fix itimer/many thread hang Submitter : Doug Chapman <doug.chapman@hp.com> Date : 2008-11-06 11:03 References : http://marc.info/?l=linux-kernel&m=122596943416648&w=4 Handled-By : Frank Mayhar <fmayhar@google.com> Handled-By : Peter Zijlstra <peterz@infradead.org> Handled-By : Ingo Molnar <mingo@elte.hu> This entry is being used for tracking a regression from 2.6.27. Please don't close it until the problem is fixed in the mainline.
Created attachment 18734 [details] Proposed fix. Doug, I've attached a proposed patch (as outlined by Oleg) to the bug. Can you pick it up and see if it fixes the crash? Thanks.
(In reply to comment #1) > Created an attachment (id=18734) [details] > Proposed fix. > > Doug, I've attached a proposed patch (as outlined by Oleg) to the bug. Can > you > pick it up and see if it fixes the crash? Thanks. > I hit a lockup during boot with this: Detected change(s) the following file(s): BUG: spinlock lockup on CPU#1, mktemp/2880, a000000100ae9100 Call Trace: [<a000000100016360>] show_stack+0x40/0xa0 sp=e000010089f9fba0 bsp=e000010089f99290 [<a0000001000163f0>] dump_stack+0x30/0x60 sp=e000010089f9fd70 bsp=e000010089f99278 [<a0000001003359c0>] _raw_spin_lock+0x200/0x260 sp=e000010089f9fd70 bsp=e000010089f99240 [<a0000001006d10c0>] _spin_lock+0x20/0x40 sp=e000010089f9fd70 bsp=e000010089f99220 [<a00000010007f760>] task_rq_lock+0xa0/0x120 sp=e000010089f9fd70 bsp=e000010089f991d0 [<a000000100087080>] try_to_wake_up+0x160/0x680 sp=e000010089f9fd70 bsp=e000010089f99178 [<a000000100087630>] wake_up_state+0x30/0x60 sp=e000010089f9fd80 bsp=e000010089f99150 [<a0000001000b6e40>] signal_wake_up+0x60/0xa0 sp=e000010089f9fd80 bsp=e000010089f99128 [<a0000001000b72c0>] complete_signal+0x440/0x4a0 sp=e000010089f9fd80 bsp=e000010089f990e0 [<a0000001000b79a0>] send_signal+0x3c0/0x420 sp=e000010089f9fd80 bsp=e000010089f99098 [<a0000001000b7eb0>] __group_send_sig_info+0x30/0x60 sp=e000010089f9fd80 bsp=e000010089f99068 [<a0000001000badf0>] do_notify_parent+0x410/0x480 sp=e000010089f9fd80 bsp=e000010089f99028 [<a0000001000a3780>] do_exit+0xfc0/0x1220 sp=e000010089f9fe20 bsp=e000010089f98fc8 [<a0000001000a3b60>] do_group_exit+0x180/0x200 sp=e000010089f9fe30 bsp=e000010089f98f88 [<a0000001000a3c00>] sys_exit_group+0x20/0x40 sp=e000010089f9fe30 bsp=e000010089f98f30 [<a00000010000c4d0>] ia64_trace_syscall+0xf0/0x130 sp=e000010089f9fe30 bsp=e000010089f98f30 [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20 sp=e000010089fa0000 bsp=e000010089f98f30
On Fri, 2008-11-07 at 12:49 -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11965 > > > > > > ------- Comment #2 from doug.chapman@hp.com 2008-11-07 12:49 ------- > (In reply to comment #1) > > Created an attachment (id=18734) > --> (http://bugzilla.kernel.org/attachment.cgi?id=18734&action=view) > [details] > > Proposed fix. > > > > Doug, I've attached a proposed patch (as outlined by Oleg) to the bug. Can > you > > pick it up and see if it fixes the crash? Thanks. > > > > I hit a lockup during boot with this: > > Detected change(s) the following file(s): > BUG: spinlock lockup on CPU#1, mktemp/2880, a000000100ae9100 > > Call Trace: > [<a000000100016360>] show_stack+0x40/0xa0 > sp=e000010089f9fba0 bsp=e000010089f99290 > [<a0000001000163f0>] dump_stack+0x30/0x60 > sp=e000010089f9fd70 bsp=e000010089f99278 > [<a0000001003359c0>] _raw_spin_lock+0x200/0x260 > sp=e000010089f9fd70 bsp=e000010089f99240 > [<a0000001006d10c0>] _spin_lock+0x20/0x40 > sp=e000010089f9fd70 bsp=e000010089f99220 > [<a00000010007f760>] task_rq_lock+0xa0/0x120 > sp=e000010089f9fd70 bsp=e000010089f991d0 > [<a000000100087080>] try_to_wake_up+0x160/0x680 > sp=e000010089f9fd70 bsp=e000010089f99178 > [<a000000100087630>] wake_up_state+0x30/0x60 > sp=e000010089f9fd80 bsp=e000010089f99150 > [<a0000001000b6e40>] signal_wake_up+0x60/0xa0 > sp=e000010089f9fd80 bsp=e000010089f99128 > [<a0000001000b72c0>] complete_signal+0x440/0x4a0 > sp=e000010089f9fd80 bsp=e000010089f990e0 > [<a0000001000b79a0>] send_signal+0x3c0/0x420 > sp=e000010089f9fd80 bsp=e000010089f99098 > [<a0000001000b7eb0>] __group_send_sig_info+0x30/0x60 > sp=e000010089f9fd80 bsp=e000010089f99068 > [<a0000001000badf0>] do_notify_parent+0x410/0x480 > sp=e000010089f9fd80 bsp=e000010089f99028 > [<a0000001000a3780>] do_exit+0xfc0/0x1220 > sp=e000010089f9fe20 bsp=e000010089f98fc8 > [<a0000001000a3b60>] do_group_exit+0x180/0x200 > sp=e000010089f9fe30 bsp=e000010089f98f88 > [<a0000001000a3c00>] sys_exit_group+0x20/0x40 > sp=e000010089f9fe30 bsp=e000010089f98f30 > [<a00000010000c4d0>] ia64_trace_syscall+0xf0/0x130 > sp=e000010089f9fe30 bsp=e000010089f98f30 > [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20 > sp=e000010089fa0000 bsp=e000010089f98f30 Okay, thanks. Back to the drawing board...
First-Bad-Commit : f06febc96ba8e0af80bcc3eaec0a109e88275fac
For the record, I do not hit a lockup (or any other problem) with this fix on amd64.
Notify-Also : Oleg Nesterov <oleg@redhat.com>
fixed by: commit ad474caca3e2a0550b7ce0706527ad5ab389a4d4 Author: Oleg Nesterov <oleg@redhat.com> Date: Mon Nov 10 15:39:30 2008 +0100 fix for account_group_exec_runtime(), make sure ->signal can't be freed