Bug 11965

Summary: regression introduced by - timers: fix itimer/many thread hang
Product: Timers Reporter: Rafael J. Wysocki (rjw)
Component: OtherAssignee: Frank Mayhar (fmayhar)
Status: CLOSED CODE_FIX    
Severity: normal CC: a.p.zijlstra, doug.chapman, fmayhar
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11808    
Attachments: Proposed fix.

Description Rafael J. Wysocki 2008-11-06 15:52:51 UTC
Subject    : regression introduced by - timers: fix itimer/many thread hang
Submitter  : Doug Chapman <doug.chapman@hp.com>
Date       : 2008-11-06 11:03
References : http://marc.info/?l=linux-kernel&m=122596943416648&w=4
Handled-By : Frank Mayhar <fmayhar@google.com>
Handled-By : Peter Zijlstra <peterz@infradead.org>
Handled-By : Ingo Molnar <mingo@elte.hu>

This entry is being used for tracking a regression from 2.6.27.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Frank Mayhar 2008-11-07 12:17:19 UTC
Created attachment 18734 [details]
Proposed fix.

Doug, I've attached a proposed patch (as outlined by Oleg) to the bug.  Can you pick it up and see if it fixes the crash?  Thanks.
Comment 2 Doug Chapman 2008-11-07 12:49:19 UTC
(In reply to comment #1)
> Created an attachment (id=18734) [details]
> Proposed fix.
> 
> Doug, I've attached a proposed patch (as outlined by Oleg) to the bug.  Can
> you
> pick it up and see if it fixes the crash?  Thanks.
> 

I hit a lockup during boot with this:

Detected change(s) the following file(s):
  BUG: spinlock lockup on CPU#1, mktemp/2880, a000000100ae9100

Call Trace:
 [<a000000100016360>] show_stack+0x40/0xa0
                                sp=e000010089f9fba0 bsp=e000010089f99290
 [<a0000001000163f0>] dump_stack+0x30/0x60
                                sp=e000010089f9fd70 bsp=e000010089f99278
 [<a0000001003359c0>] _raw_spin_lock+0x200/0x260
                                sp=e000010089f9fd70 bsp=e000010089f99240
 [<a0000001006d10c0>] _spin_lock+0x20/0x40
                                sp=e000010089f9fd70 bsp=e000010089f99220
 [<a00000010007f760>] task_rq_lock+0xa0/0x120
                                sp=e000010089f9fd70 bsp=e000010089f991d0
 [<a000000100087080>] try_to_wake_up+0x160/0x680
                                sp=e000010089f9fd70 bsp=e000010089f99178
 [<a000000100087630>] wake_up_state+0x30/0x60
                                sp=e000010089f9fd80 bsp=e000010089f99150
 [<a0000001000b6e40>] signal_wake_up+0x60/0xa0
                                sp=e000010089f9fd80 bsp=e000010089f99128
 [<a0000001000b72c0>] complete_signal+0x440/0x4a0
                                sp=e000010089f9fd80 bsp=e000010089f990e0
 [<a0000001000b79a0>] send_signal+0x3c0/0x420
                                sp=e000010089f9fd80 bsp=e000010089f99098
 [<a0000001000b7eb0>] __group_send_sig_info+0x30/0x60
                                sp=e000010089f9fd80 bsp=e000010089f99068
 [<a0000001000badf0>] do_notify_parent+0x410/0x480
                                sp=e000010089f9fd80 bsp=e000010089f99028
 [<a0000001000a3780>] do_exit+0xfc0/0x1220
                                sp=e000010089f9fe20 bsp=e000010089f98fc8
 [<a0000001000a3b60>] do_group_exit+0x180/0x200
                                sp=e000010089f9fe30 bsp=e000010089f98f88
 [<a0000001000a3c00>] sys_exit_group+0x20/0x40
                                sp=e000010089f9fe30 bsp=e000010089f98f30
 [<a00000010000c4d0>] ia64_trace_syscall+0xf0/0x130
                                sp=e000010089f9fe30 bsp=e000010089f98f30
 [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
                                sp=e000010089fa0000 bsp=e000010089f98f30
Comment 3 Frank Mayhar 2008-11-07 13:06:39 UTC
On Fri, 2008-11-07 at 12:49 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=11965
> 
> 
> 
> 
> 
> ------- Comment #2 from doug.chapman@hp.com  2008-11-07 12:49 -------
> (In reply to comment #1)
> > Created an attachment (id=18734)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=18734&action=view)
>  [details]
> > Proposed fix.
> > 
> > Doug, I've attached a proposed patch (as outlined by Oleg) to the bug.  Can
> you
> > pick it up and see if it fixes the crash?  Thanks.
> > 
> 
> I hit a lockup during boot with this:
> 
> Detected change(s) the following file(s):
>   BUG: spinlock lockup on CPU#1, mktemp/2880, a000000100ae9100
> 
> Call Trace:
>  [<a000000100016360>] show_stack+0x40/0xa0
>                                 sp=e000010089f9fba0 bsp=e000010089f99290
>  [<a0000001000163f0>] dump_stack+0x30/0x60
>                                 sp=e000010089f9fd70 bsp=e000010089f99278
>  [<a0000001003359c0>] _raw_spin_lock+0x200/0x260
>                                 sp=e000010089f9fd70 bsp=e000010089f99240
>  [<a0000001006d10c0>] _spin_lock+0x20/0x40
>                                 sp=e000010089f9fd70 bsp=e000010089f99220
>  [<a00000010007f760>] task_rq_lock+0xa0/0x120
>                                 sp=e000010089f9fd70 bsp=e000010089f991d0
>  [<a000000100087080>] try_to_wake_up+0x160/0x680
>                                 sp=e000010089f9fd70 bsp=e000010089f99178
>  [<a000000100087630>] wake_up_state+0x30/0x60
>                                 sp=e000010089f9fd80 bsp=e000010089f99150
>  [<a0000001000b6e40>] signal_wake_up+0x60/0xa0
>                                 sp=e000010089f9fd80 bsp=e000010089f99128
>  [<a0000001000b72c0>] complete_signal+0x440/0x4a0
>                                 sp=e000010089f9fd80 bsp=e000010089f990e0
>  [<a0000001000b79a0>] send_signal+0x3c0/0x420
>                                 sp=e000010089f9fd80 bsp=e000010089f99098
>  [<a0000001000b7eb0>] __group_send_sig_info+0x30/0x60
>                                 sp=e000010089f9fd80 bsp=e000010089f99068
>  [<a0000001000badf0>] do_notify_parent+0x410/0x480
>                                 sp=e000010089f9fd80 bsp=e000010089f99028
>  [<a0000001000a3780>] do_exit+0xfc0/0x1220
>                                 sp=e000010089f9fe20 bsp=e000010089f98fc8
>  [<a0000001000a3b60>] do_group_exit+0x180/0x200
>                                 sp=e000010089f9fe30 bsp=e000010089f98f88
>  [<a0000001000a3c00>] sys_exit_group+0x20/0x40
>                                 sp=e000010089f9fe30 bsp=e000010089f98f30
>  [<a00000010000c4d0>] ia64_trace_syscall+0xf0/0x130
>                                 sp=e000010089f9fe30 bsp=e000010089f98f30
>  [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
>                                 sp=e000010089fa0000 bsp=e000010089f98f30

Okay, thanks.  Back to the drawing board...
Comment 4 Rafael J. Wysocki 2008-11-09 11:19:52 UTC
First-Bad-Commit : f06febc96ba8e0af80bcc3eaec0a109e88275fac
Comment 5 Frank Mayhar 2008-11-10 11:43:40 UTC
For the record, I do not hit a lockup (or any other problem) with this fix on amd64.
Comment 6 Rafael J. Wysocki 2008-11-16 09:40:27 UTC
Notify-Also : Oleg Nesterov <oleg@redhat.com>
Comment 7 Ingo Molnar 2008-11-22 23:40:28 UTC
fixed by:

commit ad474caca3e2a0550b7ce0706527ad5ab389a4d4
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Mon Nov 10 15:39:30 2008 +0100

    fix for account_group_exec_runtime(), make sure ->signal can't be freed