Bug 14727

Summary: livelock in futex_wait_setup
Product: Process Management Reporter: Stas Sergeev (stsp2)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: a.p.zijlstra, akpm, mingo, rjw, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32-rc5-00338-g2e2ec95-dirty Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14230    
Attachments: stack trace

Description Stas Sergeev 2009-12-03 20:55:46 UTC
Created attachment 24006 [details]
stack trace

I've got the unkillable thunderbird-bin process
eating 100% of CPU. The stack trace shows that
the futex_wait_setup was re-entered, and maybe
this caused a problem. Is there some lock that
had to be irqsafe and is not?

Dec  3 23:17:42 lin2 kernel: thunderbird-b R  running task        0  2941   2937 0x0000008c
Dec  3 23:17:42 lin2 kernel: ffff88005d61bb18 ffffffff8149f51f ffff88005d61ba68 00000000beca0c41
Dec  3 23:17:42 lin2 kernel: ffff880064180f60 ffffffff8100c200 ffff88005d61bc70 00000000beca0c41
Dec  3 23:17:42 lin2 kernel: ffff88005d61ba98 0000000000000000 0000000000000002 0000000000000000
Dec  3 23:17:42 lin2 kernel: Call Trace:
Dec  3 23:17:42 lin2 kernel: [<ffffffff8149f51f>] ? thread_return+0x9c/0x7ed
Dec  3 23:17:42 lin2 kernel: [<ffffffff8100c200>] ? restore_args+0x0/0x30
Dec  3 23:17:42 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150
Dec  3 23:17:42 lin2 kernel: [<ffffffff8100c14e>] ? common_interrupt+0xe/0x13
Dec  3 23:17:42 lin2 kernel: [<ffffffff814a3085>] ? _spin_unlock+0x15/0x80
Dec  3 23:17:42 lin2 kernel: [<ffffffff8109bad0>] ? futex_wait_setup+0x110/0x150
Dec  3 23:17:42 lin2 kernel: [<ffffffff8109d5fb>] ? futex_wait_requeue_pi+0x1bb/0x430
Dec  3 23:17:42 lin2 kernel: [<ffffffff8104f7f5>] ? try_to_wake_up+0x2c5/0x2f0
Dec  3 23:17:42 lin2 kernel: [<ffffffff8149ff11>] ? preempt_schedule+0x61/0x80
Dec  3 23:17:42 lin2 kernel: [<ffffffff814a30de>] ? _spin_unlock+0x6e/0x80
Dec  3 23:17:42 lin2 kernel: [<ffffffff8109c1a9>] ? futex_wake+0x129/0x140
Dec  3 23:17:42 lin2 kernel: [<ffffffff8109e20c>] ? do_futex+0x8c/0xae0
Dec  3 23:17:42 lin2 kernel: [<ffffffff8107feb0>] ? autoremove_wake_function+0x0/0x60
Dec  3 23:17:42 lin2 kernel: [<ffffffff81133aa3>] ? fget_light+0x83/0x150
Dec  3 23:17:42 lin2 kernel: [<ffffffff8109ed36>] ? sys_futex+0xd6/0x190
Dec  3 23:17:42 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0
Dec  3 23:17:42 lin2 kernel: [<ffffffff8100b7ab>] ? system_call_fastpath+0x16/0x1b
Comment 1 Stas Sergeev 2009-12-03 21:23:10 UTC
Another trace here, looks easily triggerable...

Dec  4 00:16:24 lin2 kernel: thunderbird-b R  running task        0  2923   2919 0x0000008c
Dec  4 00:16:24 lin2 kernel: ffffffff8100c14e ffff88005cc91b28 ffffffff8109ba52 0000000000000000
Dec  4 00:16:24 lin2 kernel: 0000000000000000 ffffffff81d249d0 0000000000000001 0000000000000000
Dec  4 00:16:24 lin2 kernel: ffffffff81d249d0 000000000000b4b0 0950000000000000 ffffffffffffff8e
Dec  4 00:16:24 lin2 kernel: Call Trace:
Dec  4 00:16:24 lin2 kernel: [<ffffffff8100c14e>] ? common_interrupt+0xe/0x13
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150
Dec  4 00:16:24 lin2 kernel: [<ffffffff81099517>] ? lock_acquire+0xb7/0x190
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150
Dec  4 00:16:24 lin2 kernel: [<ffffffff81099bb4>] ? lock_release+0xe4/0x250
Dec  4 00:16:24 lin2 kernel: [<ffffffff814a3295>] ? _spin_lock+0x45/0x70
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109bb07>] ? futex_wait_setup+0x147/0x150
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109d5fb>] ? futex_wait_requeue_pi+0x1bb/0x430
Dec  4 00:16:24 lin2 kernel: [<ffffffff8104f7f5>] ? try_to_wake_up+0x2c5/0x2f0
Dec  4 00:16:24 lin2 kernel: [<ffffffff8149ff11>] ? preempt_schedule+0x61/0x80
Dec  4 00:16:24 lin2 kernel: [<ffffffff814a30de>] ? _spin_unlock+0x6e/0x80
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109c1a9>] ? futex_wake+0x129/0x140
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109e20c>] ? do_futex+0x8c/0xae0
Dec  4 00:16:24 lin2 kernel: [<ffffffff8107feb0>] ? autoremove_wake_function+0x0/0x60
Dec  4 00:16:24 lin2 kernel: [<ffffffff81133aa3>] ? fget_light+0x83/0x150
Dec  4 00:16:24 lin2 kernel: [<ffffffff8109ed36>] ? sys_futex+0xd6/0x190
Dec  4 00:16:24 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0
Dec  4 00:16:24 lin2 kernel: [<ffffffff8100b7ab>] ? system_call_fastpath+0x16/0x1b
Comment 2 Andrew Morton 2009-12-03 23:12:11 UTC
I marked this as a regression (which I assume it is).

Also reassigned it to "process management".

Does 2.6.32 fail too?
Comment 3 Stas Sergeev 2009-12-04 05:31:22 UTC
Dunno if its a regression - I used that
kernel without any problems, but now I
upgraded my thunderbird, and this started
to happen...
I'll check with the latest git later or
tomorrow.

Dec  4 08:27:16 lin2 kernel: thunderbird-b R  running task        0  2941   2937 0x0000008c
Dec  4 08:27:16 lin2 kernel: ffff8800642dfa98 ffffffff8149f51f ffff8800642dfa18 ffffffff8104e94e
Dec  4 08:27:16 lin2 kernel: ffffffff8104e8d0 00000000f369a346 0000000000000000 00000000fa680f44
Dec  4 08:27:16 lin2 kernel: 0000000000000000 ffff8800642de000 ffff880063f04f90 ffff880063f05320
Dec  4 08:27:16 lin2 kernel: Call Trace:
Dec  4 08:27:16 lin2 kernel: [<ffffffff8149f51f>] ? thread_return+0x9c/0x7ed
Dec  4 08:27:16 lin2 kernel: [<ffffffff8104e94e>] ? finish_task_switch+0x7e/0x120
Dec  4 08:27:16 lin2 kernel: [<ffffffff8104e8d0>] ? finish_task_switch+0x0/0x120
Dec  4 08:27:16 lin2 kernel: [<ffffffff8100c200>] ? restore_args+0x0/0x30
Dec  4 08:27:16 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0
Dec  4 08:27:16 lin2 kernel: [<ffffffff8100c14e>] ? common_interrupt+0xe/0x13
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109bad0>] ? futex_wait_setup+0x110/0x150
Dec  4 08:27:16 lin2 kernel: [<ffffffff81099af9>] ? lock_release+0x29/0x250
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109b8d7>] ? get_futex_value_locked+0x47/0x90
Dec  4 08:27:16 lin2 kernel: [<ffffffff814a309d>] ? _spin_unlock+0x2d/0x80
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109bad0>] ? futex_wait_setup+0x110/0x150
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109b9c9>] ? futex_wait_setup+0x9/0x150
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109d5fb>] ? futex_wait_requeue_pi+0x1bb/0x430
Dec  4 08:27:16 lin2 kernel: [<ffffffff8104f7f5>] ? try_to_wake_up+0x2c5/0x2f0
Dec  4 08:27:16 lin2 kernel: [<ffffffff8149ff11>] ? preempt_schedule+0x61/0x80
Dec  4 08:27:16 lin2 kernel: [<ffffffff814a30de>] ? _spin_unlock+0x6e/0x80
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109c1a9>] ? futex_wake+0x129/0x140
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109e20c>] ? do_futex+0x8c/0xae0
Dec  4 08:27:16 lin2 kernel: [<ffffffff8107feb0>] ? autoremove_wake_function+0x0/0x60
Dec  4 08:27:16 lin2 kernel: [<ffffffff81133aa3>] ? fget_light+0x83/0x150
Dec  4 08:27:16 lin2 kernel: [<ffffffff8109ed36>] ? sys_futex+0xd6/0x190
Dec  4 08:27:16 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0
Dec  4 08:27:16 lin2 kernel: [<ffffffff8100b7ab>] ? system_call_fastpath+0x16/0x1b
Comment 4 Stas Sergeev 2009-12-05 22:08:56 UTC
Still no lockups on .32, will reopen
if that happens again.
Comment 5 Stas Sergeev 2009-12-07 07:45:53 UTC
OK, but on .32 I am _always_ getting the
firefox lockups with the trace below. But
it is killable with kill -9, so I don't
know if it is a kernel bug or not... My
linux desktop feels very unlucky lately. :)
Or maybe its time to run memtest...


Dec  7 01:41:27 lin2 kernel: firefox       S 0000000000000000     0 16856      1 0x00000080
Dec  7 01:41:27 lin2 kernel: ffff88005e73bc18 0000000000000046 0000000000000002 0000000000000000
Dec  7 01:41:27 lin2 kernel: ffff88007dde0000 0000000000000000 0000000000000000 ffff880002215918
Dec  7 01:41:27 lin2 kernel: 00000001008a357e ffff88005e73a000 ffff88007dde02b0 ffff88005e73bfd8
Dec  7 01:41:27 lin2 kernel: Call Trace:
Dec  7 01:41:27 lin2 kernel: [<ffffffff81084883>] futex_wait_queue_me+0xb3/0xf0
Dec  7 01:41:27 lin2 kernel: [<ffffffff81084ab2>] futex_wait+0x1f2/0x370
Dec  7 01:41:27 lin2 kernel: [<ffffffff810e4f5b>] ? __do_fault+0x13b/0x4b0
Dec  7 01:41:27 lin2 kernel: [<ffffffff81086ca9>] do_futex+0x109/0xab0
Dec  7 01:41:27 lin2 kernel: [<ffffffff812273f3>] ? __up_read+0x23/0xc0
Dec  7 01:41:27 lin2 kernel: [<ffffffff8107fd6d>] ? trace_hardirqs_on+0xd/0x10
Dec  7 01:41:27 lin2 kernel: [<ffffffff8140cbed>] ? _spin_unlock_irqrestore+0x3d/0x70
Dec  7 01:41:27 lin2 kernel: [<ffffffff81227469>] ? __up_read+0x99/0xc0
Dec  7 01:41:27 lin2 kernel: [<ffffffff81087716>] sys_futex+0xc6/0x170
Dec  7 01:41:27 lin2 kernel: [<ffffffff8140c654>] ? lockdep_sys_exit_thunk+0x35/0x67
Dec  7 01:41:27 lin2 kernel: [<ffffffff8140c5de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
Dec  7 01:41:27 lin2 kernel: [<ffffffff8100b42b>] system_call_fastpath+0x16/0x1b
Comment 6 Rafael J. Wysocki 2009-12-29 20:49:23 UTC
On Tuesday 29 December 2009, Stas Sergeev wrote:
> 29.12.2009 18:28, Rafael J. Wysocki wrote:
> >  The following bug entry is on the current list of known regressions
> >  introduced between 2.6.31 and 2.6.32.  Please verify if it still should
> >  be listed and let me know (either way).
> >  Bug-Entry  : http://bugzilla.kernel.org/show_bug.cgi?id=14727
> >  Subject            : livelock in futex_wait_setup
> >  Submitter  : Stas Sergeev<stsp@aknet.ru>
> >  Date               : 2009-12-03 20:55 (27 days old)
> Hi, I think it may not be listed as
> the regression. It changed the behaveour
> after an update to 2.6.32 from -rc5, and
> it changes the behaveour after userspace
> updates, so I can't reliably reproduce it
> any more, and the dependancy on a particular
> kernel version is unclear...