Created attachment 24006 [details] stack trace I've got the unkillable thunderbird-bin process eating 100% of CPU. The stack trace shows that the futex_wait_setup was re-entered, and maybe this caused a problem. Is there some lock that had to be irqsafe and is not? Dec 3 23:17:42 lin2 kernel: thunderbird-b R running task 0 2941 2937 0x0000008c Dec 3 23:17:42 lin2 kernel: ffff88005d61bb18 ffffffff8149f51f ffff88005d61ba68 00000000beca0c41 Dec 3 23:17:42 lin2 kernel: ffff880064180f60 ffffffff8100c200 ffff88005d61bc70 00000000beca0c41 Dec 3 23:17:42 lin2 kernel: ffff88005d61ba98 0000000000000000 0000000000000002 0000000000000000 Dec 3 23:17:42 lin2 kernel: Call Trace: Dec 3 23:17:42 lin2 kernel: [<ffffffff8149f51f>] ? thread_return+0x9c/0x7ed Dec 3 23:17:42 lin2 kernel: [<ffffffff8100c200>] ? restore_args+0x0/0x30 Dec 3 23:17:42 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150 Dec 3 23:17:42 lin2 kernel: [<ffffffff8100c14e>] ? common_interrupt+0xe/0x13 Dec 3 23:17:42 lin2 kernel: [<ffffffff814a3085>] ? _spin_unlock+0x15/0x80 Dec 3 23:17:42 lin2 kernel: [<ffffffff8109bad0>] ? futex_wait_setup+0x110/0x150 Dec 3 23:17:42 lin2 kernel: [<ffffffff8109d5fb>] ? futex_wait_requeue_pi+0x1bb/0x430 Dec 3 23:17:42 lin2 kernel: [<ffffffff8104f7f5>] ? try_to_wake_up+0x2c5/0x2f0 Dec 3 23:17:42 lin2 kernel: [<ffffffff8149ff11>] ? preempt_schedule+0x61/0x80 Dec 3 23:17:42 lin2 kernel: [<ffffffff814a30de>] ? _spin_unlock+0x6e/0x80 Dec 3 23:17:42 lin2 kernel: [<ffffffff8109c1a9>] ? futex_wake+0x129/0x140 Dec 3 23:17:42 lin2 kernel: [<ffffffff8109e20c>] ? do_futex+0x8c/0xae0 Dec 3 23:17:42 lin2 kernel: [<ffffffff8107feb0>] ? autoremove_wake_function+0x0/0x60 Dec 3 23:17:42 lin2 kernel: [<ffffffff81133aa3>] ? fget_light+0x83/0x150 Dec 3 23:17:42 lin2 kernel: [<ffffffff8109ed36>] ? sys_futex+0xd6/0x190 Dec 3 23:17:42 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0 Dec 3 23:17:42 lin2 kernel: [<ffffffff8100b7ab>] ? system_call_fastpath+0x16/0x1b
Another trace here, looks easily triggerable... Dec 4 00:16:24 lin2 kernel: thunderbird-b R running task 0 2923 2919 0x0000008c Dec 4 00:16:24 lin2 kernel: ffffffff8100c14e ffff88005cc91b28 ffffffff8109ba52 0000000000000000 Dec 4 00:16:24 lin2 kernel: 0000000000000000 ffffffff81d249d0 0000000000000001 0000000000000000 Dec 4 00:16:24 lin2 kernel: ffffffff81d249d0 000000000000b4b0 0950000000000000 ffffffffffffff8e Dec 4 00:16:24 lin2 kernel: Call Trace: Dec 4 00:16:24 lin2 kernel: [<ffffffff8100c14e>] ? common_interrupt+0xe/0x13 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150 Dec 4 00:16:24 lin2 kernel: [<ffffffff81099517>] ? lock_acquire+0xb7/0x190 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150 Dec 4 00:16:24 lin2 kernel: [<ffffffff81099bb4>] ? lock_release+0xe4/0x250 Dec 4 00:16:24 lin2 kernel: [<ffffffff814a3295>] ? _spin_lock+0x45/0x70 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109ba52>] ? futex_wait_setup+0x92/0x150 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109bb07>] ? futex_wait_setup+0x147/0x150 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109d5fb>] ? futex_wait_requeue_pi+0x1bb/0x430 Dec 4 00:16:24 lin2 kernel: [<ffffffff8104f7f5>] ? try_to_wake_up+0x2c5/0x2f0 Dec 4 00:16:24 lin2 kernel: [<ffffffff8149ff11>] ? preempt_schedule+0x61/0x80 Dec 4 00:16:24 lin2 kernel: [<ffffffff814a30de>] ? _spin_unlock+0x6e/0x80 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109c1a9>] ? futex_wake+0x129/0x140 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109e20c>] ? do_futex+0x8c/0xae0 Dec 4 00:16:24 lin2 kernel: [<ffffffff8107feb0>] ? autoremove_wake_function+0x0/0x60 Dec 4 00:16:24 lin2 kernel: [<ffffffff81133aa3>] ? fget_light+0x83/0x150 Dec 4 00:16:24 lin2 kernel: [<ffffffff8109ed36>] ? sys_futex+0xd6/0x190 Dec 4 00:16:24 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0 Dec 4 00:16:24 lin2 kernel: [<ffffffff8100b7ab>] ? system_call_fastpath+0x16/0x1b
I marked this as a regression (which I assume it is). Also reassigned it to "process management". Does 2.6.32 fail too?
Dunno if its a regression - I used that kernel without any problems, but now I upgraded my thunderbird, and this started to happen... I'll check with the latest git later or tomorrow. Dec 4 08:27:16 lin2 kernel: thunderbird-b R running task 0 2941 2937 0x0000008c Dec 4 08:27:16 lin2 kernel: ffff8800642dfa98 ffffffff8149f51f ffff8800642dfa18 ffffffff8104e94e Dec 4 08:27:16 lin2 kernel: ffffffff8104e8d0 00000000f369a346 0000000000000000 00000000fa680f44 Dec 4 08:27:16 lin2 kernel: 0000000000000000 ffff8800642de000 ffff880063f04f90 ffff880063f05320 Dec 4 08:27:16 lin2 kernel: Call Trace: Dec 4 08:27:16 lin2 kernel: [<ffffffff8149f51f>] ? thread_return+0x9c/0x7ed Dec 4 08:27:16 lin2 kernel: [<ffffffff8104e94e>] ? finish_task_switch+0x7e/0x120 Dec 4 08:27:16 lin2 kernel: [<ffffffff8104e8d0>] ? finish_task_switch+0x0/0x120 Dec 4 08:27:16 lin2 kernel: [<ffffffff8100c200>] ? restore_args+0x0/0x30 Dec 4 08:27:16 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0 Dec 4 08:27:16 lin2 kernel: [<ffffffff8100c14e>] ? common_interrupt+0xe/0x13 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109bad0>] ? futex_wait_setup+0x110/0x150 Dec 4 08:27:16 lin2 kernel: [<ffffffff81099af9>] ? lock_release+0x29/0x250 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109b8d7>] ? get_futex_value_locked+0x47/0x90 Dec 4 08:27:16 lin2 kernel: [<ffffffff814a309d>] ? _spin_unlock+0x2d/0x80 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109bad0>] ? futex_wait_setup+0x110/0x150 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109b9c9>] ? futex_wait_setup+0x9/0x150 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109d5fb>] ? futex_wait_requeue_pi+0x1bb/0x430 Dec 4 08:27:16 lin2 kernel: [<ffffffff8104f7f5>] ? try_to_wake_up+0x2c5/0x2f0 Dec 4 08:27:16 lin2 kernel: [<ffffffff8149ff11>] ? preempt_schedule+0x61/0x80 Dec 4 08:27:16 lin2 kernel: [<ffffffff814a30de>] ? _spin_unlock+0x6e/0x80 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109c1a9>] ? futex_wake+0x129/0x140 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109e20c>] ? do_futex+0x8c/0xae0 Dec 4 08:27:16 lin2 kernel: [<ffffffff8107feb0>] ? autoremove_wake_function+0x0/0x60 Dec 4 08:27:16 lin2 kernel: [<ffffffff81133aa3>] ? fget_light+0x83/0x150 Dec 4 08:27:16 lin2 kernel: [<ffffffff8109ed36>] ? sys_futex+0xd6/0x190 Dec 4 08:27:16 lin2 kernel: [<ffffffff81096a8d>] ? trace_hardirqs_on_caller+0x16d/0x1c0 Dec 4 08:27:16 lin2 kernel: [<ffffffff8100b7ab>] ? system_call_fastpath+0x16/0x1b
Still no lockups on .32, will reopen if that happens again.
OK, but on .32 I am _always_ getting the firefox lockups with the trace below. But it is killable with kill -9, so I don't know if it is a kernel bug or not... My linux desktop feels very unlucky lately. :) Or maybe its time to run memtest... Dec 7 01:41:27 lin2 kernel: firefox S 0000000000000000 0 16856 1 0x00000080 Dec 7 01:41:27 lin2 kernel: ffff88005e73bc18 0000000000000046 0000000000000002 0000000000000000 Dec 7 01:41:27 lin2 kernel: ffff88007dde0000 0000000000000000 0000000000000000 ffff880002215918 Dec 7 01:41:27 lin2 kernel: 00000001008a357e ffff88005e73a000 ffff88007dde02b0 ffff88005e73bfd8 Dec 7 01:41:27 lin2 kernel: Call Trace: Dec 7 01:41:27 lin2 kernel: [<ffffffff81084883>] futex_wait_queue_me+0xb3/0xf0 Dec 7 01:41:27 lin2 kernel: [<ffffffff81084ab2>] futex_wait+0x1f2/0x370 Dec 7 01:41:27 lin2 kernel: [<ffffffff810e4f5b>] ? __do_fault+0x13b/0x4b0 Dec 7 01:41:27 lin2 kernel: [<ffffffff81086ca9>] do_futex+0x109/0xab0 Dec 7 01:41:27 lin2 kernel: [<ffffffff812273f3>] ? __up_read+0x23/0xc0 Dec 7 01:41:27 lin2 kernel: [<ffffffff8107fd6d>] ? trace_hardirqs_on+0xd/0x10 Dec 7 01:41:27 lin2 kernel: [<ffffffff8140cbed>] ? _spin_unlock_irqrestore+0x3d/0x70 Dec 7 01:41:27 lin2 kernel: [<ffffffff81227469>] ? __up_read+0x99/0xc0 Dec 7 01:41:27 lin2 kernel: [<ffffffff81087716>] sys_futex+0xc6/0x170 Dec 7 01:41:27 lin2 kernel: [<ffffffff8140c654>] ? lockdep_sys_exit_thunk+0x35/0x67 Dec 7 01:41:27 lin2 kernel: [<ffffffff8140c5de>] ? trace_hardirqs_on_thunk+0x3a/0x3f Dec 7 01:41:27 lin2 kernel: [<ffffffff8100b42b>] system_call_fastpath+0x16/0x1b
On Tuesday 29 December 2009, Stas Sergeev wrote: > 29.12.2009 18:28, Rafael J. Wysocki wrote: > > The following bug entry is on the current list of known regressions > > introduced between 2.6.31 and 2.6.32. Please verify if it still should > > be listed and let me know (either way). > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14727 > > Subject : livelock in futex_wait_setup > > Submitter : Stas Sergeev<stsp@aknet.ru> > > Date : 2009-12-03 20:55 (27 days old) > Hi, I think it may not be listed as > the regression. It changed the behaveour > after an update to 2.6.32 from -rc5, and > it changes the behaveour after userspace > updates, so I can't reliably reproduce it > any more, and the dependancy on a particular > kernel version is unclear...