Steps to reproduce: Run a threaded program that creates a lot of threads and passes a lot of signals back and forth (using tkill or tgkill) to the threads calling pthread_create. If a signal arrives at an inopportune moment, we take this path in clone: /* * Process group and session signals need to be delivered to just the * parent before the fork or both the parent and the child after the * fork. Restart if a signal comes in before we add the new process to * it's process group. * A fatal signal pending means that current will exit, so the new * thread can't slip out of an OOM kill (or normal SIGKILL). */ recalc_sigpending(); if (signal_pending(current)) { spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock); retval = -ERESTARTNOINTR; goto bad_fork_cleanup_namespace; } There's a number of other failure paths too but this is the easiest to reproduce without resource starvation. The problem is that some lines earlier we did this: if (clone_flags & CLONE_PARENT_SETTID) if (put_user(p->pid, parent_tidptr)) goto bad_fork_cleanup_delays_binfmt; We shouldn't really be doing this until we know that the fork is going to succeed. Otherwise, a walk through NPTL's data structures will temporarily show a TID which was never actually created. This can lead to some confusion in the code for broadcasting setuid to multiple threads; the thread appears to have been created, but in fact will be created a moment later with an entirely different TID. I think it should be safe to move the put_user down.
Hi All, I am currently 2.6.18.8 kernel for armv4tl platform. I am finding the exactly same issue as mentioned in this bug. The pthread_create call is failing since there is a signal pending and clone is returning ERESTARTNOINTR to the userspace. Please share the details of the fix mentioned in this Bug 7210 so that I can patch it to kernel 2.6.18.8 Thanks, Raghu
Just for future reference, patch is available at https://lkml.org/lkml/2007/1/2/295 -Murali
I find below reply which has the contents of patch ------------------------------------------------------------------------------ From: Daniel Jacobowitz <dan@codesourcery.com> Do not implement CLONE_PARENT_SETTID until we know that clone will succeed. If we do it too early NPTL's data structures temporarily reference a non-existant TID. Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com> --- On Tue, Sep 26, 2006 at 08:59:15PM -0700, Linus Torvalds wrote: > > On Tue, 26 Sep 2006, Roland McGrath wrote: > > > > It can go last, right before return, after unlock. > > Userland only cares that parent_tidptr set before parent syscall returns, > > and child_tidptr set before child returns. > > Ok, as long as people are sure, I don't care. Then we have to just ignore > the error, though, since we can't recover (we've already "exposed" the > child on the task lists). > > I don't think it's a big deal. Ignoring the error just means that if you > pass in an invalid ptr, it's as if the bit to set that value wasn't set. > Not a problem. > > Especially if there is a test-program, can we just have a patch to try > that has been verified? It _sounded_ like somebody actually had a program > that could trigger this with some horrid code that sent signals and cloned > all the time? I never got back to you about this... Refresher, if there isn't enough above: CLONE_PARENT_SETTID is currently implemented right after a TID is assigned. There's a lot of clone left to go at that point including a check for pending signals which can lead to clone failing. This leaves a TID in NPTL's thread list which doesn't correspond to a thread. I found Sunday another place where this is a problem, besides the process-global UID stuff in glibc. GDB tries to attach to the nonexistant thread and gets upset. I've made it cope, but at the same time it provides a convenient test case. Without the attached patch, tls.exp in the GDB testsuite would intermittently report that it could not attach to a thread - always within half an hour. With the patch it ran for four hours without a problem. kernel/fork.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) Index: linux-source-2.6.18/kernel/fork.c =================================================================== --- linux-source-2.6.18.orig/kernel/fork.c 2007-01-02 13:45:28.000000000 -0500 +++ linux-source-2.6.18/kernel/fork.c 2007-01-02 13:52:09.000000000 -0500 @@ -1012,10 +1012,6 @@ static struct task_struct *copy_process( delayacct_tsk_init(p); /* Must remain after dup_task_struct() */ copy_flags(clone_flags, p); p->pid = pid; - retval = -EFAULT; - if (clone_flags & CLONE_PARENT_SETTID) - if (put_user(p->pid, parent_tidptr)) - goto bad_fork_cleanup_delays_binfmt; INIT_LIST_HEAD(&p->children); INIT_LIST_HEAD(&p->sibling); @@ -1251,6 +1247,14 @@ static struct task_struct *copy_process( total_forks++; spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock); + + /* + * Now that we know the fork has succeeded, record the new + * TID. It's too late to back out if this fails. + */ + if (clone_flags & CLONE_PARENT_SETTID) + put_user(p->pid, parent_tidptr); + proc_fork_connector(p); return p; @@ -1281,7 +1285,6 @@ bad_fork_cleanup_policy: bad_fork_cleanup_cpuset: #endif cpuset_exit(p); -bad_fork_cleanup_delays_binfmt: delayacct_tsk_free(p); if (p->binfmt) module_put(p->binfmt->module); -- Daniel Jacobowitz CodeSourcery -