Bug 13355 - Core pattern piping race
Summary: Core pattern piping race
Status: RESOLVED CODE_FIX
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Neil Horman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-21 15:27 UTC by Earl Chew
Modified: 2010-01-07 11:47 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.29.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
patch to block core_pattern dumping until the dump application exits (2.06 KB, patch)
2009-05-22 14:32 UTC, Neil Horman
Details | Diff
kmod.c patch against older implementation (1.02 KB, patch)
2009-05-22 23:27 UTC, Earl Chew
Details | Diff
new patch (8.66 KB, patch)
2009-06-05 01:53 UTC, Neil Horman
Details | Diff
new version of core_pattern patch (10.53 KB, patch)
2009-06-16 19:36 UTC, Neil Horman
Details | Diff
Minor edit to allow core_pipe_limit to be accessible. (408 bytes, patch)
2009-06-19 16:58 UTC, Earl Chew
Details | Diff

Description Earl Chew 2009-05-21 15:27:12 UTC
In:

http://lwn.net/Articles/280959/

there is a sample core pipe receiver which does this:

>            /* Change our current working directory to that of the
>               crashing process */
>
>            snprintf(cwd, PATH_MAX, "/proc/%s/cwd", argv[1]);
>            chdir(cwd);

My inspection of the patches suggests to me that there is no
guarantee that the failing process is still available
when the pipe receiver activates and works on core file.

Some experiments that I've done seem to confirm this:

a. I have my core pipe receiver simply sleep(30), and I cannot
   find the /proc/pid/ entry for the failing process using
   another interactive shell window.

b. I remove the sleep(30), and my core pipe receiver can find
   /proc/pid/exe.


If my analysis is correct, I'm wondering if there is a straightforward
way to hold off destruction of the failing process until the
core pipe receiver "acknowledges" (somehow) that it has dealt with
the image.

I think this would be useful because it would then allow the receiver
to inspect /proc/pid/* to emit useful things about the failing
process.
Comment 1 Alan 2009-05-21 16:19:09 UTC
Using /proc to get the path wouldn't be sufficient anyway - the following could have occurred

core dump begins
mv core-dir old-core-dir
mkdir core-dir

You would need an actual handle to that directory to be sure.

I guess the logical approach would be to start the core dumping thread in the cwd of the core dumped
task if we don't do so already.

This isn't however a bug - so really if you want it fixed you need to submit patches upstream
Comment 2 Neil Horman 2009-05-21 17:21:59 UTC
You miss the point here, this is using the core_pattern infrastructure, and its the user space core collector that reading the pipe of the core dump, which might also need access to the crashing processes /proc/<pid> directory.  We need to block the return of do_coredump until such time as the user space application is finished and exits.
Comment 3 Alan 2009-05-21 19:35:09 UTC
I don't think I am but maybe my description is confusing:

At the point the core pattern pipe is executed the directory the program ran in may not even be attached to the file system. It may be in a different namespace, it may have been deleted, it may be blocked by security. Thus we cannot pass it by name

If we allowed dead processes to get stuck forever when the core dump handler wedged mayhem would result, so we can't block without adding a lot of risk

So you would have to find a way of passing the directory handle somehow and the obvious way would be to make call_usermodehelper_pipe() able to run the helper with the same cwd as the core dumping process.
Comment 4 Neil Horman 2009-05-22 00:19:10 UTC
I think I see our disconnect, the race we're talking about isn't with writing to a directory, its with reading files in a directory, specifically the /proc/<pid> directory of the crashing process.  That directory can't be moved detached, etc while the process is crashing, but since, when we specify core_pattern as being a pipe to an executable, we may or may not be able to access that crashing processes /proc/<pid> directory.  Weather or not we can depends on the size of the crashing process and the size of the stdin pipe.  If the crashing process is small, and the core image fits into the pipe without blocking, do_coredump in the kernel will complete and clean the crashing process, removing its proc directory along the way.  Since the contents of that directory can be very usefull to a core_pattern receiving process we'd like to be able to guarantee its accessibility.  I agree that simply blocking on the process completion in the kernel is prone to errors, and other dangers, but I've got a few other ideas that I'm playing with at the moment.
Comment 5 Alan 2009-05-22 09:40:59 UTC
The example you posted uses /proc/<pid> to retrieve the cwd - that cwd in /proc is useless so the example is pointless even if you could use /proc/pid reliably.

The core dump pattern formatter can insert things like pid - because that is the only safe way to get hold of them. What do you need from /proc/pid that can be used in any other context ?
Comment 6 Neil Horman 2009-05-22 10:30:24 UTC
I didn't post it, but regardless, the fact that the example may not work doesn't make the info in proc useless.  With proc still around you can access data about the process thats not in the core file (like the full path to the executable, what its cwd was (even if those things have moved) whats its limits or personality flags were, etc.  Relevant debug information...
Comment 7 Neil Horman 2009-05-22 14:32:34 UTC
Created attachment 21492 [details]
patch to block core_pattern dumping until the dump application exits

Heres a patch for you to try Eric, I've not tested it yet, but it should do what you need.

For all the reasons Alan has pointed out, and then some, I'm not really thrilled with the idea of blocking like this in do_coredump, but this should prove at least that doing so will keep the crashing process around. 

I'm going to fiddle with the notion of just holding a reference to the proc directory as an alternative (perhaps by setting core_patterns cwd to /proc/<pid>), but if you could test this in the meantime, that would be great
Comment 8 Alan 2009-05-22 14:39:14 UTC
So put the relevant bits into the formatting options - which is what we always did before when one turned up that was really useful
Comment 9 Neil Horman 2009-05-22 14:54:52 UTC
Thats alot of bits to put on the command line, but it might be an option, depending on how the above testing turns out.
Comment 10 Alan 2009-05-22 15:07:25 UTC
You only have to pass what is really needed - you can also pass it as environment variables which might actually be better for compatibility and for size ?
Comment 11 Earl Chew 2009-05-22 16:12:24 UTC
(In reply to comment #7)
> I'm going to fiddle with the notion of just holding a reference to the proc
> directory as an alternative (perhaps by setting core_patterns cwd to
> /proc/<pid>), but if you could test this in the meantime, that would be great

Looks promising for our current needs. I'll try to work it into our kernel this afternoon.
Comment 12 Earl Chew 2009-05-22 16:37:03 UTC
(In reply to comment #7)
> For all the reasons Alan has pointed out, and then some, I'm not really
> thrilled with the idea of blocking like this in do_coredump, but this should
> prove at least that doing so will keep the crashing process around. 

I know there is concern about keeping the kernel code blocked for long periods of time. This is understandable and I share that concern.

I think that this is an underlying issue that already exists with the current code and I don't see this particular weakness having been discussed (at
least not in the current context). The susceptibility to blocking likely arose
the moment provision was made to allow core dumps to be made anywhere other
than in the file system alongside the executable (ie the introduction of
core_pattern perhaps was the turning point).

For example:

a. echo '/dev/ttya' > /proc/sys/kernel/core_pattern

   In other words the writing of the core might be rate limited by the
   target file/device.

b. echo '|/usr/local/bin/slowreaper' >  /proc/sys/kernel/core_pattern

   Suppose slowreaper is something like:

           while (...) {
              sleep(10);
              ch = getchar();
              ...
           }

   Then even the current code will block in the kernel once the pipe
   buffers fill.

Granted these cases are somewhat pathological, but I think not much more so than the case where the wait4(2) takes too long to complete, or perhaps never
completes.

Admittedly all of this is rather theoretical because in practice things
usually turn out "all right". However, the kernel needs to handle lots
of situations, and in particular try to be well behaved when things are
not turning out right.

Some questions:

1. Is it reasonable to require that binfmt->core_dump() complete within
   a bounded amount of time ? Perhaps a timeout scaled to the size
   of the executable ?

2. Is it reasonable to require that the core_pattern pipe process complete
   within a bounded amount of time (IOW don't wait4(2) indefinitely) ?


I think it is entirely reasonable to allow the core_pattern pipe process
to gather whatever diagnostic information it sees fit. As an
application write, I see /proc/<pid> as essential in this regard.

I think it is somewhat unwieldy to attempt to foretell what information
might (or might not) be useful and provision that in various formatting
options in do_coredump(). This approach couples do_coredump() more
tightly to modifications and enhancements in /proc/<pid>, and I
think that would not be the best approach.
Comment 13 Neil Horman 2009-05-22 17:03:55 UTC
Alan:
"You only have to pass what is really needed" Thats what I see as the problem, I'm not entirely sure what is really needed, and theres lots of info to choose from.  I know that early users of this interface wanted to interrogate proc so they could gather info on program metadata for use in identifying program versions and releases as part of automatic bug filing utilities.  I don't really know what others wanted to do with it.  If we had a better idea of what subset of data was relevant it might be easier.  As for puttin git in env variables, that was discussed before (ubuntu tried it that way initially).  It was turned down IIRC as being prone to application breakage and aliasing (i.e. if we add a new env variable for relevant data later and programs are using that variable to hold internal information, they break.


Earl:
I think you make a good point regarding the timing of the program.  In fact, thats probably what has let this work for so long (programs are of sufficient size such that they stall the pipe writer in the kernel giving the core-collector time to interrogate proc).

Regarding the use of timeouts, I'm sure that would be safe, but I'm not sure how we syncronize that timeout event with userspace.  I was thinkging we might be able to hold an extra ref count on the task struct and prevent proc deregistration until the user space app was done with the process.  Let me know how the test results go and I'll keep working on it.
Comment 14 Earl Chew 2009-05-22 21:56:27 UTC
(In reply to comment #7)
> Created an attachment (id=21492) [details]
> patch to block core_pattern dumping until the dump application exits

I am concerned about this part of the patch:

diff --git a/kernel/kmod.c b/kernel/kmod.c
index b750675..b4de1d4 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -257,6 +257,8 @@ static void __call_usermodehelper(struct work_struct *work)
 		pid = kernel_thread(____call_usermodehelper, sub_info,
 				    CLONE_VFORK | SIGCHLD);
 
+	sub_info->retval = pid;
+
 	switch (wait) {
 	case UMH_NO_WAIT:
 		break;
@@ -264,7 +266,6 @@ static void __call_usermodehelper(struct work_struct *work)
 	case UMH_WAIT_PROC:
 		if (pid > 0)
 			break;
-		sub_info->retval = pid;
 		/* FALLTHROUGH */
 
 	case UMH_WAIT_EXEC:


The code immediately preceding this is:

	/* CLONE_VFORK: wait until the usermode helper has execve'd
	 * successfully We need the data structures to stay around
	 * until that is done.  */
	if (wait == UMH_WAIT_PROC || wait == UMH_NO_WAIT)
		pid = kernel_thread(wait_for_helper, sub_info,
				    CLONE_FS | CLONE_FILES | SIGCHLD);
	else
		pid = kernel_thread(____call_usermodehelper, sub_info,
				    CLONE_VFORK | SIGCHLD);



In the case of UMH_WAIT_EXEC, the else-arm is executed, and there is
code in ____call_usermodehelper() that does:

	retval = kernel_execve(sub_info->path, sub_info->argv, sub_info->envp);

	/* Exec failed? */
	sub_info->retval = retval;
	do_exit(0);


IOW, the thread executing ____call_usermodehelper() writes sub_info->retval(),
and I think will race with your patch.


Consider:

--- kmod.c.orig 2009-05-22 14:09:04.000000000 -0700
+++ kmod.c      2009-05-22 14:55:42.000000000 -0700
@@ -124,6 +124,7 @@
        char **envp;
        enum umh_wait wait;
        int retval;
+       int pid;
        struct file *stdin;
        void (*cleanup)(char **argv, char **envp);
 };
@@ -255,14 +256,15 @@
                pid = kernel_thread(____call_usermodehelper, sub_info,
                                    CLONE_VFORK | SIGCHLD);

+       sub_info->pid = pid;
+
        switch (wait) {
        case UMH_NO_WAIT:
                break;

        case UMH_WAIT_PROC:
-               if (pid > 0)
+               if (pid >= 0)
                        break;
-               sub_info->retval = pid;
                /* FALLTHROUGH */

        case UMH_WAIT_EXEC:
@@ -476,7 +478,10 @@
        if (wait == UMH_NO_WAIT)        /* task has freed sub_info */
                goto unlock;
        wait_for_completion(&done);
-       retval = sub_info->retval;
+       if (wait == UMH_WAIT_EXEC)
+               retval = sub_info->pid;
+       else
+               retval = sub_info->retval;

 out:
        call_usermodehelper_freeinfo(sub_info);



Note that I changed the test for success to (pid >= 0) to complement
the usual test for failure (pid < 0) as per https://lists.linux-foundation.org/pipermail/kernel-janitors/2006-July/015990.html.
Comment 15 Earl Chew 2009-05-22 23:26:30 UTC
(In reply to comment #11)
> Looks promising for our current needs. I'll try to work it into our kernel
> this afternoon.

The outcome looks good.

Our kernel has a slightly older version of kmod.c, so I had to make the
change to kmod.c in an equivalent way. See the attachment.


Here's my preliminary testing with a slow reaper.

# echo '|/tmp/slowreaper %p' > /proc/sys/kernel/core_pattern
# ls -l coredump
-rwxr-xr-x    1 root     root         3064 May 22 15:22 coredump
# ./coredump

Note the small size of the coredump program (essentially kill(getpid(), SIGSEGV)).

The slowreaper script is:

#!/bin/sh

echo Sleep $1 > /dev/console
sleep 30
echo Wake $1 > /dev/console
ls -l /proc/$1/ > /dev/console


After 30 seconds I see the content of /proc/$1/ on the console.
Comment 16 Earl Chew 2009-05-22 23:27:51 UTC
Created attachment 21497 [details]
kmod.c patch against older implementation
Comment 17 Neil Horman 2009-05-27 00:58:16 UTC
Ok, I've been thinking about this, and it occured to me, that this will actually be a safe solution as long as I fix the other bug you reported, which stops the core pattern from infinite regression.  The resource starvation that happens if a core colletion process blocks indefinately isn't much worse than if the pipe is never drained.  I'm going to fix the regression issue and push this upstream.
Comment 18 Neil Horman 2009-06-04 10:51:27 UTC
Just an update: I've not forgotten about this issue.  I'm working on ways to do crash reccursion detection.  I'll post all the changes when I have them working
Comment 19 Neil Horman 2009-06-05 01:53:52 UTC
Created attachment 21759 [details]
new patch

Heres a new version of the patch.  It revamps the way we detect recursive core dumps, and seems pretty robust in my testing.  It also waits for core_pattern helpers to complete before cleaning up the crashed process, and adds a new sysctl to limit the number of parallel core dumps to prevent flooding the box with long running core collection processes.  it works pretty well for me, but I would appreciate some extra testing before I posted it for review.  Thanks!
Comment 20 Earl Chew 2009-06-10 16:33:26 UTC
(In reply to comment #19)
> Heres a new version of the patch.

One new section generates a warning about possible recursion:

+			if (!last_exe_file) {
+				printk(KERN_WARNING "Abort on possible recursive core\n");
+				mmput(cdmm);
+				goto fail_fput;
+			}

As far as I can tell, last_exe_file can only be null in this context if pattern_task is null in the following context:

+		if (dump_count == 1) {
+			struct pid *ps = find_get_pid(pid);
+			pattern_task = get_pid_task(ps, PIDTYPE_PID);
+			if (pattern_task) {
+				cdmm = get_task_mm(pattern_task);
+				current_exe = get_mm_exe_file(cdmm);
+				last_exe_file = current_exe;
+				put_task_struct(pattern_task);
+				mmput(cdmm);
+			}
+			put_pid(ps);
+		}


This leaves me a little puzzled as to how the "possible recursive core" can be inferred from !last_exe_file .

Would you elaborate?



Further along:

+fail_dropcount:
+	if (ispipe)
+		atomic_dec(&core_dump_count);


I'm thinking that there should be some code to reset last_exe_file:

       if (ispipe && ! atomic_dec_return(&core_dump_count))
               last_exe_file = NULL;
Comment 21 Neil Horman 2009-06-10 17:39:58 UTC
>This leaves me a little puzzled as to how the "possible recursive core" can be
>inferred from !last_exe_file .

It can't really, That particular clause is there as a safety catch.  If something is wrong with the specification of /proc/sys/kernel/core_pattern, and call_usermodehelper_pipe fails, then the core_dump which sets dump_count to 1 won't have an exe file struct to point to. If another cpu is handling a second core in parallel, theres no way to compare the current exe file struct to the last one (since its still null), but we don't want to go on, because I don't want to accidentally wait forever on a garbage pid.  The real check for recursion is in the else clause of that conditional, in which we check current_exe == last_exe. I agree that "possible recursion" is a bit misleading, but I couldn't think of anything else concise to say there.  Suggestsions for a better error message welcome.

>I'm thinking that there should be some code to reset last_exe_file:
We could I suppose, but I didn't want to worry about having to lock access to that pointer, and slow this path down further.  If we reset it to NULL there, then its possible that, if we decrement dump count to zero there, a second cpu can then immeidately bump it back to 1, and we risk the chance of NULLing that pointer after the second cpu sets it to a new value.  By just leaving it alone, we can use the counter as the lock.  When we decrement to zero, the next increment will set it to its new value, which effectively prevents parallel writes (since we only do the write in one place, and its guarded by the fact that only one execution context can see the counter as being 1 at any one time.  Theres a chance that, if you have lots of core dumps happening in parallel, and you try to change the core_pattern executable while they're happening, you might get a race where the recursion check will fail inadvertently, but thats what the core_dump_limit sysctl in part can help with.

How has your testing gone?
Comment 22 Earl Chew 2009-06-10 18:02:21 UTC
(In reply to comment #21)
> We could I suppose, but I didn't want to worry about having to lock access to
> that pointer, and slow this path down further.  If we reset it to NULL there,
> then its possible that, if we decrement dump count to zero there, a second
> cpu
> can then immeidately bump it back to 1, and we risk the chance of NULLing
> that
> pointer after the second cpu sets it to a new value.


Yikes. Sorry about that silly suggestion. Not enough coffee this morning ;-)

What I should have written was:

       if (ispipe) {
               if (dump_count == 1)
                   last_exe_file = NULL;
               atomic_dec(&core_dump_count);
       }

If dump_count was one, this thread was privileged enough to be the first, and thus has setting rights on last_ext_file, so it is safe for it to also have resetting rights.


If you do not reset last_exe_file, the predicate !last_exe_file will never be true after the first core dump is taken after boot --- which renders the whole if(!last_exe_file){} rather useless in practical terms.


> Theres a chance that, if you have lots of core dumps happening in parallel,
> and you try to change the core_pattern executable while they're happening,
> you
> might get a race where the recursion check will fail inadvertently, but thats
> what the core_dump_limit sysctl in part can help with.


Yes, there is a reasonable chance that two processes will race to dump core simultaneously.



One of those processes will the first to dump, and the other second as counted by core_dump_count.

The second might get to run this code:


+            if (!last_exe_file) {
+                printk(KERN_WARNING "Abort on possible recursive core\n");
+                mmput(cdmm);
+                goto fail_fput;
+            }


before the first, and so will fail to dump for (apparently) no good reason.


All in all, I think you're better off forgetting about the "possible" clause and going with the simpler:

+		if (dump_count > 1) {
+			cdmm = get_task_mm(current);
+			current_exe = get_mm_exe_file(cdmm);
+			if ( ! last_exe_file && last_exe_file == current_exe) {
+				printk(KERN_WARNING "Abort on recursive core\n");
+				mmput(cdmm);
+				goto fail_fput;
+			}


It appears that get_mm_ext_file() can return NULL (if it's confused?), but I'm not sure what this code should do if that eventuates -- probably nothing. I can't see any other use of current_exe.


> How has your testing gone?


I only started looking at this today with a view of merging your patch, so I don't have any concrete test details to report (apart from these musings upon reviewing your changes).
Comment 23 Neil Horman 2009-06-10 20:20:57 UTC
>If dump_count was one, this thread was privileged enough to be the first, and
>thus has setting rights on last_ext_file, so it is safe for it to also have
>resetting rights.
Unfortunately, thats not the case, because the do_coredump code isn't serialized.  Jsut because a given context grabs the dump_count = 1 reference, doesn't mean that other cpus aren't processing dumps while this one is finishing.  By returning last_exe_file to null, it prevents those parallel dumps from making the recursive crash check.



>before the first, and so will fail to dump for (apparently) no good reason.

Thats true, my concern is that the last_exe_file == NULL when dump_count > 1 case is ambiguous in meaning.  Does it mean that a previous attempt to dump failed in forking the user space app, or does it mean that dump_count==2 raced ahead of dump_count=1?  I'm really worreid about letting recursive dumps through, although it wouldn't hurt to try the other way.  Feel free to make that null check change if you'd like to test with it.

Let me know how your testing goes, we'll make a final version of the patch based on those test, and then I'll post.  Thanks!
Comment 24 Earl Chew 2009-06-12 17:09:18 UTC
(In reply to comment #23)
> >If dump_count was one, this thread was privileged enough to be the first,
> and
> >thus has setting rights on last_ext_file, so it is safe for it to also have
> >resetting rights.
> Unfortunately, thats not the case, because the do_coredump code isn't
> serialized.  Jsut because a given context grabs the dump_count = 1 reference,
> doesn't mean that other cpus aren't processing dumps while this one is
> finishing.  By returning last_exe_file to null, it prevents those parallel
> dumps from making the recursive crash check.

I got dragged off to some other work. I'm still working on the merge and test.

In the meantime, I thought of another approach to detect recursion which is more general and robust.

Suppose we add the ability to mark a process as being a core dumper. Say introduce a COREDUMPER attribute to the process. Normally a process is not a core dumper.

When the core_pattern indicates a pipe, that pipe is established and the target process is marked as a core dumper.

The core dumper attribute is sticky and transfers across a fork() and across an exec().

The do_coredump() need only check if the dumpee is a core dumper (ie has the COREDUMPER attribute), and if so can take evasive action.

The advantages of this approach are that it can detect likely recursion under many conditions.

The evasive action on detecting that the dumpee is a core dumper is probably to revert to the default "core" action at the very least. Perhaps we can get a little more creative in this regard.
Comment 25 Neil Horman 2009-06-12 18:06:51 UTC
No, I'd rather not.

Its not a bad idea, but to implement it, we'd need to introduce a new field to task_struct and nobody is going to want that (burning space for such a rare use case is going to be frowned upon, and all the exitsting flags fields are already in use.  What we have in the patch above works well in my testing, and I don't see any holes in it.  Please test it as is, and let me know the results.
Comment 26 Alan 2009-06-12 18:18:06 UTC
Just set the coresize limit to zero for the dumper - thats something that root can override with care if they wish (eg to debug dump tools) and needs no new features ?
Comment 27 Earl Chew 2009-06-12 18:32:05 UTC
> What we have in the patch above works well in my testing, and
> I don't see any holes in it.  Please test it as is, and let me know the
> results.

Ok.


I understand that there may be objections to using up valuable flag space.
I believe the PF_* flags are the ones in question:

[snip]
#define PF_VCPU		0x00000010	/* I'm a virtual CPU */
#define PF_FORKNOEXEC	0x00000040	/* forked but didn't exec */
#define PF_SUPERPRIV	0x00000100	/* used super-user privileges */
[snip]
#define PF_USED_MATH	0x00002000	/* if unset the fpu must be initialized before use */
#define PF_NOFREEZE	0x00008000	/* this thread should not be frozen */
[snip]
#define PF_THREAD_BOUND	0x04000000	/* Thread bound to specific cpu */
#define PF_MEMPOLICY	0x10000000	/* Non-default NUMA mempolicy */
[snip]

It seems to me that the following flag bits are available:

             0x00000020
             0x00000080
             0x00004000
             0x08000000

Thus whilst there may be a great reluctance to use these, I don't think it's true to say that they're all used up.
Comment 28 Earl Chew 2009-06-12 18:45:04 UTC
(In reply to comment #26)
> Just set the coresize limit to zero for the dumper - thats something that
> root
> can override with care if they wish (eg to debug dump tools) and needs no new
> features ?

I don't think this will work out of the box.


IIRC the current strategy is to ignore the coresize limit if a pipe (ie |) is seen in the core_pattern. In this case, do_coredump() proceeds to launch the process and to pipe in the core dump.


I'm ok with changing the semantics so that the piped core_pattern proceeds if the coresize limit is non-zero. This would be not be a problem for the application(s) I have in mind. Others might have differing opinions.
Comment 29 Neil Horman 2009-06-12 19:56:06 UTC
Earl:
     Yes, you are correct, they aren't completely exhausted, technically, but I'm still very hesitant to burn a flag on indicating that process can't recurse through do_coredump.  Its just such a rare condition.  And I really think that doing so isn't needed.  The code I have as it stands now catches recursion in all the testing that I've done.  About the ony case it doesn't catch is the case in which a core dump helper forks a child and that segfaults, which the core dump limit sysctl catches.  I'd really like to get some more robust testing on what we have currently, please.

Alan:
    This is actually a good point.  We already set RLIM_CORE to 0 in ____call_usermodehelper, so that part is taken care of.  we are currently ignoring the RLIM_CORE setting when a pipe is used in do_coredump though because a partial core to a usermodehelper isn't very usefull.  Its that ingnoring of the RLIMIT_CORE that opens us to this recursion problem.  If we were to start honoring the core limit we could avoid the recursion issue alltogether, and the price would be the need to manually set ulimit -c unlimited on any process that was to use a piped core helper.  IIRC the initial argument against this was that it required the co-ordination of each process in the system to make use of this facility.  Perhaps its not to high a price to pay...

Thoughts?
Comment 30 Alan 2009-06-15 16:54:36 UTC
Interpreting 0 as "no dump" for pipe cases as well would probably not be unreasonable - the length has no meaning in a pipe but the 0 still implies a limit.
Comment 31 Neil Horman 2009-06-15 19:31:29 UTC
Yeah, agreed, that won't make for too pleasant a backport to many distro's, but whatever, its sounding more and more like the most sane solution.  I'll rewrite this in teh next few days to use RLIMIT_CORE as the protection mechanism.
Comment 32 Neil Horman 2009-06-16 19:36:02 UTC
Created attachment 21946 [details]
new version of core_pattern patch

Ok, here's a new variant of the patch.  As We've been discussing, it takes a new approach to recursion detection.  Instead of actually looking for recursive cores, we simply fail to dump any core which has a process RLIMIT_CORE value of 0.  Since the usemodehelper process sets that value, we don't recurse.  Of course we could if the process in user space changed its rlim, but it runs as root, so its allowed to do stupid things.  process waiting, and the core limit sysctl are in this version of the patch as well (details in the patch itself).  Earl, this patch has worked on my test system here quite well.  Please provide feedback ASAP, and I'll get it posted for review.  Thanks!
Comment 33 Earl Chew 2009-06-19 16:58:57 UTC
Created attachment 22009 [details]
Minor edit to allow core_pipe_limit to be accessible.

I had to make this modification to have the core_pipe_limit readable on my kernel.
Comment 34 Neil Horman 2009-06-19 17:34:11 UTC
Yep, thanks for that.  Is the testing going well?
Comment 35 Earl Chew 2009-06-19 17:36:31 UTC
bugzilla-daemon@bugzilla.kernel.org wrote:
> --- Comment #34 from Neil Horman <nhorman@tuxdriver.com>  2009-06-19 17:34:11
> ---
> Yep, thanks for that.  Is the testing going well?

Yes. No other anomalies to report so far. I'll attempt to stress
some of the corner cases this afternoon after I get some other
kernel code review snippets out to my colleagues.

Earl
Comment 36 Neil Horman 2009-06-19 17:53:04 UTC
Copy that, If I don't hear from you on this over the weekend then, I'll assume all is well and post it monday late morning.  Thanks!

Note You need to log in before you can comment on or make changes to this bug.