Bug 43300

Summary: ::prctl(PR_SET_PDEATHSIG, SIGKILL) uses parent thread death to signal instead of parent process death.
Product: Documentation Reporter: David Wilcox (davidvsthegiant)
Component: man-pagesAssignee: documentation_man-pages (documentation_man-pages)
Status: RESOLVED DOCUMENTED    
Severity: normal CC: alan, AlphaEtaPi, filbranden, hans, hansecke, j, mtk.manpages, oleg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: - Subsystem:
Regression: No Bisected commit-id:
Attachments: Test case with parameters and signal handler
Patch to forget_original_parent() on kernel/exit.c
Updated test case calling pthread_exit() from main thread
git commit
Test case illustrating pdeath_signal losing effect if thread exits before parent
Test case illustrating prctl(PR_GET_PDEATHSIG) from a thread
Patch moving pdeath_signal from task_struct to signal_struct

Description David Wilcox 2012-05-25 21:50:40 UTC
I have a process that is forking to a child process. The child process should not exist if the parent process exists. So, I call ::prctl(PR_SET_PDEATHSIG, SIGKILL) in the child process to kill it if the parent dies. What ends up happening is the parent thread calls pthread_exit, and that thread ends up being the catalyst that kills the child process.

Steps to Reproduce:

Here is my code:

parent.cpp:

#include <sys/prctl.h>
#include <signal.h>
#include <unistd.h>
#include <pthread.h>
#include <iostream>

void* run(void* ptr) {

    std::cout << "thread:" << getpid() << ":" << std::hex << pthread_self() << ":" << std::dec << getppid() << std::endl;
    auto pid = fork();
    if ( pid != 0 ) {
        sleep(1);
    }
    else {
        char* arg = NULL;
        execv("./child", &arg);
    }
    return NULL;
}

int main() {

    std::cout << "main:" << getpid() << ":" << std::hex << pthread_self() << ":" << std::dec << getppid() << std::endl;

    pthread_t threadid;
    pthread_attr_t attr;

    ::pthread_attr_init( &attr );
    ::pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );
    ::pthread_create(&threadid,&attr,run,NULL);

    sleep(6);

    return 0;
}


child.cpp:

#include <sys/prctl.h>
#include <signal.h>
#include <unistd.h>
#include <iostream>

int main() {
    std::cout << "child:" << getpid() << ":" << std::hex << pthread_self() << ":" << std::dec << getppid() << std::endl;
    ::prctl( PR_SET_PDEATHSIG, SIGKILL );
    sleep(6);


    return 0;
}

Compile parent.cpp and child.cpp:

$ g++ ../parent.cpp -lpthread -o parent
$ g++ ../child_main.cpp -lpthread -o child

Run parent while at the same time grepping the status of child.

Run this command in one terminal:

$ for i in {1..10000}; do ps aux | grep child ; sleep .5; done

Run parent.

$ ./parent

Notice the output of the grep. After one second, the child goes defunct. The child process is receiving a SIGKILL because the parent thread dies after one second. It sleeps for one second after the fork. I would expect the child not to go defunct and to stay alive since the parent process is still alive (although the thread that forked is not anymore).

Remove the call to prctl in child.cpp and run the test again. Notice that the child does not go defunct.

I'm running Ubuntu 12.04 with the default installation. I have verified that this same problem occurs on Centos 4.

This man page seems to indicate that it should be called with the parent process dies, not the parent thread.http://www.kernel.org/doc/man-pages/online/pages/man2/prctl.2.html

PR_SET_PDEATHSIG (since Linux 2.1.57)
              Set the parent process death signal of the calling process to arg2
              (either a signal value in the range 1..maxsig, or 0 to clear).       This is the signal that the calling process will get when its parent dies.
              This value is cleared for the child of a fork(2).

Is there any way to make prctl send the signal with the parent PROCESS dies, and not the parent THREAD?
Comment 1 Filipe Brandenburger 2012-05-28 06:00:06 UTC
Created attachment 73437 [details]
Test case with parameters and signal handler

Hi,

I wrote a test case with some changes to make it easier to reproduce and watch the issue. Some of the changes are:
- print start/end events from execution so it can be traced without extra shell
- parameters to specify how long to sleep in main/thread/child
- parameter to decide whether main or thread spawns the child
- send SIGTERM instead of SIGKILL, add a handler to SIGTERM on child
- don't exec an external binary, just run code from same binary

I tried the test script under kernel 3.4.0 from Linus' git repository as of commit 1e2aec8... and this was the output:

[filbranden@rawhide bug43300]$ uname -a
Linux rawhide.lan 3.4.0+ #1 SMP Sun May 27 12:30:10 EDT 2012 i686 i686 i386 GNU/Linux
[filbranden@rawhide bug43300]$ ./bug43300 4 2 6 thread
0: main: pid=29386 self=0xb77ae900 ppid=29328
0: thread: pid=29386 self=0xb77adb40 ppid=29328
0: child: pid=29388 self=0xb77adb40 ppid=29386
2: thread: slept 2 seconds, exiting now.
2: child: received SIGTERM, exiting now.
4: main: slept 4 seconds, exiting now.

As you can see, "child" is receiving a SIGTERM at second 2 right after "thread" exits, which reproduces the bug clearly.

Cheers,
Fil
Comment 2 Filipe Brandenburger 2012-05-28 06:11:14 UTC
Created attachment 73438 [details]
Patch to forget_original_parent() on kernel/exit.c

This patch seems to fix the problem.

When a thread forks a child process it ends up in the "children" list of the task corresponding to the thread and not the thread group leader. When the thread dies, exit_notify() is called, which will end up calling forget_original_parent() in order to reset the parent of its children processes.

In cases where forget_original_parent() is called for a thread of a process which still has other threads running, then find_new_reaper() will pick another thread in that same process and will set it as the new reaper for the children processes.

By checking whether the new reaper is in a different thread group than the old father (by checking whether they tgid's match) we can detect whether this was just a single thread of a process that still has other running threads that exited, or if it was really the parent process that exited (and now the child is being inherited by "init".)

Here's a run of the test case under the patched kernel:

[filbranden@rawhide bug43300]$ uname -a
Linux rawhide.lan 3.4.0-bug43300+ #2 SMP Mon May 28 00:42:24 EDT 2012 i686 i686 i386 GNU/Linux
[filbranden@rawhide bug43300]$ ./bug43300 4 2 6 thread
0: main: pid=1458 self=0xb7727900 ppid=1400
0: thread: pid=1458 self=0xb7726b40 ppid=1400
0: child: pid=1460 self=0xb7726b40 ppid=1458
2: thread: slept 2 seconds, exiting now.
4: main: slept 4 seconds, exiting now.
4: child: received SIGTERM, exiting now.

Even though the child is being spawned by the thread, it's only receiving SIGTERM when the main thread exits.

I might be missing some corner cases here. I thought of the case of when either process is being ptrace'd, not sure how prctl should behave in that case, I still couldn't come up with a test case for that but will try to do that.

I also tried to think of whether tgid can change or still be different between threads of the same process. (What if the task with tid = tgid is exiting but the other threads are still running? Is that case even valid? Will the tgid be updated in that case?) I could not come up with a valid test case for this case either, I'll look deeper into it.

I hope you find this patch useful. I'm just getting started with contributing patches to the Linux kernel, I'll have another look at the corner cases I pointed out and will look into the process of having this submitted to the official kernel.

Cheers,
Fil
Comment 3 Filipe Brandenburger 2012-05-29 00:21:31 UTC
Created attachment 73450 [details]
Updated test case calling pthread_exit() from main thread

I updated the test case to include the possibility of calling pthread_exit() from the main thread, by doing that and specifying a longer sleep for thread than for main, it's possible that the process will still be alive while the main thread will not (it's in some kind of zombie state.)

It seems that the unpatched kernel seems to have the same issue on that specific case, main forks child but main calls pthread_exit() before thread is done:

[filbranden@rawhide bug43300]$ uname -a
Linux rawhide.lan 3.4.0+ #1 SMP Sun May 27 12:30:10 EDT 2012 i686 i686 i386
GNU/Linux
[filbranden@rawhide bug43300]$ ./bug43300 2 4 6 main_pthread_exit
0: main: pid=6175 self=0xb76626c0 ppid=14927
0: thread: pid=6175 self=0xb7661b70 ppid=14927
0: child: pid=6177 self=0xb76626c0 ppid=6175
2: main: slept 2 seconds, exiting now.
2: child: received SIGTERM, exiting now.
4: thread: slept 4 seconds, exiting now.


On the other hand, the patch seems to correctly take care of this case as well, as in that case the main thread will call forget_original_parent() but the new reaper will be the thread it started but both of them have the same tgid so the prctl signal will not trigger (yet):

[filbranden@rawhide bug43300]$ uname -a
Linux rawhide.lan 3.4.0-bug43300+ #2 SMP Mon May 28 00:42:24 EDT 2012 i686 i686 i386 GNU/Linux
[filbranden@rawhide bug43300]$ ./bug43300 2 4 6 main_pthread_exit
0: main: pid=11881 self=0xb77bd900 ppid=1893
0: child: pid=11883 self=0xb77bd900 ppid=11881
0: thread: pid=11881 self=0xb77bcb40 ppid=1893
2: main: slept 2 seconds, exiting now.
4: thread: slept 4 seconds, exiting now.
4: child: received SIGTERM, exiting now.


I also checked that the tgid doesn't change in that case, in the case above it will be 11881 which is the pid of the main thread. That's probably why the task_struct for it is kept around in that case and it goes into a zombie or zombie-like state as it isn't running any code. The parent will not collect it until all the other threads finish, so that should be the behavior of prctl as well.
Comment 4 Filipe Brandenburger 2012-05-29 00:29:07 UTC
I also tested the ptrace case, even though it wouldn't make sense for it to change anything as, although it messes with ->parent or ->real_parent (don't recall which), it will not change the ->children list which is the one that is traversed when checking whether the prctl signal must be sent.

I tried both cases, first running everything under strace:

[filbranden@rawhide bug43300]$ strace -f -o /tmp/s.txt ./bug43300 4 2 6 thread

And also starting the script on one window with some longer sleeps, then copying the pid of the child and pasting it on another window as a parameter to "strace -p", in order to have only the child ptrace'd but not the parent. In any of the two cases, it didn't change anything (whether on the patched or unpatched kernel.)

I'm fairly confident with the patch at this point. Two things:

1) Not 100% sure whether comparing ->tgid is the best, another alternative I looked into was comparing ->group_leader instead, not sure whether that would make any difference though. I'll try to check whether a similar comparison is done elsewhere in the kernel code and I'll try to be consistent with it if I find it.

2) I'd add an unlikely(...) to that if condition, since I believe both t->pdeath_signal being set and the case where the child was forked from a thread other than the main thread are unlikely to happen.

None of those two are biggies though and the patch seems to work fine as it is. I'll review the documentation on submitting patches, I'll turn it into a properly documented git commit and I'll submit it to the mailing list.

Any comments before I do that?

Thanks,
Fil
Comment 5 Filipe Brandenburger 2012-05-29 04:58:37 UTC
Created attachment 73453 [details]
git commit

Here's a git commit with a patch and commit message.

I found the same_thread_group() inline function which compares the ->tgid of two task_struct's so I used that function instead of just comparing the fields myself.

Patch was checked with scripts/checkpatch.pl

If there's no objections I'll send it to the maintainers tomorrow.

Thanks,
Fil
Comment 6 David Wilcox 2012-05-29 15:54:42 UTC
Filipe,

It sounds like you have a good handle on the bug and a fix for it. I have no additional comments.
Comment 7 Filipe Brandenburger 2012-06-07 04:03:47 UTC
Created attachment 73516 [details]
Test case illustrating pdeath_signal losing effect if thread exits before parent

Compile with:
$ gcc -o bug43300-test2 bug43300-test2.c -lpthread

To illustrate the issue, run:
$ ./bug43300-test2 4 6 2

The thread of the child process that sets prctl(PR_SET_PDEATHSIG) exits in 2 seconds, later on when the parent exits after 4 seconds the main thread doesn't receive SIGTERM as pdeath_signal was assigned only to the thread (that no longer exists.)

On the unpatched kernel:

$ ./bug43300-test2 4 6 2
0: main: pid=27488 self=0xb76ce6c0 ppid=27002
0: child: pid=27489 self=0xb76ce6c0 ppid=27488
0: child_thread: pid=27489 self=0xb76cdb70 ppid=27488
2: child_thread: slept 2 seconds, exiting now.
4: main: slept 4 seconds, exiting now.
6: child: slept 6 seconds, pthread_exiting now.

Notice there's no SIGTERM received here. The original patch doesn't fix this issue, the patch I'll attach soon (which moves pdeath_signal to signal_struct) does solve this issue.
Comment 8 Filipe Brandenburger 2012-06-07 04:06:09 UTC
Created attachment 73517 [details]
Test case illustrating prctl(PR_GET_PDEATHSIG) from a thread

Another test case, build with:
$ gcc -o bug43300-test4 bug43300-test4.c -lpthread

Run with just:
$ ./bug43300-test4

On the unpatched kernel, calling prctl(PR_GET_PDEATHSIG, &sig) from a thread of the main process returns 0 even though it was set on another thread before:

$ ./bug43300-test4 
main: setting prctl(PR_SET_PDEATHSIG, SIGTERM)
main: prctl(PR_GET_PDEATHSIG) = 15
thread: prctl(PR_GET_PDEATHSIG) = 0
child: prctl(PR_GET_PDEATHSIG) = 0

The original patch doesn't fix this issue, the new patch I'm about to attach does.
Comment 9 Filipe Brandenburger 2012-06-07 04:12:24 UTC
Created attachment 73518 [details]
Patch moving pdeath_signal from task_struct to signal_struct

As discussed with Oleg Nesterov, this patch moves pdeath_signal from task_struct to signal_struct which makes it behave a bit more sanely on cases where there are multi-threaded processes.

It passes the three test cases attached to this bug. Here is the output of running them on a kernel that includes the patch:

$ ./bug43300 4 2 6 thread
0: main: pid=1991 self=0xb778c900 ppid=1934
0: thread: pid=1991 self=0xb778bb40 ppid=1934
0: child: pid=1993 self=0xb778bb40 ppid=1991
2: thread: slept 2 seconds, exiting now.
4: main: slept 4 seconds, exiting now.
4: child: received SIGTERM, exiting now.

$ ./bug43300-test2 4 6 2
0: main: pid=2055 self=0xb7731900 ppid=1998
0: child: pid=2056 self=0xb7731900 ppid=2055
0: child_thread: pid=2056 self=0xb7730b40 ppid=2055
2: child_thread: slept 2 seconds, exiting now.
4: main: slept 4 seconds, exiting now.
4: child_thread: self=0xb7731900: received SIGTERM, exiting now.

$ ./bug43300-test4
main: setting prctl(PR_SET_PDEATHSIG, SIGTERM)
main: prctl(PR_GET_PDEATHSIG) = 15
thread: prctl(PR_GET_PDEATHSIG) = 15
child: prctl(PR_GET_PDEATHSIG) = 0
Comment 10 Adam H. Peterson 2012-06-07 18:26:17 UTC
This fix would be very helpful to one of the main projects we're working on at the company I work for (which I won't name, but you've heard of it).  Our current workaround is to kludge the parent process so that worker threads survive for the lifetime of the process (even if they have no more useful work to do), to prevent child processes from being signaled prematurely.

Here's a side thought about the behavior change.  Will this fix make it so that the signal will be delivered to the process instead of a specific thread?  I'm guessing yes.  For our project, that won't make a significant difference, because we're registering from the main thread for a SIGKILL (or a SIGUSR1 which logs and then raises a SIGKILL to itself) and any active thread would handle that the same way.  I'm guessing it won't matter to most other current users of PDEATH_SIGNAL because the interaction with threading is confusing enough that threaded applications have probably avoided using this facility.
Comment 11 Oleg Nesterov 2012-06-08 17:23:45 UTC
(In reply to comment #9)
?
> As discussed with Oleg Nesterov, this patch moves pdeath_signal from
> task_struct to signal_struct which makes it behave a bit more sanely on cases
> where there are multi-threaded processes.

Looks correct at first glance, please send it to lkml.

Not sure we can change this historical behaviour, this
was already discussed before. But we can try again.
Comment 12 Oleg Nesterov 2012-06-08 17:41:32 UTC
(In reply to comment #11)
>
> Not sure we can change this historical behaviour, this
> was already discussed before. But we can try again.

I've found one (but not the first) discussion about this
oddity, see http://marc.info/?l=linux-kernel&m=117622718413276
and the whole thread.
Comment 13 Filipe Brandenburger 2012-06-13 03:40:51 UTC
Updated patch sent to lkml, thread at:
http://thread.gmane.org/gmane.linux.kernel/1305164

David Wilcox and Adam Peterson, should I cc you on that thread so you can explain the reasons why you're for this change?

Thanks,
Filipe
Comment 14 Adam H. Peterson 2012-06-15 20:46:23 UTC
(In reply to comment #13)
> David Wilcox and Adam Peterson, should I cc you on that thread so you can
> explain the reasons why you're for this change?

I would be great with that.  I'll also go over them here:

In our project, this change would be valuable because we fork-exec worker processes as helpers to our main daemon, which is currently multithreaded.  We use the death signal so that when the main daemon exits, the workers will exit as well, since the work they are doing is useless in the absence of the main daemon.  Our main daemon happens to be multithreaded, but the worker processes do not care about the thread contexts within the main daemon; they service the daemon as a whole.  Some worker processes are started in a pool, but others are created on demand (e.g., in response to an Ice call).  Consequently, they are not always created by the main daemon's main thread.

Once a worker is created, it becomes part of the pool until the main daemon tells it to shut down (or until the main daemon itself ends, either via a shutdown sequence or a signal).  We use the death signal so that we can terminate the main daemon (whether through a normal shutdown, a TERM, or a KILL), and not have to run a second command to clean up the worker processes.  Cleaning up the worker processes manually becomes particularly tricky once a new instance of the main daemon has started and has created more workers.

Under the current behavior, we must either keep all threads alive after creation and put threads that have outlived their purpose to sleep if they might have forked a worker process, or we must forego using the death signal and find another way to clean up the process list when the daemon goes down.  Having the child process poll for the parent process to exit would be a partial solution, but would leave workers running for some time after their work is no longer needed.  This may leave the server in a state of high resource usage for a long time after the main daemon has been restarted (which is not an uncommon situation for the main daemon to be restarted).

Consequently, our current workaround is to prevent all threads of the main process from exiting so that the worker processes won't be signaled prematurely before the main daemon shuts down.

The documentation (at least what I have on my machines) describes the death signal in terms of processes, without mentioning threads, and I also feel that the process-oriented behavior meets a better need than the thread-oriented behavior.  If a process wishes to know when its parent thread dies, it is most likely being developed in conjunction with the parent process code (IMHO the threading state is an internal detail of a process), and it is fairly reasonable for the parent thread to maintain a list of processes to wait/signal before the thread exits.  However, a process receiving a SIGKILL has no chance to forward signals on to running children, and similarly it is (IME) inadvisable/impractical for the process to forward signals in response to SIGABRT or SIGSEGV.  If a process ends, its threads will also end, so the death signal would still be usable for fail-safe signalling of a thread's child processes --- if a thread is in a consistent, responsive state, it can manually signal children, and if it's in a corrupted state, the process is also corrupted and when it aborts, the child processes will then be signaled.
Comment 15 Adam H. Peterson 2012-06-15 20:58:09 UTC
Perhaps this call could me made to act per-thread when the parent is in the same thread-group and per-process if the parent is in a different process.  I believe this would meet both our needs and Albert Calahan's needs.

Alternatively, creating a new prctl setting that is process-oriented and fixing the documentation for the current one would also meet both our needs.  (In any event, the documentation will have to be fixed if the current behavior, in any form, is preserved.)
Comment 16 Filipe Brandenburger 2012-06-15 21:24:07 UTC
David and Adam, thanks a lot for chiming in on the mailing list discussion!
Comment 17 David Wilcox 2015-02-19 17:12:15 UTC
Has this been fixed in the most recent kernel? Can you please give a commit that fixes the problem? The documentation has not been changed to match current functionality.

http://man7.org/linux/man-pages/man2/prctl.2.html
Comment 18 Alan 2015-02-19 19:11:51 UTC
It was closed as OBSOLETE not fixed
Comment 19 David Wilcox 2015-02-19 19:48:11 UTC
Would you mind commenting on how the issue has become obsolete? Is prctl not supported anymore?
Comment 20 Alan 2015-02-19 19:52:41 UTC
Nobody has fixed, nobody cares enough to fix it in several years.
Comment 21 David Wilcox 2015-02-19 20:08:15 UTC
So, even though the documentation does not match actual behavior in supported modules, we're closing it because "nobody cares".

Wow.
Comment 22 Oleg Nesterov 2015-02-19 20:20:04 UTC
On 02/19, bugzilla-daemon@bugzilla.kernel.org wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=43300
>
> --- Comment #21 from David Wilcox <davidvsthegiant@gmail.com> ---
> So, even though the documentation does not match actual behavior in supported
> modules,

Perhaps we should fix the documentation...

> we're closing it because "nobody cares".
>
> Wow.

It is not that nobody cares. We discussed this several times. And yes,
the current behaviour looks just ugly. The problem is, unlikely we can
change it now, this can obviously break the applications which rely on
the fact that pdeath_signal is per-thread.
Comment 23 Oleg Nesterov 2015-02-19 20:26:40 UTC
(In reply to Oleg Nesterov from comment #22)
> 
> It is not that nobody cares. We discussed this several times. And yes,
> the current behaviour looks just ugly. The problem is, unlikely we can
> change it now, this can obviously break the applications which rely on
> the fact that pdeath_signal is per-thread.

Just for example, see http://marc.info/?l=linux-kernel&m=117622288008087
and the whole thread.

And this is not the first discussion. Perhaps my memory fools me, but
it seems to me that I too tried to propose this change many years ago
too, but it was nacked.
Comment 24 Alan 2015-02-19 21:47:01 UTC
Ok so lets move it to documentation if the docs are in error
Comment 25 Michael Kerrisk 2015-05-05 10:15:30 UTC
I've added the following text to prctl(2) under the description of PR_SET_PDEATHSIG:

              Warning:  the  "parent" in this case is considered to be
              the thread that created this process.  In  other  words,
              the  signal  will  be  sent  when that thread terminates
              (via, for example, pthread_exit(3)), rather  than  after
              all of the threads in the parent process terminate.

This suffices, I believe. If not, please reopen the bug.

Thanks,

Michael
Comment 26 Jürg Billeter 2017-09-09 10:37:47 UTC
For your information, I've submitted a patch that introduces PR_SET_PDEATHSIG_PROC to provide this feature without breaking backward compatibility. See https://lkml.org/lkml/2017/9/9/39
Comment 27 hansecke 2017-09-11 12:11:31 UTC
Amazing Jürg, thank you! I've been waiting for this for a long time. It's absolutely vital to making sure my child processes get cleaned up properly.

I'm an outsider to kernel development, but I could not help but notice that nobody has replied to your patch so far. Would it be out of line to send a reply on LKML "please review and merge, this is important"?
Comment 28 Jürg Billeter 2017-09-11 13:29:24 UTC
(In reply to hansecke from comment #27)
> I'm an outsider to kernel development, but I could not help but notice that
> nobody has replied to your patch so far. Would it be out of line to send a
> reply on LKML "please review and merge, this is important"?

It hasn't been long since I've submitted the patch, especially considering the weekend. Too early for a ping, in my opinion. You can reply to the mail if you have something to contribute to the discussion, e.g., a use case that isn't already clear from the patch description. Otherwise I would hold off on that.

BTW, my patch passes the three attached test cases as well (after s/PDEATHSIG/PDEATHSIG_PROC/). One intentional difference is that the setting is inherited across fork to allow killing a whole process subtree without race conditions. Thus, the output of the third test is:

# ./bug43300-test4      
main: setting prctl(PR_SET_PDEATHSIG_PROC, SIGTERM)
main: prctl(PR_GET_PDEATHSIG_PROC) = 15
thread: prctl(PR_GET_PDEATHSIG_PROC) = 15
child: prctl(PR_GET_PDEATHSIG_PROC) = 15
Comment 29 Oleg Nesterov 2017-09-11 17:03:25 UTC
On 09/11, bugzilla-daemon@bugzilla.kernel.org wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=43300
>
> --- Comment #28 from Jürg Billeter (j@bitron.ch) ---
> (In reply to hansecke from comment #27)
> > I'm an outsider to kernel development, but I could not help but notice that
> > nobody has replied to your patch so far. Would it be out of line to send a
> > reply on LKML "please review and merge, this is important"?
>
> It hasn't been long since I've submitted the patch, especially considering
> the
> weekend. Too early for a ping, in my opinion. You can reply to the mail if
> you
> have something to contribute to the discussion, e.g., a use case that isn't
> already clear from the patch description. Otherwise I would hold off on that.

FYI, I can't reply to that email because I am not subscribed to lkml and I was
not cc'ed ;)

I am looking at https://lkml.org/lkml/2017/9/9/39 and at first glance the patch
looks fine... except I do not understand the change in copy_signal().

Oleg.
Comment 30 Jürg Billeter 2017-09-11 17:24:32 UTC
(In reply to Oleg Nesterov from comment #29)
> FYI, I can't reply to that email because I am not subscribed to lkml and I
> was
> not cc'ed ;)

Forwarded as attachment.

> I am looking at https://lkml.org/lkml/2017/9/9/39 and at first glance the
> patch
> looks fine... except I do not understand the change in copy_signal().

Thanks for taking a look. copy_signal() is called on clone()/fork() and the effect of this change is that the setting gets inherited across fork. This allows using PDEATHSIG_PROC with SIGKILL to ensure that there are no stray descendants when the parent dies. If that's not desired, PDEATHSIG_PROC can be cleared after fork.

Combined with no_new_privs (to avoid setuid children) and a seccomp filter (to disallow further changes to PDEATHSIG_PROC) this can also be used for sandboxing.
Comment 31 Oleg Nesterov 2017-09-12 17:14:35 UTC
On 09/11, bugzilla-daemon@bugzilla.kernel.org wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=43300
>
> --- Comment #30 from Jürg Billeter (j@bitron.ch) ---
> (In reply to Oleg Nesterov from comment #29)
> > FYI, I can't reply to that email because I am not subscribed to lkml and I
> > was
> > not cc'ed ;)
>
> Forwarded as attachment.

Thanks,

> > I am looking at https://lkml.org/lkml/2017/9/9/39 and at first glance the
> > patch
> > looks fine... except I do not understand the change in copy_signal().
>
> Thanks for taking a look. copy_signal() is called on clone()/fork() and the
> effect of this change is that the setting gets inherited across fork.

This is clear, but I am not sure this is right... Lets continue this discussion
on lkml.

Oleg.
Comment 32 Jürg Billeter 2018-12-05 17:34:50 UTC
For those that are still interested, I've started another attempt, introducing PR_SET_KILL_DESCENDANTS_ON_EXIT:

https://lore.kernel.org/lkml/20181130080004.23635-1-j@bitron.ch/