Commit 296291cd ("mm: make sendfile(2) killable") has been causing docker#18180, which leads some processes to hang up in a weird zombie state when they are running on Docker with AUFS.
Commit 296291cd produces infinite -EINTR loop in mm/filemap.c:generic_perform_write, which fs/aufs/xino.c:do_xino_fwrite() cannot tolerate:
static ssize_t do_xino_fwrite(vfs_writef_t func, struct file *file, void *kbuf,
size_t size, loff_t *pos)
/* cannot escape from this loop
when func returns -EINTR infinitely! */
err = func(file, buf.u, size, pos);
} while (err == -EAGAIN || err == -EINTR);
As do_xino_fwrite() loops infinitely, zap_pid_ns_processes() (executed in another LWP) cannot return from schedule() when running on a single processor.
Although AUFS has not been merged into upstream, I think generic_perform_write() should keep its original behavior.
I made a Docker container for ease of debugging: https://github.com/AkihiroSuda/test18180/tree/v0.0.1
$ docker run -it --rm akihirosuda/test18180
[INFO] Checking whether hitting docker#18180.
<-- hangs up here with commit 296291cd
[INFO] OK. not hitting docker#18180.
[INFO] Checking whether sendfile(2) is killable.
[INFO] If the container hangs up here, you are still facing the bug that linux@296291cd tried to fix.
<-- hangs up here without commit 296291cd
[INFO] OK. sendfile(2) is killable.
<-- No kernel can reach here
AUFS is going to support commit 296291cd, so I suggest to close this ticket.