Commit 296291cd ("mm: make sendfile(2) killable") has been causing docker#18180, which leads some processes to hang up in a weird zombie state when they are running on Docker with AUFS. https://github.com/torvalds/linux/commit/296291cd https://github.com/docker/docker/issues/18180#issuecomment-167042078 Commit 296291cd produces infinite -EINTR loop in mm/filemap.c:generic_perform_write, which fs/aufs/xino.c:do_xino_fwrite() cannot tolerate: static ssize_t do_xino_fwrite(vfs_writef_t func, struct file *file, void *kbuf, size_t size, loff_t *pos) { .. do { /* cannot escape from this loop when func returns -EINTR infinitely! */ err = func(file, buf.u, size, pos); } while (err == -EAGAIN || err == -EINTR); .. } As do_xino_fwrite() loops infinitely, zap_pid_ns_processes() (executed in another LWP) cannot return from schedule() when running on a single processor. Although AUFS has not been merged into upstream, I think generic_perform_write() should keep its original behavior. I made a Docker container for ease of debugging: https://github.com/AkihiroSuda/test18180/tree/v0.0.1 $ docker run -it --rm akihirosuda/test18180 [INFO] Checking whether hitting docker#18180. <-- hangs up here with commit 296291cd [INFO] OK. not hitting docker#18180. [INFO] Checking whether sendfile(2) is killable. [INFO] If the container hangs up here, you are still facing the bug that linux@296291cd tried to fix. <-- hangs up here without commit 296291cd [INFO] OK. sendfile(2) is killable. <-- No kernel can reach here
AUFS is going to support commit 296291cd, so I suggest to close this ticket. http://article.gmane.org/gmane.linux.file-systems.aufs.user/5343