Bug 109971 (docker-18180) - Regression in commit 296291cd (mm/filemap.c): Docker hangs up
Summary: Regression in commit 296291cd (mm/filemap.c): Docker hangs up
Status: RESOLVED MOVED
Alias: docker-18180
Product: File System
Classification: Unclassified
Component: VFS (show other bugs)
Hardware: All Linux
: P1 high
Assignee: fs_vfs
URL: https://github.com/docker/docker/issu...
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-25 04:38 UTC by Akihiro Suda
Modified: 2016-01-10 14:37 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.1.13
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Akihiro Suda 2015-12-25 04:38:43 UTC
Commit 296291cd ("mm: make sendfile(2) killable") has been causing docker#18180, which leads some processes to hang up in a weird zombie state when they are running on Docker with AUFS.

https://github.com/torvalds/linux/commit/296291cd
https://github.com/docker/docker/issues/18180#issuecomment-167042078

Commit 296291cd produces infinite -EINTR loop in mm/filemap.c:generic_perform_write, which fs/aufs/xino.c:do_xino_fwrite() cannot tolerate:

    static ssize_t do_xino_fwrite(vfs_writef_t func, struct file *file, void *kbuf,
                      size_t size, loff_t *pos)
    {
    ..
        do {
             /* cannot escape from this loop 
                when func returns -EINTR infinitely! */
            err = func(file, buf.u, size, pos);
        } while (err == -EAGAIN || err == -EINTR);
    ..
    }
    
As do_xino_fwrite() loops infinitely, zap_pid_ns_processes() (executed in another LWP) cannot return from schedule() when running on a single processor.

Although AUFS has not been merged into upstream, I think generic_perform_write() should keep its original behavior.


I made a Docker container for ease of debugging: https://github.com/AkihiroSuda/test18180/tree/v0.0.1

    $ docker run -it --rm akihirosuda/test18180
    [INFO] Checking whether hitting docker#18180.
    <-- hangs up here with commit 296291cd
    [INFO] OK. not hitting docker#18180.
    [INFO] Checking whether sendfile(2) is killable.
    [INFO] If the container hangs up here, you are still facing the bug that linux@296291cd tried to fix.
    <-- hangs up here without commit 296291cd
    [INFO] OK. sendfile(2) is killable.
    <-- No kernel can reach here
Comment 1 Akihiro Suda 2015-12-28 17:14:07 UTC
AUFS is going to support commit 296291cd, so I suggest to close this ticket.
http://article.gmane.org/gmane.linux.file-systems.aufs.user/5343

Note You need to log in before you can comment on or make changes to this bug.