Bug 197069 - systemd service with ProtectHome=yes causes ELOOP when accessing /home
Summary: systemd service with ProtectHome=yes causes ELOOP when accessing /home
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-28 20:17 UTC by Jack
Modified: 2017-10-01 00:37 UTC (History)
2 users (show)

See Also:
Kernel Version: kernel-lt 4.4 and 4.9 series
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Jack 2017-09-28 20:17:05 UTC
Description of problem:
Having ProtectHome=yes in any service file causes a symlinked or autofs mounted /home directory to return ELOOP
ls: cannot open directory /home: Too many levels of symbolic links

Version-Release number of selected component (if applicable):
Red Hat 7.0+

How reproducible:
1. Start a service with ProtectHome=yes
2. Start autofs with /home automounted
3. Access to /home returns ELOOP
ls: cannot open directory /home: Too many levels of symbolic links

Additional info:
This Fedora bug report describes the issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1444223
Comment 1 Theodore Tso 2017-09-30 00:34:43 UTC
This has nothing to do with the kernel or a file system.  It's a systemd bug.
Comment 2 Jack 2017-09-30 01:04:11 UTC
According to CentOS, they say it's not a systemd bug https://bugs.centos.org/view.php?id=13927
Comment 3 Jack 2017-09-30 01:07:48 UTC
Before this commit into the -lt branch, autofs and symlinked /home worked fine. After that commit, this issue appeared.

commit 839d42687dfce0ed0ea2c6bd8d707cc0e276fbe7
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Fri Jan 20 18:28:35 2017 +1300

    mnt: Tuck mounts under others instead of creating shadow/side mounts.
    
    commit 1064f874abc0d05eeed8993815f584d847b72486 upstream.
    
    Ever since mount propagation was introduced in cases where a mount in
    propagated to parent mount mountpoint pair that is already in use the
    code has placed the new mount behind the old mount in the mount hash
    table.
    
    This implementation detail is problematic as it allows creating
    arbitrary length mount hash chains.
    
    Furthermore it invalidates the constraint maintained elsewhere in the
    mount code that a parent mount and a mountpoint pair will have exactly
    one mount upon them.  Making it hard to deal with and to talk about
    this special case in the mount code.
    
    Modify mount propagation to notice when there is already a mount at
    the parent mount and mountpoint where a new mount is propagating to
    and place that preexisting mount on top of the new mount.
    
    Modify unmount propagation to notice when a mount that is being
    unmounted has another mount on top of it (and no other children), and
    to replace the unmounted mount with the mount on top of it.
    
    Move the MNT_UMUONT test from __lookup_mnt_last into
    __propagate_umount as that is the only call of __lookup_mnt_last where
    MNT_UMOUNT may be set on any mount visible in the mount hash table.
    
    These modifications allow:
     - __lookup_mnt_last to be removed.
     - attach_shadows to be renamed __attach_mnt and its shadow
       handling to be removed.
     - commit_tree to be simplified
     - copy_tree to be simplified
    
    The result is an easier to understand tree of mounts that does not
    allow creation of arbitrary length hash chains in the mount hash table.
    
    The result is also a very slight userspace visible difference in semantics.
    The following two cases now behave identically, where before order
    mattered:
    
    case 1: (explicit user action)
            B is a slave of A
            mount something on A/a , it will propagate to B/a
            and than mount something on B/a
    
    case 2: (tucked mount)
            B is a slave of A
            mount something on B/a
            and than mount something on A/a
    
    Histroically umount A/a would fail in case 1 and succeed in case 2.
    Now umount A/a succeeds in both configurations.
    
    This very small change in semantics appears if anything to be a bug
    fix to me and my survey of userspace leads me to believe that no programs
    will notice or care of this subtle semantic change.
    
    v2: Updated to mnt_change_mountpoint to not call dput or mntput
    and instead to decrement the counts directly.  It is guaranteed
    that there will be other references when mnt_change_mountpoint is
    called so this is safe.
    
    v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt
        As the locking in fs/namespace.c changed between v2 and v3.
    
    v4: Reworked the logic in propagate_mount_busy and __propagate_umount
        that detects when a mount completely covers another mount.
    
    v5: Removed unnecessary tests whose result is alwasy true in
        find_topper and attach_recursive_mnt.
    
    v6: Document the user space visible semantic difference.
    
    Fixes: b90fa9ae8f51 ("[PATCH] shared mount handling: bind and rbind")
    Tested-by: Andrei Vagin <avagin@virtuozzo.com>
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Comment 4 Theodore Tso 2017-09-30 05:28:25 UTC
According to Centos, it's not a bug in the Centos kernel.  That's why they closed it.

And it's not an ext4 bug, which is why I'm closing it here.

Opening bugs in random bug trackers is not going to help you win friends, and it's certainly not going to help you get it anyone to look at it.
Comment 5 Jack 2017-09-30 07:40:36 UTC
Your opinion that I made a mistake in opening bug reports in random bug trackers and trying to win friends in a place where I'm sure it wouldn't help either way is very condescending. I'm reporting this bug because this bug happens in the kernel series 4.4 and 4.9 lt series after the commit I had mentioned previously. The bug does not happen in the same 4.4 and 4.9 series before the commit.

Instead of closing my issue, how about helping me find the correct place to report it? Obviously I don't know where I should file it under because I seem to have posted it in a random place.
Comment 6 Theodore Tso 2017-10-01 00:37:13 UTC
The Fedora bug report you referenced indicates that it is fixed by installing systemd 232 or above.

I suggest you post a bugfix in bug tracker system used by your distribution.   And if their answer is they don't support newer kernels on an older / obsolete distribution, then that's your answer.

You indicated you are using Red Hat 7.0.  I'm going to assume that's RHEL 7.0, which is based on a 3.10 kernel, and a systemd from 3+ years ago.   If things fall apart when you try installing a newer kernel, that won't be the first time RHEL has compatibility problems with newer kernels.  If you are using RHEL 7, that generally means you value stability more than you do new features.  If you want to be using a newer kernel, you should try upgrading to a more modern distribution *first*.

If you are indeed using RHEL, then the place to ask is the Red Hat Support.  I can tell you that using a 4.4 or 4.9 kernel is not a supported kernel, so it may very well be that the answer is that there is no right place, and what you are doing is just not anything anyone is interested in supporting.  If you want to pay someone enough money, or you want to retain your own Linux experts, I'm sure they might be able to make it work --- as the old NASA saying goes, "anything will fly if you give it enough thrust" --- but it's highly likely no one is going to be interested in supporting you for free.

Note You need to log in before you can comment on or make changes to this bug.