Bug 117651 - Root NFS and autofs - mount disappears due to inode revalidate failed
Summary: Root NFS and autofs - mount disappears due to inode revalidate failed
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-05 11:21 UTC by alexpro
Modified: 2016-08-02 13:44 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.1.15
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
log fragment with failed getattr (3.69 KB, application/octet-stream)
2016-05-05 11:21 UTC, alexpro
Details
Test Patch v2 (1.19 KB, patch)
2016-07-31 20:46 UTC, [account disabled by administrator]
Details | Diff
Test Patch v3 (807 bytes, patch)
2016-08-01 15:16 UTC, [account disabled by administrator]
Details | Diff
Handle EADDRNOTAVAIL on connection failure (1.28 KB, patch)
2016-08-01 19:15 UTC, Trond Myklebust
Details | Diff

Description alexpro 2016-05-05 11:21:37 UTC
Created attachment 215321 [details]
log fragment with failed getattr

After upgrading from 3.14.12 to 4.1.15 I've got strange behaviour with autofs. 

It happens on diskless machine with root on NFS. There is autofs indirect mount on /storage, where NFS volumes are mounted on demand.

Everything mounts normally, but after mount expiration
(during unmount) autofs mount disappears - that is, not only /storage/<volume>
unmounts, but /storage is also unmounted. This happens quite often - usually on the very first expiration.

automount logs umount error in such cases.

Without autofs, mount/umount of the same NFS volume works correctly. 

With NFS debug enabled, I've got following errors:

NFS reply getattr: -512
nfs_revalidate_inode: (0:15/6291480) getattr failed, error=-512
NFS: nfs_lookup_revalidate(/storage) is invalid

After inode revalidate failure all mounts on this inode are unmounted and so autofs mount disappears, while automount daemon itself continue to run.

tcpdump shows, that server replies to getattr without error, but client
doesn't see this reply, and returning error instead.


I am ready to supply additional information.
Comment 1 alexpro 2016-05-06 17:23:11 UTC
I've tracked the source of the problem: commit https://github.com/torvalds/linux/commit/402e23b4ed9ed81852b6c15b793fcf84ea91e491

That is, before this commit SO_REUSEPORT was not set (due to invalid argument)
and everything worked fine.
Comment 2 [account disabled by administrator] 2016-07-23 03:32:56 UTC
Did you actually test it by reverting that commit or bisecting to it. I am wondering as it seems highly unlikely that commit would me causing your issues.
Comment 3 alexpro 2016-07-27 08:00:52 UTC
(In reply to bastienphilbert from comment #2)
> Did you actually test it by reverting that commit or bisecting to it. I am
> wondering as it seems highly unlikely that commit would me causing your
> issues.

Yes, I did bisection.
Actually, there is one more thing to this bug: it reliably happens only with default autofs timeout of 300 sec. With smaller values it is much harder to reproduce.

Still, I've just disabled call to xs_sock_set_reuseport currently and everything working fine even with 300 sec timeout.
Comment 4 [account disabled by administrator] 2016-07-27 15:32:08 UTC
In the function you bisected to can you change:
kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEPORT,
		(char *)&opt, sizeof(opt));
to:
kernel_setsockopt(sock, SOL_REUSEADDR, SO_REUSEPORT,
		(char *)&opt, sizeof(opt));
and see if that helps with your current issue.
Comment 5 alexpro 2016-07-29 11:25:08 UTC
(In reply to bastienphilbert from comment #4)
> In the function you bisected to can you change:
> kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEPORT,
>               (char *)&opt, sizeof(opt));
> to:
> kernel_setsockopt(sock, SOL_REUSEADDR, SO_REUSEPORT,
>               (char *)&opt, sizeof(opt));
> and see if that helps with your current issue.

SOL_REUSEADDR not defined anywhere in kernel, so I assume you mean SO_REUSEADDR option?

With "kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEADDR"
problem remains.
Comment 6 [account disabled by administrator] 2016-07-31 20:46:27 UTC
See if the below patch fixes the issue or does something different.
Comment 7 [account disabled by administrator] 2016-07-31 20:46:42 UTC
Created attachment 227041 [details]
Test Patch v2
Comment 8 alexpro 2016-08-01 06:30:44 UTC
(In reply to bastienphilbert from comment #7)
> Created attachment 227041 [details]
> Test Patch v2

Well, mount works, but possibly because setsockopt returns error -92 ENOPROTOOPT
(I've added printk to check return).
Comment 9 [account disabled by administrator] 2016-08-01 15:16:35 UTC
See if the below patch helps out.
Comment 10 [account disabled by administrator] 2016-08-01 15:16:52 UTC
Created attachment 227181 [details]
Test Patch v3
Comment 11 Trond Myklebust 2016-08-01 19:15:34 UTC
Created attachment 227191 [details]
Handle EADDRNOTAVAIL on connection failure

Please could you see if the attached patch helps?

Thanks!
Comment 12 alexpro 2016-08-02 08:28:37 UTC
(In reply to Trond Myklebust from comment #11)
> Created attachment 227191 [details]
> Handle EADDRNOTAVAIL on connection failure
> 
> Please could you see if the attached patch helps?
> 
> Thanks!

With this patch everything works!

Patch from comment #10 didn't make any difference.
Comment 13 Trond Myklebust 2016-08-02 13:44:17 UTC
Thanks for testing! I'll merge this patch upstream with a Cc:stable in the coming weeks.

Note You need to log in before you can comment on or make changes to this bug.