Bug 117651
Summary: | Root NFS and autofs - mount disappears due to inode revalidate failed | ||
---|---|---|---|
Product: | File System | Reporter: | alexpro |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | bastienphilbert |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.1.15 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
log fragment with failed getattr
Test Patch v2 Test Patch v3 Handle EADDRNOTAVAIL on connection failure |
I've tracked the source of the problem: commit https://github.com/torvalds/linux/commit/402e23b4ed9ed81852b6c15b793fcf84ea91e491 That is, before this commit SO_REUSEPORT was not set (due to invalid argument) and everything worked fine. Did you actually test it by reverting that commit or bisecting to it. I am wondering as it seems highly unlikely that commit would me causing your issues. (In reply to bastienphilbert from comment #2) > Did you actually test it by reverting that commit or bisecting to it. I am > wondering as it seems highly unlikely that commit would me causing your > issues. Yes, I did bisection. Actually, there is one more thing to this bug: it reliably happens only with default autofs timeout of 300 sec. With smaller values it is much harder to reproduce. Still, I've just disabled call to xs_sock_set_reuseport currently and everything working fine even with 300 sec timeout. In the function you bisected to can you change: kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, (char *)&opt, sizeof(opt)); to: kernel_setsockopt(sock, SOL_REUSEADDR, SO_REUSEPORT, (char *)&opt, sizeof(opt)); and see if that helps with your current issue. (In reply to bastienphilbert from comment #4) > In the function you bisected to can you change: > kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, > (char *)&opt, sizeof(opt)); > to: > kernel_setsockopt(sock, SOL_REUSEADDR, SO_REUSEPORT, > (char *)&opt, sizeof(opt)); > and see if that helps with your current issue. SOL_REUSEADDR not defined anywhere in kernel, so I assume you mean SO_REUSEADDR option? With "kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEADDR" problem remains. See if the below patch fixes the issue or does something different. Created attachment 227041 [details]
Test Patch v2
(In reply to bastienphilbert from comment #7) > Created attachment 227041 [details] > Test Patch v2 Well, mount works, but possibly because setsockopt returns error -92 ENOPROTOOPT (I've added printk to check return). See if the below patch helps out. Created attachment 227181 [details]
Test Patch v3
Created attachment 227191 [details]
Handle EADDRNOTAVAIL on connection failure
Please could you see if the attached patch helps?
Thanks!
(In reply to Trond Myklebust from comment #11) > Created attachment 227191 [details] > Handle EADDRNOTAVAIL on connection failure > > Please could you see if the attached patch helps? > > Thanks! With this patch everything works! Patch from comment #10 didn't make any difference. Thanks for testing! I'll merge this patch upstream with a Cc:stable in the coming weeks. |
Created attachment 215321 [details] log fragment with failed getattr After upgrading from 3.14.12 to 4.1.15 I've got strange behaviour with autofs. It happens on diskless machine with root on NFS. There is autofs indirect mount on /storage, where NFS volumes are mounted on demand. Everything mounts normally, but after mount expiration (during unmount) autofs mount disappears - that is, not only /storage/<volume> unmounts, but /storage is also unmounted. This happens quite often - usually on the very first expiration. automount logs umount error in such cases. Without autofs, mount/umount of the same NFS volume works correctly. With NFS debug enabled, I've got following errors: NFS reply getattr: -512 nfs_revalidate_inode: (0:15/6291480) getattr failed, error=-512 NFS: nfs_lookup_revalidate(/storage) is invalid After inode revalidate failure all mounts on this inode are unmounted and so autofs mount disappears, while automount daemon itself continue to run. tcpdump shows, that server replies to getattr without error, but client doesn't see this reply, and returning error instead. I am ready to supply additional information.