Bug 197887

Summary: NFS on heavy load - shares lost permissions sporadically
Product: File System Reporter: fireon (linux)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: NEW ---    
Severity: normal CC: bfields, daduke
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.13.4-1 Subsystem:
Regression: No Bisected commit-id:

Description fireon 2017-11-15 20:45:09 UTC
Hello, 

if you using NFSshare and work with grouppermissions they will be lost sporadically if NFS is under heavy load by coping files. 

Reproduce: 

add a NFSshare, add for example 5 folders with grouppermissions like:

rwxrwx---  root  docs  /myshare
rwxrwx---  root  video /myvideoshare
...

The user is member of the groups. Then copy about 8GB from one share to the other. Or copy files in the share in another folder. Then the permissions are gone. You can't enter, you can't read anymore. 
Tested on different hardware on different situations, same with NFS3/4 with and without acl's. 

And the problem is userspecific. If i copy some files and i get permission denied, then every other user has still the right permissions. After they copy some files too over the nfsshares they have the same problem. 

This is working fine here with kernel 4.10.17-5 Ubuntukernel. I can't tested it on the fast with another version from 4.13. because using here ZFS. 

Thanks a lot
Best Regards
Fireon
Comment 1 fireon 2017-11-15 20:46:56 UTC
Addendum: To fix this temporaly, you have to restart the NFSservice and remount on the client.
Comment 2 Christian Herzog 2017-12-18 12:54:18 UTC
Hi,

we can confirm this bug. Since we upgraded our file server kernel to 4.13.0.19.26 we've been seeing random permission denied errors on NFS too. Back to 4.10.0.42.44 and the problem is gone.

Please let us know if we can help debug.

thanks,
-Christian




--
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
IT Services Group, HPT H 8                    voice: +41 44 633 39 50
Department of Physics, ETH Zurich
8093 Zurich, Switzerland                     http://nic.phys.ethz.ch/
Comment 3 bfields 2017-12-19 16:10:47 UTC
This sounds like the bug fixed by bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 "kernel: make groups_sort calling a responsibility group_info allocators", though I'd assumed that was a long-standing bug, not a recent regression.  Are you running rpc.mountd with -g/--manage-gids?
Comment 4 bfields 2017-12-19 16:15:42 UTC
OK, I see, the regression was probably introduced by b7b2562f7252878e18de60c24f320052076f9de8 "kernel/groups.c: use sort library function", which first appeared in 4.13.  Previously the function used to sort group lists was a no-op in the case of a list that was already sorted.  (I wonder whether userspace was passing down already-sorted group lists?)
Comment 5 Christian Herzog 2017-12-20 06:36:35 UTC
while I can't answer your questions, we'd be happy to run tests on our dev system.

thanks,
-Christian
Comment 6 bfields 2017-12-20 15:14:03 UTC
Your options are probably:

- stay on a pre-4.13 kernel until you have a 4.15-based kernel, or
- turn off the -g/--manage-gids rpc.mountd option (I don't know where that's configured on Ubuntu), or
- apply the "make groups_sort calling..." patch to your kernel manually.

And if any of those work that'd probably be enough confirmation that the bug is what I think it is.
Comment 7 fireon 2017-12-20 21:25:32 UTC
> turn off the -g/--manage-gids rpc.mountd option
I check this... 

Thanks!
Comment 8 Christian Herzog 2017-12-21 06:11:15 UTC
(In reply to bfields from comment #6)
> Your options are probably:
> 
> - stay on a pre-4.13 kernel until you have a 4.15-based kernel, or
> - turn off the -g/--manage-gids rpc.mountd option (I don't know where that's
> configured on Ubuntu), or
> - apply the "make groups_sort calling..." patch to your kernel manually.
> 
> And if any of those work that'd probably be enough confirmation that the bug
> is what I think it is.

4.15 will be fine? Waiting then is the easiest option, but we might give the other two a shot...

thanks,
-Christian
Comment 9 fireon 2017-12-29 09:28:49 UTC
4.15 will be fine? Waiting too for this version, because didn't find out where i had to disable rpc.mountd. Strange.