Bug 60563

Summary: FANOTIFY: On root squash NFS mounted file systems, unable mark directories or read from file descriptors without dropping privileges
Product: File System Reporter: Craig (iwonbigbro)
Component: NFSAssignee: Eric Paris (eparis)
Status: NEW ---    
Severity: normal CC: douglas.leeder, jlayton, Josephmitchell36, prajval02, szg00000, tim.pickersgill, trondmy
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.X Subsystem:
Regression: No Bisected commit-id:
Attachments: Test demonstrating drop privileges required for fanotify_mark() on root squash fs.
Revised test script.

Description Craig 2013-07-17 08:31:54 UTC
When calling fanotify_mark on an NFS root squash mounted fs, fanotify_mark fails because it doesn't have permission to access the mount point.  The workaround for this is trivial, since it involves marking the file system in a separate process with privileges dropped to match those of the mounted fs, returned by stat.

However, once you receive an fanotify event from this mark, the file descriptor returned is not readable/seekable, since NFS verifies the permissions of the accessing process and causes the system call to return EIO errors.  Again, a trivial work around would be to fork and drop privileges to the appropriate user and perform the IO operations in a separate process.  However, in a multi-threaded environment, this becomes dangerous and creates memory allocation deadlock scenarios, unless the forked process performs an exec.  Constantly forking and exec'ing in order to perform seeks and reads on the file descriptor would create extra processing overheads, resulting in performance degradation.  Alternatively, it requires managing IO processing daemons that are started up when an event is triggered, using sockets to communicate with the parent process.  This adds a lot of unnecessary complexity to handling IO.


To reproduce this issue as root, create a directory and a file: 
    mkdir /tmp/nfs_export
    date > /tmp/nfs_export/example 
    chown -R nobody:nogroup /tmp/nfs_export

Add this to /etc/exports:
    /tmp/nfs_export *(rw,async,root_squash)

Reload NFS kernel server: 
    /etc/init.d/nfs-kernel-server reload

Create a mount point and mount the file system:
    mkdir /tmp/nfs_mount
    mount localhost:/tmp/nfs_export /tmp/nfs_mount

Attempt to access file:
    cat /tmp/nfs_mount/example

Should get EACCES error.

Now just write or use a simple fanotify program to mark the /tmp/nfs_mount mount point and handle the event for an access attempt on /tmp/nfs_mount/example.  You will not be able to read from the file descriptor provided by the fanotify event, it will result in an EIO error.
Comment 1 Craig 2013-07-17 08:45:20 UTC
This is applicable to NFS protocol version 3
Comment 2 Craig 2013-07-17 09:34:54 UTC
For NFSv4, the issue is somewhat worse.  You can drop privileges to mark the mount point with fanotify_mark, but when poll signifies that an fanotify event is available on the fanotify file descriptor, reading from the fanotify file descriptor fails with EACCES, for events relating to files that are not accessible by a squashed root user.  With NFSv3, you can at least read the events from the fanotify file descriptor, you just can't read from the event's associated file descriptor.
Comment 3 Trond Myklebust 2013-07-17 13:07:57 UTC
Assigning this to the fanotify maintainers to decide if there is a bug
here. As far as the NFS filesystem is concerned, it is performing as per
design, and there is no bug.
Comment 4 Eric Paris 2013-07-17 15:43:53 UTC
I'm very confused.  You are chowning /tmp/nfs_export to nobody:nogroup but NFS is going to squash to nfsnobody    So your client doesn't have read permission on the directory you are trying to add a mark.

I just set up something similar, but chowned the server backing filesystem to the right uid and everything worked....
Comment 5 Craig 2013-07-17 18:33:40 UTC
I forgot to mention changing the mode to 0700 on the directory and file, or 0600 on the file to be strictly correct.

When you say the right uid, what uid are you referring to?  nobody:nogroup

I will setup a standalone script that I can reproduce the issue with here and send attach it...
Comment 6 Eric Paris 2013-07-17 18:38:38 UTC
I'm saying that root_squash is going to map to nfsnobody.  nfsnobody != nobody

So if the file/dir is 600/700 it is not accessible to root on the client, and this is expected...

I'm confused....
Comment 7 Craig 2013-07-17 19:22:21 UTC
Sorry.  Let me elaborate a little.

If you have an NFS export with root squash, owned by a user that exists on the client, but not root and that local user has read access, then there is potential for a malicious remote file to be copied to the local system.  Fanotify provides hooks to allow a given process to control this with the FAN_OPEN_PERM mode of operation.

So /tmp/nfs_export/example is owned by 'foo' with a mode of 0600 and /tmp/nfs_export is also owned by 'foo' with a mode of 0700.  When I call fanotify_mark on the /tmp/nfs_mount, in the on-access monitoring process, running as root, fanotify_mark returns -1 because NFS squashes access and prevents root access as you quite rightly pointed out.  I work around this by executing a subprocess that inherits the fanotify_fd and performs the fanotify_mark with dropped privileges as 'foo'.  Something root has always been able to do with root_squash, which is why I don't know why it squashes you to nfsnobody as opposed to the owner of the file (providing they exist locally).  Anyway, that's another issue, I am side tracking...

So now I have a mark I marked on a root owned fanotify_fd, using a subprocess I ran as 'foo'.  If I now read the /tmp/nfs_mount/example file as user 'foo':

sudo -u foo cat /tmp/nfs_mount/example

...I get an event in the fanotify process running as root, that has a file descriptor copied from kernel space to user space, for the file being accessed by foo.  However, the root running fanotify process gets an EIO error reading from this file descriptor, when determining if it is "safe".  Meanwhile, user 'foo' is blocked while the process decides whether to fail open or not.

This is again, because NFS squashes root access to nfsnobody.  Again, it is possible to work around this by forking yet another process to perform seeks and reads on the event file descriptor, running as 'foo'; 'foo' being determined by stat'ing the /proc/`event.pid`/fd/`event.fd` file and looking at the value of st_uid.

My issue is that forking is very cumbersome for on-access, granted it will increase the load average of the system for file operations like a recursive grep.  It is also not thread safe, so if you consider the fanotify process having 5 or more threads for parsing fanotify events and responding based on some evaluation, forking becomes very problematic and you can't just drop privileges in a given thread without first blocking the other threads.

This is also the case with GVFS mounted file systems, but I am more concerned that for NFS v4, the read operations on fanotify_fd fail with EACCES under these scenarios, so you can't even obtain the event struct.

I would expect the kernel to be able to allow root to obtain file owner privileges when constructing the fanotify event and creating the accessed file descriptor.  Without this, fanotify can't secure GVFS or NFS file system access.

I am part way through my test script, do you still want me to attach it?

Thanks
Craig
Comment 8 Eric Paris 2013-07-17 20:09:44 UTC
I guess I'm going to need to see the test script to figure out what you are really doing.

Seems like this is trivially fixed by using anon_uid=nobody in your exports instead of squashing root to have 0 access...
Comment 9 Craig 2013-07-17 21:41:26 UTC
Created attachment 106913 [details]
Test demonstrating drop privileges required for fanotify_mark() on root squash fs.

This exhibits the fanotify_mark() issue, but for some yet unknown reason, it doesn't exhibit the read failure on the event file descriptor.  I am sifting through the production code to see if I can identify how it differs from my stand-alone example.  The only obvious one is that my stand-alone example is single threaded, so could it be lock/mutex related?

The argument to the script is the owner of the NFS export directory and files, so you can specify some other user than the default of 'mail' that I have used.
Comment 10 Craig 2013-07-17 21:45:23 UTC
The script can also be used to demonstrate the NFS v4 issue.  Just change the vers=3 mount option to vers=4 and you get the following output:

fanotify_init(): Initialising...
fanotify_init(): Initialised on fd 3
fanotify_mark(): Marking (null)...
Error: 13: Permission denied
Okay, so let's do that again with dropped privileges...
(This wouldn't work in a multi-threaded environment without forking and executing another process, so let's do it like that)
Getting stat info: /tmp/fanotify_test.sh.lxwvvNN1/nfs_mount
Waiting for fanotify_mark() child process...
child: Setting real & effective uid: 8
Closing fanotify fd: 3
Child exited with exit status: 0
fanotify_mark(): Marked /tmp/fanotify_test.sh.lxwvvNN1/nfs_mount
loop: Waiting for events...
loop: Waiting for events...
loop: Waiting for events...
loop: Error: Failed to read from fanotify fd: 20: Not a directory
loop: Waiting for events...
Received SIGTERM, shutting down...
loop: Shutdown gracefully
Closing fanotify fd: 3
Comment 11 Craig 2013-07-17 21:46:45 UTC
Also, noticed a printf ordering bug in the form of Marking (null).  Feel free to fix that one :-)
Comment 12 Craig 2013-07-18 09:39:50 UTC
I have been able to reproduce the issue on NFS 3 with my stand-alone script.  It seems that on Kernel version 3.2.0-23, it works.  But on the 3.5.0-34 kernel I have, it doesn't work.  I will paste the output separately...
Comment 13 Craig 2013-07-18 09:40:53 UTC
Kernel version: 3.5.0-34-generic
Compiling /tmp/fanotify_test.sh.E55gudiy/fanotify_test.c...
Setting up directories...
Testing NFS export owned by mail...
drwxrwxrwx 4 root root 4096 Jul 18 10:37 /tmp/fanotify_test.sh.E55gudiy
drwx------ 2 mail root 4096 Jul 18 10:37 /tmp/fanotify_test.sh.E55gudiy/nfs_export
-rwx------ 1 mail root 29 Jul 18 10:37 /tmp/fanotify_test.sh.E55gudiy/nfs_export/example
-rw-r--r-- 1 root root 6521 Jul 18 10:37 /tmp/fanotify_test.sh.E55gudiy/fanotify_test.c
drwxrwxrwx 2 root root 4096 Jul 18 10:37 /tmp/fanotify_test.sh.E55gudiy/nfs_mount
-rwxr-xr-x 1 root root 13744 Jul 18 10:37 /tmp/fanotify_test.sh.E55gudiy/fanotify_test
-rw-r--r-- 1 root root 389 Jul 18 10:37 /tmp/fanotify_test.sh.E55gudiy/exports.bak
Configuring NFS server...
 * Re-exporting directories for NFS kernel daemon...                                                                                                                                                [ OK ] 
Mounting the NFS export to /tmp/fanotify_test.sh.E55gudiy/nfs_mount...
Starting fanotify handler as root...
Waiting for process to settle...
Attempting to access /tmp/fanotify_test.sh.E55gudiy/nfs_mount/example as nobody...
Thu Jul 18 10:37:56 BST 2013
=== Output from fanotify handler process ===
fanotify_init(): Initialising...
fanotify_init(): Initialised on fd 3
fanotify_mark(): Marking (null)...
Error: 13: Permission denied
Okay, so let's do that again with dropped privileges...
(This wouldn't work in a multi-threaded environment without forking and executing another process, so let's do it like that)
Getting stat info: /tmp/fanotify_test.sh.E55gudiy/nfs_mount
Waiting for fanotify_mark() child process...
child: Setting real & effective uid: 8
Closing fanotify fd: 3
Child exited with exit status: 0
fanotify_mark(): Marked /tmp/fanotify_test.sh.E55gudiy/nfs_mount
loop: Waiting for events...
loop: Waiting for events...
loop: Waiting for events...
loop: Read 24 bytes from fanotify fd
loop: Event file descriptor: 4
loop: Error: Readlink failed: 2: No such file or directory
loop: Allowing access to file: /proc/6595/fd/4
loop: Reading from event file descriptor...
loop: Error: Read failed: 5: Input/output error
loop: Waiting for events...
Received SIGTERM, shutting down...
loop: Shutdown gracefully
Closing fanotify fd: 3
=== End of output ===
Restoring exports from /tmp/fanotify_test.sh.E55gudiy...
 * Re-exporting directories for NFS kernel daemon...      



Note the EIO error trying to read from the event file descriptor.
Comment 14 Craig 2013-07-18 09:41:53 UTC
Kernel version: 3.2.0-23-generic
Compiling /tmp/fanotify_test.sh.MdFC1yq7/fanotify_test.c...
Setting up directories...
Testing NFS export owned by mail...
drwxrwxrwx 4 root root 4096 Jul 18 10:41 /tmp/fanotify_test.sh.MdFC1yq7
-rw-r--r-- 1 root root 389 Jul 18 10:41 /tmp/fanotify_test.sh.MdFC1yq7/exports.bak
drwxrwxrwx 2 root root 4096 Jul 18 10:41 /tmp/fanotify_test.sh.MdFC1yq7/nfs_mount
-rw-r--r-- 1 root root 6521 Jul 18 10:41 /tmp/fanotify_test.sh.MdFC1yq7/fanotify_test.c
-rwxr-xr-x 1 root root 12212 Jul 18 10:41 /tmp/fanotify_test.sh.MdFC1yq7/fanotify_test
drwx------ 2 mail root 4096 Jul 18 10:41 /tmp/fanotify_test.sh.MdFC1yq7/nfs_export
-rwx------ 1 mail root 29 Jul 18 10:41 /tmp/fanotify_test.sh.MdFC1yq7/nfs_export/example
Configuring NFS server...
 * Re-exporting directories for NFS kernel daemon...                                                                                                                                                [ OK ] 
Mounting the NFS export to /tmp/fanotify_test.sh.MdFC1yq7/nfs_mount...
Starting fanotify handler as root...
Waiting for process to settle...
Attempting to access /tmp/fanotify_test.sh.MdFC1yq7/nfs_mount/example as nobody...
Thu Jul 18 10:41:16 BST 2013
=== Output from fanotify handler process ===
fanotify_init(): Initialising...
fanotify_init(): Initialised on fd 3
fanotify_mark(): Marking (null)...
Error: 13: Permission denied
Okay, so let's do that again with dropped privileges...
(This wouldn't work in a multi-threaded environment without forking and executing another process, so let's do it like that)
Getting stat info: /tmp/fanotify_test.sh.MdFC1yq7/nfs_mount
Waiting for fanotify_mark() child process...
child: Setting real & effective uid: 8
Closing fanotify fd: 3
Child exited with exit status: 0
fanotify_mark(): Marked /tmp/fanotify_test.sh.MdFC1yq7/nfs_mount
loop: Waiting for events...
loop: Waiting for events...
loop: Waiting for events...
loop: Read 24 bytes from fanotify fd
loop: Event file descriptor: 4
loop: Error: Readlink failed: 2: No such file or directory
loop: Allowing access to file: /proc/2940/fd/4
loop: Reading from event file descriptor...
loop: Read 29 bytes from file: /proc/2940/fd/4
loop: Waiting for events...
Received SIGTERM, shutting down...
loop: Shutdown gracefully
Closing fanotify fd: 3
=== End of output ===
Restoring exports from /tmp/fanotify_test.sh.MdFC1yq7...
 * Re-exporting directories for NFS kernel daemon...       



Note on this version, it works fine and there is no EIO error.
Comment 15 Craig 2013-07-18 10:14:36 UTC
Kernel version: 3.8.0-26-generic
Compiling /tmp/fanotify_test.sh.eqXYHrki/fanotify_test.c...
Setting up directories...
Testing NFS export owned by mail...
drwxrwxrwx 4 root root 4096 Jul 18 11:14 /tmp/fanotify_test.sh.eqXYHrki
-rw-r--r-- 1 root root 389 Jul 18 11:14 /tmp/fanotify_test.sh.eqXYHrki/exports.bak
drwxrwxrwx 2 root root 4096 Jul 18 11:14 /tmp/fanotify_test.sh.eqXYHrki/nfs_mount
-rw-r--r-- 1 root root 6521 Jul 18 11:14 /tmp/fanotify_test.sh.eqXYHrki/fanotify_test.c
-rwxr-xr-x 1 root root 12212 Jul 18 11:14 /tmp/fanotify_test.sh.eqXYHrki/fanotify_test
drwx------ 2 mail root 4096 Jul 18 11:14 /tmp/fanotify_test.sh.eqXYHrki/nfs_export
-rwx------ 1 mail root 29 Jul 18 11:14 /tmp/fanotify_test.sh.eqXYHrki/nfs_export/example
Configuring NFS server...
 * Re-exporting directories for NFS kernel daemon...                                                                                                                                                [ OK ] 
Mounting the NFS export to /tmp/fanotify_test.sh.eqXYHrki/nfs_mount...
Starting fanotify handler as root...
Waiting for process to settle...
Attempting to access /tmp/fanotify_test.sh.eqXYHrki/nfs_mount/example as nobody...
Thu Jul 18 11:14:15 BST 2013
=== Output from fanotify handler process ===
fanotify_init(): Initialising...
fanotify_init(): Initialised on fd 3
fanotify_mark(): Marking /tmp/fanotify_test.sh.eqXYHrki/nfs_mount...
Error: 13: Permission denied
Okay, so let's do that again with dropped privileges...
(This wouldn't work in a multi-threaded environment without forking and executing another process, so let's do it like that)
Getting stat info: /tmp/fanotify_test.sh.eqXYHrki/nfs_mount
Waiting for fanotify_mark() child process...
child: Setting real & effective uid: 8
Closing fanotify fd: 3
Child exited with exit status: 0
fanotify_mark(): Marked /tmp/fanotify_test.sh.eqXYHrki/nfs_mount
loop: Waiting for events...
loop: Waiting for events...
loop: Waiting for events...
loop: Read 24 bytes from fanotify fd
loop: Event file descriptor: 4
loop: Error: Readlink failed: 2: No such file or directory
loop: Allowing access to file: /proc/1886/fd/4
loop: Reading from event file descriptor...
loop: Error: Read failed: 5: Input/output error
loop: Waiting for events...
Received SIGTERM, shutting down...
loop: Shutdown gracefully
Closing fanotify fd: 3
=== End of output ===
Restoring exports from /tmp/fanotify_test.sh.eqXYHrki...
 * Re-exporting directories for NFS kernel daemon...
Comment 16 Craig 2013-07-23 11:21:06 UTC
I have noticed on 32bit platforms and a 3.2 kernel, the EIO errors occur intermittently.  Below are two identical runs, several seconds apart.  The first run worked fine and was able to read from the file descriptor.  The second run failed and was not able to read from the file descriptor (EIO error).


The good run:
      
Platform: Linux somehost.somedomain 3.2.0-48-generic #74-Ubuntu SMP Thu Jun 6 19:45:16 UTC 2013 i686 i686 i386 GNU/Linux
Compiling /tmp/fanotify_test.sh.rTXTGfRG/fanotify_test.c...
Setting up directories...
Testing NFS export owned by mail...
drwxrwxrwx 4 root root 4096 Jul 23 12:17 /tmp/fanotify_test.sh.rTXTGfRG
-rw-r--r-- 1 root root 389 Jul 23 12:17 /tmp/fanotify_test.sh.rTXTGfRG/exports.bak
drwxrwxrwx 2 root root 4096 Jul 23 12:17 /tmp/fanotify_test.sh.rTXTGfRG/nfs_mount
-rw-r--r-- 1 root root 6521 Jul 23 12:17 /tmp/fanotify_test.sh.rTXTGfRG/fanotify_test.c
-rwxr-xr-x 1 root root 12212 Jul 23 12:17 /tmp/fanotify_test.sh.rTXTGfRG/fanotify_test
drwx------ 2 mail root 4096 Jul 23 12:17 /tmp/fanotify_test.sh.rTXTGfRG/nfs_export
-rwx------ 1 mail root 29 Jul 23 12:17 /tmp/fanotify_test.sh.rTXTGfRG/nfs_export/example
Configuring NFS server...
 * Re-exporting directories for NFS kernel daemon...                                                                                                                                                [ OK ] 
Mounting the NFS export to /tmp/fanotify_test.sh.rTXTGfRG/nfs_mount...
Starting fanotify handler as root...
Waiting for process to settle...
Attempting to access /tmp/fanotify_test.sh.rTXTGfRG/nfs_mount/example as nobody...
Tue Jul 23 12:17:37 BST 2013
=== Output from fanotify handler process ===
fanotify_init(): Initialising...
fanotify_init(): Initialised on fd 3
fanotify_mark(): Marking /tmp/fanotify_test.sh.rTXTGfRG/nfs_mount...
Error: 13: Permission denied
Okay, so let's do that again with dropped privileges...
(This wouldn't work in a multi-threaded environment without forking and executing another process, so let's do it like that)
Getting stat info: /tmp/fanotify_test.sh.rTXTGfRG/nfs_mount
Waiting for fanotify_mark() child process...
child: Setting real & effective uid: 8
Closing fanotify fd: 3
Child exited with exit status: 0
fanotify_mark(): Marked /tmp/fanotify_test.sh.rTXTGfRG/nfs_mount
loop: Waiting for events...
loop: Waiting for events...
loop: Waiting for events...
loop: Read 24 bytes from fanotify fd
loop: Event file descriptor: 4
loop: Error: Readlink failed: 2: No such file or directory
loop: Allowing access to file: /proc/2068/fd/4
loop: Reading from event file descriptor...
loop: Read 29 bytes from file: /proc/2068/fd/4
loop: Waiting for events...
Received SIGTERM, shutting down...
loop: Shutdown gracefully
Closing fanotify fd: 3
=== End of output ===
Restoring exports from /tmp/fanotify_test.sh.rTXTGfRG...
 * Re-exporting directories for NFS kernel daemon...          




The failed run:

Platform: Linux somehost.somedomain 3.2.0-48-generic #74-Ubuntu SMP Thu Jun 6 19:45:16 UTC 2013 i686 i686 i386 GNU/Linux
Compiling /tmp/fanotify_test.sh.H8b9ap4o/fanotify_test.c...
Setting up directories...
Testing NFS export owned by mail...
drwxrwxrwx 4 root root 4096 Jul 23 12:17 /tmp/fanotify_test.sh.H8b9ap4o
-rw-r--r-- 1 root root 389 Jul 23 12:17 /tmp/fanotify_test.sh.H8b9ap4o/exports.bak
drwxrwxrwx 2 root root 4096 Jul 23 12:17 /tmp/fanotify_test.sh.H8b9ap4o/nfs_mount
-rw-r--r-- 1 root root 6521 Jul 23 12:17 /tmp/fanotify_test.sh.H8b9ap4o/fanotify_test.c
-rwxr-xr-x 1 root root 12212 Jul 23 12:17 /tmp/fanotify_test.sh.H8b9ap4o/fanotify_test
drwx------ 2 mail root 4096 Jul 23 12:17 /tmp/fanotify_test.sh.H8b9ap4o/nfs_export
-rwx------ 1 mail root 29 Jul 23 12:17 /tmp/fanotify_test.sh.H8b9ap4o/nfs_export/example
Configuring NFS server...
 * Re-exporting directories for NFS kernel daemon...                                                                                                                                                [ OK ] 
Mounting the NFS export to /tmp/fanotify_test.sh.H8b9ap4o/nfs_mount...
Starting fanotify handler as root...
Waiting for process to settle...
Attempting to access /tmp/fanotify_test.sh.H8b9ap4o/nfs_mount/example as nobody...
Tue Jul 23 12:17:40 BST 2013
=== Output from fanotify handler process ===
fanotify_init(): Initialising...
fanotify_init(): Initialised on fd 3
fanotify_mark(): Marking /tmp/fanotify_test.sh.H8b9ap4o/nfs_mount...
Error: 13: Permission denied
Okay, so let's do that again with dropped privileges...
(This wouldn't work in a multi-threaded environment without forking and executing another process, so let's do it like that)
Getting stat info: /tmp/fanotify_test.sh.H8b9ap4o/nfs_mount
Waiting for fanotify_mark() child process...
child: Setting real & effective uid: 8
Closing fanotify fd: 3
Child exited with exit status: 0
fanotify_mark(): Marked /tmp/fanotify_test.sh.H8b9ap4o/nfs_mount
loop: Waiting for events...
loop: Waiting for events...
loop: Waiting for events...
loop: Read 24 bytes from fanotify fd
loop: Event file descriptor: 4
loop: Error: Readlink failed: 2: No such file or directory
loop: Allowing access to file: /proc/2148/fd/4
loop: Reading from event file descriptor...
loop: Error: Read failed: 5: Input/output error
loop: Waiting for events...
Received SIGTERM, shutting down...
loop: Shutdown gracefully
Closing fanotify fd: 3
=== End of output ===
Restoring exports from /tmp/fanotify_test.sh.H8b9ap4o...
 * Re-exporting directories for NFS kernel daemon...
Comment 17 Craig 2013-07-23 11:28:34 UTC
I ran the aformentioned tests with tcpdump running to capture the NFS packets.  As far as NFS is concerned, all three requests were served and we can see 29 bytes being transferred in all three dumps.  But the first two failed with EIO errors, while the last one was able to read 29 bytes from the event file descriptor.  Does this suggest that NFS is not preventing access?
Comment 18 Craig 2013-07-23 11:29:34 UTC
Created attachment 106995 [details]
Revised test script.

I have made some changes to the test script.
Comment 19 Craig 2013-07-24 09:59:28 UTC
I have eliminated the intermittency of the EIO error on NFSv3 and isolated it to whether a response has been processed or not.  In my sample program, I didn't want to unnecessarily block IO, so ensured I sent a response first (ALLOW).  In doing so, if this response is handled in time, reading from the event file descriptor does not cause an EIO error.  

However, I revised this decision and moved the sending of the response to after the attempts to read from the event file descriptor.  If I don't send a response, I consistently get EIO errors when reading from the event file descriptor (some sort of kernel file locking issue maybe?).

I also moved the sending of the response back to before the read and stuck a second sleep after the write() call to attempt synchronisation.  With the sleep in after sending the response, we get no EIO errors at all, but it does present the problem that it has sent a response before it has identified if what type to send.
Comment 20 Craig 2013-07-24 15:08:26 UTC
I enabled NFS debugging output:

rpcdebug -m nfs -s all
rpcdebug -m nfsd -s all

It seems that the originating process, after being unblocked by the ALLOW response, performs a flush and release on the inode.  This seems to allow the root fanotify process to read from the event file descriptor.

I guess this is where my input will stop, since I don't know enough about what is going on in kernel space.  I've had a glaze over the NFS codebase, but all I can see is an area of code that descends from the fh_verify() call, that ensures the current credentials NFSD_MAY_LOCK or ( NFSD_MAY_READ and NFSD_MAY_USER_OVERRIDE ).

The ball is in your park now; I hope my feedback has been useful.
Comment 21 Prajval 2014-01-30 17:35:28 UTC

>Seems like this is trivially fixed by using anon_uid=nobody in your exports
>>instead of squashing root to have 0 access...

Tried setting this option in the exports, this still does not allow the root fanotify process to read from the event file descriptor. Any further update on this would be useful.
Comment 22 Douglas Leeder 2014-05-19 08:46:16 UTC
We have bisected the NFSv4 issue (unable to receive fanotify events - just get "Not a directory" errors).

The bug originated at:

commit 1788ea6e3b2a58cf4fb00206e362d9caff8d86a7
Author: Jeff Layton <jlayton@redhat.com>
Date:   Fri Nov 4 13:31:21 2011 -0400

    nfs: when attempting to open a directory, fall back on normal lookup (try #5)
    
    commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return
    from an OPEN call. Prior to that patch, that caused the client to fall
    back to doing a normal lookup. When that patch went in, the code began
    returning that error to userspace. The d_revalidate codepath however
    never had the corresponding change, so it was still possible to end up
    with a NULL ctx->state pointer after that.
    
    That patch caused a regression. When we attempt to open a directory that
    does not have a cached dentry, that open now errors out with EISDIR. If
    you attempt the same open with a cached dentry, it will succeed.
    
    Fix this by reverting the change in nfs_atomic_lookup and allowing
    attempts to open directories to fall back to a normal lookup
    
    Also, add a NFSv4-specific f_ops->open routine that just returns
    -ENOTDIR. This should never be called if things are working properly,
    but if it ever is, then the dprintk may help in debugging.
    
    To facilitate this, a new file_operations field is also added to the
    nfs_rpc_ops struct.
    
    Cc: stable@kernel.org
    Signed-off-by: Jeff Layton <jlayton@redhat.com>
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


This fits with the errno of ENOTDIR that we are getting from the fanotify event read.
Comment 24 Douglas Leeder 2014-05-19 09:19:20 UTC
To clarify the 3.{1,2} kernels where the regression was introduced give ENOTDIR.

The current tip of Linus' tree gives 13 - Permission denied when reading the event.
Comment 25 Trond Myklebust 2014-05-19 12:28:51 UTC
So is fanotify trying to call dentry_open() on behalf of the filesystem in fs/notify/fanotify/fanotify_user.c:create_fd()? That would be a bug...
Comment 26 Douglas Leeder 2014-05-19 13:05:09 UTC
How should fanotify reopen the file, ideally skipping authentication checks?
Comment 27 Trond Myklebust 2014-05-19 13:20:12 UTC
Why does a notification system need to do this in the first place?

Open by dentry is race-prone in NFS: there is no guarantee that the file won't have been replaced on the server.

There is no way to skip authentication checks in NFS. Every RPC call that is sent is authenticated, and the server will check whether or not that user has permission to perform that particular operation at this time.
Comment 28 Douglas Leeder 2014-05-19 13:30:16 UTC
fanotify passes the fd to the user-space process to do content-based access control.

See: http://lwn.net/Articles/339253/

How about passing the credentials from a different process? create_fd is called within the context of the fanotify process, but needs to use the credentials of the original process.
Comment 29 Trond Myklebust 2014-05-19 22:04:43 UTC
If you really must run this on a NFS client (rather than on the server), then I suggest creating a proper callback into the filesystem. Trying to hack it by directly calling VFS filesystem helper functions such as dentry_open() isn't going to be supported.