Bug 14501

Summary: NFSv4 mount: Open,fork,Lock,Read: Read denied because it uses the OPEN state instead of the LOCK state
Product: File System Reporter: Jean-Louis ROCHETTE (rochette_jean-louis)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: normal CC: fleite, jcavallaro, jlayton
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29.4-167.fc11.i686 Subsystem:
Regression: No Bisected commit-id:
Attachments: Source code to reproduce
NFSv4: Ensure that we track the NFSv4 lock state in read/write requests.

Description Jean-Louis ROCHETTE 2009-10-28 16:27:04 UTC
Created attachment 23561 [details]
Source code to reproduce

When a single process does open(), fcntl(F_SETLK), read(), 
the read operation uses the LOCK state and succeeds.

However, if the process opens the file and then forks a child process, and the child process locks the file and reads it, the read operation is done with the OPEN state, and the server denies the file access.

Attached is the source code I used, with comments for the configuration
and network trace.

Thanks for your help on this issue!
Jean-Louis.
Comment 1 Jean-Louis ROCHETTE 2009-10-28 16:29:05 UTC
I omitted to say that it is OK with Solaris 5.11.
Comment 2 Jean-Louis ROCHETTE 2009-11-06 10:52:54 UTC
Hi Trond,

did you have a chance to look at this issue?
Our customer is waiting on it.
Is there any workaround?

Thanks a lot!
Jean-Louis.
Comment 3 Trond Myklebust 2009-11-07 20:55:34 UTC
Sorry. I've been travelling, and so haven't been too responsive.

Yes, this is definitely a bug. It is basically due to the fact that we do
not keep a record of the lockowner in the struct nfs_page (that tracks dirty
regions on a page).
I'm still pondering how to fix this. The thing causing me to hesitate is
that I want to avoid further growing the size of struct nfs_page if possible.

I am working on the issue, though. A patch will be forthcoming.
Comment 4 Fabio Olive Leite 2009-11-09 17:05:03 UTC
Trond,

Is this caused by nfs_readpage_async, when calling nfs_create_request, only having access to nfs_open_context? In this case, why would the open context be uptodate if all operations are carried out in the same process, and not if the child does the lock and read but not the open?

Or am I completely off? :)

Cheers,
Fábio Olivé
Comment 5 Trond Myklebust 2009-11-10 09:53:37 UTC
No. It is rather because the nfs_create_request fails to record the posix
lock owner (i.e. the value of current->files), and so when we get round
to actually writing out the data to the server, we have lost track of
the correct lock stateid to use.
Comment 6 Juan J. Cavallaro 2009-11-30 13:58:26 UTC
Hi Trond, I am definitely interested in this fix, do you have any news regarding this? Thanks! :)
Comment 7 Jeff Layton 2010-03-22 19:12:41 UTC
Trond, any more progress on the patch for this?
Comment 8 Trond Myklebust 2010-03-23 23:03:31 UTC
I've got a plan for how to proceed.

I want to have 1 struct nfs_open_context per lockowner in the case where the application uses locking. That way we do not have to increase the size of each individual struct nfs_page.

Long term, it will also allow us various cleanups w.r.t. the NFSv4 locking code,
and provide a vehicle for adding the RELEASE_LOCKOWNER-on-close (which various server implementors are asking for).

I'm still not done coding it, though. Could people please stop reporting other bugs while I do? :-) :-) :-)
Comment 9 Juan J. Cavallaro 2010-06-09 14:55:12 UTC
Hi Trond! Have you done any progress regarding this issue? Thanks!
Comment 10 Trond Myklebust 2010-06-30 21:15:55 UTC
Created attachment 26983 [details]
NFSv4: Ensure that we track the NFSv4 lock state in read/write requests.

The attached patch adds an extra field to the struct nfs_page in order to track the lock owner for each change. Can you please check if it suffices to fix the bug?
Comment 11 Jean-Louis ROCHETTE 2010-08-12 08:38:18 UTC
Congratulations, Trond!
This issue is now fixed.
I let you deal with the status of this Buzilla entry, since I'm not sure how to deal with it.
Thanks!
Jean-Louis.

Retried the test on non patched Linux machine, reproduced the error:
--------------------------------------------------------------------
jlr@newlnxjlr(55) uname -a
Linux newlnxjlr 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:37:32 EDT 2006 i686 i686 i386 GNU/Linux

jlr@newlnxjlr(53) g++ lock_read.cxx -o lock_read
jlr@newlnxjlr(54) lock_read /mnt/ns801s2_v4/testfile.0
Waiting for child 0 to exit
Error while reading [Input/output error] iteration=0, pid=30977
Waiting for child 1 to exit
Error while reading [Input/output error] iteration=0, pid=30979
Error while reading [Input/output error] iteration=0, pid=30978
Waiting for child 2 to exit
Exiting parent

Installed the patch on a virtual machine, test now succeeds (ran several times)
-----------------------------------------------------------
jlr@vmfc12jlr(34) uname -a
Linux vmfc12jlr 2.6.32.14 #1 SMP Thu Jul 1 11:54:07 CEST 2010 x86_64 x86_64 x86_64 GNU/Linux

jlr@vmfc12jlr(22) g++ lock_read.cxx -o lock_read
jlr@vmfc12jlr(32) lock_read /mnt/ns801s2_v4/testfile.0
Waiting for child 0 to exit
Successfully completed 5 iterations, pid=16405
Waiting for child 1 to exit
Successfully completed 5 iterations, pid=16406
Waiting for child 2 to exit
Successfully completed 5 iterations, pid=16407
Exiting parent

The network trace shows that now read operations are done with the lock stateid.
Comment 12 Trond Myklebust 2010-08-12 15:16:27 UTC
Sorry it took such a long time to fix, but this patch has now gone upstream
and should appear in 2.6.36-rc1.

The upstream commit id is f11ac8db5d07b6e99d41ff4aa39d878ee5cef1c5