Bug 9454

Summary: F_SETLEASE/F_WRLCK active on file causes it to appear modified over NVSv3 mount
Product: File System Reporter: starlight
Component: NFSAssignee: bfields
Severity: normal CC: akpm, alan, bfields, jlayton, neilb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.9-55.0.12.ELsmp Tree: Mainline
Regression: No
Attachments: nfs_lease_bug.c
remove lease_get_mtime

Description starlight 2007-11-24 12:24:10 UTC
Most recent kernel where this bug did not occur:

don't know


RHEL 4.5

Hardware Environment:

Athlon 4800+, P4

Software Environment:

x86_64, i686, portable bug

Problem Description:

Whenever a file has a lease applied with

   fcntl(fd, F_SETLEASE, F_WRLCK);

the modification time for the file as seen via NFSv3
is temporarily set to the time of the lease acquire.
Once lease is removed, modification time reverts to
correct original value.  Altered modification time
appease *only* via NFS.  It does not change on the
local system.

Steps to reproduce:

Run attached test program on a file and 'ls -l' the
file via a remote NFS mount.  Observe modification
time.  Once test program exits, check again.


Specific impact of this bug is spurious 'make' target rebuilding 
when a single source tree is used to build executables for 
multiple platforms, including Windows.

Reported to Red Hat Bugzilla too.  Complete evolution of discovery 
and analysis appears in report.

Comment 1 starlight 2007-11-24 12:25:13 UTC
Created attachment 13739 [details]
Comment 2 Trond Myklebust 2007-11-24 12:38:00 UTC
Can you reproduce this on a mainline client? AFAIK, that should just return

Otherwise, this looks like a problem for Red Hat...
Comment 3 starlight 2007-11-24 12:46:17 UTC
Don't have time to build vanilla kernel and try it, but the 
testcase is trivial and should be easy to check on existing 

Reported this as a courtesy in case it's in vanilla kernel.  
It's obscure--took months to figure out what was causing the 
spurious 'make' rebuilding.  Frustrating experience.
Comment 4 starlight 2007-11-24 19:08:54 UTC
>Can you reproduce this on a mainline client?

BTW, this is a server-side issue.
Comment 5 bfields 2007-11-26 08:57:56 UTC
This is by design--see fs/locks.c:lease_get_mtime().

The argument is: if somebody has a write lease on the file (are you exporting this via Samba?  That'd be the typical user), then they're caching writes--they're explicitly telling us that we do not know whether the file is still the same, because we may have modified it on a remote client and not told us about it.  So lease_get_mtime() reports the current time as the mtime, prompting you to actually try opening the file, at which point the write lease gets broken and any cached writes get flushed out.

It's a terrible kludge, I agree, and maybe we should remove it.  But I'd first like to understand what circumstances prompted smoebody to add it originally, and talk to the Samba people about how they're using these write leases.

(In NFSv4, by the way, there's a callback to the client to allow the server to find out the attributes of a file that the client holding a write lease is caching, which solves this problem.  We haven't implemented that yet; it seems likely to be an enormous pain.  I wonder if Samba would want something similar?)
Comment 6 starlight 2007-11-26 10:17:08 UTC
Yes, using a Samba mount along with NFS mounts to 'make' build
an application on multiple platforms simultaneously.  When
re-building existing trees the result is spurious target building,
a real problem.  Took months of frustration to figure out what
was going on.

Actually reported this as a bug to Samba along with RH, and a
request for info from the Samba team provoked the analysis that
isolated the problem.  Closed the Samba bug but it's all still


My impression for looking at the Samba code is that the leases
are put in place so that files can be monitored for changes,
but that's just a guess.  Perhaps Samba can use a read lease
instead of a write lease.
Comment 7 bfields 2007-11-26 10:55:17 UTC
Thanks for the pointer to the samba bug.  I understand the frustration.

My feeling is that: yes, it's suboptimal (but perhaps not a bug) for samba to be requesting a write lease when a read lease (sufficient to alert it to modifications of the file) would be enough.

It seems to me that the real bug is the incomplete lease implementation--if the purpose of a lease is to allow a remote client to cache writes to the file, then there's no way for us to give a sensible answer to a stat call, unless we break the lease first.

Perhaps we could find out (in the samba case) what the consequences would be of not updating mtime in this case?  I suppose the worst case would be that a modification to a file made on a samba client could be indefinitely delayed from being flushed to the server.

If we agree that that would be less of a problem than these spurious bumps in the mtime, then the best solution for now may just be to rip out lease_get_mtime().  I'll cook up a straw-man patch....
Comment 8 bfields 2007-11-26 10:56:16 UTC
Created attachment 13763 [details]
remove lease_get_mtime

Not sure if this is what we want to do, but it's at least something you could test to confirm the source of the problem.
Comment 9 starlight 2007-11-26 11:13:01 UTC
Thanks!  Happen to have a kernel source tree set up for the
Centos/RH images running on the server, so I can try this out
in the next week or so.  Seems certain to work.

In the interim perhaps you could start a dialog with Samba about
the reasoning behind the application of the F_WRLCK?  Or if
you wish I could do so using the Samaba bug report cited
Comment 10 starlight 2007-11-27 19:56:13 UTC
Created attachment 13773 [details]

Adapted patch for RHEL 4.5 2.6.9-55 kernel (attached).  Works as 
expected and eliminates the spurious target rebuilding when a 
concurrent Samaba and NFS 'make' is run on same source tree. 
Test case also shows expected behavior.

Obviously did not do any regression testing with other Samba 
file scenarios.  In our build trees the same files and 
directories are never written by more than one server.
Comment 11 Jeff Layton 2008-01-08 12:53:35 UTC
Hmmm...kludge or not, it seems like using lease_get_mtime makes everything err on the side of tighter cache consistency, and that seems to be a good thing.

I assume that samba is doing this to emulate oplocks on the share. In fact, snipping from the smb.conf manpage:

          Kernel  oplocks support allows Samba oplocks to be broken whenever a
          local UNIX process or NFS operation accesses a file that smbd(8) has
          oplocked.  This  allows  complete data consistency between SMB/CIFS,
          NFS and local file access (and is a very cool feature :-).

...but if we remove lease_get_mtime then that "very cool feature" is now a "very broken feature". A samba client could get an oplock on a file which may never be revoked even though clients may have the file open read only and are polling for changes.

While lease_get_mtime is a kludge, it's one that appears to help data consistency and may help prevent corruption. I think that's generally a good thing and not something we should eliminate without some careful consideration.

Finally, this probably can be worked around in userspace by setting "kernel oplocks = no" in smb.conf, though I haven't tested it and can't confirm it.
Comment 12 starlight 2008-04-25 13:13:38 UTC
Tried "kernel oplocks = no" in the Samba config
and it does work around the issue.
Comment 13 Jeff Layton 2008-04-25 13:39:52 UTC
Thanks. That's good to know.

The only downside of using that is that samba is not taking out leases for oplocks and is just tracking them internally. Accesses by other applications (and NFS) won't revoke the oplock so you won't have the same data consistency guarantees.

On the other hand, the kernel patch that Bruce posted would just make NFS a lot less consistent about revoking the oplocks. When the client can satisfy the read from the cache, then the lease wouldn't be broken. When it has to satisfy the read by querying the server it would be. When the lease isn't broken then the oplock holder can continue to buffer up the writes, and you can end up with a large amount of dirty pages on the client outstanding.

Bruce's patch might still be reasonable, but we need to be aware that there might be drawbacks and it could break some people relying on this behavior. Given that this is a nfsv2/3 specific kludge, I'm less keen on breaking it.

Finally, NFSv4 doesn't use lease_get_mtime since it's a stateful protocol, and
actually has nfsd hold the file open. I assume that sharing the same data with samba and nfsv4 might be a lot smoother, but you may just be best off having the Linux clients use CIFS if you need consistency with oplocks.