Most recent kernel where this bug did not occur:
Athlon 4800+, P4
x86_64, i686, portable bug
Whenever a file has a lease applied with
fcntl(fd, F_SETLEASE, F_WRLCK);
the modification time for the file as seen via NFSv3
is temporarily set to the time of the lease acquire.
Once lease is removed, modification time reverts to
correct original value. Altered modification time
appease *only* via NFS. It does not change on the
Steps to reproduce:
Run attached test program on a file and 'ls -l' the
file via a remote NFS mount. Observe modification
time. Once test program exits, check again.
Specific impact of this bug is spurious 'make' target rebuilding
when a single source tree is used to build executables for
multiple platforms, including Windows.
Reported to Red Hat Bugzilla too. Complete evolution of discovery
and analysis appears in report.
Created attachment 13739 [details]
Can you reproduce this on a mainline client? AFAIK, that should just return
Otherwise, this looks like a problem for Red Hat...
Don't have time to build vanilla kernel and try it, but the
testcase is trivial and should be easy to check on existing
Reported this as a courtesy in case it's in vanilla kernel.
It's obscure--took months to figure out what was causing the
spurious 'make' rebuilding. Frustrating experience.
>Can you reproduce this on a mainline client?
BTW, this is a server-side issue.
This is by design--see fs/locks.c:lease_get_mtime().
The argument is: if somebody has a write lease on the file (are you exporting this via Samba? That'd be the typical user), then they're caching writes--they're explicitly telling us that we do not know whether the file is still the same, because we may have modified it on a remote client and not told us about it. So lease_get_mtime() reports the current time as the mtime, prompting you to actually try opening the file, at which point the write lease gets broken and any cached writes get flushed out.
It's a terrible kludge, I agree, and maybe we should remove it. But I'd first like to understand what circumstances prompted smoebody to add it originally, and talk to the Samba people about how they're using these write leases.
(In NFSv4, by the way, there's a callback to the client to allow the server to find out the attributes of a file that the client holding a write lease is caching, which solves this problem. We haven't implemented that yet; it seems likely to be an enormous pain. I wonder if Samba would want something similar?)
Yes, using a Samba mount along with NFS mounts to 'make' build
an application on multiple platforms simultaneously. When
re-building existing trees the result is spurious target building,
a real problem. Took months of frustration to figure out what
was going on.
Actually reported this as a bug to Samba along with RH, and a
request for info from the Samba team provoked the analysis that
isolated the problem. Closed the Samba bug but it's all still
My impression for looking at the Samba code is that the leases
are put in place so that files can be monitored for changes,
but that's just a guess. Perhaps Samba can use a read lease
instead of a write lease.
Thanks for the pointer to the samba bug. I understand the frustration.
My feeling is that: yes, it's suboptimal (but perhaps not a bug) for samba to be requesting a write lease when a read lease (sufficient to alert it to modifications of the file) would be enough.
It seems to me that the real bug is the incomplete lease implementation--if the purpose of a lease is to allow a remote client to cache writes to the file, then there's no way for us to give a sensible answer to a stat call, unless we break the lease first.
Perhaps we could find out (in the samba case) what the consequences would be of not updating mtime in this case? I suppose the worst case would be that a modification to a file made on a samba client could be indefinitely delayed from being flushed to the server.
If we agree that that would be less of a problem than these spurious bumps in the mtime, then the best solution for now may just be to rip out lease_get_mtime(). I'll cook up a straw-man patch....
Created attachment 13763 [details]
Not sure if this is what we want to do, but it's at least something you could test to confirm the source of the problem.
Thanks! Happen to have a kernel source tree set up for the
Centos/RH images running on the server, so I can try this out
in the next week or so. Seems certain to work.
In the interim perhaps you could start a dialog with Samba about
the reasoning behind the application of the F_WRLCK? Or if
you wish I could do so using the Samaba bug report cited
Created attachment 13773 [details]
Adapted patch for RHEL 4.5 2.6.9-55 kernel (attached). Works as
expected and eliminates the spurious target rebuilding when a
concurrent Samaba and NFS 'make' is run on same source tree.
Test case also shows expected behavior.
Obviously did not do any regression testing with other Samba
file scenarios. In our build trees the same files and
directories are never written by more than one server.
Hmmm...kludge or not, it seems like using lease_get_mtime makes everything err on the side of tighter cache consistency, and that seems to be a good thing.
I assume that samba is doing this to emulate oplocks on the share. In fact, snipping from the smb.conf manpage:
Kernel oplocks support allows Samba oplocks to be broken whenever a
local UNIX process or NFS operation accesses a file that smbd(8) has
oplocked. This allows complete data consistency between SMB/CIFS,
NFS and local file access (and is a very cool feature :-).
...but if we remove lease_get_mtime then that "very cool feature" is now a "very broken feature". A samba client could get an oplock on a file which may never be revoked even though clients may have the file open read only and are polling for changes.
While lease_get_mtime is a kludge, it's one that appears to help data consistency and may help prevent corruption. I think that's generally a good thing and not something we should eliminate without some careful consideration.
Finally, this probably can be worked around in userspace by setting "kernel oplocks = no" in smb.conf, though I haven't tested it and can't confirm it.
Tried "kernel oplocks = no" in the Samba config
and it does work around the issue.
Thanks. That's good to know.
The only downside of using that is that samba is not taking out leases for oplocks and is just tracking them internally. Accesses by other applications (and NFS) won't revoke the oplock so you won't have the same data consistency guarantees.
On the other hand, the kernel patch that Bruce posted would just make NFS a lot less consistent about revoking the oplocks. When the client can satisfy the read from the cache, then the lease wouldn't be broken. When it has to satisfy the read by querying the server it would be. When the lease isn't broken then the oplock holder can continue to buffer up the writes, and you can end up with a large amount of dirty pages on the client outstanding.
Bruce's patch might still be reasonable, but we need to be aware that there might be drawbacks and it could break some people relying on this behavior. Given that this is a nfsv2/3 specific kludge, I'm less keen on breaking it.
Finally, NFSv4 doesn't use lease_get_mtime since it's a stateful protocol, and
actually has nfsd hold the file open. I assume that sharing the same data with samba and nfsv4 might be a lot smoother, but you may just be best off having the Linux clients use CIFS if you need consistency with oplocks.