Created attachment 38442 [details] Programs for reproducing Environment: Debian Lenny with kernel 2.6.37-rc3 i686 nfs-common 1:1.1.2-6lenny2 nfs-kernel-server 1:1.1.2-6lenny2 172.22.0.10 is a server 172.22.0.11 is a client /etc/exports on a server: /srv/test 172.22.0.11(rw,sync,no_subtree_check) Steps to reproduce: client# mount -t nfs 172.22.0.10:/srv/test /srv/test server$ cd /srv/test server$ ./linlock r file press Ctrl-C... client$ cd /srv/test client$ ./linlock w file fcntl: Resource temporary unavailable client$ ./lockinfo file F_UNLCK SEEK_SET start 0 len 0 ./linlock tool set read (r) or write (w) lock on a file. ./lockinfo shows what F_GETLK returns. This test shows that a client can not set a write lock on a read lock of server but F_GETLK tells that file is not locked. The bug does not occur if client with 2.6.21 kernel. The bug occurs if client with 2.6.22-rc1 and higher.
Created attachment 38452 [details] [PATCH] NFS: Fix fcntl F_GETLK not reporting some conflicts Narrowing the search range to 2.6.21..2.6.22-rc1 was very helpful. Please try this patch (for the current HEAD, but applies even to 2.6.22-rc1 with some fuzz).
Thank you for fast fix. With this patch F_GETLK tells that file is locked. I also found there is other difference between a client and a server. If a lock is set on client F_GETLK on server reports a pid of process which set a lock ([lockd] pid). But if a lock is set on server F_GETLK on client returns pid equal to 0. This difference seems more old (is was exist always?) than bug with lock type because it also exists on 2.6.21 and 2.6.18.
http://lxr.free-electrons.com/source/fs/lockd/clntproc.c?a=m68knommu#L439 Can it be the reason of the problem? If so, it looks like it should be replaced with "fl->fl_pid = req->a_res.lock.fl.fl_pid;", but of course there may be some reasons not to do it (setting it to zero looks very strange).
(In reply to comment #3) > http://lxr.free-electrons.com/source/fs/lockd/clntproc.c?a=m68knommu#L439 Here req->a_res.lock.fl.fl_pid does not really contain a pid - according to the NFS lock manager protocol specs, it is just an unique value generated by the system which requested the lock. In Linux this is an arbitrary number not actually related with a real pid: http://lxr.free-electrons.com/source/fs/lockd/clntproc.c?a=m68knommu#L76 And a real pid value would be useless anyway, because there is no standard field in struct flock to return any identification of the machine where that process is running (some systems have such fields, but not Linux). Even worse, returning a remote pid value which happens to match some unrelated local process may be harmful - returning 0 meaning "a remote process which cannot be identified in a reliable and portable way" is much safer.
Thanks for your comment! Now it's clear for me.
(In reply to comment #4) When a lock is set on a client and fcntl is called on a server F_GETLK returns lockd pid instead of 0. So if pid != 0 then it is not always mean that a lock is set by a local program. On 2.6.8 F_GETLK always returns lockd pid in case of remote lock. But now we receive either 0 or lockd pid.
fixed in 2.6.37-rc6 by commit 21ac19d484a8ffb66f64487846c8d53afef04d2b Author: Sergey Vlasov <vsu@altlinux.ru> Date: Sun Nov 28 21:04:05 2010 +0000 NFS: Fix fcntl F_GETLK not reporting some conflicts