Bug 23892 - fcntl(..., F_GETLK, ...) does not return correct info about a read lock which is set on server
Summary: fcntl(..., F_GETLK, ...) does not return correct info about a read lock which...
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-28 14:49 UTC by Alexander Morozov
Modified: 2013-10-11 21:15 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.22-rc1 and higher
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Programs for reproducing (1.03 KB, application/bzip2)
2010-11-28 14:49 UTC, Alexander Morozov
Details
[PATCH] NFS: Fix fcntl F_GETLK not reporting some conflicts (1.54 KB, patch)
2010-11-28 17:18 UTC, Sergey Vlasov
Details | Diff

Description Alexander Morozov 2010-11-28 14:49:37 UTC
Created attachment 38442 [details]
Programs for reproducing

Environment:

Debian Lenny with kernel 2.6.37-rc3 i686
nfs-common         1:1.1.2-6lenny2
nfs-kernel-server  1:1.1.2-6lenny2
172.22.0.10 is a server
172.22.0.11 is a client

/etc/exports on a server:
/srv/test  172.22.0.11(rw,sync,no_subtree_check)


Steps to reproduce:

client# mount -t nfs 172.22.0.10:/srv/test /srv/test

server$ cd /srv/test
server$ ./linlock r file
press Ctrl-C...

client$ cd /srv/test
client$ ./linlock w file
fcntl: Resource temporary unavailable
client$ ./lockinfo file
F_UNLCK SEEK_SET start 0 len 0

./linlock tool set read (r) or write (w) lock on a file. ./lockinfo shows what F_GETLK returns. This test shows that a client can not set a write lock on a read lock of server but F_GETLK tells that file is not locked.


The bug does not occur if client with 2.6.21 kernel.
The bug occurs if client with 2.6.22-rc1 and higher.
Comment 1 Sergey Vlasov 2010-11-28 17:18:15 UTC
Created attachment 38452 [details]
[PATCH] NFS: Fix fcntl F_GETLK not reporting some conflicts

Narrowing the search range to 2.6.21..2.6.22-rc1 was very helpful. Please try this patch (for the current HEAD, but applies even to 2.6.22-rc1 with some fuzz).
Comment 2 Alexander Morozov 2010-11-28 20:30:37 UTC
Thank you for fast fix. With this patch F_GETLK tells that file is locked.

I also found there is other difference between a client and a server. If a lock is set on client F_GETLK on server reports a pid of process which set a lock ([lockd] pid). But if a lock is set on server F_GETLK on client returns pid equal to 0. This difference seems more old (is was exist always?) than bug with lock type because it also exists on 2.6.21 and 2.6.18.
Comment 3 Pavel Shilovsky 2010-11-29 19:08:54 UTC
http://lxr.free-electrons.com/source/fs/lockd/clntproc.c?a=m68knommu#L439

Can it be the reason of the problem? If so, it looks like it should be replaced with
"fl->fl_pid = req->a_res.lock.fl.fl_pid;", but of course there may be some reasons not to do it (setting it to zero looks very strange).
Comment 4 Sergey Vlasov 2010-11-29 21:10:33 UTC
(In reply to comment #3)
> http://lxr.free-electrons.com/source/fs/lockd/clntproc.c?a=m68knommu#L439

Here req->a_res.lock.fl.fl_pid does not really contain a pid - according to the NFS lock manager protocol specs, it is just an unique value generated by the system which requested the lock. In Linux this is an arbitrary number not actually related with a real pid:

http://lxr.free-electrons.com/source/fs/lockd/clntproc.c?a=m68knommu#L76

And a real pid value would be useless anyway, because there is no standard field in struct flock to return any identification of the machine where that process is running (some systems have such fields, but not Linux). Even worse, returning a remote pid value which happens to match some unrelated local process may be harmful - returning 0 meaning "a remote process which cannot be identified in a reliable and portable way" is much safer.
Comment 5 Pavel Shilovsky 2010-11-30 06:16:07 UTC
Thanks for your comment! Now it's clear for me.
Comment 6 Alexander Morozov 2010-11-30 12:05:00 UTC
(In reply to comment #4)

When a lock is set on a client and fcntl is called on a server F_GETLK returns lockd pid instead of 0. So if pid != 0 then it is not always mean that a lock is set by a local program.

On 2.6.8 F_GETLK always returns lockd pid in case of remote lock. But now we receive either 0 or lockd pid.
Comment 7 Florian Mickler 2011-01-23 15:34:55 UTC
fixed in 2.6.37-rc6 by
commit 21ac19d484a8ffb66f64487846c8d53afef04d2b
Author: Sergey Vlasov <vsu@altlinux.ru>
Date:   Sun Nov 28 21:04:05 2010 +0000

    NFS: Fix fcntl F_GETLK not reporting some conflicts

Note You need to log in before you can comment on or make changes to this bug.