Bug 8159 - NFS file locking is too slow
Summary: NFS file locking is too slow
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-09 06:00 UTC by Matthias Meinke
Modified: 2007-03-09 08:50 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.20.1 with revised patch http://bugzilla.kernel.org/show_bug.
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Matthias Meinke 2007-03-09 06:00:06 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.20.1
Distribution: Suse 10.1
Hardware Environment: Dell or PC with Intel Dual Core 64 Processors
Software Environment: kernel 2.6.20.1 with revised patch fron
http://bugzilla.kernel.org/show_bug.cgi?id=7916
Problem Description: File locking over NFS works, but needs about 30 seconds

Steps to reproduce: use pearl script:

#!/usr/bin/perl
 use Fcntl ':flock'; # import LOCK_* constants
 print("Just before opening file.\n");
 open(LOCKFILE,">testlockfile") or
   die "Error: Could not write to lock file: testlockfile: $!\n";
 print("Just before locking file.\n");
 flock(LOCKFILE,LOCK_EX);
 print("Just before unlocking file.\n");
 flock(LOCKFILE,LOCK_UN);
 print("All done.  File locked and unlocked.\n");
 unlink("testlockfile");
 exit(0);
Comment 1 Matthias Meinke 2007-03-09 06:11:19 UTC
Here is some more information about the problem:

we are running a Dell cluster and also other Pc's with Intel Dual Core
processors with a Suse 10.1 distribution.  On all platforms we noticed the same
behaviour:

The kernel 2.6.18 in the Suse distribution produces wrong data on the disc when
written over nfs. After updating to kernel 2.6.20.1 this problem was resolved,
but under some nfs-load a kernel bug (kernel BUG at fs/inode.c) regularly
occured so that we applied the revised patch from

http://bugzilla.kernel.org/show_bug.cgi?id=7916

now the system and nfs seems to run stable, but unfortunately file locking is
now extremely slow so that applications run into timouts and we receive the
following kernel messages (from dmesg):
...
statd: server localhost not responding, timed out
lockd: cannot monitor HOSTNAME
lockd: failed to monitor HOSTNAME
After some googling, I found the following perl script to test file locking:
#!/usr/bin/perl
 use Fcntl ':flock'; # import LOCK_* constants
 print("Just before opening file.\n");
 open(LOCKFILE,">testlockfile") or
   die "Error: Could not write to lock file: testlockfile: $!\n";
 print("Just before locking file.\n");
 flock(LOCKFILE,LOCK_EX);
 print("Just before unlocking file.\n");
 flock(LOCKFILE,LOCK_UN);
 print("All done.  File locked and unlocked.\n");
 unlink("testlockfile");
 exit(0);

running this on an nfs mounted file system gives the following result:

time perlscript
Just before opening file.
Just before locking file.
Just before unlocking file.
All done.  File locked and unlocked.

real    0m30.068s
user    0m0.012s
sys     0m0.000s

so it takes about 30 seconds to do the file locking. on other machines running
an older kernel 2.6.18 but the same hardware and software configuration,  I get
the same messages but the times were:
real    0m0.297s
user    0m0.028s
sys     0m0.012s

any hints, what is going wrong?

here are the details of the system:
uname -a
Linux HOSTNAME 2.6.20.1 #4 SMP Tue Mar 6 11:57:25 CET 2007 x86_64 x86_64 x86_64
GNU/Linux

Comment 2 Olaf Kirch 2007-03-09 06:29:21 UTC
I think what you're seeing is related to the kernel statd support in
Suse (mea culpa). You upgraded to mainline, which doesn't have that (yet),
so you also need to run the user space rpc.statd (which Suse doesn't ship).
You probably need to get it from nfs-utils and build and install it yourself.
Comment 3 Matthias Meinke 2007-03-09 07:52:29 UTC
So I installed:

libgssapi-0.10.tar.gz
librpcsecgss-0.14
nfs-utils-1.0.10.tar.gz

after starting "statd" from nfs-utils the problem vanishes. So this seems to be
a Suse related problem.

Thank you Olaf Kirch for the fast response. Although I think that Suse should
have supplied a kernel update concerning the production of wrong data over nfs,
which I mentioned in my problem description. That is really something which has
caused quite some headache.
Comment 4 Trond Myklebust 2007-03-09 08:50:09 UTC
Closing bug, since this appears to be a SuSE distribution issue rather than
a mainline kernel breakage.

Note You need to log in before you can comment on or make changes to this bug.