Between versions 2.6.39 and 3.0, clients accessing a previously uncached file through NFS causes one or two of the nfsd processes to spend large amounts of time in system. This has resulted in a slow down to clients from almost 50MB/sec to 1.3MB/sec (both cold cache on both client and server).
I am using NFSv4 over GigE with a MTU of 9000. The relevant line from /proc/mounts shows:
192.168.1.2:/home/ /home nfs4 rw,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.3,minorversion=0,fsc,local_lock=none,addr=192.168.1.2 0 0
(it says fsc, but I don't have any cache in actual use)
I have bisected the problem down to the following commit:
Author: Olga Kornievskaia <email@example.com>
Date: Tue Oct 21 14:13:47 2008 -0400
Subject: svcrpc: take advantage of tcp autotuning
Reverting this commit on top of 3.0.1 eliminates the problem.
Server issue. Reassigning to Bruce
I have been able to reproduce this on 3.2 as well, however I have some additional details.
I see this effect only when I increase the nfsd process' priority via:
renice -N $(pidof nfsd)
The larger the N (and so the higher the priority), the worse the effect.
It could easily be argued that using renice to make nfsd run more immediately is a mistake, and the obvious fix is to just stop doing that. However, it remains an unexpected regression.
"I see this effect only when I increase the nfsd process' priority"
That's a very odd thing to do.
Nevertheless, I would be curious to know why it's happening.
Do you have a simple reproducer? (What commands are you running on the client, exactly?)
It's fairly simple for me to reproduce. Simply cat a big file on the NFS mounted filesystem to /dev/null. I use dd since it reports the rate at which data was transferred at the end, but there are other tools that work. It has to be an uncached non-sparse file however. Files that are cached on the server or are sparse don't trigger the problem. I do not know if the use of jumbo frames is a requirement.
One extra data point I have forgotten to mention until now -- the underlying filesystem is encrypted with dm-crypt.
Huh, I would have expected Olga's commit to affect writes, not reads.
Is it reproduceable without dm-crypt?
Apparently not. A test file on a non-encrypted partition went through at full speed no matter the nfsd nice level.
Can you get any profiling information? (perf should be able to do this, yes? Sorry, you're on your own when it comes to figuring out how, but it shouldn't be too difficult...)
BTW, could you test the latest upstream? This may be fixed by d10f27a750312ed5638c876e4bd6aa83664cccd8 "svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping".