Bug 40912 - Excessive NFS system load on server with cold cache
Summary: Excessive NFS system load on server with cold cache
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: bfields
Depends on:
Reported: 2011-08-10 21:05 UTC by Bruce Guenter
Modified: 2012-08-30 17:11 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.0
Regression: Yes
Bisected commit-id:


Description Bruce Guenter 2011-08-10 21:05:26 UTC
Between versions 2.6.39 and 3.0, clients accessing a previously uncached file through NFS causes one or two of the nfsd processes to spend large amounts of time in system.  This has resulted in a slow down to clients from almost 50MB/sec to 1.3MB/sec (both cold cache on both client and server).

I am using NFSv4 over GigE with a MTU of 9000.  The relevant line from /proc/mounts shows: /home nfs4 rw,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=,minorversion=0,fsc,local_lock=none,addr= 0 0

(it says fsc, but I don't have any cache in actual use)

I have bisected the problem down to the following commit:

commit 9660439861aa8dbd5e2b8087f33e20760c2c9afc
Author: Olga Kornievskaia <aglo@citi.umich.edu>
Date:   Tue Oct 21 14:13:47 2008 -0400
Subject: svcrpc: take advantage of tcp autotuning

Reverting this commit on top of 3.0.1 eliminates the problem.
Comment 1 Trond Myklebust 2011-08-10 22:05:29 UTC
Server issue. Reassigning to Bruce
Comment 2 Bruce Guenter 2012-01-13 18:53:25 UTC
I have been able to reproduce this on 3.2 as well, however I have some additional details.

I see this effect only when I increase the nfsd process' priority via:

renice -N $(pidof nfsd)

The larger the N (and so the higher the priority), the worse the effect.

It could easily be argued that using renice to make nfsd run more immediately is a mistake, and the obvious fix is to just stop doing that.  However, it remains an unexpected regression.
Comment 3 bfields 2012-01-14 00:16:44 UTC
"I see this effect only when I increase the nfsd process' priority"

That's a very odd thing to do.

Nevertheless, I would be curious to know why it's happening.

Do you have a simple reproducer?  (What commands are you running on the client, exactly?)
Comment 4 Bruce Guenter 2012-01-16 22:35:31 UTC
It's fairly simple for me to reproduce.  Simply cat a big file on the NFS mounted filesystem to /dev/null.  I use dd since it reports the rate at which data was transferred at the end, but there are other tools that work.  It has to be an uncached non-sparse file however.  Files that are cached on the server or are sparse don't trigger the problem.  I do not know if the use of jumbo frames is a requirement.

One extra data point I have forgotten to mention until now -- the underlying filesystem is encrypted with dm-crypt.
Comment 5 bfields 2012-01-16 22:46:56 UTC
Huh, I would have expected Olga's commit to affect writes, not reads.

Is it reproduceable without dm-crypt?
Comment 6 Bruce Guenter 2012-01-16 22:54:54 UTC
Apparently not.  A test file on a non-encrypted partition went through at full speed no matter the nfsd nice level.
Comment 7 bfields 2012-01-16 23:05:49 UTC
Can you get any profiling information?  (perf should be able to do this, yes?  Sorry, you're on your own when it comes to figuring out how, but it shouldn't be too difficult...)
Comment 8 bfields 2012-08-30 17:11:02 UTC
BTW, could you test the latest upstream?  This may be fixed by d10f27a750312ed5638c876e4bd6aa83664cccd8 "svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping".

Note You need to log in before you can comment on or make changes to this bug.