Bug 40912
Summary: | Excessive NFS system load on server with cold cache | ||
---|---|---|---|
Product: | File System | Reporter: | Bruce Guenter (bruce) |
Component: | NFS | Assignee: | bfields |
Status: | RESOLVED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | alan, trondmy |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Bruce Guenter
2011-08-10 21:05:26 UTC
Server issue. Reassigning to Bruce I have been able to reproduce this on 3.2 as well, however I have some additional details. I see this effect only when I increase the nfsd process' priority via: renice -N $(pidof nfsd) The larger the N (and so the higher the priority), the worse the effect. It could easily be argued that using renice to make nfsd run more immediately is a mistake, and the obvious fix is to just stop doing that. However, it remains an unexpected regression. "I see this effect only when I increase the nfsd process' priority" That's a very odd thing to do. Nevertheless, I would be curious to know why it's happening. Do you have a simple reproducer? (What commands are you running on the client, exactly?) It's fairly simple for me to reproduce. Simply cat a big file on the NFS mounted filesystem to /dev/null. I use dd since it reports the rate at which data was transferred at the end, but there are other tools that work. It has to be an uncached non-sparse file however. Files that are cached on the server or are sparse don't trigger the problem. I do not know if the use of jumbo frames is a requirement. One extra data point I have forgotten to mention until now -- the underlying filesystem is encrypted with dm-crypt. Huh, I would have expected Olga's commit to affect writes, not reads. Is it reproduceable without dm-crypt? Apparently not. A test file on a non-encrypted partition went through at full speed no matter the nfsd nice level. Can you get any profiling information? (perf should be able to do this, yes? Sorry, you're on your own when it comes to figuring out how, but it shouldn't be too difficult...) BTW, could you test the latest upstream? This may be fixed by d10f27a750312ed5638c876e4bd6aa83664cccd8 "svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping". |