Most recent kernel where this bug did not occur: 2.6.14.7 Distribution: Debian GNU/Linux 3.1 (Sarge) Hardware Environment: NFSv3 server on dual opteron NFSv3 clients on dual PIII or Opteron or even VMWare Software Environment: Attached nfsbench program which exposes the problem, other than that, standard Debian installations. The problem can even be triggered with an old Red Hat inside VMWare. Problem Description: Certain I/O patterns will run horribly slow due to a cache bug in the NFS client code in kernels later than 2.6.14.7. Especially GNU ld (the linker) will expose this problem - link jobs on large files run 10-50 times slower on kernels newer than 2.6.14.7 than they did on older kernels. The attached nfsbench program exposes the problem too, by sort-of exercising the same I/O pattern as GNU ld. Steps to reproduce: 1) You need an NFS server and an NFS client. 2) Compile the test program: gcc -o nfsbench nfsbench.c 3) On the NFS client, execute the test program 100 times: for i in `seq 1 100`; do ./nfsbench; done On a good kernel, all runs will complete in roughly a second each. On a bad kernel, *most* (but not necessarily all) runs will take roughly one minute to execute. The attached patch seems to fix the problem.
Created attachment 8115 [details] Program to reproduce the regression Just compile and run. Run the test program 100 times to make sure that you catch bad kernels - a bad kernel *may* execute one run of the program in about a second, but *most* runs will take about a minute. Good kernels should finish a run of the program in about one second.
Created attachment 8116 [details] Patch for 2.6.16 Patch from Trond, adapted by ksm@evalesco.com for 2.6.16
I can still reproduce this problem on Linux 2.6.17-rc6
Created attachment 8266 [details] Original patch from Trond, works with 2.6.17-rc6
Already queued up for 2.6.18. Will not fix for 2.6.17 since it is not a critical bug.
Today I did some testing on NFS related problems in kernels > 2.6.14 I'd like to point out that the patch against 2.6.17(.9) works in our environment. However, there's a sligt twist that might interest you: It appears that the bad cache behaviour in the NFS client _only_ happens when the NFS server is very busy / slow. When testing (with an unpatched 2.6.17 kernel) against an idle server, all is well. One run of the nfsbench program takes about one second and generates about 1000 rpc calls. Testing against a busy linux server, with filesystems exported as "async", also all is well. 1 second, 1000 rpc's but, testing against a busy, slow server (fileystems exported as sync), all is not well. 10 seconds, 10000 rpc's My conclusion is that caching fails when the NFS server is slow in responding. This could lead to a downward spiral: NFS server slow -> caching fails -> more rpc's -> NFS server even slower -> ... -> crunch! I'm pretty sure I've seen this happening on our production systems running 2.6.15 with mysql instances on NFS shares (although I didn't realize it at the time). Our response then was to go back to 2.6.14. We're still running 2.6.14 on those systems.
UDP mount? If so, then that is quite expected. Don't use UDP in situations where you have frequent congestion issues. TCP mounts can also see poor performance if you have too few nfsd threads, since the Linux NFS server limits the number number of allowed TCP connections. The limit is proportional to the number of threads.
The patch was merged into 2.6.18-rc1, so I'm closing the bug.