Bug 6557
Summary: | NFS client (10x) performance regression 2.6.14.7 -> all later kernels | ||
---|---|---|---|
Product: | File System | Reporter: | Jakob (joe) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | ksm |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.17-rc4 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Program to reproduce the regression
Patch for 2.6.16 Original patch from Trond, works with 2.6.17-rc6 |
Description
Jakob
2006-05-15 06:11:01 UTC
Created attachment 8115 [details]
Program to reproduce the regression
Just compile and run.
Run the test program 100 times to make sure that you catch bad kernels - a bad
kernel *may* execute one run of the program in about a second, but *most* runs
will take about a minute. Good kernels should finish a run of the program in
about one second.
Created attachment 8116 [details] Patch for 2.6.16 Patch from Trond, adapted by ksm@evalesco.com for 2.6.16 I can still reproduce this problem on Linux 2.6.17-rc6 Created attachment 8266 [details]
Original patch from Trond, works with 2.6.17-rc6
Already queued up for 2.6.18. Will not fix for 2.6.17 since it is not a critical bug. Today I did some testing on NFS related problems in kernels > 2.6.14 I'd like to point out that the patch against 2.6.17(.9) works in our environment. However, there's a sligt twist that might interest you: It appears that the bad cache behaviour in the NFS client _only_ happens when the NFS server is very busy / slow. When testing (with an unpatched 2.6.17 kernel) against an idle server, all is well. One run of the nfsbench program takes about one second and generates about 1000 rpc calls. Testing against a busy linux server, with filesystems exported as "async", also all is well. 1 second, 1000 rpc's but, testing against a busy, slow server (fileystems exported as sync), all is not well. 10 seconds, 10000 rpc's My conclusion is that caching fails when the NFS server is slow in responding. This could lead to a downward spiral: NFS server slow -> caching fails -> more rpc's -> NFS server even slower -> ... -> crunch! I'm pretty sure I've seen this happening on our production systems running 2.6.15 with mysql instances on NFS shares (although I didn't realize it at the time). Our response then was to go back to 2.6.14. We're still running 2.6.14 on those systems. UDP mount? If so, then that is quite expected. Don't use UDP in situations where you have frequent congestion issues. TCP mounts can also see poor performance if you have too few nfsd threads, since the Linux NFS server limits the number number of allowed TCP connections. The limit is proportional to the number of threads. The patch was merged into 2.6.18-rc1, so I'm closing the bug. |