Bug 9566
Summary: | NFS client 10x performance regression with lseek/read/write mix | ||
---|---|---|---|
Product: | File System | Reporter: | Olivier Paquet (olivier.paquet) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | akpm, bfields, drepper, ismail, neilb, nickpiggin, randy.dunlap |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.23 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
simple test case
output of tcpdump when writing a 1 MB file NFS: Fix a read performance regression |
Description
Olivier Paquet
2007-12-14 14:58:14 UTC
Created attachment 14044 [details]
simple test case
Do you have a 'strace' that demonstrates what glibc is doing? Also a tcpdump might be useful (binary). Something like tcpdump -s 9000 -w dump.out host my.server and port 2049 should do the trick. The attached program contains both a version which goes through glibc and one with syscalls which I wrote from the strace. The glibc strace looks like: open("slow_nfs_test_file", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 mmap2(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb 7f55000 _llseek(3, 0, [0], SEEK_SET) = 0 write(3, "\234<\211\277\226:\366\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 25 60 _llseek(3, 0, [0], SEEK_SET) = 0 read(3, "\234<\211\277\226:\366\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 25 60 write(3, "\234<\211\277\226:\366\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 25 60 _llseek(3, 0, [0], SEEK_SET) = 0 read(3, "\234<\211\277\226:\366\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 51 20 write(3, "\234<\211\277\226:\366\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 25 60 [...] _llseek(3, 1343488, [1343488], SEEK_SET) = 0 read(3, "\0\0\0\0\370\277\365\267\250\205\247\2774\200\4\10\210"..., 32768) = 8192 write(3, "|~\247\277\226\252\364\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2560 _llseek(3, 1343488, [1343488], SEEK_SET) = 0 read(3, "\0\0\0\0\370\277\365\267\250\205\247\2774\200\4\10\210"..., 32768) = 10752 write(3, "|~\247\277\226\252\364\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2560 _llseek(3, 1343488, [1343488], SEEK_SET) = 0 read(3, "\0\0\0\0\370\277\365\267\250\205\247\2774\200\4\10\210"..., 32768) = 13312 write(3, "|~\247\277\226\252\364\267\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2560 [...] I'm not quite sure what's up with there being so many writes to the same location (don't remember that) but it does not matter for reproducing the problem. The simplified version has the following strace and the same slowdown: open("slow_nfs_test_file", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 lseek(3, 0, SEEK_SET) = 0 read(3, "", 32768) = 0 write(3, "\274\367\272\277\226J\362\267\3\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2 560 lseek(3, 2560, SEEK_SET) = 2560 read(3, "", 32768) = 0 write(3, "\274\367\272\277\226J\362\267\3\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2 560 lseek(3, 5120, SEEK_SET) = 5120 read(3, "", 32768) = 0 write(3, "\274\367\272\277\226J\362\267\3\0\0\0\0\0\0\0\0\0\0\0\0"..., 2560) = 2 560 [... all the same but with incrementing offsets until the end] Created attachment 14047 [details]
output of tcpdump when writing a 1 MB file
That is a really _sucky_ implementation of buffered i/o. I fail to see what that read() is supposed to be good for. That needs to be filed as a glibc bug. Looking at what it is doing, I'll bet that the real reason for the regression is once again commit 44dd151d5c21234cc534c47d7382f5c28c3143cd (NFS: Don't mark a written page as uptodate until it is on disk). I'm adding a patch that should fix this particular regression, but really this should be fixed on the glibc side. Created attachment 14048 [details]
NFS: Fix a read performance regression
Fix the regression by setting the page as uptodate if we're appending to
the file.
Reply-To: akpm@linux-foundation.org On Sat, 15 Dec 2007 09:59:44 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > That is a really _sucky_ implementation of buffered i/o. I fail to see what > that > read() is supposed to be good for. That needs to be filed as a glibc bug. Ulrich, are your ears burning? The patch works ok here. Thanks. The glibc involved is 2.3.6 and is already 2 years old. I tried building 2.7 to see if it has the same problem but had no luck getting it to work. I'll file a glibc bug if I can't verify that they've fixed this already by some other means. (In reply to comment #7) > Ulrich, are your ears burning? The stdio implementation is unfortunately littered with idiosyncrasies due to backward compatibility. These make it hard to impossible to make fundamental changes. In this case I don't see much of a problem. The implementation cannot make assumptions about the file content. The previously known file size might be wrong at the time when the next buffer has to be read from disk. Therefore the reads should all be necessary. While I appreciate that backward compatibility may be a problem, I still don't fully understand your point. AFAICS the application isn't actually requesting a fread(). It is rather just performing a series of fseek() and fwrite() operations. While I can understand that the buffer contents may need to be revalidated should the client issue a fread() request, that is not the case here. I'm therefore surprised that glibc is inserting these read() syscalls which would appear to serve no useful purpose as far as the example application is concerned. Ultimately, this problem was fixed by commit efc91ed0191e3fc62bb1c556ac93fc4e661214d2 (NFS: Optimise append writes with holes). Marking as resolved. |