Bug 38572
Summary: | New NFS: directory XXX contains a readdir loop seems to be triggered by well-behaving server | ||
---|---|---|---|
Product: | File System | Reporter: | Petr Vandrovec (petr) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, anna, difrost.kernel, florian, maciej.rutecki, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.0.0-rc4-00096 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 32012 | ||
Attachments: |
wireshark decoded trace
[PATCH] NFS: Fix spurious readdir cookie loop messages [PATCH v2] NFS: Fix spurious readdir cookie loop messages [PATCH v3] NFS: Fix spurious readdir cookie loop messages |
This is marked as a regression. From which kernel version? Thanks. Message is new since March. It was not present in 2.6.38, but it is present in 2.6.39 (and 3.0.0-rc). My testing with 2.6.38 kernel shows no message, and 'ls -U' does not report any duplicates. commit 8ef2ce3e16d9bec6cf015207c1c82a5b864046ac Author: Bryan Schumaker <bjschuma@netapp.com> Date: Wed Mar 23 15:04:31 2011 -0400 NFS: Detect loops in a readdir due to bad cookies Some filesystems (such as ext4) can return the same cookie value for multiple files. If we try to start a readdir with one of these cookies, the server will return the first file found with a cookie of the same value. This can cause the client to enter an infinite loop. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Hmm... So you are basically alternating between growing and shrinking the directory, while running 'ls'? I'll see if I can look into that over the vacation. Not really. One script creates and then deletes lots of files & directories, while other script list directory content. I've never hit problem when first script is in creation phase, only 'ls' during deletion phase generates duplicate warning. Originally I noticed problem by deleting directory content while watching progress with 'ls | wc -l' ... Created attachment 64402 [details]
[PATCH] NFS: Fix spurious readdir cookie loop messages
If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.
Does this patch help?
I'm afraid that for whatever reason it does not. [539613.027512] FS-Cache: Netfs 'nfs' registered for caching [539650.085051] NFS: directory petr/x contains a readdir loop. Please contact your server vendor. Offending cookie: 42528 [539652.812132] NFS: directory petr/x contains a readdir loop. Please contact your server vendor. Offending cookie: 54200 [539654.673139] NFS: directory petr/x contains a readdir loop. Please contact your server vendor. Offending cookie: 62156 [539655.846894] NFS: directory petr/x contains a readdir loop. Please contact your server vendor. Offending cookie: 66900 I guess I got same issue. Will try to trace cookies but looks like this is not ,,vendor'' problem. Created attachment 67042 [details]
[PATCH v2] NFS: Fix spurious readdir cookie loop messages
Bryan Schumaker reports that this slight variant on the original patch
appears to fix the spurious loop warnings in his testing.
Could you please check if it works for you.
Gah... Never mind, that one is buggy too. I'm working on a fix, and will have Bryan test before I ask anyone else to do so... Created attachment 67052 [details]
[PATCH v3] NFS: Fix spurious readdir cookie loop messages
OK. This one seems to have passed everything we have thrown at it so far.
In addition to the bug that was fixed by the v2 patch, it also fixes an
issue whereby the loop detection would kick in too early.
The patch seems to have been merged in v3.1-rc1: commit 0c0308066ca53fdf1423895f3a42838b67b3a5a8 Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Sat Jul 30 12:45:35 2011 -0400 NFS: Fix spurious readdir cookie loop messages Trond looks like your patch is correct and fix the issue in my case. |
Created attachment 64052 [details] wireshark decoded trace When I run for a in `seq 1 1000`; do mkdir $a; touch $a-test; done for a in `seq 1 1000`; do rmdir $a; done for a in `seq 1 1000`; do rm $a-test; done in one terminal, and while true; do ls; done in the another, every now and then kernel says: Jun 30 11:38:36 petr-dev3 kernel: [65277.074773] NFS: directory petr/asd contains a readdir loop. Please contact your server vendor. Offending cookie: 41912 Jun 30 11:38:37 petr-dev3 kernel: [65277.520933] NFS: directory petr/asd contains a readdir loop. Please contact your server vendor. Offending cookie: 42904 First I talked to our IT, but after capturing packets I cannot see anything wrong on server's side. In wireshark capture I see that cookie 41912 was returned twice for the same readdir loop (server does not use cookieverf, it returns 0 always): petr-dev3:~# grep '\(cookie: 0\|Cookie: 41912\|EOF: 1\)' trace.txt4 cookie: 0 Cookie: 41912 Cookie: 41912 cookie: 0 Cookie: 41912 Cookie: 41912 But it did so because client issued very same readdir request twice, completely ignoring all data from first request: petr-dev3:~# grep '\(cookie:\|Cookie: 41912\|EOF: 1\)' trace.txt4 | grep -C2 41912 cookie: 40584 cookie: 41744 Cookie: 41912 cookie: 41744 Cookie: 41912 cookie: 42904 cookie: 44068 -- cookie: 40616 cookie: 41776 Cookie: 41912 cookie: 41776 Cookie: 41912 cookie: 42936 cookie: 44100 Textual decoded trace of readdirplus cycle which triggered kernel messages is attached. Non-decoded trace of full session is 80MB :-(