Most recent kernel where this bug did not occur: Distribution: RedHat AS 4 Hardware Environment: Dell PowerEdge 6850, QUAD 3,2GHZ HT , 8GB RAM, 5 SCSI disks 146 GB 15k RPM. Software Environment: RH AS 4 with kernel 2.6.17.11 Problem Description: We have several processes hang in "D" (disk wait) state forever. That way, we cannot strace, gdb, pstack them to know what they were doing or where they were. It is a race condition between two processes renaming directories on a NFS volume. Assuming the following directory structure: "/nfs/a/b/c", where "/nfs" is mounted on a NFS volume... Process 1 tries to rename "/nfs/a/b/c" to "/nfs/a/d" at the same time that process 2 (in the second server) is trying to rename "/nfs/a/b" to "/nfs/a/d". Process 1 may hang. This error occiurs if the 2 processes running in same server, too. For more detail see: http://marc.theaimsgroup.com/?l=linux-kernel&m=115868947321633&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=115876518005796&w=2 Steps to reproduce: /********************************** renamedir This program can be used to reproduce the deadlock in vfs_rename_dir, kernel 2.6.17 Instructions: 1. Mount a NFS volume from two servers (e.g. /nfs) 2. Create a directory (e.g. testdir) in this volume 3. In the first server, run 'renamedir -p /nfs/testdir' 4. In the second server, run 'renamedir -p /nfs/testdir -c' Wait for a while. The process in the first server should cause a deadlock in the VFS and stuck in D state. Fernando Soto - f.soto () terra ! com ! br - 20/Sep/2006 Terra Networks Brasil S/A *********************************/ #include <stdio.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <stdlib.h> #include <errno.h> void usage(void) { fprintf(stderr,"renamedir -p <path> [-c]\n"); exit(1); } int main(int argc, char *argv[]) { int test_case = 0; char source[FILENAME_MAX]; char target[FILENAME_MAX]; const char *p1 = NULL; int i; // read options while ((i = getopt(argc, argv, "p:c")) != -1) { switch (i) { case 'c': test_case = 1; break; case 'p': p1 = optarg; break; } } if (!p1) { usage(); } // create test environment snprintf(target,sizeof(target),"%s/b",p1); if (mkdir(target,0755) == -1 && errno != EEXIST) { fprintf(stderr,"Could not create dir %s: %s\n",target,strerror(errno)); exit(1); } snprintf(target,sizeof(target),"%s/b/e",p1); if (mkdir(target,0755) == -1 && errno != EEXIST) { fprintf(stderr,"Could not create dir %s: %s\n",target,strerror(errno)); exit(1); } snprintf(target,sizeof(target),"%s/b/c",p1); if (mkdir(target,0755) == -1 && errno != EEXIST) { fprintf(stderr,"Could not create dir %s: %s\n",target,strerror(errno)); exit(1); } snprintf(target,sizeof(target),"%s/b/c/f",p1); if (mkdir(target,0755) == -1 && errno != EEXIST) { fprintf(stderr,"Could not create dir %s: %s\n",target,strerror(errno)); exit(1); } // prepare test cases if (test_case) { snprintf(source,sizeof(target),"%s/b",p1); snprintf(target,sizeof(target),"%s/d",p1); } else { snprintf(source,sizeof(target),"%s/b/c",p1); snprintf(target,sizeof(target),"%s/d",p1); } // test loop while (1) { rename(source,target); rename(target,source); } }
The problem only occurs in a NFS file system, we try to reproduce the problem in a local file system (ext3) but the problem dont occour. Att, Ranieri
You can produce this more reliably by doing something like the following: --------------------------------------------------- On client 1 On client 2 ----------- ----------- mkdir foo foo/bar foo/baz cd foo/bar mv foo/bar . mv foo bar mv ../baz foo/bag --- deadlocks on rename semaphore in vfs_rename() --- --------------------------------------------------- Basically, we need to prevent directories from being instantiated twice in the same namespace.
Should be fixed in mainline now. See http://kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9eaef27b36a6b716384948da94b8fc5bfba7b712