Most recent kernel where this bug did *NOT* occur: 2.6.18 Distribution: FC6 + vanilla 2.6.20 kernel Hardware Environment: Intel i386 NFS-client, Netapp NFS-server Software Environment: Problem Description: On 2.6.19+ kernels we see these entries in our logs: NFS: server jookiba-20 error: fileid changed fsid 0:f: expected fileid 0x55b7f5, got 0x477663 NFS: server jookiba-20 error: fileid changed fsid 0:f: expected fileid 0x55b7f5, got 0x2afea7 NFS: server jookiba-20 error: fileid changed fsid 0:f: expected fileid 0x55b7f5, got 0x291078 (etc. etc.) I traced the fileid 0x55b7f5 back to a directory. However, the directory seems perfectly normal (on this server it's on a read-only mounted filesystem and the last change on this directory was over a month ago) I could also trace the "got oxXXXXX" fileid's back; these are totally unrelated. They are on other mounts/qtrees and point to data that is not referenced on this particular server Steps to reproduce: 1) run a 2.6.19+ kernel 2) mount a number of filesystems with the same fsid (this is typically done by creating a number of qtrees on a netapp fileserver and mounting the qtrees individually. Since they are on the same volume, the netapp will assign the same fsid to the exports and in 2.6.19+ the kernel will group all superblocks into one superblock) 3) generate activity on the mounted volumes 4) wait When I cd into this directory and do a "stat ." Sometimes I get regular stat output: $ stat . File: `.' Size: 4096 Blocks: 8 IO Block: 32768 directory Device: fh/15d Inode: 5617653 Links: 2 Access: (2775/drwxrwsr-x) Uid: ( 3005/ bnn01ap) Gid: (20041/ cwbnn01) Access: 2007-05-31 10:21:48.795174000 +0200 Modify: 2007-04-27 11:02:27.259644000 +0200 Change: 2007-04-27 11:02:27.259644000 +0200 and sometimes I get: $ stat . stat: cannot stat `.': Stale NFS file handle
Actually it's easier to reproduce the problem: 1 + 2) = same. Just make sure that you have ro and rw mounted filesystems on the same fsid and that the rw-mounted filesystems "win" (i.e. the merged superblock sees all nfs-filesystems as rw) 3) Make sure that /mnt/ro is a read-only exported filesystem and that /mnt/ro/foodir exists, is a directory and is writable by the user 4) $ while :; do mkdir /mnt/ro/foodir/subdir; done This triggers the "fileid changed" behaviour
This is due to a known bug in the netapp filers. Under certain circumstances they can return uninitialised post-op attributes. I believe that a fix is supposed to be forthcoming in OnTap.
To be more precise, the fix is available in OnTap 7.2.3 and later. Note that the bug only appears on read-only exports, so one workaround is to simply export the volume as read-write from the filers, and then mount it as read-only on the clients.