Bug 8557 - NFS: server error: fileid changed
Summary: NFS: server error: fileid changed
Status: CLOSED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-31 01:29 UTC by Dick Snippe
Modified: 2008-07-24 12:27 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.20
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Dick Snippe 2007-05-31 01:29:52 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.18
Distribution: FC6 + vanilla 2.6.20 kernel
Hardware Environment: Intel i386 NFS-client, Netapp NFS-server
Software Environment: 
Problem Description:

On 2.6.19+ kernels we see these entries in our logs:

NFS: server jookiba-20 error: fileid changed
fsid 0:f: expected fileid 0x55b7f5, got 0x477663
NFS: server jookiba-20 error: fileid changed
fsid 0:f: expected fileid 0x55b7f5, got 0x2afea7
NFS: server jookiba-20 error: fileid changed
fsid 0:f: expected fileid 0x55b7f5, got 0x291078
(etc. etc.)

I traced the fileid 0x55b7f5 back to a directory. However, the directory seems
perfectly normal (on this server it's on a read-only mounted filesystem and the
last change on this directory was over a month ago)
I could also trace the "got oxXXXXX" fileid's back; these are totally unrelated.
They are on other mounts/qtrees and point to data that is not referenced on this
particular server

Steps to reproduce:
1) run a 2.6.19+ kernel
2) mount a number of filesystems with the same fsid
(this is typically done by creating a number of qtrees on a netapp fileserver
and mounting the qtrees individually. Since they are on the same volume, the
netapp will assign the same fsid to the exports and in 2.6.19+ the kernel will
group all superblocks into one superblock)
3) generate activity on the mounted volumes
4) wait

When I cd into this directory and do a "stat ." Sometimes I get regular stat output:
$ stat .
  File: `.'
  Size: 4096            Blocks: 8          IO Block: 32768  directory
Device: fh/15d  Inode: 5617653     Links: 2
Access: (2775/drwxrwsr-x)  Uid: ( 3005/ bnn01ap)   Gid: (20041/ cwbnn01)
Access: 2007-05-31 10:21:48.795174000 +0200
Modify: 2007-04-27 11:02:27.259644000 +0200
Change: 2007-04-27 11:02:27.259644000 +0200

and sometimes I get:
$ stat .
stat: cannot stat `.': Stale NFS file handle
Comment 1 Dick Snippe 2007-05-31 03:48:08 UTC
Actually it's easier to reproduce the problem:

1 + 2) = same. Just make sure that you have ro and rw mounted filesystems on the
same fsid and that the rw-mounted filesystems "win" (i.e. the merged superblock
sees all nfs-filesystems as rw)
3) Make sure that /mnt/ro is a read-only exported filesystem and that
/mnt/ro/foodir exists, is a directory and is writable by the user 
4) $ while :; do mkdir /mnt/ro/foodir/subdir; done

This triggers the "fileid changed" behaviour
Comment 2 Trond Myklebust 2007-05-31 05:16:14 UTC
This is due to a known bug in the netapp filers. Under certain circumstances they
can return uninitialised post-op attributes.

I believe that a fix is supposed to be forthcoming in OnTap.
Comment 3 Trond Myklebust 2008-07-24 12:26:51 UTC
To be more precise, the fix is available in OnTap 7.2.3 and later.

Note that the bug only appears on read-only exports, so one workaround
is to simply export the volume as read-write from the filers, and then
mount it as read-only on the clients.

Note You need to log in before you can comment on or make changes to this bug.