Most recent kernel where this bug did not occur: 2.4.18 (maybe 2.4 in general) Distribution: any (Debian, SuSE, ...) Hardware Environment: any Software Environment: coreutils-5.92 to CVS HEAD. ====================================================================== Problem Description: While NFS apparently never claimed to be fully POSIX-compliant (fcntl and lockf) there seems to be a regression in one point with 2.6: NFS updates file modification time on close() which clashes with the way the standrd "cp" command is implemented in coreutils-5.92 and newer. Apparently, POSIX does not allow updating the modification time on close: http://lists.gnu.org/archive/html/bug-coreutils/2005-10/msg00221.html Quoting from there: > > POSIX is pretty careful in specifying when st_atime, st_mtime, and > > st_ctime can be changed (see, e.g., readdir()). POSIX doesn't grant > > permission for close() to change these times, so close() shouldn't > > change the times. This is independent of whether utimes() was called. According to http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00219.html the usage pattern of the new cp from coreutils works fine with Linux-2.4.18 and an NFS Server as server. The problem of updating modification times on close (but apparently only if there were buffers to be written) causes trouble in "cp" because it updates the modification time of the file before it closes the file. This is perfectly legal according to POSIX and worked fine in the above test with Linux-2.4 The workaround for "cp" would be to call fsync() before updating the modification time, then close() does not update it again, and cp -p, -a and --preserve=timestamps work. The problem has been analzed in the Debiang bugtracker and http://bugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=340236 The end of it is also recorded in bug-coreutils: http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00219.html The outcome there was that it would be a kernel problem. I don't know if it's possible to make NFS behave as it did in the test with 2.4.18 and Solaris as Server, but I'm filing this to make the problem researchable which look into this bugzilla for an answer and to hopefully get an answer too. ================================================================= Steps to reproduce: Simple local check using 2.6.16-rc4 with a local NFS server: mkdir /tmp/coreutils-test cd /tmp/coreutils-test wget http://ftp.gnu.org/pub/gnu/coreutils/coreutils-5.94.tar.bz2 tar xvfj coreutils-5.94.tar.bz2 cd coreutils-5.94 ./configure make mkdir -p test chown nobody test echo "$PWD/test 127.0.0.1(rw,sync)" >>/etc/exports # start/reload the kernel nfsserver now mkdir mountpoint mount localhost:/space/test $PWD/mountpoint # cat /proc/mounts shows me these mount options: # rw,v3,rsize=32768,wsize=32768,hard,lock,proto=tcp,addr=localhost cp -p configure configure.new ls -l configure configure.new # (modification time idential) src/cp -p configure mountpoint # (expect an error: failed to preserve ownerships / Operation not permitted) ls -l configure mountpoint/configure # (modification time differ, while the same cp command works locally) umount $PWD/mountpoint # stop nfsserver/remove exports entry and reload it. Summary: Either NFS needs to set the modification time on the last write (and never on close) or cp needs a workaround to call fsync() before calling utime().
Further explanation: cp versions prior to 5.92 closed the destination file before triggering the utime system call to set the modification time of the destination file, while 5.92 to 5.94 and the current CVS close the file _*_after_*_ setting the modfication time/ That's a sample of how the newer versions do it (from cp -p): fd 4 is the destination file here: > open("~", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4 > fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > write(4, "foo\n", 4) = 4 > utimes("/proc/self/fd/4", {1134767841, 0}) = 0 > chmod("baz", 0100644) = 0 > close(4) = 0 It has also been mentioned: > 2. Timestamps of empty files are preserved, which reinforces the above > suspicion (no writes for empty file) So the cause of the update may indeed that some data is pending to be written slightly after the utimes call and in any case finally on close, but this should, according to POSIX not update the mtime/not be visible to the user.
I've a possible better workaround for cp (avoids fsync) now: It creates a copy of the destination fd, closes dest_desc and continues the remaining operations (utime/utimes/futimesat and fchmod with the copy: --- src/copy.c +++ src/copy.c @@ -420,6 +420,12 @@ timespec[0] = get_stat_atime (src_sb); timespec[1] = get_stat_mtime (src_sb); + int dest_desc2 = dup (dest_desc); + + close(dest_desc); /* needed for NFS with Linux-2.6 */ + + dest_desc = dest_desc2; + if (futimens (dest_desc, dst_name, timespec) != 0) { error (0, errno, _("preserving times for %s"), quote (dst_name)); copying /etc/passwd then looks like this: read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 32768) = 1254 write(4, "root:x:0:0:root:/root:/bin/bash\n"..., 1254) = 1254 read(3, "", 32768) = 0 dup(4) = 5 close(4) = 0 utimes("/proc/self/fd/5", {1140803734, 0}) = 0 fchown(5, 0, 0) = 0 close(5) = 0 This avoids the need for fsync which waits until the data is written, which would have a negative impact on the performance of cp, expecially when copy huge numbers of small files locally. About using "/proc/self/fd/5" here: It's simply symlink, so nothing special, but also suboptimal I think since it creates an extra indifection which the kernel has to resolve internally. Not a big one of course, only a few function calls in the VFS layer. I think it would be better to use the futimesat() system call, but that's offtopic here, I'll report that to the coreutils people. Still it is a workaround only.
I found this, Paul Eggert of UCLA got this this approved by the Austin Group: > I asked the POSIX folks about this (see > <http://www.opengroup.org/austin/mailarchives/ag-review/msg01971.html>) > and the answer came back that the Cygwin behavior was indeed a bug; > see <http://www.opengroup.org/austin/mailarchives/ag/msg08912.html> > and look for "XBD ERN 61". ---------------------------------------------------------------------- XBD ERN 61 File times update Accept as marked Change XBD page 99 lines 3089-3091 (File Times Update) from: All fields that are marked for update shall be updated when the file ceases to be open by any process, or when a stat(), fstat(), or lstat() is performed on the file. to: All fields that are marked for update shall be updated when the file ceases to be open by any process or before a stat(), fstat(), lstat(), fsync(), utime(), or utimes() is successfully performed on the file. ---------------------------------------------------------------------- So the POSIX folks want to have the fields updated before the functions are called which makes sense, at least I, from a API user's viewpoint, would like to have defined, always consistant behaviour. I have also read that it works this way on OpenBSD 3.4 NFS client and a Solaris 8 NFS server in this post: http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00219.html I don't know how utimes() is implemented in NFS, but if this is the way it's wanted then probably the implementation of utimes in NFS. I just looked at fs/open.c:do_utimes() which calls fs/attr.c:notify_change() which calls fs/nfs/inode.c:nfs_setattr(), which has this check: /* Write all dirty data if we're changing file permissions or size */ if ((attr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_SIZE)) != 0) { filemap_write_and_wait(inode->i_mapping); nfs_wb_all(inode); } Could we add ATTR_CTIME|ATTR_MTIME|ATTR_ATIME which are set by du_utimes() to this check to write all dirty data also if we are changing timestamps?
Created attachment 7472 [details] NFS: writes should not clobber utimes() calls Ensure that we flush out writes in the case when someone calls utimes() in order to set the file times.
Trond, what about getting this patch into 2.6.16?
I'm sceptical about doing that as it isn't really a critical regression: in fact, this is the first complaint despite the fact that the issue has existed for all(?) 2.6 kernels. Unless someone complains loudly, I'd therefore prefer to wait until 2.6.17.
Patch merged into mainline kernel prior to 2.6.17-rc1.