Bug 28502

Summary: nfsclient does not fall back to v2 synchronous mode when using O_SYNC
Product: File System Reporter: Stefan Bader (stefan.bader)
Component: NFSAssignee: Trond Myklebust (trondmy)
Severity: normal CC: joseph.salisbury
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 to 2.6.38-rc3 Tree: Mainline
Regression: No

Description Stefan Bader 2011-02-07 15:38:18 UTC

Used the following in /etc/exports:
/nfs_export (rw,no_root_squash,sync,no_wdelay,no_subtree_check)

1. Mount NFS mount from Client:
sudo mount server:/nfs_export /nfs_mount

2. Start collecting tcpdump data on client and server.

3. Perform a simple dd to cause and NFS write (Using oflag=sync):
strace -o /tmp/strace.dd.joe.out dd if=/dev/zero of=/nfs_mount/syncfile bs=1k count=1 oflag=sync

4. Review tcpdump data and notice the client does not issue the "nfs_file_sync" write requests.

The oflag=sync causes the file to be opened with O_SYNC. Reading descriptions of this flag sounds like this should switch from the new UNSTABLE/COMMIT mode back to the FILE_SYNC mode. However looking at the code, it seems FILE_SYNC would only be used on retries or writeback writes for reclaim.

So the question here is: Is the assumption wrong or the implementation?
Comment 1 Stefan Bader 2011-02-08 13:46:19 UTC
To add a little detail here: I slightly modified the test case to write out 100 blocks in sequence. The strace on both kernel versions (2.6.32 based and 2.6.38 based) only show read and writes (no fsync or sync). The tcpdump however shows, again in both cases, that every write (with UNSTABLE flag) is followed by a COMMIT.

This sounds to me, that from a data integrity point of view, the result is the same as one would expect. That is every write is waited for before continuing with the next write.

The only downside I could see is that instead of using one write request with the FILE_SYNC flag, this requires two requests for each write which seems a bit of a waste. And it seems unexpected compared to the documentation of O_SYNC I found at
http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html#MOUNTOPTIONS in section 5.9.
Comment 2 Trond Myklebust 2011-04-15 21:30:06 UTC
This should be fixed in the upstream kernel with commit
b31268ac793fd300da66b9c28bbf0a200339ab96 (FS: Use stable writes when not
doing a bulk flush).

Please reopen this bug if the above commit doesn't fix the problem.