Most recent kernel where this bug did not occur: applies to all kernels Distribution: Hardware Environment: x86 and others Software Environment: Problem Description: On Linux, it not is possible to modify the setting of the O_SYNC status flag using the fcntl(F_SETFL) operation. (i.e., this flag can only be set during open(2).) However, all other Unix implementations that I have tested do allow this status flag to be modified using fcntl(F_SETFL). I have tested FreeBSD 6.0, Tru64 5.1, Solaris 8, and HP-UX 11. A test program is provided below. My reading of the SUSv3-fcntl(F_SETFL) also confirms that an application should be able to modify the O_SYNC setting using fcntl(F_SETFL), and thus Linux is non-conformant. Cheers, Michael /* fcntl_O_SYNC.c */ #include <sys/types.h> #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <fcntl.h> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ } while (0) #define usageErr(msg, progName) \ do { fprintf(stderr, "Usage: "); \ fprintf(stderr, msg, progName); \ exit(EXIT_FAILURE); } while (0) int main(int argc, char *argv[]) { int flags, fd; if (argc != 2 || strcmp(argv[1], "--help") == 0) usageErr("%s path\n", argv[0]); fd = open(argv[1], O_RDWR | O_SYNC); if (fd == -1) errExit("open"); flags = fcntl(fd, F_GETFL); if (flags == -1) errExit("fcntl"); assert(flags & O_SYNC); if (fcntl(fd, F_SETFL, flags & ~O_SYNC) == -1) errExit("fcntl"); flags = fcntl(fd, F_GETFL); if (flags == -1) errExit("fcntl"); if (flags & O_SYNC) { printf("O_SYNC was left unchanged (non-conformant)\n"); exit(EXIT_FAILURE); } else { printf("O_SYNC was changed (conformant)\n"); exit(EXIT_SUCCESS); } } /* main */
It appears to be unspecified what occurs. Also rather tricky is the semantic question of what happens to queued I/O at the point you set the flag ?
My advice from Geoff Clare at the Open Group is that the behavior is specified, and Linux doesn't conform. And as noted, every other system that I tested does support setting O_SYNC with fcntl().
Could you share his reasoning on this - its not obvious to me from the spec. ALso what do other systems do if you do write(lots) lseek(back a bit) f_setfl(O_SYNC) write(overlapped) which bits are synchronous and what ordering is guaranteed- I can't find any clear view on this at all
I think the reasoning went like this. All "file status flags" that are specified in the standard must be settable via F_SETFL, unless otherwise specified. The standard says: "Bits corresponding to the file access mode and the file creation flags, as defined in <fcntl.h>, that are set in arg shall be ignored." Under the specification of <fcntl.h>, we find the following file status flags specified: 7851 File status flags used for open( ) and fcntl( ) are as follows: 7852 O_APPEND Set append mode. 7853 SIO O_DSYNC Write according to synchronized I/O data integrity completion. 7854 O_NONBLOCK Non-blocking mode. 7855 SIO O_RSYNC Synchronized read I/O operations. 7856 O_SYNC Write according to synchronized I/O file integrity completion. (The SIO marking indicates feature that is part of an Option (Synchronized Input and Output) for POSIX -- thus O_DSYNC and O_RSYNC are not mandatory.) As I remarked in the initial report, Linux seems to be alone in not allowing O_SYNC to be settable using F_SETFL. Furthermore, Linux does allow O_APPEND and O_NONBLOCK (the other flags required by POSIX) to be modified using F_SETFL.
Again, maybe it would be wise to drop a line about this on LKML, so this topic won't get lost in obscurity of bugzilla.
Created attachment 51602 [details] Fix to allow O_SYNC to be set via fcntl Here is a patch that fixes the problem. The question about outstanding I/O is interesting, but not a tough problem to solve. An application gets what it asks for. If it does a bunch of delayed writes and then wants to switch to synchronous writes, it better call fsync(2) first to ensure that its file is consistent. Otherwise the results are undefined.
https://patchwork.kernel.org/patch/591481/ It looks like this patch still hasn't been merged into the mainline. What's its status? It's causing real-world problems (see the note on http://docs.basho.com/riak/1.3.0/tutorials/choosing-a-backend/Bitcask/), and this seems like a simple enough fix (if it doesn't bring it to total compliance, it still brings it a lot closer than it was before).
Stuart, I think your best bet may be to restart the thread at http://thread.gmane.org/gmane.linux.kernel/1105833 or start a new thread that points to that thread and this bug, and CC all of the contributors to the bug report and also the earlier thread.
Stuart, I meant to add, that if you do (0start an email thread, please CC me at only mtk.manpages@gmail.com. It was a mistake that I entered this bug under my other email address.