Distribution: Debian GNU/Linux 3.0r2 Hardware Environment: x86 Software Environment: Any software updating files solely via. mmap(). In this particular scenario, I see the problem with the SysOrb Network Monitoring System (http://www.sysorb.com) - it holds three database files; main.odb, main.odbj and main.tsdb. main.odb and main.tsdb are updated with read/write, while main.odbj is updated solely by means of an mmap(). Problem Description: The file(s) updated solely by means of an mmap() (open/mmap/{modify mem}/close) do not get their mtime updated. This is a serious problem when running backups, as the backup software will see such files as "not modified". I believe that this is a bug; mtime should reflect when file data changes. Solaris 9 does update the mtime correctly in the scenario that I tested. Windows 2000 exhibits the same problem as Linux when using the M$ equivalent of mmap. If that's any consolation ;) Steps to reproduce: See the atime/mtime/ctime using 'stat' of a file. Update the file with a open/mmap/{modify mem}/close Use 'stat' again to see that the mtime does not reflect the fact that the file data was changed.
Hi can you please tell me how to reproduce the bug, i have used a program to reproduce the bug!. and the time stamps were changed !. (which indicates this is not a bug). please go though the attached program. thanks in advance . any suggestions and corrections are welcomed. regards badari mail id : h.badari@gmail.com #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/wait.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <sys/mman.h> #define ungraceful_exit 2 #define graceful_exit 0 struct stat time_buffer1, time_buffer2; void * marea; static void show_usage(const char *prog_name, const char *mesg ){ fprintf(stderr, "Error: %s\n" "Usage is:\n%s fname\n\tWhere fname is a file name to use\n", mesg, prog_name); } static void error(char *mesg){ perror(mesg); exit(ungraceful_exit); } int main (void) { int child, status; pid_t wait_status; caddr_t mmap_ptr; char buffer[80]; int fd, i; char buff[256], ch; if ( (fd = open ("test", O_RDWR) ) < 0 ) error ("OPEN FAIL \n"); if ( fstat ( fd, &time_buffer1 ) < 0 ) error (" fstat failed \n"); if ( strcpy (buff,ctime ( & time_buffer1 . st_mtime) ) == NULL) error ("strcpy error \n"); buff [ strlen (buff) + 1] = '\n'; buff [ strlen (buff) + 2] = '\0'; printf ("\n first mtime stamp = %s \n", buff ); if ( strcpy (buff,ctime ( & time_buffer1 . st_atime) ) == NULL) error ("strcpy error \n"); buff [ strlen (buff) + 1] = '\n'; buff [ strlen (buff) + 2] = '\0'; printf ("\n first atime stamp = %s \n", buff ); if ( strcpy (buff,ctime ( & time_buffer1 . st_ctime) ) == NULL) error ("strcpy error \n"); buff [ strlen (buff) + 1] = '\n'; buff [ strlen (buff) + 2] = '\0'; printf ("\n first ctime stamp = %s \n", buff ); status = ftruncate(fd, sizeof(buffer)); /* make the file the buffer size */ if (status){ fprintf(stderr,"Could not ftruncate(%d, %d) = %d\n", fd,sizeof(buffer), status ); error("Bad Ftruncate"); } mmap_ptr = mmap((caddr_t) 0, /* Memory Location, 0 means O/S chooses */ sizeof(buffer),/* How many bytes to mmap */ PROT_READ | PROT_WRITE, /* Read and write permissions */ MAP_SHARED, /* Accessible by another process */ fd, /* which file is associated with mmap */ (off_t) 0); /* Offset in page frame */ if (mmap_ptr == MAP_FAILED) error("Memory Map Failed, QUIT!"); sprintf(buffer, "This is the secret message!"); memcpy(mmap_ptr, buffer, sizeof(buffer)); if ( fstat ( fd, &time_buffer2 ) < 0 ) perror (" fstat failed \n"); if ( strcpy (buff,ctime ( & time_buffer2 . st_mtime) ) == NULL) perror ("strcpy error \n"); printf ("\n second mtime stamp = %s \n", ctime (& time_buffer2 . st_mtime ) ); if ( strcpy (buff,ctime ( & time_buffer2 . st_atime) ) == NULL) error ("strcpy error \n"); buff [ strlen (buff) + 1] = '\n'; buff [ strlen (buff) + 2] = '\0'; printf ("\n second atime stamp = %s \n", ctime (& time_buffer2 . st_atime ) ); if ( strcpy (buff,ctime ( & time_buffer2 . st_ctime) ) == NULL) error ("strcpy error \n"); buff [ strlen (buff) + 1] = '\n'; buff [ strlen (buff) + 2] = '\0'; printf ("\n second ctime stamp = %s \n", ctime (& time_buffer2 . st_ctime ) ); status = close (fd); if (status < 0 ) error (" close open file failed \n"); exit ( graceful_exit ); }
And this was tested on three machines with three different os 1. AMD Athlon xp, 2400, 2GHz, 128 mb ram, Redhat linux release 9 2.4.20 kernel 2. Intel pentium 4, 256 mb ram, Fedora core 2 2.6.12- kernel 3. AMD sempron ,256 mb ram, DEBIAN LINUX 2.4.18 - gcov.
Your test program modifies the metatadata of the file (by calling ftruncate). This causes the stat information to be updated correctly. Try commenting out the ftruncate call after the first run of the program. You will then see, that no matter how many times you run the program and update the file, only atime is modified - no file modification is apparant from viewing the stat data (mtime/ctame remain the same). Modifying the file data with a write() will cause the correct stat data updates. The missing mmap() updates are a major problem for memory mapped database files (a journal file in my usage scenario).
Hello, I have tested this program, before and after commenting the ftruncate call , the results indicated as suspected i.e, only atime was modified ( mtime and ctime stamps were not modified ), which indicates this is a bug. Howvever , as of my knowledge goes , database concurrency controll system will not use time stamps protocols, but strict 2PL protocols. So i believe this will not create problems. But data base prevention schemes of linux uses time stamps, if time stamp does not modify it might lead to dead locks. please do correct me if im wrong in my concepts. Sathya Narayana P
It is a bug, yes, we agree. However, your arguments about this not causing problems are wrong. It does cause problems. Lots of tools like backup utilities or tmpreaper depend on mtime/ctime records to decide whether or not to delete or copy files. I wouldn't have files this bug report in the first place, were it not for commonly used tools doing the wrong thing (specifically for me, tmpreaper deleting database files that were frequently updated - but it could have been any other problem... In the worst case, a problem that could have gone unnoticed for a long time, like missing backups). I do not understand your arguments about database "concurrency controll systems" or "data base prevention schemes of linux", but I assume that we still agree that this is a bug.
See http://lkml.org/lkml/2007/1/11/223 for ongoing work. Not sure what the status of this patch is.
What is the status of the bug, is it still being worked on? Thanks.
As far as I know, it is still not being worked on.
I changed the test program Badari wrote above. The program will be attached to this bug soon. This program shows that the msync() function has a bug: it does not update the st_mtime and st_ctime fields as the POSIX standard defines. Now I'm trying to fix the problem and waiting for any comments about my view on this bug.
Created attachment 14254 [details] The test program
The Open Group defines the behavior of the mmap() function as follows. The st_ctime and st_mtime fields of a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked for update at some point in the interval between a write reference to the mapped region and the next call to msync() with MS_ASYNC or MS_SYNC for that portion of the file by any process. If there is no such call and if the underlying file is modified as a result of a write reference, then these fields shall be marked for update at some time after the write reference. The above citation was taken from the following link: http://www.opengroup.org/onlinepubs/009695399/functions/mmap.html Therefore, the msync() function should be called before verifying the time stamps st_mtime and st_ctime in the test program Badari wrote above. Otherwise, the time stamps may be updated at some unspecified moment.
Anton, et.al, it you feel like fixing this problem, you are more than welcome to propose a patch.
A patch for this bug was proposed in the following message: http://lkml.org/lkml/2008/1/4/296
Created attachment 14312 [details] The proposed patch
There have been three solutions proposed to fix this bug: 1) Peter Staubach's one rejected due to the several problems but received no reaction when a slightly modified patch was proposed later: http://lkml.org/lkml/2006/5/17/138 2) Miklos Szeredi's patch was not accepted as well: http://lkml.org/lkml/2007/2/27/295 3) my solution was ignored: http://lkml.org/lkml/2008/1/7/234 It's logical to presume the community has not enough interest in the above solutions, and this bug could be closed already as WONTFIX. Natalie, what are you thinking about closing the bug?
Please re-submit to LKML. This bug causes backup systems to *miss* changed files. This bug does cause data loss in common real-world deployments (I gave an example with a database when posting the bug, but this affects the data from all mmap using applications with common backup systems). Silent exclusion from backups is very very nasty.
(In reply to comment #16) > Please re-submit to LKML. > > This bug causes backup systems to *miss* changed files. > > This bug does cause data loss in common real-world deployments (I gave an > example with a database when posting the bug, but this affects the data from > all mmap using applications with common backup systems). > > Silent exclusion from backups is very very nasty. > OK, I understand and have already replied to my last message in LKML: http://lkml.org/lkml/2008/1/9/110
Thank you, I will try and back this up on LKML too. Considering the fuzz there was about atime updates, I am surprised that no one seems to care about mtime (which, unlike atime, has very serious implications on peoples backups). Thanks Anton.
Created attachment 14398 [details] The test program, version 2 I changed the unit test due to Peter's remark in LKML (http://lkml.org/lkml/2008/1/9/267): > http://pygx.sourceforge.net/mmap.c > > This program showed that the msync() function had a bug: > it did not update the st_mtime and st_ctime fields. > > The program shows the appropriate behavior of the msync() > function using the kernel with the proposed patch applied. > Specifically, the ctime and mtime time stamps do change > when modifying the mapped memory and do not change when > there have been no write references between the mmap() > and msync() system calls. > > Sorry, I don't see where the test program shows that the file times did not change if there had not been an intervening modification to the mmap'd region. It appears to me that it just shows the file times changing or not when there has been intervening modification after the mmap call and before the fstat call. Or am I looking in the wrong place? :-)
Now I'm working on my next solution for this bug as I've already written in LKML (http://lkml.org/lkml/2008/1/9/387). My first solution is not acceptable because of the intervening sync() case which Peter Staubach has mentioned in LKML (http://lkml.org/lkml/2008/1/9/267): > The patch adds a call to the file_update_time() function to change > the file metadata before syncing. The patch also contains > substantial code cleanup: consolidated error check > for function parameters, using the PAGE_ALIGN() macro instead of > "manual" alignment check, improved readability of the loop, > which traverses the process memory regions, updated comments. > > These changes catch the simple case, where the file is mmap'd, modified via the mmap'd region, and then an msync is done, all on a mostly quiet system. However, I don't see how they will work if there has been something like a sync(2) done after the mmap'd region is modified and the msync call. When the inode is written out as part of the sync process, I_DIRTY_PAGES will be cleared, thus causing a miss in this code. The I_DIRTY_PAGES check here is good, but I think that there needs to be some code elsewhere too, to catch the case where I_DIRTY_PAGES is being cleared, but the time fields still need to be updated.
Created attachment 14407 [details] The cleanup patch The first part of my second solution was proposed in LKML: http://lkml.org/lkml/2008/1/10/485
Created attachment 14408 [details] The functional changes patch The second part of my second solution was proposed in LKML: http://lkml.org/lkml/2008/1/10/486
Created attachment 14430 [details] The test program, version 3
I proposed my third version of the solution for this bug in LKML: http://lkml.org/lkml/2008/1/12/196
Created attachment 14432 [details] The cleanup patch, version 2 The error checks are separated out as suggested by Rik van Riel.
Created attachment 14433 [details] The functional changes patch, version 2 New since the previous version: 1) no need to explicitly call msync() to update file times; 2) changing block device data is visible to all device files associated with the block device; 3) some small refinements according to the LKML comments.
The unit test program produces the following output. Using the latest git version of the vanilla kernel: >>> Modifying file... Flushing data using sync()... Failure: time not changed. Not modifying file... Flushing data using msync()... Success: time not changed. Not modifying file... Flushing data using fsync()... Success: time not changed. Modifying file... Flushing data using msync()... Failure: time not changed. Modifying file... Flushing data using fsync()... Failure: time not changed. <<< With the patch applied: >>> Modifying file... Flushing data using sync()... Success: time changed. Not modifying file... Flushing data using msync()... Success: time not changed. Not modifying file... Flushing data using fsync()... Success: time not changed. Modifying file... Flushing data using msync()... Success: time changed. Modifying file... Flushing data using fsync()... Success: time changed. <<< Similar results were obtained for a block device file.
I proposed my fourth version of the solution for this bug in LKML: http://lkml.org/lkml/2008/1/15/202
Created attachment 14464 [details] The cleanup patch, version 3
Created attachment 14465 [details] The functional changes patch, version 3
I proposed the fifth version of my solution for this bug in LKML: http://marc.info/?l=linux-kernel&m=120053180919064 http://marc.info/?l=linux-kernel&m=120053180919074 http://marc.info/?l=linux-kernel&m=120053181019078
Created attachment 14486 [details] The cleanup patch, version 4
Created attachment 14487 [details] The functional changes patch, version 4
Created attachment 14493 [details] The performance test program
I proposed the sixth version of my solution for this bug in LKML: http://lkml.org/lkml/2008/1/17/382
Created attachment 14495 [details] The cleanup patch, version 5
Created attachment 14496 [details] The functional changes patch, version 5
Unfortunately, I have no more time for working on this bug.
Almost fixed already, this bug can probably be closed soon, because Linus sent some comments in LKML.
1. Requirements 1.1) the POSIX standard requires updating ctime and mtime not later than at the call to msync() with MS_SYNC or MS_ASYNC flags; 1.2) in existing POSIX implementations, ctime and mtime get updated not later than at the call to fsync(); 1.3) in existing POSIX implementation, ctime and mtime get updated not later than at the call to sync(), the "auto-update" feature; 1.4) the customers require and the common sense suggests that ctime and mtime should be updated not later than at the call to munmap() or exit(), the latter function implying an implicit call to munmap(); 1.5) the (1.1) item should be satisfied if the file is a block device special file; 1.6) the (1.1) item should be satisfied for files residing on memory-backed filesystems such as tmpfs, too. The following operating systems were used as the reference platforms and are referred to as the "existing implementations" above: HP-UX B.11.31 and FreeBSD 6.2-RELEASE. 2. Lazy update All attempts before version 6 implemented the "lazy update" approach to satisfying the requirements given above. Within the latter approach, ctime and mtime get updated at last moment allowable. Since we don't update the file times immediately, some Flag has to be used. When up, this Flag means that the file data was modified and the file times need to be updated as soon as possible. Any existing "dirty" flag which, when up, mean that a page has been written to, is not suitable for this purpose. Indeed, msync() called with MS_ASYNC would have to reset this "dirty" flag after updating ctime and mtime. The sys_msync() function itself is basically a no-op in the MS_ASYNC case. Thereby, the synchronization routines relying upon this "dirty" flag would lose data. Therefore, a new Flag has to be introduced. The (1.5) item coupled with (1.3) requirement leads to hard work with the block device inodes. Specifically, during writeback it is impossible to tell which block device file was originally mapped. Therefore, we need to traverse the list of "active" devices associated with the block device inode. This would lead to updating file times for block device files, which were not taking part in the data transfer. Also all versions prior to version 6 failed to correctly process ctime and mtime for files on the memory-backed filesystems such as tmpfs. So the (1.6) requirement was not satisfied. If a write reference has occurred between two consecutive calls to msync() with MS_ASYNC, the second call to the latter function should take into account the last write reference. The last write reference can not be caught if no pagefault occurs. Hence a pagefault needs to be forced. This can be done using two different approaches. The first one is to synchronize data even when msync() was called with MS_ASYNC. This is not acceptable because the current design of the sys_msync() routine forbids starting I/O for the MS_ASYNC case. The second approach is to write protect the page for triggering a pagefault at the next write reference. Note that the dirty flag for the page should not be cleared thereby. In the "lazy update" approach, the requirements (1.1), (1.2), (1.3), and (1.4) taken together result in adding code at least to the following kernel routines: sys_msync(), do_fsync(), some routine in the unmap() call path, some routine in the sync() call path. Finally, a file_update_time()-like function would have to be created for processing the inode objects, not file objects. This is due to the fact that during the sync() operation, the file object may not exist any more, only the inode is known. To sum up: this "lazy" approach leads to massive changes, incurs overhead in the block device case, and requires complicated design decisions. 3. Immediate update OK, still reading? There's a better way. In a fashion analogous to what happens at write(2), react to the fact that the page gets dirtied by updating the file times immediately. Thereby any page writeback happens when the write reference has already been accounted for from the view point of file times. The only problem which remains is to force refreshing file times at the write reference following a call to msync() with MS_ASYNC. As mentioned above, all that is needed here is to force a pagefault. The vma_wrprotect() routine introduced in this patch series is called from sys_msync() in the MS_ASYNC case. The former routine is essentially a version of existing page_mkclean_one() function from mm/rmap.c. Unlike the latter function, the vma_wrprotect() does not touch the dirty bit.
Created attachment 14519 [details] The cleanup patch, version 6
Created attachment 14520 [details] The functional changes patch, version 6
1. The cost of updating the file times at pagefault This section presents the results of measuring the performance impact of enabling the mtime/ctime update in Linux kernel. The relevant portions of the test program are: >>> #define FILE_SIZE (1024 * 1024 * 256) p = mmap(0, FILE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); /* Bring the pages in */ for (i = 0; i < FILE_SIZE; i += 4096) tmp = p[i]; /* Dirty the pages */ for (i = 0; i < FILE_SIZE; i += 4096) p[i] = i; /* Write-protect the pages */ msync(p, FILE_SIZE, MS_ASYNC); /* Measure the cost of file_update_time() */ gettimeofday(&tv_start, NULL); for (i = 0; i < FILE_SIZE; i += 4096) p[i] = i; gettimeofday(&tv_stop, NULL); <<< An ext4 partition was used to keep the memory-mapped file. Here is the relevant part of dmesg output: >>> EXT4 FS on hdb, internal journal EXT4-fs: mounted filesystem with ordered data mode. EXT4-fs: file extents enabled <<< The tests were performed using the following platforms: 1. KVM x86_64 guest OS, current Git kernel. RAM size: 1G. Host system: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz. 2. Pentium III-based system, current Git kernel. RAM size: 512M. The following tables give the time difference between the two calls to gettimeofday(). The test program was run three times in a raw with a delay of one second between consecutive runs. The following commands were issued prior to running the tests: echo 80 >/proc/sys/vm/dirty_ratio echo 80 >/proc/sys/vm/dirty_background_ratio echo 30000 >/proc/sys/vm/dirty_expire_centisecs sync Tables 1.1 and 1.2 give the time difference in microseconds between the two calls to gettimeofday() in the test program. Table 1.1. The KVM guest system. Memory: 1G. File size: 256M. --------------------------------------------------- | | Before the patch | After the patch | --------------------------------------------------- | First run | 251898 usec | 978355 usec | --------------------------------------------------- | Second run | 260171 usec | 975127 usec | --------------------------------------------------- | Third run | 257235 usec | 983775 usec | --------------------------------------------------- Table 1.2. The Pentium III system. Memory: 512M. File size: 256M. --------------------------------------------------- | | Before the patch | After the patch | --------------------------------------------------- | First run | 10283 usec | 96564 usec | --------------------------------------------------- | Second run | 10898 usec | 96569 usec | --------------------------------------------------- | Third run | 10107 usec | 89970 usec | --------------------------------------------------- 2. The cost of clearing the write flag Following are the results of the measurements of the performance cost of clearing the write flag when the msync() function is called with the MS_ASYNC flag. Here are the relevant portions of the test program: >>> #define FILE_SIZE (1024 * 1024 * 256) p = mmap(0, FILE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); /* Bring the pages in */ for (i = 0; i < FILE_SIZE; i += 4096) tmp = p[i]; /* Dirty the pages */ for (i = 0; i < FILE_SIZE; i += 4096) p[i] = i; /* How long did we spend in msync(MS_ASYNC)? */ gettimeofday(&tv_start, NULL); msync(p, FILE_SIZE, MS_ASYNC); gettimeofday(&tv_stop, NULL); <<< The tests were performed using the following platforms: 1. KVM x86_64 guest OS, current Git kernel. RAM size: 1G. Host system: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz. 2. Pentium III-based system, current Git kernel. RAM size: 512M. Tables 2.1 and 2.2 give the time difference between the two calls to gettimeofday() in the test program. The test program was run three times in a raw with a delay of one second between consecutive runs. The following commands were issued prior to running the tests: echo 80 >/proc/sys/vm/dirty_ratio echo 80 >/proc/sys/vm/dirty_background_ratio echo 30000 >/proc/sys/vm/dirty_expire_centisecs sync Table 2.1. The KVM guest system. Memory: 1G. File size: 256M. --------------------------------------------------- | | Before the patch | After the patch | --------------------------------------------------- | First run | 31 usec | 13578 usec | --------------------------------------------------- | Second run | 35 usec | 13236 usec | --------------------------------------------------- | Third run | 31 usec | 13301 usec | --------------------------------------------------- Table 2.2. The Pentium III system. Memory: 512M. File size: 256M. --------------------------------------------------- | | Before the patch | After the patch | --------------------------------------------------- | First run | 8 usec | 14134 usec | --------------------------------------------------- | Second run | 8 usec | 14460 usec | --------------------------------------------------- | Third run | 8 usec | 14262 usec | ---------------------------------------------------
I proposed the eighth version of my solution for this bug in LKML: http://lkml.org/lkml/2008/1/22/371
Accidentally changed the bug status. Reopening.
I don't want to submit my patch series anymore, because Linus appeared to be very tired of my bad code. Reassigning this bug to default assignee of selected component.
After reporting the fact that this bug had been reassigned to default assignee in LKML, the following patch was committed to the upstream kernel: http://lkml.org/lkml/diff/2008/1/22/370/1
You can close the bug as fixed now.
The patch applied to the upstream kernel fixes this bug. Closing. The remaining problem is to satisfy the paper standard requirement. This requirement is totally ambiguous and requires invasive changes and expensive code. This is the MS_ASYNC case of the msync() function. Googling shows that the msync() function is most probably not called with the MS_ASYNC flag by any popular application. Therefore, it is left unimplemented for now.
Thanks Anton, for the code and analysis. This was a tough one...
Quoting from http://lkml.org/lkml/2009/5/7/634: Christoph Hellwig <hch@infradead.org> writes: >> Ferenc Wagner <wferi@niif.hu> writes: >> >>> I've noticed that the last modification times of our RRD files got >>> stuck after upgrading from 2.6.24 to 2.6.26 (Debian Etch -> Lenny; I >>> also tested with 2.6.30-rc5, they are still stuck). It has some >>> literature, most notably kernel bug #2645, but that's closed long ago >>> and the resulting patch http://lkml.org/lkml/2008/1/22/370 is present >>> in my kernels. Still, the test program (version 3 from the bug report) >>> gives failures: > > The problem is pretty simple. do_wp_page and __do_fault use > file_update_time to update ctime and mtime. But this function is only > a helper for simply filesystems that have a binary inode dirty/non dirty > state and keep the m/ctime purely in the Linux inode. It must not be > called from generic code as more complex filesystems need a notification > through ->setattr to update the timestamps. This will also affect other > filesystems like ubifs. I'm not entirely sure why it ever worked > before, we must have picked up those c/mtime updates by accident > somehow. > [...] > Doing this correctly in the framework of the current code is > unfortunately not so easy, as calling ->setattr requires taking i_mutex > which we can't in the pagefaul path. > > To fix this properly we need to actually update the timestamps during > msync and co as done by the patches from Miklos: > > http://lkml.org/lkml/2007/2/28/166 > > and Peter: > > http://lkml.org/lkml/2006/5/31/176 Did any further progress happen?
Looks like I can't find a way to reopen this bug, as Christoph Hellwig suggested. Somebody with the power, please do. Thanks.
(In reply to comment #52) > Looks like I can't find a way to reopen this bug, > as Christoph Hellwig suggested. Somebody with the power, > please do. Thanks. As far as I remember, any other generic FS-independent methods except calling file_update_time() function when a page has been just dirtied cannot be used in the current code. The reason why msync() function itself haven't changed within the last solution accepted by Linus was that the timestamps should be updated sometimes without additional calls to msync(), anyway. And I agree with Ray Lee (http://lkml.org/lkml/2009/5/12/274) that it must be a filesystem-specific issue, not a generic one since it works for ramfs and ext2 filesystems. I'm almost sure that it's rather worth filing a new bug against those filesystems forgetting about the ctime/mtime timestamps than putting an FS-specific code into the generic part of Linux or repeating of the previous history quite long, I'd say. As for reopening and suggesting completed versions of the previous solutions (including both of two mentioned by Christoph), they had been already proposed. As you can see here in this bug, all were rejected by the community due to objective reasons for each.
Sorry for the late reply. Finally I did some further testing, which showed that under 2.6.30-rc8 the test program reports full failure on XFS and RAMFS (contrary to the above, would you mind rechecking?), full success on EXT3, VFAT and ReiserFS, and mixed results on TMPFS: $ ./mmap /dev/shm/testfile Modifying /dev/shm/testfile... Flushing data using sync()... Failure: time not changed. Not modifying /dev/shm/testfile... Flushing data using msync()... Success: time not changed. Not modifying /dev/shm/testfile... Flushing data using fsync()... Success: time not changed. Modifying /dev/shm/testfile... Flushing data using msync()... Failure: time not changed. Modifying /dev/shm/testfile... Flushing data using fsync()... Failure: time not changed. So this doesn't look like az XFS-specific issue. Do you still recommend opening separate new reports for the affected filesystems? Thanks, Feri.
(In reply to comment #54) > Sorry for the late reply. Finally I did some further testing, which showed > that under 2.6.30-rc8 the test program reports full failure on XFS and RAMFS > (contrary to the above, would you mind rechecking?), full success on EXT3, > VFAT > and ReiserFS, and mixed results on TMPFS: > > $ ./mmap /dev/shm/testfile > Modifying /dev/shm/testfile... > Flushing data using sync()... > Failure: time not changed. > Not modifying /dev/shm/testfile... > Flushing data using msync()... > Success: time not changed. > Not modifying /dev/shm/testfile... > Flushing data using fsync()... > Success: time not changed. > Modifying /dev/shm/testfile... > Flushing data using msync()... > Failure: time not changed. > Modifying /dev/shm/testfile... > Flushing data using fsync()... > Failure: time not changed. > > So this doesn't look like az XFS-specific issue. Do you still recommend > opening > separate new reports for the affected filesystems? > > Thanks, > Feri. Unfortunately, I've not been following the changes into the kernel in the parts related to this bug, so it's possible that the fix is no more good for the recent kernels. However, if we suppose that the fix is still there, it's worth filing a bug against at least XFS. Then, according to the results of that new bug, the further development would be much easier. As far as I vaguely remember some of the file systems such as XFS and probably RAMFS, too, overload the functionality of updating the time stamps by their own code, not using the general facilities.
Denoting subtest success/failure in the mmap test program with +/-, the picture is the following: XFS RAMFS TMPFS EXT2 VFAT 2.6.30-rc8 ----- ----- -++-- +++++ +++++ 2.6.31 -++-- -++-- -++-- +++++ +++++ 2.6.32-rc4 +++++ -++-- -++-- +++++ +++++ Christoph Hellwig fixed XFS some days ago, yay! On the other hand, the improvement between 2.6.30-rc8 and 2.6.31 probably isn't FS specific, as RAMFS didn't change much during that period. Of course, RAMFS (and thus TMPFS) would be nice to get fully fixed as well.