Bug 15572

Summary: Bug on JFFS2: some nodes are written back with old size
Product: File System Reporter: ppkwuxuan (wux)
Component: OtherAssignee: David Woodhouse (dwmw2)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Linux-2.6.24.7 Subsystem:
Regression: No Bisected commit-id:

Description ppkwuxuan 2010-03-18 08:52:46 UTC
We are running Linux-2.6.24.7 on a board of ARM926EJ-S. The file system is JFFS2. A test program is used for testing SQLlite functions. The program runs after system start-up automatically and  then add or remove several records in some database files. After adding and remove a large amount of records, this program restart the system. So this program can run all along.
    Sometimes, after system startup, this program found that one database file is corrupted. The size of this file is shorter than the size before previous restart. So we use the program "jffs2dump" to check the partition, and found that the last several node of this file has an invalid value of "isize" field(struct jffs2_raw_inode), which is as follows:
node at 0x007fc748, totlen 00x0000045b, #ino     88, version 54212, isize  1540096, csize     1047, dsize     4096, offset   1536000
node at 00x007fcba4, totlen 0x00000558, #ino     88, version 54213, isize  1539072, csize     1300, dsize     4096, offset   241664
    So after system start-up, JFFS2 thinks that the length of this file is only 1539072 bytes, so the last 4096 bytes are not included in this file. This is the reason of corruption of this file.
    We check JFFS2 source code, and found that this probelm may occur under the following condition:
    1. The function "jffs2_write_end"(fs/jffs2/file.c) is invoked in a process.This function invokes the function "jffs2_write_inode_range"(fs/jffs2/write.c) to write data in this file back. We guess that during this function, the actual file length will increase. But after this function returns, the field i_size of inode(struct inode) is still the original value.
    2. GC(garbage collection) thread interrupts the execution of that process and invoke the function "jffs2_garbage_collect_dnode"(fs/jffs2/gc.c)  to write back. In this function, some nodes of this file may be written back. But the field "isize" of these nodes are the original value before step 1.
    3. The garbage collection thread complete the work and control is returned back to the process.
    4. The function "jffs2_write_end" completes the rest work, such as updates i_size field of inode.
    So if the system restart at that time before or just after step 3, those nodes written by GC thread in step 2 has a shorter size and higher version than those written by the function "jffs2_write_inode_range" in step 1. So after system start up again, JFFS2 will think those nodes written by the process are obsolete. And this probelm will occur
Comment 1 Andrew Morton 2010-03-22 21:40:21 UTC
Hi, Dave ;)
Comment 2 David Woodhouse 2010-03-23 11:04:34 UTC
Hm, f->sem is supposed to protect against garbage collection from happening while we're processing a write -- but we're updating the isize after we drop f->sem, which opens up the race condition.

Thank you very much for your excellent diagnosis -- it looks entirely sane to me. If you're right, this patch should 'fix' it, and then we'll look for a cleaner way to achieve the same thing (and worry about the rules for updating inode->i_size)...

--- a/fs/jffs2/write.c
+++ b/fs/jffs2/write.c
@@ -392,6 +392,8 @@ int jffs2_write_inode_range(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
 			}
 			break;
 		}
+		JFFS2_F_I_SIZE(f) = je32_to_cpu(ri->isize);
+
 		ret = jffs2_add_full_dnode_to_inode(c, f, fn);
 		if (f->metadata) {
 			jffs2_mark_node_obsolete(c, f->metadata->raw);