Bug 14472

Summary: EXT4 corruption
Product: File System Reporter: Rafael J. Wysocki (rjw)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: CLOSED CODE_FIX    
Severity: normal CC: parag.lkml, sandeen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14230    

Description Rafael J. Wysocki 2009-10-26 16:52:41 UTC
Subject    : [2.6.32-rc4] + EXT4 corruption
Submitter  : Shawn Starr <shawn.starr@rogers.com>
Date       : 2009-10-13 2:07
References : http://marc.info/?l=linux-kernel&m=125539997508256&w=4
Handled-By : Theodore Tso <tytso@mit.edu>
Notify-Also : Andy Lutomirski <luto@mit.edu>

This entry is being used for tracking a regression from 2.6.31.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2009-10-29 21:31:10 UTC
On Thursday 29 October 2009, Andrew Lutomirski wrote:
> On Mon, Oct 26, 2009 at 2:55 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.31.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=14472
> > Subject         : EXT4 corruption
> > Submitter       : Shawn Starr <shawn.starr@rogers.com>
> > Date            : 2009-10-13 2:07 (14 days old)
> > References      : http://marc.info/?l=linux-kernel&m=125539997508256&w=4
> > Handled-By      : Theodore Tso <tytso@mit.edu>
> >
> 
> 
> This but is *not* fixed.  I just triggered it a few minutes ago by
> abusing i915 and drm, which caused a panic.  This is slightly newer
> than 2.6.32-rc5, with a couple of i915 bugfixes thrown in.
> 
> Photos are here:
> http://web.mit.edu/luto/www/ext4_crashphotos/
> 
> This is a very nasty regression, for obvious reasons.
Comment 2 paragw 2009-10-29 21:38:47 UTC
I looked at the fsck pics - I have gone through this a few days ago. 

Aneesh suggested to apply the below patch and after applying it and crashing the machine couple times I have not observed the corruption. So I have reason to hope this patch below on top of today's git should improve things. Please try.

commit a8836b1d6f92273e001012c7705ae8f4c3d5fb65
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Tue Oct 27 15:36:38 2009 +0530

   ext4: discard preallocation during truncate

   We need to make sure when we drop and reacquire the inode's
   i_data_sem we discard the inode preallocation. Otherwise we
   could have blocks marked as free in bitmap but still belonging
   to prealloc space.

   Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5c5bc5d..a1ef1c3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -209,6 +209,12 @@ static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
       up_write(&EXT4_I(inode)->i_data_sem);
       ret = ext4_journal_restart(handle, blocks_for_truncate(inode));
       down_write(&EXT4_I(inode)->i_data_sem);
+       /*
+        * We have dropped i_data_sem. So somebody else could have done
+        * block allocation. So discard the prealloc space created as a
+        * part of block allocation
+        */
+       ext4_discard_preallocations(inode);

       return ret;
 }
Comment 3 Eric Sandeen 2009-10-29 21:44:45 UTC
Lest champagne break out too early, I have still seen corruption with this patch in place, while running my testcase (mentioned in bug #14354)

-Eric
Comment 4 Rafael J. Wysocki 2009-11-17 22:30:40 UTC
On Tuesday 17 November 2009, Andy Lutomirski wrote:
> I'm think this was the journal checksumming bug, which is fixed.
> 
> 
> On Nov 16, 2009, at 5:37 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.31.  Please verify if it still should be listed and let me  
> > know
> > (either way).
> >
> >
> > Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=14472
> > Subject        : EXT4 corruption
> > Submitter    : Shawn Starr <shawn.starr@rogers.com>
> > Date        : 2009-10-13 2:07 (35 days old)
> > References    : http://marc.info/?l=linux-kernel&m=125539997508256&w=4
> > Handled-By    : Theodore Tso <tytso@mit.edu>