Bug 9692

Summary: journal_data mount option causes filesystem corruption with blocksize != 4096
Product: File System Reporter: Harald Judt (h.judt)
Component: ext3Assignee: Andrew Morton (akpm)
Status: CLOSED OBSOLETE    
Severity: high CC: alan, erik.andren, jbacik, jm
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.23.9 Subsystem:
Regression: No Bisected commit-id:

Description Harald Judt 2008-01-05 09:52:12 UTC
Most recent kernel where this bug did not occur: -
Older kernels have this problem too (I think I noticed this booting >= 2.6.21, definitely 2.6.22).

Distribution: Gentoo Linux x86
This bug seems to be hardware-independent (tested on three different machines which all use quite different drivers). If you need hardware information or any other log or configuration files, let me know please.

Problem Description:
When creating an ext3 filesystem with journal_data option and block sizes different than 4096 (tested: 1024, 2048) filesystem corruption will occur if certain operations are performed (see below).
Corruption will not occur if 4096 block size is used, or if any other block size is used together with journal_data_ordered or journal_data_writeback.
No errors in dmesg.

Steps to reproduce:
I found this bug using an audio file tagger, so you need exfalso which is part of quodlibet (http://www.sacredchao.net/quodlibet/). No other file tagger I used produced this kind of problem. Still, this has to be a kernel problem, right??

1. Create ext3 file system:
mkfs.ext3 -O has_journal,dir_index -b 1024 /dev/sdd1
tune2fs -c 0 -i 0 -m 0 -o journal_data /dev/sdd1

tune2fs 1.40.3 (05-Dec-2007)  (filtered)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype sparse_super
Filesystem flags:         signed directory hash
Default mount options:    journal_data
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              126976
Block count:              1012060
Reserved block count:     0
Free blocks:              976865
Free inodes:              126965
First block:              1
Block size:               1024
Fragment size:            1024
Reserved GDT blocks:      256
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         1024
Inode blocks per group:   128
Last mount time:          n/a
Mount count:              0
Maximum mount count:      -1
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:       128
Journal inode:            8
Default directory hash:   tea
Journal backup:           inode blocks

2. Mount it and copy mp3,ogg,... files to it. This does not cause any file system corruption (which you can confirm by running fsck).

pmount /dev/sdd1:
/dev/sdd1 on /media/sdd1 type ext3 (rw,noexec,nosuid,nodev,errors=continue)

3. Use quodlibet/exfalso to change the id3 tags. Add tags to it if not present, or delete them if already present. This will lead to file system corruption.

brw-r----- 1 root disk 8, 49 /dev/sdd1

4. Unmount the volume.
pumount /dev/sdd1

5. Run fsck -fvD /dev/sdd1. It will complain about wrong i_size.

e2fsck 1.40.3 (05-Dec-2007)
Pass 1: Checking inodes, blocks, and sizes
Inode 47106, i_size is 5015509, should be 5017600.  Fix<y>? yes
Inode 47107, i_size is 4657736, should be 4661248.  Fix<y>? yes
Inode 47109, i_size is 11928555, should be 11931648.  Fix<y>? yes
Inode 47111, i_size is 5698454, should be 5701632.  Fix<y>? yes
Inode 47112, i_size is 9384018, should be 9388032.  Fix<y>? yes
Inode 47114, i_size is 5679228, should be 5681152.  Fix<y>? yes
Inode 47115, i_size is 6107218, should be 6111232.  Fix<y>? yes
Inode 47117, i_size is 4354297, should be 4358144.  Fix<y>? yes
Inode 47118, i_size is 4512286, should be 4513792.  Fix<y>? yes
Inode 47120, i_size is 7010846, should be 7012352.  Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdd1: ***** FILE SYSTEM WAS MODIFIED *****

      28 inodes used (0.02%)
      14 non-contiguous inodes (50.0%)
         # of inodes with ind/dind/tind blocks: 15/15/0
  123417 blocks used (12.19%)
       0 bad blocks
       0 large files

      16 regular files
       3 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
       0 symbolic links (0 fast symbolic links)
       0 sockets
--------
      19 files

Reproducible: Always.
No binary modules were loaded, clean boot from vanilla kernel. But of course, also happens with gentoo-sources and tuxonice-sources and nvidia binary loaded ;-).
Comment 1 Anonymous Emailer 2008-01-05 19:15:38 UTC
Reply-To: akpm@linux-foundation.org

On Sat,  5 Jan 2008 09:52:15 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9692
>
>            Summary: journal_data mount option causes filesystem corruption
>                     with blocksize != 4096
>            Product: File System
>            Version: 2.5
>      KernelVersion: 2.6.23.9
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: ext3
>         AssignedTo: akpm@osdl.org
>         ReportedBy: h.judt@gmx.at
> 
> 
> Most recent kernel where this bug did not occur: -
> Older kernels have this problem too (I think I noticed this booting >=
> 2.6.21,
> definitely 2.6.22).

I'm getting the feeling that we should just disable data=journal.  Make it
use data=ordered mode instead.  It isn't getting a lot of attention..

> Distribution: Gentoo Linux x86
> This bug seems to be hardware-independent (tested on three different machines
> which all use quite different drivers). If you need hardware information or
> any
> other log or configuration files, let me know please.
> 
> Problem Description:
> When creating an ext3 filesystem with journal_data option and block sizes
> different than 4096 (tested: 1024, 2048) filesystem corruption will occur if
> certain operations are performed (see below).
> Corruption will not occur if 4096 block size is used, or if any other block
> size is used together with journal_data_ordered or journal_data_writeback.
> No errors in dmesg.
> 
> Steps to reproduce:
> I found this bug using an audio file tagger, so you need exfalso which is
> part
> of quodlibet (http://www.sacredchao.net/quodlibet/). No other file tagger I
> used produced this kind of problem. Still, this has to be a kernel problem,
> right??
> 
> 1. Create ext3 file system:
> mkfs.ext3 -O has_journal,dir_index -b 1024 /dev/sdd1
> tune2fs -c 0 -i 0 -m 0 -o journal_data /dev/sdd1
> 
> tune2fs 1.40.3 (05-Dec-2007)  (filtered)
> Filesystem volume name:   <none>
> Last mounted on:          <not available>
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      has_journal resize_inode dir_index filetype
> sparse_super
> Filesystem flags:         signed directory hash
> Default mount options:    journal_data
> Filesystem state:         clean
> Errors behavior:          Continue
> Filesystem OS type:       Linux
> Inode count:              126976
> Block count:              1012060
> Reserved block count:     0
> Free blocks:              976865
> Free inodes:              126965
> First block:              1
> Block size:               1024
> Fragment size:            1024
> Reserved GDT blocks:      256
> Blocks per group:         8192
> Fragments per group:      8192
> Inodes per group:         1024
> Inode blocks per group:   128
> Last mount time:          n/a
> Mount count:              0
> Maximum mount count:      -1
> Check interval:           0 (<none>)
> Reserved blocks uid:      0 (user root)
> Reserved blocks gid:      0 (group root)
> First inode:              11
> Inode size:       128
> Journal inode:            8
> Default directory hash:   tea
> Journal backup:           inode blocks
> 
> 2. Mount it and copy mp3,ogg,... files to it. This does not cause any file
> system corruption (which you can confirm by running fsck).
> 
> pmount /dev/sdd1:
> /dev/sdd1 on /media/sdd1 type ext3 (rw,noexec,nosuid,nodev,errors=continue)
> 
> 3. Use quodlibet/exfalso to change the id3 tags. Add tags to it if not
> present,
> or delete them if already present. This will lead to file system corruption.
> 
> brw-r----- 1 root disk 8, 49 /dev/sdd1
> 
> 4. Unmount the volume.
> pumount /dev/sdd1
> 
> 5. Run fsck -fvD /dev/sdd1. It will complain about wrong i_size.
> 
> e2fsck 1.40.3 (05-Dec-2007)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 47106, i_size is 5015509, should be 5017600.  Fix<y>? yes
> Inode 47107, i_size is 4657736, should be 4661248.  Fix<y>? yes
> Inode 47109, i_size is 11928555, should be 11931648.  Fix<y>? yes
> Inode 47111, i_size is 5698454, should be 5701632.  Fix<y>? yes
> Inode 47112, i_size is 9384018, should be 9388032.  Fix<y>? yes
> Inode 47114, i_size is 5679228, should be 5681152.  Fix<y>? yes
> Inode 47115, i_size is 6107218, should be 6111232.  Fix<y>? yes
> Inode 47117, i_size is 4354297, should be 4358144.  Fix<y>? yes
> Inode 47118, i_size is 4512286, should be 4513792.  Fix<y>? yes
> Inode 47120, i_size is 7010846, should be 7012352.  Fix<y>? yes
> 
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 3A: Optimizing directories
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> 
> /dev/sdd1: ***** FILE SYSTEM WAS MODIFIED *****
> 
>       28 inodes used (0.02%)
>       14 non-contiguous inodes (50.0%)
>          # of inodes with ind/dind/tind blocks: 15/15/0
>   123417 blocks used (12.19%)
>        0 bad blocks
>        0 large files
> 
>       16 regular files
>        3 directories
>        0 character device files
>        0 block device files
>        0 fifos
>        0 links
>        0 symbolic links (0 fast symbolic links)
>        0 sockets
> --------
>       19 files
> 
> Reproducible: Always.
> No binary modules were loaded, clean boot from vanilla kernel. But of course,
> also happens with gentoo-sources and tuxonice-sources and nvidia binary
> loaded
> ;-).
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
Comment 2 Jayson R. King 2008-01-06 19:51:49 UTC
If I understood the bug right it isn't a bug in the filesystem code but an off-by-one problem in e2fsck. 

Please see proposed resolution with patch for e2fsprogs on linux-ext4 mailing list: http://marc.info/?l=linux-ext4&m=119967534809038&w=4
Comment 3 Dark Shadow 2008-01-07 05:48:46 UTC
Jayson R. King, you're right. While I had no time to try the patch yet, I can confirm the issue with e2fsck. I verified this by md5summing the files before and after the e2fsck test. files on vfat and ext3 are identical before fsck, but files show different md5 sums on ext3 after alleged file system errors have been corrected.

Thank you, I will report about the patch later.
Comment 4 Harald Judt 2008-01-07 07:03:19 UTC
I can confirm the md5 test. I've also applied the patch from the mailing list on 1.40.4, but nevertheless, I did get an error message about i_size (though it was only one and not all files were corrupt after correction). Now I got an additional error during pass 5: Block bitmap differences +(982934--9282935).

>I'm getting the feeling that we should just disable data=journal.  Make it
>use data=ordered mode instead.  It isn't getting a lot of attention..

Hmmm... While it might be only an issue with e2fsck, it doesn't occur with journal_data_ordered or journal_data_writeback. Does not have to mean anything special but makes it worth for consideration. Is journal_data unsafe or unrecommended to use?
Comment 5 Harald Judt 2008-01-12 08:43:18 UTC
Now I've tried the patch from http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg04370.html
too but the results were the same.

The block bitmap difference error only shows up when leaving the i_size error uncorrected, so sorry - just ignore it.

There is only one i_size error now, no matter how many files I change (tested 40 files). However, I have to copy a certain amount of data to produce the error (at least 50MiB of data).

fsck 1.40.4 (31-Dec-2007)
e2fsck 1.40.4 (31-Dec-2007)
Pass 1: Checking inodes, blocks, and sizes
Inode 99334, i_size is 11928555, should be 11930624.  Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdd1: ***** FILE SYSTEM WAS MODIFIED *****

      67 inodes used (0.05%)
      25 non-contiguous inodes (37.3%)
         # of inodes with ind/dind/tind blocks: 47/47/0
  375920 blocks used (37.14%)
       0 bad blocks
       0 large files

      51 regular files
       7 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
       0 symbolic links (0 fast symbolic links)
       0 sockets
--------
      58 files

Unfortunately, above patches for e2fsck do not solve the problem completely.
Comment 6 Anonymous Emailer 2008-02-26 13:49:49 UTC
Reply-To: akpm@linux-foundation.org

On Sat, 5 Jan 2008 19:15:52 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sat,  5 Jan 2008 09:52:15 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9692
> >
> >            Summary: journal_data mount option causes filesystem corruption
> >                     with blocksize != 4096
> >            Product: File System
> >            Version: 2.5
> >      KernelVersion: 2.6.23.9
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: ext3
> >         AssignedTo: akpm@osdl.org
> >         ReportedBy: h.judt@gmx.at
> > 
> > 
> > Most recent kernel where this bug did not occur: -
> > Older kernels have this problem too (I think I noticed this booting >=
> 2.6.21,
> > definitely 2.6.22).
> 
> I'm getting the feeling that we should just disable data=journal.  Make it
> use data=ordered mode instead.  It isn't getting a lot of attention..
> 

As discussed today - this is the bug which makes me wonder how
useful/popular journalled-data mode is.
Comment 7 Josef Bacik 2008-02-27 12:52:37 UTC
trying to reproduce this on the latest git pull from linus's tree and the latest pu/ branch of the e2fsprogs I'm not seeing this problem.  I mkfs my fs, copied over the kernel src tree and some of my mp3 collection with data=journal, umounted and ran an e2fsck and I got no errors.  Can you try the latest stuff and see if you still see the same problem?

Josef
Comment 8 Harald Judt 2008-08-08 03:07:25 UTC
I gave this another try, after booting from latest 2.6.24 + gentoo-patches + tuxonice (I had trouble compiling > 2.6.24 with make error in vdso32-sym.lds, but don't have enough time for solving that now).

However, using e2fsck 1.40.11 (17-June-2008), the problem persists. Maybe some patches still didn't make it in there?

I will do another test using a newer kernel version when I find time to solve my compiling issue.
Comment 9 Erik Andr 2009-12-30 14:38:19 UTC
Is this still an issue with a recent kernel?