Bug 13292 - ext4 without journal reproductible file corruption
Summary: ext4 without journal reproductible file corruption
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-13 09:22 UTC by Thibault Mondary
Modified: 2009-07-21 09:02 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.29.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The harddrive image to mount and chroot (69 bytes, text/plain)
2009-05-13 09:22 UTC, Thibault Mondary
Details

Description Thibault Mondary 2009-05-13 09:22:42 UTC
Created attachment 21323 [details]
The harddrive image to mount and chroot

Hi,

I found file(s) corruption using ext4 without journal, on two different hardware (a dell vostro 1700, 64 bits archlinux + 64 bits jaunty, perso kernel 2.6.29.3 and an acer aspire one a110, ssd, 32 bits archlinux, same kernel version).


The problem can be reproduced using the hdd image in attachment. This file is an ext4 without journal image of a minimal archlinux 64 system.

Step to reproduce :

A-

1/ extract the archive on a 64 bits system running 2.6.29.3
2/ mount the .img as ext4, loopback
3/ chroot in the mounted directory
4/ execute "locale-gen"
5/ execute "locale". The result is ok
6/ have a look at /usr/lib/locale/locale-archive file (with vi or other editor)
7/ exit chroot
8/ umount hd image

B-

1/ mount again the hd image
2/ chroot again
3/ execute "locale". There are 3 errors at the beginning, the output is not the same as previously
4/ look at /usr/lib/locale/locale-archive, the content is not the same !


Reproduce these steps with a journal (tune2fs -O has_journal, fsck -f) and you'll see the file locale-archive and the output of locale command are consistants between mount/umount.

The file locale-archive is not the only affected. On other system, that was nvidia kernel module which was corrupted between installation and reboot.


Regards,

Thibault
Comment 1 Frank Mayhar 2009-05-13 23:10:14 UTC
I would be interested to know what your actual output looks like.  I've tried to reproduce this and don't seem to be able to, at least not in my environment.

My attempts looked like:
[/root]# mount -t ext4 -o loop /foo/archlinux_64_minimal_ext4_without_journal.img /mnt
[/root]# chroot /mnt
bash-3.2# locale-gen
Generating locales...
  fr_FR.UTF-8... done
  fr_FR.ISO-8859-1... done
  fr_FR.ISO-8859-15@euro... done
Generation complete.
bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=
bash-3.2# LANG=fr_FR
bash-3.2# locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=
bash-3.2# exit
[/root]# umount /mnt
[/root]# mount -t ext4 -o loop /foo/archlinux_64_minimal_ext4_without_journal.img /mnt
[/root]# chroot /mnt
bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=
bash-3.2# LANG=fr_FR
bash-3.2# locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=
bash-3.2#
Comment 2 Thibault Mondary 2009-05-14 14:14:01 UTC
(In reply to comment #1)
I'am starting from a locale "fr_FR@euro" on my system.


Output WITHOUT journal :

root@ubuntu ~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o loop
root@ubuntu ~# chroot /mnt/loop

bash-3.2# locale-gen
Generating locales...
  fr_FR.UTF-8... done
  fr_FR.ISO-8859-1... done
  fr_FR.ISO-8859-15@euro... done
Generation complete.

bash-3.2# locale
LANG=fr_FR@euro
LC_CTYPE="fr_FR@euro"
LC_NUMERIC="fr_FR@euro"
LC_TIME="fr_FR@euro"
LC_COLLATE="fr_FR@euro"
LC_MONETARY="fr_FR@euro"
LC_MESSAGES="fr_FR@euro"
LC_PAPER="fr_FR@euro"
LC_NAME="fr_FR@euro"
LC_ADDRESS="fr_FR@euro"
LC_TELEPHONE="fr_FR@euro"
LC_MEASUREMENT="fr_FR@euro"
LC_IDENTIFICATION="fr_FR@euro"
LC_ALL=

bash-3.2# exit
root@ubuntu:~# umount /mnt/loop

root@ubuntu:~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o loop
root@ubuntu ~# chroot /mnt/loop

********BEGINNING OF THE PROBLEM : locales are normally generated from previous mount**********

bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=fr_FR@euro
LC_CTYPE="fr_FR@euro"
LC_NUMERIC="fr_FR@euro"
LC_TIME="fr_FR@euro"
LC_COLLATE="fr_FR@euro"
LC_MONETARY="fr_FR@euro"
LC_MESSAGES="fr_FR@euro"
LC_PAPER="fr_FR@euro"
LC_NAME="fr_FR@euro"
LC_ADDRESS="fr_FR@euro"
LC_TELEPHONE="fr_FR@euro"
LC_MEASUREMENT="fr_FR@euro"
LC_IDENTIFICATION="fr_FR@euro"
LC_ALL=

bash-3.2# LANG=fr_FR
bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=

**********END************


*************VERSION 2 : adding a journal to the image ************

root@ubuntu:~# tune2fs -O has_journal archlinux_64_minimal_ext4_without_journal.img
tune2fs 1.41.4 (27-Jan-2009)
Création de l'i-noeud du journal : complété
Le système de fichiers sera automatiquement vérifié tous les 39 montages ou
après 180 jours, selon la première éventualité. Utiliser tune2fs -c ou -i
pour écraser la valeur.

root@ubuntu:~# e2fsck -f archlinux_64_minimal_ext4_without_journal.img
e2fsck 1.41.4 (27-Jan-2009)
Passe 1 : vérification des i-noeuds, des blocs et des tailles
Passe 2 : vérification de la structure des répertoires
Passe 3 : vérification de la connectivité des répertoires
Passe 4 : vérification des compteurs de référence
Passe 5 : vérification de l'information du sommaire de groupe
archlinux_64_minimal_ext4_without_journal.img : 21848/65536 fichiers (0.1% nontigus), 86608/262144 blocs

root@ubuntu:~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o loop
root@ubuntu:~# chroot /mnt/loop

bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=fr_FR@euro
LC_CTYPE="fr_FR@euro"
LC_NUMERIC="fr_FR@euro"
LC_TIME="fr_FR@euro"
LC_COLLATE="fr_FR@euro"
LC_MONETARY="fr_FR@euro"
LC_MESSAGES="fr_FR@euro"
LC_PAPER="fr_FR@euro"
LC_NAME="fr_FR@euro"
LC_ADDRESS="fr_FR@euro"
LC_TELEPHONE="fr_FR@euro"
LC_MEASUREMENT="fr_FR@euro"
LC_IDENTIFICATION="fr_FR@euro"
LC_ALL=

bash-3.2# locale-gen
Generating locales...
  fr_FR.UTF-8... done
  fr_FR.ISO-8859-1... done
  fr_FR.ISO-8859-15@euro... done
Generation complete.

bash-3.2# locale
LANG=fr_FR@euro
LC_CTYPE="fr_FR@euro"
LC_NUMERIC="fr_FR@euro"
LC_TIME="fr_FR@euro"
LC_COLLATE="fr_FR@euro"
LC_MONETARY="fr_FR@euro"
LC_MESSAGES="fr_FR@euro"
LC_PAPER="fr_FR@euro"
LC_NAME="fr_FR@euro"
LC_ADDRESS="fr_FR@euro"
LC_TELEPHONE="fr_FR@euro"
LC_MEASUREMENT="fr_FR@euro"
LC_IDENTIFICATION="fr_FR@euro"
LC_ALL=

bash-3.2# exit
root@ubuntu:~# umount /mnt/loop 

root@ubuntu:~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o loop
root@ubuntu:~# chroot /mnt/loop
*****************HERE, NO PROBLEM, locales are taken from previous session*******

bash-3.2# locale
LANG=fr_FR@euro
LC_CTYPE="fr_FR@euro"
LC_NUMERIC="fr_FR@euro"
LC_TIME="fr_FR@euro"
LC_COLLATE="fr_FR@euro"
LC_MONETARY="fr_FR@euro"
LC_MESSAGES="fr_FR@euro"
LC_PAPER="fr_FR@euro"
LC_NAME="fr_FR@euro"
LC_ADDRESS="fr_FR@euro"
LC_TELEPHONE="fr_FR@euro"
LC_MEASUREMENT="fr_FR@euro"
LC_IDENTIFICATION="fr_FR@euro"
LC_ALL=


exit, umount...






> I would be interested to know what your actual output looks like.  I've tried
> to reproduce this and don't seem to be able to, at least not in my
> environment.
> 
> My attempts looked like:
> [/root]# mount -t ext4 -o loop
> /foo/archlinux_64_minimal_ext4_without_journal.img /mnt
> [/root]# chroot /mnt
> bash-3.2# locale-gen
> Generating locales...
>   fr_FR.UTF-8... done
>   fr_FR.ISO-8859-1... done
>   fr_FR.ISO-8859-15@euro... done
> Generation complete.
> bash-3.2# locale
> locale: Cannot set LC_CTYPE to default locale: No such file or directory
> locale: Cannot set LC_MESSAGES to default locale: No such file or directory
> locale: Cannot set LC_ALL to default locale: No such file or directory
> LANG=en_US
> LC_CTYPE="en_US"
> LC_NUMERIC="en_US"
> LC_TIME="en_US"
> LC_COLLATE="en_US"
> LC_MONETARY="en_US"
> LC_MESSAGES="en_US"
> LC_PAPER="en_US"
> LC_NAME="en_US"
> LC_ADDRESS="en_US"
> LC_TELEPHONE="en_US"
> LC_MEASUREMENT="en_US"
> LC_IDENTIFICATION="en_US"
> LC_ALL=
> bash-3.2# LANG=fr_FR
> bash-3.2# locale
> LANG=fr_FR
> LC_CTYPE="fr_FR"
> LC_NUMERIC="fr_FR"
> LC_TIME="fr_FR"
> LC_COLLATE="fr_FR"
> LC_MONETARY="fr_FR"
> LC_MESSAGES="fr_FR"
> LC_PAPER="fr_FR"
> LC_NAME="fr_FR"
> LC_ADDRESS="fr_FR"
> LC_TELEPHONE="fr_FR"
> LC_MEASUREMENT="fr_FR"
> LC_IDENTIFICATION="fr_FR"
> LC_ALL=
> bash-3.2# exit
> [/root]# umount /mnt
> [/root]# mount -t ext4 -o loop
> /foo/archlinux_64_minimal_ext4_without_journal.img /mnt
> [/root]# chroot /mnt
> bash-3.2# locale
> locale: Cannot set LC_CTYPE to default locale: No such file or directory
> locale: Cannot set LC_MESSAGES to default locale: No such file or directory
> locale: Cannot set LC_ALL to default locale: No such file or directory
> LANG=en_US
> LC_CTYPE="en_US"
> LC_NUMERIC="en_US"
> LC_TIME="en_US"
> LC_COLLATE="en_US"
> LC_MONETARY="en_US"
> LC_MESSAGES="en_US"
> LC_PAPER="en_US"
> LC_NAME="en_US"
> LC_ADDRESS="en_US"
> LC_TELEPHONE="en_US"
> LC_MEASUREMENT="en_US"
> LC_IDENTIFICATION="en_US"
> LC_ALL=
> bash-3.2# LANG=fr_FR
> bash-3.2# locale
> LANG=fr_FR
> LC_CTYPE="fr_FR"
> LC_NUMERIC="fr_FR"
> LC_TIME="fr_FR"
> LC_COLLATE="fr_FR"
> LC_MONETARY="fr_FR"
> LC_MESSAGES="fr_FR"
> LC_PAPER="fr_FR"
> LC_NAME="fr_FR"
> LC_ADDRESS="fr_FR"
> LC_TELEPHONE="fr_FR"
> LC_MEASUREMENT="fr_FR"
> LC_IDENTIFICATION="fr_FR"
> LC_ALL=
> bash-3.2#
Comment 3 Frank Mayhar 2009-05-14 20:07:25 UTC
Thank you for the very detailed information you provided.  Unfortunately I can't reproduce this problem in my environment.  That environment is, however, somewhat different from yours, in that it's a 2.6.26 kernel plus ext4 patches up to March 16 (minus a few patches that depended on changes in other parts of the kernel).

My suggestion is that you drop back to the March 16 (or so) kernel and see if you can reproduce the problem.  If you can't reproduce it then you can do a binary search to see where the problem started, i.e. try an April 15 kernel, etc., until you home in on the offending commit.

If you _can_ reproduce the problem with the March 16 kernel then something more complex is going on and I'll have to try again to reproduce it here.
Comment 4 Theodore Tso 2009-05-20 01:02:32 UTC
I've been able to replicate the problem using a 2.6.30-rc6 kernel with the ext4 patch queue applied.

It seems to be utterly repeatable, and it seems to have to do with how the locale-gen program writes out /usr/lib/locale/locale-archive.  After you run local-gen, an md5sum of that file gives you:

e98e9a55061c63f7ae089f7ac016eac6  /mnt/usr/lib/locale/locale-archive

but after you unmount and remount the filesystem, an md5 of that file gives you:

5ab6d62d18431d057a514eb7dbd78428  /mnt/usr/lib/locale/locale-archive

If I manually copy the file into place, it seems to be OK.   So it must be in how the file gets copied into place.   

Unfortunately the image doesn't have strace, but I've tried stracing locale-gen on an (32-bit x86) Ubuntu system, and it appears that locale-gen seems to modify the file by using a combination of mmap as well as direct writes (?!?):

28124 open("/usr/lib/locale/locale-archive", O_RDWR|O_LARGEFILE) = 3
28124 fstat64(3, {st_mode=S_IFREG|0644, st_size=1330544, ...}) = 0
28124 fcntl64(3, F_SETLKW64, {type=F_WRLCK, whence=SEEK_CUR, start=0, len=56}, 0xfffb3f20) = 0
28124 stat64("/usr/lib/locale/locale-archive", {st_mode=S_IFREG|0644, st_size=1330544, ...}) = 0
28124 read(3, "\t\1\2\336\0\0\0\0008\0\0\0\2\0\0\0\213\3\0\0\274*\0\0\26\0\0\0L\35\0\0\10"..., 56) = 56
28124 mmap2(NULL, 103860, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xf6d58000
28124 _llseek(3, 0, [1330544], SEEK_END) = 0
28124 write(3, "\27\20\5 \23\0\0\0T\0\0\0X\0\0\0d\0\0\0d\4\0\0\0\202\2\0p\235\2\0|"..., 962094) = 962094
28124 _llseek(3, 0, [2292638], SEEK_END) = 0
28124 write(3, "\0\0"..., 2)            = 2
28124 write(3, "\24\21\3 \6\0\0\0 \0\0\0\"\0\0\0$\0\0\0(\0\0\0,\0\0\0000\0\0\0."..., 3584) = 3584
28124 munmap(0xf6d58000, 103860)        = 0
28124 close(3)                          = 0

All I can posit is that somehow some dirty bits aren't getting set so that some data blocks aren't getting written back to disk, so that when the filesystem is umounted and remounted.  Using debugfs to look at the file, it looks indeed like the blocks on disk are never getting written out.   Using debugfs "dump /usr/lib/locale/locale-archive /tmp/foo", I'm seeing the contents of what we see after the filesystem is unmounted and remounted.   Not at all clear why not using a journal makes a difference, though.

I've tried running fsx on a filesystem without a journal, and it's not showing the problem.
Comment 5 Frank Mayhar 2009-05-20 17:36:26 UTC
Just FYI, as I said I was unable to reproduce this in our current environment.  We'll shortly be pulling in another set of patches from the ext4 stable tree (bringing us from March to whatever is current at the time), at which point I'll try again.  If I'm successful I'll do a bisection to figure out which patch or patches introduced the problem.

Assuming Ted doesn't beat me to it, anyway. :-)
Comment 6 Frank Mayhar 2009-05-27 19:23:42 UTC
Well, I pulled in a bunch of patches from the ext4 stable tree (although not all of them, unfortunately, since our base is well behind top-of-tree) and was still unable to reproduce this.  I'll continue to keep an eye on it, however.
Comment 7 Frank Mayhar 2009-05-28 21:16:09 UTC
Yesterday I pulled down Ted's ext4-stable tree and built it.  Today I used it to actually reproduce this problem.  Now to try to track it down...

BTW, my suspicion is that the problem is either somewhere in the rest of the kernel or maybe some post-2.6.26 change elsewhere is tickling a problem in ext4 itself.  I suspect this because we're very nearly up-to-date with ext4 itself (modulo some patches that don't seem directly relevant to this issue) but the rest of our kernel is still pretty much straight 2.6.26.  If it were strictly an ext4 issue I would think we would be able to reproduce it with our kernel, but we can't.

I'm going to work on tracking it down; I'm posting here just to make sure I'm not duplicating someone else's effort.  Ted?
Comment 8 Thibault Mondary 2009-07-21 09:02:44 UTC
Hi,

I tested again using 2.6.31-rc3, and the bug seems to be resolved, no more corruption using locale-gen.


Thibault

Note You need to log in before you can comment on or make changes to this bug.