Bug 15576

Summary: Data Loss (flex_bg and ext4_mb_generate_buddy errors)
Product: File System Reporter: xpenev
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED DOCUMENTED    
Severity: normal CC: alan, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-20-generic #58-Ubuntu SMP Subsystem:
Regression: No Bisected commit-id:

Description xpenev 2010-03-19 01:05:11 UTC
# create a 484 cylinder disk [3.7 GB]
dd of=disk.bin bs=512 count=0 seek=$((484*255*63))

# associate with loop device
losetup /dev/loop0 disk.bin

# generate bad blocks file [600 MB]
for((i=360491;i<=497992;i++)); do echo $i; done > omit

# format disk with ext4
mkfs.ext4 -l omit /dev/loop0

# mount disk
mkdir foobar; mount /dev/loop0 foobar

# create a 2 GB file
cd foobar; dd if=/dev/zero bs=1024 count=$((1024 * 1024 * 2))

# check dmesg
[ 9200.006021] EXT4-fs error (device loop0): ext4_mb_generate_buddy: EXT4-fs: group 12: 0 blocks in bitmap, 2 in gd
[ 9200.010311] EXT4-fs error (device loop0): ext4_mb_generate_buddy: EXT4-fs: group 13: 0 blocks in bitmap, 2 in gd
[ 9200.010359] EXT4-fs error (device loop0): ext4_mb_generate_buddy: EXT4-fs: group 14: 0 blocks in bitmap, 2 in gd
[ 9200.010683] EXT4-fs error (device loop0): ext4_mb_generate_buddy: EXT4-fs: group 15: 9911 blocks in bitmap, 9913 in gd

Worse off, however, if rather than creating a 2 GB file, you use this partition as the target root partition for installation using the latest [32-bit] Ubuntu installer ... consistently at 57 percent of the install ext4 reports data loss.

[ 1129.344600] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 12: 0 blocks in bitmap, 2 in gd
[ 1129.344626] Aborting journal on device sda1:8.
[ 1129.380671] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[ 1129.380697] EXT4-fs (sda1): Remounting filesystem read-only
[ 1129.492154] EXT4-fs (sda1): Remounting filesystem read-only
[ 1129.542049] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 13: 0 blocks in bitmap, 2 in gd
[ 1129.554043] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 14: 0 blocks in bitmap, 2 in gd
[ 1129.574283] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs: group 15: 9911 blocks in bitmap, 9913 in gd
[ 1129.574343] mpage_da_map_blocks block allocation failed for inode 41510 at logical offset 0 with max blocks 6 with error -30
[ 1129.574352] This should not happen.!! Data will be lost
[ 1129.574393] ext4_da_writepages: jbd2_start: 1000 pages, ino 41510; err -30
[ 1129.574406] Pid: 11796, comm: pdflush Not tainted 2.6.31-14-generic #48-Ubuntu
[ 1129.574414] Call Trace:
[ 1129.574440]  [<c056e41c>] ? printk+0x18/0x1c
[ 1129.574456]  [<c0266162>] ext4_da_writepages+0x452/0x490
[ 1129.574474]  [<c01ba551>] do_writepages+0x21/0x40
[ 1129.574489]  [<c02033fe>] writeback_single_inode+0x16e/0x3d0
[ 1129.574503]  [<c0150510>] ? process_timeout+0x0/0x10
[ 1129.574515]  [<c0203afd>] generic_sync_sb_inodes+0x38d/0x4a0
[ 1129.574528]  [<c0203ced>] writeback_inodes+0x4d/0xe0
[ 1129.574539]  [<c01b9432>] wb_kupdate+0xa2/0x110
[ 1129.574551]  [<c01bac27>] __pdflush+0xf7/0x1f0
[ 1129.574562]  [<c01bad20>] ? pdflush+0x0/0x40
[ 1129.574573]  [<c01bad20>] ? pdflush+0x0/0x40
[ 1129.574583]  [<c01bad59>] pdflush+0x39/0x40
[ 1129.574594]  [<c01b9390>] ? wb_kupdate+0x0/0x110
[ 1129.574606]  [<c015bf8c>] kthread+0x7c/0x90
[ 1129.574616]  [<c015bf10>] ? kthread+0x0/0x90
[ 1129.574630]  [<c0104007>] kernel_thread_helper+0x7/0x10
Comment 1 Theodore Tso 2010-03-22 02:13:37 UTC
On Fri, Mar 19, 2010 at 01:05:23AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> # create a 484 cylinder disk [3.7 GB]
> dd of=disk.bin bs=512 count=0 seek=$((484*255*63))
> 
> # associate with loop device
> losetup /dev/loop0 disk.bin
> 
> # generate bad blocks file [600 MB]
> for((i=360491;i<=497992;i++)); do echo $i; done > omit
> 
> # format disk with ext4
> mkfs.ext4 -l omit /dev/loop0

This is an e2fsprogs bug.  If you run e2fsck at this point, pass 5
errors will be reported, that exactly correspond with what you report
the kernel ends up complaining about:

Free blocks count wrong for group #12 (2, counted=0).

Free blocks count wrong for group #13 (2, counted=0).

Free blocks count wrong for group #14 (2, counted=0).

Free blocks count wrong for group #15 (9913, counted=9911).

Free blocks count wrong (800730, counted=800722).

> Worse off, however, if rather than creating a 2 GB file, you use
> this partition as the target root partition for installation using
> the latest [32-bit] Ubuntu installer ... consistently at 57 percent
> of the install ext4 reports data loss.

That's because the the file system is getting remounted read-only when
the file system corruption is detected:


> [ 1129.344600] EXT4-fs error (device sda1): ext4_mb_generate_buddy: EXT4-fs:
> group 12: 0 blocks in bitmap, 2 in gd
> [ 1129.380697] EXT4-fs (sda1): Remounting filesystem read-only

The basic idea behind this is when there is a discrepancy between the
pass #5 summary statistics and the block allocation bitmap, the
problem could be in the block allocation bitmap.  (In this case it is
the summary statistics, but there's no way for the code to know that.)
If the block allocation bitmap is bogus, it's very dangerous to
continue writing into the file system, since we may end up allocating
blocks that are already in use by other files, and this would cause
data loss when those data blocks get overwritten.

Once the file system is marked as read-only, data written just before
the file system was remounted read-only can't be pushed out to disk,
which is the reason for the warnign message:

> [ 1129.574343] mpage_da_map_blocks block allocation failed for inode 41510 at
> logical offset 0 with max blocks 6 with error -30
> [ 1129.574352] This should not happen.!! Data will be lost

(Error -30 is "EROFS".)

We should probably improve the error messages here, but there's not
much else we can do.

The real core issue is the fact that mke2fs isn't doing the right
thing when there are bad blocks and flex_bg is specified.  It's
something we don't test for, since in practice it never happens with
modern disk drives. 

						- Ted
Comment 2 xpenev 2010-03-22 05:34:37 UTC
You guys are awesome :) This let's me get back on my way.