Bug 194567

Summary: ext4 no longer mounts
Product: File System Reporter: bugzilla.kernel.org
Component: ext4Assignee: fs_ext4 (fs_ext4)
Severity: normal CC: achim.ledermueller, sandeen, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4.48 Subsystem:
Regression: No Bisected commit-id:
Attachments: ext4: fix fencepost in s_first_meta_bg validation

Description bugzilla.kernel.org 2017-02-12 20:55:10 UTC
I have a single ext4 filesystem on my workstatioon used to host mysql database files. I don't know how old it is (months, years), but I happen to keep the mkfs command used to create it (the device is 67108864 sectors):

mke2fs -t ext4 -b 4096 -N 65536 -I 128 -L DB -m 0 -O dir_index,extent,filetype,flex_bg,^has_journal,large_file,^resize_inode,sparse_super,uninit_bg

It worked fine till 4.4.47. it no longer mounts in 4.4.48:

[  206.596713] EXT4-fs (dm-24): VFS: Can't find ext4 filesystem
[  214.028205] EXT4-fs (dm-24): first meta block group too large: 2 (group descriptor block count 2)
[  242.185792] EXT4-fs (dm-24): first meta block group too large: 2 (group descriptor block count 2)

e2fsck -f (from "e2fsck 1.43.4 (31-Jan-2017)") finds no errors. Not sure if this is a bug in mkfs, e2fsck, or the kernel, but since it is a regression in the kernel, I reported it here.
Comment 1 Eric Sandeen 2017-02-13 04:00:11 UTC
Can you add the dumpe2fs -h output for the device?

I guess this would be thanks to:


ext4: validate s_first_meta_bg at mount time
commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe upstream.

Ralf Spenneberg reported that he hit a kernel crash when mounting a
modified ext4 image. And it turns out that kernel crashed when
calculating fs overhead (ext4_calculate_overhead()), this is because
the image has very large s_first_meta_bg (debug code shows it's
842150400), and ext4 overruns the memory in count_overhead() when
setting bitmap buffer, which is PAGE_SIZE.

  buf = get_zeroed_page(GFP_NOFS);  <=== PAGE_SIZE buffer
  blks = count_overhead(sb, i, buf);

  for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400
          ext4_set_bit(EXT4_B2C(sbi, s++), buf);   <=== buffer overrun

This can be reproduced easily for me by this script:

  rm -f fs.img
  mkdir -p /mnt/ext4
  fallocate -l 16M fs.img
  mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img
  debugfs -w -R "ssv first_meta_bg 842150400" fs.img
  mount -o loop fs.img /mnt/ext4

Fix it by validating s_first_meta_bg first at mount time, and
refusing to mount if its value exceeds the largest possible meta_bg

Reported-by: Ralf Spenneberg <ralf@os-t.de>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Comment 2 bugzilla.kernel.org 2017-02-13 14:59:38 UTC
dumpe2fs 1.43.4 (31-Jan-2017)
Filesystem volume name:   DB
Last mounted on:          /db
Filesystem UUID:          68f492ad-4c47-49ea-9ad9-3948f14b2ce3
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr dir_index filetype meta_bg extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         not clean
Errors behavior:          Remount read-only
Filesystem OS type:       Linux
Inode count:              90112
Block count:              8388608
Reserved block count:     0
Free blocks:              1628674
Free inodes:              90022
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         352
Inode blocks per group:   11
First meta block group:   2
Flex block group size:    16
Filesystem created:       Wed Dec  4 13:14:34 2013
Last mount time:          Mon Feb 13 12:13:01 2017
Last write time:          Mon Feb 13 12:13:01 2017
Mount count:              3
Maximum mount count:      -1
Last checked:             Sun Feb 12 21:36:42 2017
Check interval:           0 (<none>)
Lifetime writes:          2525 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Default directory hash:   half_md4
Directory Hash Seed:      01f2a95d-8a5d-488a-b5ce-d2f1a810edc3
Comment 3 Theodore Tso 2017-02-15 06:29:12 UTC
Created attachment 254761 [details]
ext4: fix fencepost in s_first_meta_bg validation

Oops, thanks for reporting the bug.  This fix should take care of your issue.

I was able to reproduce such a file system using:

mke2fs -t ext4 -b 4096 -N 90112 -I 128 -L DB -m 0 -O dir_index,extent,filetype,flex_bg,^has_journal,large_file,^resize_inode,sparse_super,uninit_bg,meta_bg,^metadata_csum,^64bit /tmp/test.img 8388608

I'm not sure how it got into that state, but I'm guess it involved online resizing?
Comment 4 bugzilla.kernel.org 2017-02-15 11:30:50 UTC
First, I'm impressed how quickly this was handled. I am in no pressing need of a fix (4.4.47 works just as well) and would like to avoid compiling a kernel just for this fix. Hope this is ok and the issue is "obvious" enough to fix without it, so from my side, all is well and I can check once this hits a release kernel.

As for resizing, my lvm history doesn't go back that far, and I don't specifically remember having it resized, but since the fs has moved through at least three different disks and contains a slowly growing database, it would be a prime target for resizing - I certainly do have a habit of resizing filesystems in general.

On the other hand, wouldn't -O ^resize_inode preclude online resizing? No need to answer that, you certainly know best. If, however, online resizing wouldn't work I probably would have played ayround with tune2fs andf/or offline resizing to make it work.
Comment 5 Theodore Tso 2017-02-15 23:46:59 UTC
The meta_bg feature allows file system resizing when (a) there is no resize_inode, OR (b) when the file system has more than 2**32 blocks.   We don't enable the meta_bg field and turn off resize_inode by default because for smaller file systems on HDD's, using the meta_bg slows down the mount by a little (since the block group descriptors get spread out across the disk).   So in general the strategy is to use the resize_inode until the file system grows beyond 2**32 blocks, and only then to switch on the meta_bg feature.   So I was a bit surprised to see your smallish file system with meta_bg.   This is supported primarily for debugging / development processes (so we can easily test meta_bg without needing huge test disks), and not something I had necessarily had intended for use in production.

(But thanks for being a guinea pig so we could find this bug!  :-)  :-)  :-)
Comment 6 bugzilla.kernel.org 2018-04-16 18:27:22 UTC
Thanks for this explanation from the master himself, and indeed this is fixed in current stable kernels. Thanks!