Bug 194567 - ext4 no longer mounts
Summary: ext4 no longer mounts
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-12 20:55 UTC by bugzilla.kernel.org
Modified: 2018-04-16 18:27 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.4.48
Subsystem:
Regression: No
Bisected commit-id:


Attachments
ext4: fix fencepost in s_first_meta_bg validation (1.20 KB, patch)
2017-02-15 06:29 UTC, Theodore Tso
Details | Diff

Description bugzilla.kernel.org 2017-02-12 20:55:10 UTC
I have a single ext4 filesystem on my workstatioon used to host mysql database files. I don't know how old it is (months, years), but I happen to keep the mkfs command used to create it (the device is 67108864 sectors):

mke2fs -t ext4 -b 4096 -N 65536 -I 128 -L DB -m 0 -O dir_index,extent,filetype,flex_bg,^has_journal,large_file,^resize_inode,sparse_super,uninit_bg

It worked fine till 4.4.47. it no longer mounts in 4.4.48:

[  206.596713] EXT4-fs (dm-24): VFS: Can't find ext4 filesystem
[  214.028205] EXT4-fs (dm-24): first meta block group too large: 2 (group descriptor block count 2)
[  242.185792] EXT4-fs (dm-24): first meta block group too large: 2 (group descriptor block count 2)

e2fsck -f (from "e2fsck 1.43.4 (31-Jan-2017)") finds no errors. Not sure if this is a bug in mkfs, e2fsck, or the kernel, but since it is a regression in the kernel, I reported it here.
Comment 1 Eric Sandeen 2017-02-13 04:00:11 UTC
Can you add the dumpe2fs -h output for the device?

I guess this would be thanks to:

http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=e21a3cad35bc2f4c7fff317e2c7d38eed363a430

ext4: validate s_first_meta_bg at mount time
commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe upstream.

Ralf Spenneberg reported that he hit a kernel crash when mounting a
modified ext4 image. And it turns out that kernel crashed when
calculating fs overhead (ext4_calculate_overhead()), this is because
the image has very large s_first_meta_bg (debug code shows it's
842150400), and ext4 overruns the memory in count_overhead() when
setting bitmap buffer, which is PAGE_SIZE.

ext4_calculate_overhead():
  buf = get_zeroed_page(GFP_NOFS);  <=== PAGE_SIZE buffer
  blks = count_overhead(sb, i, buf);

count_overhead():
  for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400
          ext4_set_bit(EXT4_B2C(sbi, s++), buf);   <=== buffer overrun
          count++;
  }

This can be reproduced easily for me by this script:

  #!/bin/bash
  rm -f fs.img
  mkdir -p /mnt/ext4
  fallocate -l 16M fs.img
  mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img
  debugfs -w -R "ssv first_meta_bg 842150400" fs.img
  mount -o loop fs.img /mnt/ext4

Fix it by validating s_first_meta_bg first at mount time, and
refusing to mount if its value exceeds the largest possible meta_bg
number.

Reported-by: Ralf Spenneberg <ralf@os-t.de>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Comment 2 bugzilla.kernel.org 2017-02-13 14:59:38 UTC
dumpe2fs 1.43.4 (31-Jan-2017)
Filesystem volume name:   DB
Last mounted on:          /db
Filesystem UUID:          68f492ad-4c47-49ea-9ad9-3948f14b2ce3
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr dir_index filetype meta_bg extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         not clean
Errors behavior:          Remount read-only
Filesystem OS type:       Linux
Inode count:              90112
Block count:              8388608
Reserved block count:     0
Free blocks:              1628674
Free inodes:              90022
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         352
Inode blocks per group:   11
First meta block group:   2
Flex block group size:    16
Filesystem created:       Wed Dec  4 13:14:34 2013
Last mount time:          Mon Feb 13 12:13:01 2017
Last write time:          Mon Feb 13 12:13:01 2017
Mount count:              3
Maximum mount count:      -1
Last checked:             Sun Feb 12 21:36:42 2017
Check interval:           0 (<none>)
Lifetime writes:          2525 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Default directory hash:   half_md4
Directory Hash Seed:      01f2a95d-8a5d-488a-b5ce-d2f1a810edc3
Comment 3 Theodore Tso 2017-02-15 06:29:12 UTC
Created attachment 254761 [details]
ext4: fix fencepost in s_first_meta_bg validation

Oops, thanks for reporting the bug.  This fix should take care of your issue.

I was able to reproduce such a file system using:


export MKE2FS_FIRST_META_BG=2
mke2fs -t ext4 -b 4096 -N 90112 -I 128 -L DB -m 0 -O dir_index,extent,filetype,flex_bg,^has_journal,large_file,^resize_inode,sparse_super,uninit_bg,meta_bg,^metadata_csum,^64bit /tmp/test.img 8388608

I'm not sure how it got into that state, but I'm guess it involved online resizing?
Comment 4 bugzilla.kernel.org 2017-02-15 11:30:50 UTC
First, I'm impressed how quickly this was handled. I am in no pressing need of a fix (4.4.47 works just as well) and would like to avoid compiling a kernel just for this fix. Hope this is ok and the issue is "obvious" enough to fix without it, so from my side, all is well and I can check once this hits a release kernel.

As for resizing, my lvm history doesn't go back that far, and I don't specifically remember having it resized, but since the fs has moved through at least three different disks and contains a slowly growing database, it would be a prime target for resizing - I certainly do have a habit of resizing filesystems in general.

On the other hand, wouldn't -O ^resize_inode preclude online resizing? No need to answer that, you certainly know best. If, however, online resizing wouldn't work I probably would have played ayround with tune2fs andf/or offline resizing to make it work.
Comment 5 Theodore Tso 2017-02-15 23:46:59 UTC
The meta_bg feature allows file system resizing when (a) there is no resize_inode, OR (b) when the file system has more than 2**32 blocks.   We don't enable the meta_bg field and turn off resize_inode by default because for smaller file systems on HDD's, using the meta_bg slows down the mount by a little (since the block group descriptors get spread out across the disk).   So in general the strategy is to use the resize_inode until the file system grows beyond 2**32 blocks, and only then to switch on the meta_bg feature.   So I was a bit surprised to see your smallish file system with meta_bg.   This is supported primarily for debugging / development processes (so we can easily test meta_bg without needing huge test disks), and not something I had necessarily had intended for use in production.

(But thanks for being a guinea pig so we could find this bug!  :-)  :-)  :-)
Comment 6 bugzilla.kernel.org 2018-04-16 18:27:22 UTC
Thanks for this explanation from the master himself, and indeed this is fixed in current stable kernels. Thanks!

Note You need to log in before you can comment on or make changes to this bug.