I have a single ext4 filesystem on my workstatioon used to host mysql database files. I don't know how old it is (months, years), but I happen to keep the mkfs command used to create it (the device is 67108864 sectors): mke2fs -t ext4 -b 4096 -N 65536 -I 128 -L DB -m 0 -O dir_index,extent,filetype,flex_bg,^has_journal,large_file,^resize_inode,sparse_super,uninit_bg It worked fine till 4.4.47. it no longer mounts in 4.4.48: [ 206.596713] EXT4-fs (dm-24): VFS: Can't find ext4 filesystem [ 214.028205] EXT4-fs (dm-24): first meta block group too large: 2 (group descriptor block count 2) [ 242.185792] EXT4-fs (dm-24): first meta block group too large: 2 (group descriptor block count 2) e2fsck -f (from "e2fsck 1.43.4 (31-Jan-2017)") finds no errors. Not sure if this is a bug in mkfs, e2fsck, or the kernel, but since it is a regression in the kernel, I reported it here.
Can you add the dumpe2fs -h output for the device? I guess this would be thanks to: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=e21a3cad35bc2f4c7fff317e2c7d38eed363a430 ext4: validate s_first_meta_bg at mount time commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe upstream. Ralf Spenneberg reported that he hit a kernel crash when mounting a modified ext4 image. And it turns out that kernel crashed when calculating fs overhead (ext4_calculate_overhead()), this is because the image has very large s_first_meta_bg (debug code shows it's 842150400), and ext4 overruns the memory in count_overhead() when setting bitmap buffer, which is PAGE_SIZE. ext4_calculate_overhead(): buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer blks = count_overhead(sb, i, buf); count_overhead(): for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400 ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun count++; } This can be reproduced easily for me by this script: #!/bin/bash rm -f fs.img mkdir -p /mnt/ext4 fallocate -l 16M fs.img mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img debugfs -w -R "ssv first_meta_bg 842150400" fs.img mount -o loop fs.img /mnt/ext4 Fix it by validating s_first_meta_bg first at mount time, and refusing to mount if its value exceeds the largest possible meta_bg number. Reported-by: Ralf Spenneberg <ralf@os-t.de> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dumpe2fs 1.43.4 (31-Jan-2017) Filesystem volume name: DB Last mounted on: /db Filesystem UUID: 68f492ad-4c47-49ea-9ad9-3948f14b2ce3 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr dir_index filetype meta_bg extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: not clean Errors behavior: Remount read-only Filesystem OS type: Linux Inode count: 90112 Block count: 8388608 Reserved block count: 0 Free blocks: 1628674 Free inodes: 90022 First block: 0 Block size: 4096 Fragment size: 4096 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 352 Inode blocks per group: 11 First meta block group: 2 Flex block group size: 16 Filesystem created: Wed Dec 4 13:14:34 2013 Last mount time: Mon Feb 13 12:13:01 2017 Last write time: Mon Feb 13 12:13:01 2017 Mount count: 3 Maximum mount count: -1 Last checked: Sun Feb 12 21:36:42 2017 Check interval: 0 (<none>) Lifetime writes: 2525 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Default directory hash: half_md4 Directory Hash Seed: 01f2a95d-8a5d-488a-b5ce-d2f1a810edc3
Created attachment 254761 [details] ext4: fix fencepost in s_first_meta_bg validation Oops, thanks for reporting the bug. This fix should take care of your issue. I was able to reproduce such a file system using: export MKE2FS_FIRST_META_BG=2 mke2fs -t ext4 -b 4096 -N 90112 -I 128 -L DB -m 0 -O dir_index,extent,filetype,flex_bg,^has_journal,large_file,^resize_inode,sparse_super,uninit_bg,meta_bg,^metadata_csum,^64bit /tmp/test.img 8388608 I'm not sure how it got into that state, but I'm guess it involved online resizing?
First, I'm impressed how quickly this was handled. I am in no pressing need of a fix (4.4.47 works just as well) and would like to avoid compiling a kernel just for this fix. Hope this is ok and the issue is "obvious" enough to fix without it, so from my side, all is well and I can check once this hits a release kernel. As for resizing, my lvm history doesn't go back that far, and I don't specifically remember having it resized, but since the fs has moved through at least three different disks and contains a slowly growing database, it would be a prime target for resizing - I certainly do have a habit of resizing filesystems in general. On the other hand, wouldn't -O ^resize_inode preclude online resizing? No need to answer that, you certainly know best. If, however, online resizing wouldn't work I probably would have played ayround with tune2fs andf/or offline resizing to make it work.
The meta_bg feature allows file system resizing when (a) there is no resize_inode, OR (b) when the file system has more than 2**32 blocks. We don't enable the meta_bg field and turn off resize_inode by default because for smaller file systems on HDD's, using the meta_bg slows down the mount by a little (since the block group descriptors get spread out across the disk). So in general the strategy is to use the resize_inode until the file system grows beyond 2**32 blocks, and only then to switch on the meta_bg feature. So I was a bit surprised to see your smallish file system with meta_bg. This is supported primarily for debugging / development processes (so we can easily test meta_bg without needing huge test disks), and not something I had necessarily had intended for use in production. (But thanks for being a guinea pig so we could find this bug! :-) :-) :-)
Thanks for this explanation from the master himself, and indeed this is fixed in current stable kernels. Thanks!