Although ext4 filesystems can grow above 2^32 blocks, block-mapped / bitmapped files cannot contain any physical block numbers above 2^32. The block-based allocator must know to only search groups & blocks < 2^32. Unfortunately this could lead to -ENOSPC for bitmap files for non-full filesystems, but c'est la vie, price we pay for ext3 compatibility I guess. Doing otherwise would lead to lots of complexity that I've seen on xfs* and hate to revisit in ext4... *we would need to have some heuristic to prefer higher blocks for extent-mapped files, to leave room below 2^32 for bitmap files... yuck.
As a simple solution to this, ext4_new_blocks() can artificially limit the number of groups for block-mapped files to be below the 2^32 block limit: In ext4_sb_info save the 32-bit block limit to avoid constantly recalculating: __u64 max_block = 1ULL << 32; do_div(max_block, EXT4_BLOCKS_PER_GROUP(sb)); sbi->s_bitmap_group_limit = max_block; In ext4_new_blocks() ext4_get_group_no_and_offset(sb, goal, &group_no, &grp_target_blk); + + ngroups = sbi->s_groups_count; + smp_rmb(); + /* limit allocations to 2^32 blocks for block-mapped files */ + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENT_FL) && + ngroups > sbi->s_bitmap_group_limit) { + ngroups = sbi->s_bitmap_group_limit; + if (goal_group > ngroups) + group_no = 0; + } goal_group = group_no; retry_alloc: : : - ngroups = EXT4_SB(sb)->s_groups_count; - smp_rmb();
I wonder if it would make sense to automatically convert to extents format in this situation (i.e. if the block the allocator wants is > 32 bits...)
Aneesh did a patch that did not allow mount with noextent on >2^32 blocks http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c07651b556323e0e763c452587fe29d2b034b314 ext4: Don't allow nonextenst mount option for large filesystem The block mapped inode format can address only blocks within 2**32. This causes a number of issues, the biggest of which is that the block allocator needs to be taught that certain inodes can not utilize block numbers > 2**32. So until this is fixed, it is simplest to fail mounting of file systems with more than 2**32 blocks if the -o noextents option is given. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1004,6 +1004,7 @@ static int parse_options (char *options, struct super_block *sb, int qtype, qfmt; char *qname; #endif + ext4_fsblk_t last_block; if (!options) return 1; @@ -1326,6 +1327,20 @@ set_qf_format: set_opt (sbi->s_mount_opt, EXTENTS); break; case Opt_noextents: + /* + * When e2fsprogs support resizing an already existing + * ext3 file system to greater than 2**32 we need to + * add support to block allocator to handle growing + * already existing block mapped inode so that blocks + * allocated for them fall within 2**32 + */ + last_block = ext4_blocks_count(sbi->s_es) - 1; + if (last_block > 0xffffffffULL) { + printk(KERN_ERR "EXT4-fs: Filesystem too " + "large to mount with " + "-o noextents options\n"); + return 0; + } clear_opt (sbi->s_mount_opt, EXTENTS); break; case Opt_i_version:
hmm, with Aneesh's patch, what about mount existing ext3 fs as ext4dev, with -o extents, then start writes to the old ext3 files? what prevent trying to allocating blocks >2^32 for this non extent format file in ext4?
Should be fixed in 2.6.32 by: commit fb0a387dcdcd21aab1b09ee7fd80b7c979bdbbfd Author: Eric Sandeen <sandeen@redhat.com> Date: Wed Sep 16 14:45:10 2009 -0400 ext4: limit block allocations for indirect-block files to < 2^32 Today, the ext4 allocator will happily allocate blocks past 2^32 for indirect-block files, which results in the block numbers getting truncated, and corruption ensues. This patch limits such allocations to < 2^32, and adds BUG_ONs if we do get blocks larger than that. This should address RH Bug 519471, ext4 bitmap allocator must limit blocks to < 2^32 * ext4_find_goal() is modified to choose a goal < UINT_MAX, so that our starting point is in an acceptable range. * ext4_xattr_block_set() is modified such that the goal block is < UINT_MAX, as above. * ext4_mb_regular_allocator() is modified so that the group search does not continue into groups which are too high * ext4_mb_use_preallocated() has a check that we don't use preallocated space which is too far out * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs No attempt has been made to limit inode locations to < 2^32, so we may wind up with blocks far from their inodes. Doing this much already will lead to some odd ENOSPC issues when the "lower 32" gets full, and further restricting inodes could make that even weirder. For high inodes, choosing a goal of the original, % UINT_MAX, may be a bit odd, but then we're in an odd situation anyway, and I don't know of a better heuristic. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>