Bug 9502 - ext4 bitmap allocator must limit blocks to < 2^32
Summary: ext4 bitmap allocator must limit blocks to < 2^32
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Eric Sandeen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-04 15:10 UTC by Eric Sandeen
Modified: 2009-11-13 22:16 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.24-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Eric Sandeen 2007-12-04 15:10:16 UTC
Although ext4 filesystems can grow above 2^32 blocks, block-mapped / bitmapped files cannot contain any physical block numbers above 2^32.  The block-based allocator must know to only search groups & blocks < 2^32.

Unfortunately this could lead to -ENOSPC for bitmap files for non-full filesystems, but c'est la vie, price we pay for ext3 compatibility I guess.  Doing otherwise would lead to lots of complexity that I've seen on xfs* and hate to revisit in ext4...

*we would need to have some heuristic to prefer higher blocks for extent-mapped files, to leave room below 2^32 for bitmap files... yuck.
Comment 1 Andreas Dilger 2007-12-04 15:32:24 UTC
As a simple solution to this, ext4_new_blocks() can artificially limit the number of groups for block-mapped files to be below the 2^32 block limit:

In ext4_sb_info save the 32-bit block limit to avoid constantly recalculating:

         __u64 max_block = 1ULL << 32;
         do_div(max_block, EXT4_BLOCKS_PER_GROUP(sb));
         sbi->s_bitmap_group_limit = max_block;

In ext4_new_blocks()

         ext4_get_group_no_and_offset(sb, goal, &group_no, &grp_target_blk);
+
+        ngroups = sbi->s_groups_count;
+        smp_rmb();
+        /* limit allocations to 2^32 blocks for block-mapped files */
+        if (!(EXT4_I(inode)->i_flags & EXT4_EXTENT_FL) &&
+            ngroups > sbi->s_bitmap_group_limit) {
+               ngroups = sbi->s_bitmap_group_limit;
+               if (goal_group > ngroups)
+                       group_no = 0;
+        }
         goal_group = group_no;
retry_alloc:
:
:
-        ngroups = EXT4_SB(sb)->s_groups_count;
-        smp_rmb();
Comment 2 Eric Sandeen 2008-01-15 07:56:28 UTC
I wonder if it would make sense to automatically convert to extents format in this situation (i.e. if the block the allocator wants is > 32 bits...)
Comment 3 Mingming Cao 2008-08-22 12:50:24 UTC
 Aneesh did a patch that did not allow mount with noextent on >2^32 blocks

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c07651b556323e0e763c452587fe29d2b034b314

ext4: Don't allow nonextenst mount option for large filesystem

The block mapped inode format can address only blocks within 2**32. This
causes a number of issues, the biggest of which is that the block
allocator needs to be taught that certain inodes can not utilize block
numbers > 2**32.  So until this is fixed, it is simplest to fail
mounting of file systems with more than 2**32 blocks if the -o noextents
option is given.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1004,6 +1004,7 @@ static int parse_options (char *options, struct super_block *sb,
        int qtype, qfmt;
        char *qname;
 #endif
+       ext4_fsblk_t last_block;
 
        if (!options)
                return 1;
@@ -1326,6 +1327,20 @@ set_qf_format:
                        set_opt (sbi->s_mount_opt, EXTENTS);
                        break;
                case Opt_noextents:
+                       /*
+                        * When e2fsprogs support resizing an already existing
+                        * ext3 file system to greater than 2**32 we need to
+                        * add support to block allocator to handle growing
+                        * already existing block  mapped inode so that blocks
+                        * allocated for them fall within 2**32
+                        */
+                       last_block = ext4_blocks_count(sbi->s_es) - 1;
+                       if (last_block  > 0xffffffffULL) {
+                               printk(KERN_ERR "EXT4-fs: Filesystem too "
+                                               "large to mount with "
+                                               "-o noextents options\n");
+                               return 0;
+                       }
                        clear_opt (sbi->s_mount_opt, EXTENTS);
                        break;
                case Opt_i_version:
Comment 4 Mingming Cao 2008-08-22 13:24:36 UTC
hmm, with Aneesh's patch, what about mount existing ext3 fs as ext4dev, with -o extents, then start writes to the old ext3 files? what prevent trying to allocating blocks >2^32 for this non extent format file in ext4?
Comment 5 Eric Sandeen 2009-11-13 22:16:21 UTC
Should be fixed in 2.6.32 by:

commit fb0a387dcdcd21aab1b09ee7fd80b7c979bdbbfd
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Wed Sep 16 14:45:10 2009 -0400

    ext4: limit block allocations for indirect-block files to < 2^32
    
    Today, the ext4 allocator will happily allocate blocks past
    2^32 for indirect-block files, which results in the block
    numbers getting truncated, and corruption ensues.
    
    This patch limits such allocations to < 2^32, and adds
    BUG_ONs if we do get blocks larger than that.
    
    This should address RH Bug 519471, ext4 bitmap allocator
    must limit blocks to < 2^32
    
    * ext4_find_goal() is modified to choose a goal < UINT_MAX,
      so that our starting point is in an acceptable range.
    
    * ext4_xattr_block_set() is modified such that the goal block
      is < UINT_MAX, as above.
    
    * ext4_mb_regular_allocator() is modified so that the group
      search does not continue into groups which are too high
    
    * ext4_mb_use_preallocated() has a check that we don't use
      preallocated space which is too far out
    
    * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
    
    No attempt has been made to limit inode locations to < 2^32,
    so we may wind up with blocks far from their inodes.  Doing
    this much already will lead to some odd ENOSPC issues when the
    "lower 32" gets full, and further restricting inodes could
    make that even weirder.
    
    For high inodes, choosing a goal of the original, % UINT_MAX,
    may be a bit odd, but then we're in an odd situation anyway,
    and I don't know of a better heuristic.
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Note You need to log in before you can comment on or make changes to this bug.