Can mout volume over 16TB ( 34TB ,40TB ) with controler accraid / hardware rais btrfs runs , etx4 not 14 TB raid can mount
Walter, please provide way more information here: describe the setup, provide the exact error messages, did it work before, what changed that it doesn't work anymore, etc. Maybe report the bug to the ext4 mailing list before logging a tracking bug here. See also: [How to report Linux kernel bugs](https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html#how-to-report-linux-kernel-bugs)
hi , bug on two servers , using kernel 5.6x no problems with kernel 5.7 only volumes under 16TB can be mount. testet also clear volume new partion , an make fs ext4 . new created vloume dont mount . use btrfs not problem. return to kernel 5.6 , there are no problems. so me can say there are no hardware problems or lowlevel drivers--- btrfs works --- only ext4 not
On Mon, 20 Apr 2020, bugzilla-daemon@bugzilla.kernel.org wrote: > with kernel 5.7 only volumes under 16TB can be mount. While this bug report is still missing details, I was able to reproduce this issue. Contrary to the subject line, it is not hardware related at all. Linux 5.5 (Debian), creating a 17 TB sparse device (4 GB backing device): $ echo "0 36507222016 zero" | dmsetup create zero0 $ echo "0 36507222016 snapshot /dev/mapper/zero0 /dev/vdb p 128" | \ dmsetup create sparse0 $ mkfs.ext4 -F /dev/mapper/sparse0 Creating filesystem with 4563402752 4k blocks and 285212672 inodes Creating journal (262144 blocks): done $ mount -t ext4 /dev/mapper/sparse0 /mnt/disk/ $ df -h /mnt/disk/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/sparse0 17T 24K 17T 1% /mnt/disk The same fails on 5.7-rc2 (vanilla) with: ------------[ cut here ]------------ would truncate bmap result WARNING: CPU: 0 PID: 640 at fs/iomap/fiemap.c:121 iomap_bmap_actor+0x3a/0x40 Modules linked in: dm_zero 9p xhci_pci xhci_hcd virtio_balloon 9pnet_virtio loop fuse sunrpc CPU: 0 PID: 640 Comm: mount Not tainted 5.7.0-rc2 #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 RIP: 0010:iomap_bmap_actor+0x3a/0x40 Code: 70 08 0f b6 8f 86 00 00 00 49 03 30 48 d3 ee 48 81 fe ff ff ff 7f 77 06 48 89 30 31 c0 c3 48 c7 c7 fa 71 56 a9 e8 04 f7 ea ff <0f> 0b 31 c0 c3 cc 41 54 55 53 48 83 ec 08 48 8b 47 48 48 89 34 24 RSP: 0018:ffffb8fd8090bb80 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa9424bb0 RCX: 0000000000000000 RDX: ffff996abbc1ed40 RSI: ffff996abbc180c8 RDI: ffff996abbc180c8 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000277 R10: 0000000000000774 R11: ffffb8fd8090ba35 R12: 0000000000000000 R13: ffff996aafc6e9b8 R14: ffffb8fd8090bc70 R15: 0000000000001000 FS: 00007efc34fafc80(0000) GS:ffff996abbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efc352f29a0 CR3: 000000016e074003 CR4: 0000000000360eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: iomap_apply+0xf4/0x1a0 ? iomap_fiemap_actor+0x90/0x90 iomap_bmap+0x70/0x90 ? iomap_fiemap_actor+0x90/0x90 bmap+0x1d/0x30 jbd2_journal_init_inode+0x2b/0xe0 ext4_fill_super+0x29c4/0x3300 ? mount_bdev+0x171/0x1a0 ? ext4_calculate_overhead+0x480/0x480 mount_bdev+0x171/0x1a0 ? ext4_calculate_overhead+0x480/0x480 legacy_get_tree+0x22/0x40 vfs_get_tree+0x1b/0x80 ? ns_capable_common+0x29/0x50 do_mount+0x713/0x9f0 ? memdup_user+0x49/0x90 __x64_sys_mount+0x89/0xc0 do_syscall_64+0x60/0x3b0 ? do_page_fault+0x243/0x4bb entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7efc351c7e1a Code: 48 8b 0d 79 e0 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 e0 0b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe6e5bce18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 000055d888f5cb00 RCX: 00007efc351c7e1a RDX: 000055d888f5cd10 RSI: 000055d888f5ea30 RDI: 000055d888f5cd30 RBP: 00007efc35318204 R08: 0000000000000000 R09: 000055d888f5ee00 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 000055d888f5cd30 R15: 000055d888f5cd10 ---[ end trace 513dea1cc94aa289 ]--- jbd2_journal_init_inode: Cannot locate journal superblock EXT4-fs (dm-1): Could not load journal inode HTH, Christian.
Yikes, jbd2 really ought to be ported to a newer internal API than bmap...
Hello All, On 4/21/20 5:21 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=207367 > > --- Comment #3 from Christian Kujau (lists@nerdbynature.de) --- > On Mon, 20 Apr 2020, bugzilla-daemon@bugzilla.kernel.org wrote: >> with kernel 5.7 only volumes under 16TB can be mount. > > While this bug report is still missing details, I was able to reproduce > this issue. Contrary to the subject line, it is not hardware related at > all. > > Linux 5.5 (Debian), creating a 17 TB sparse device (4 GB backing device): > > $ echo "0 36507222016 zero" | dmsetup create zero0 > $ echo "0 36507222016 snapshot /dev/mapper/zero0 /dev/vdb p 128" | \ > dmsetup create sparse0 > > $ mkfs.ext4 -F /dev/mapper/sparse0 > Creating filesystem with 4563402752 4k blocks and 285212672 inodes > Creating journal (262144 blocks): done > > $ mount -t ext4 /dev/mapper/sparse0 /mnt/disk/ > $ df -h /mnt/disk/ > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/sparse0 17T 24K 17T 1% /mnt/disk > > > The same fails on 5.7-rc2 (vanilla) with: > > > ------------[ cut here ]------------ > would truncate bmap result > WARNING: CPU: 0 PID: 640 at fs/iomap/fiemap.c:121 > iomap_bmap_actor+0x3a/0x40 Sorry about not seeing this through in the first place. So the problem really is that the iomap_bmap() API gives WARNING and does't return the physical block address in case if the addr is > INT_MAX. (I guess this could be mostly since the ioctl_fibmap() passes a user integer pointer and users of iomap_bmap() may mostly be coming from ioctl path till now). FYI - I do see that bmap() is also used by below APIs/subsystem. Not sure if any of subsystems mentioned below may still fail later if the underlying FS moved to iomap_bmap() interface or for any existing callers of iomap_bmap() :- 1. mm/page-io.c (generic_swapfile_activate() func) 2. fs/cachefiles/rdwr.c 3. fs/ecryptfs/mmap.c 4. fs/jbd2/journal.c But the changes done in ext4 to move to iomap_bmap() interface resulted in this issue since jbd2 tries to find the block mapping of on disk journal inode of ext4 and on a larger filesystem this may fail given the design of iomap_bmap() to not return addr if > INT_MAX. So as I see it there are 3 options from here. Wanted to put this on mailing list for discussion. 1. Make changes in iomap_bmap() to return the block address mapping. But I still would like to mention that iomap designers may not agree with this here Since the direction in general is to get rid of bmap() interface anyways. 2. Revert the patch series of "bmap & fiemap to move to iomap interface" (why fiemap too? - since if we decide to revert bmap anyways, then we better fix the performance numbers report too coming from fiemap. Also due to 3rd option below since if iomap_bmap() is not changed, then we better keep both of this interface as is until we get the solution like 3 below.) 3. To move to a new internal API like fiemap. But we need to change fiemap in a way that it should also be allowed to used by internal kernel APIs. Since as of now fiemap_extent struct is assumed to be a user pointer. (But the 3rd option as I see, won't be possible given the time frame to fix this issue. Also note if we decide to revert the changes, then the long term path would be to work on making fiemap used by internal kernel APIs too). struct fiemap_extent_info { unsigned int fi_flags; /* Flags as passed from user */ unsigned int fi_extents_mapped; /* Number of mapped extents */ unsigned int fi_extents_max; /* Size of fiemap_extent array */ struct fiemap_extent __user *fi_extents_start; /* Start of fiemap_extent array */ }; -ritesh
On Tue, Apr 21, 2020 at 09:50:38AM +0530, Ritesh Harjani wrote: > Hello All, > > On 4/21/20 5:21 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=207367 > > > > --- Comment #3 from Christian Kujau (lists@nerdbynature.de) --- > > On Mon, 20 Apr 2020, bugzilla-daemon@bugzilla.kernel.org wrote: > > > with kernel 5.7 only volumes under 16TB can be mount. > > > > While this bug report is still missing details, I was able to reproduce > > this issue. Contrary to the subject line, it is not hardware related at > > all. > > > > Linux 5.5 (Debian), creating a 17 TB sparse device (4 GB backing device): > > > > $ echo "0 36507222016 zero" | dmsetup create zero0 > > $ echo "0 36507222016 snapshot /dev/mapper/zero0 /dev/vdb p 128" | \ > > dmsetup create sparse0 > > > > $ mkfs.ext4 -F /dev/mapper/sparse0 > > Creating filesystem with 4563402752 4k blocks and 285212672 inodes > > Creating journal (262144 blocks): done > > > > $ mount -t ext4 /dev/mapper/sparse0 /mnt/disk/ > > $ df -h /mnt/disk/ > > Filesystem Size Used Avail Use% Mounted on > > /dev/mapper/sparse0 17T 24K 17T 1% /mnt/disk > > > > > > The same fails on 5.7-rc2 (vanilla) with: > > > > > > ------------[ cut here ]------------ > > would truncate bmap result > > WARNING: CPU: 0 PID: 640 at fs/iomap/fiemap.c:121 > > iomap_bmap_actor+0x3a/0x40 > > Sorry about not seeing this through in the first place. > > So the problem really is that the iomap_bmap() API > gives WARNING and does't return the physical block address in case > if the addr is > INT_MAX. (I guess this could be mostly since > the ioctl_fibmap() passes a user integer pointer and users of > iomap_bmap() may mostly be coming from ioctl path till now). No, it's because bmap is fundamentally broken when it comes to block ranges > INT_MAX. The filesystem in question is a >16TB filesystem, so the block range for that filesystem is >32bits, and hence usage of bmap in the jbd2 code is broken. Basically, jbd2 needs fixing to be able to map blocks that are at higher offsets than bmap can actually report. > FYI - I do see that bmap() is also used by below APIs/subsystem. > Not sure if any of subsystems mentioned below may still fail later > if the underlying FS moved to iomap_bmap() interface or for > any existing callers of iomap_bmap() :- > > 1. mm/page-io.c (generic_swapfile_activate() func) Filesystems using iomap infrastructure should be providing aops->swap_activate() to map swapfile extents via iomap_swapfile_activate() (e.g. see xfs_iomap_swapfile_activate()), not using generic_swapfile_activate(). > 2. fs/cachefiles/rdwr.c Known problem, work being done to stop using bmap() here > 3. fs/ecryptfs/mmap.c Just a wrapper to pass ->bmap calls through to the lower layer. > 4. fs/jbd2/journal.c Broken on filesystems where the journal file might be placed beyond a 32 bit block number, iomap_bmap() just makes that obvious. Needs fixing. You also missed f2fs copy-n-waste using it for internal swapfile mapping: /* Copied from generic_swapfile_activate() to check any holes */ That needs fixing, too. And you missed the MD bitmap code uses bmap() to map it's bitmap storage file, which means that is broken is the bitmap file is on a filesystem/block device > 16TB, too... > But the changes done in ext4 to move to iomap_bmap() interface > resulted in this issue since jbd2 tries to find the block mapping > of on disk journal inode of ext4 and on a larger filesystem > this may fail given the design of iomap_bmap() to not > return addr if > INT_MAX. > > So as I see it there are 3 options from here. Wanted to put this > on mailing list for discussion. > > 1. Make changes in iomap_bmap() to return the block address mapping. > But I still would like to mention that iomap designers may not agree > with this here Since the direction in general is to get rid of bmap() > interface anyways. Nope. bmap() is broken. Get rid of it. > 2. Revert the patch series of "bmap & fiemap to move to iomap interface" > (why fiemap too? - since if we decide to revert bmap anyways, > then we better fix the performance numbers report too coming from > fiemap. Also due to 3rd option below since if iomap_bmap() is > not changed, then we better keep both of this interface as is until > we get the solution like 3 below.) The use of bmap was broken prior to this conversion - shooting the messenger doesn't fix the problem. Get rid of bmap(). > 3. To move to a new internal API like fiemap. But we need to change > fiemap in a way that it should also be allowed to used by internal > kernel APIs. Since as of now fiemap_extent struct is assumed to be > a user pointer. Fiemap cannot be used this way. It's a diagnostic interface that provides no guarantee of coherency or atomicity, so you can't use it in this way in userspace or the kernel. IMO, the correct thing to do is for the caller to supply jbd with a block mapping callback. i.e. jbd2_journal_init_inode() gets called from both ext4 and ocfs2 with a callback that does the block mapping for that specific filesystem. Indeed, jbd2 will need to cache that callback, because it needs to call it to map journal blocks when committing transactions.... Cheers, Dave.
Hello, On 4/21/20 10:38 AM, Dave Chinner wrote: > On Tue, Apr 21, 2020 at 09:50:38AM +0530, Ritesh Harjani wrote: >> Hello All, >> >> On 4/21/20 5:21 AM, bugzilla-daemon@bugzilla.kernel.org wrote: >>> https://bugzilla.kernel.org/show_bug.cgi?id=207367 >>> >>> --- Comment #3 from Christian Kujau (lists@nerdbynature.de) --- >>> On Mon, 20 Apr 2020, bugzilla-daemon@bugzilla.kernel.org wrote: >>>> with kernel 5.7 only volumes under 16TB can be mount. >>> >>> While this bug report is still missing details, I was able to reproduce >>> this issue. Contrary to the subject line, it is not hardware related at >>> all. >>> >>> Linux 5.5 (Debian), creating a 17 TB sparse device (4 GB backing device): >>> >>> $ echo "0 36507222016 zero" | dmsetup create zero0 >>> $ echo "0 36507222016 snapshot /dev/mapper/zero0 /dev/vdb p 128" | \ >>> dmsetup create sparse0 >>> >>> $ mkfs.ext4 -F /dev/mapper/sparse0 >>> Creating filesystem with 4563402752 4k blocks and 285212672 inodes >>> Creating journal (262144 blocks): done >>> >>> $ mount -t ext4 /dev/mapper/sparse0 /mnt/disk/ >>> $ df -h /mnt/disk/ >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/mapper/sparse0 17T 24K 17T 1% /mnt/disk >>> >>> >>> The same fails on 5.7-rc2 (vanilla) with: >>> >>> >>> ------------[ cut here ]------------ >>> would truncate bmap result >>> WARNING: CPU: 0 PID: 640 at fs/iomap/fiemap.c:121 >>> iomap_bmap_actor+0x3a/0x40 >> >> Sorry about not seeing this through in the first place. >> >> So the problem really is that the iomap_bmap() API >> gives WARNING and does't return the physical block address in case >> if the addr is > INT_MAX. (I guess this could be mostly since >> the ioctl_fibmap() passes a user integer pointer and users of >> iomap_bmap() may mostly be coming from ioctl path till now). > > No, it's because bmap is fundamentally broken when it comes to block > ranges > INT_MAX. The filesystem in question is a >16TB filesystem, > so the block range for that filesystem is >32bits, and hence usage > of bmap in the jbd2 code is broken. IIUC, what you meant is that bmap in general as a interface is broken because if some user tries to call this interface via an ioctl for a file which is placed beyond 32 bits on a filesystem, then it truncates the results (basically gives us wrong results). Also this means as a user interface this was not designed properly keeping the block addresses range in mind (that someday it will extent beyong 32 bits). Due to above reason, we don't make iomap_bmap() API return the u64 addr in it's argument, even though it is capable of doing that. Is that the reason? And this is because since the overall interface from userspace side is broken and so we want to get rid of that. So what I would still like to understand is- that bmap() internal kernel function can definitely handle the sector_t block addresses, right? Isn't the bmap() kernel function was designed for this purpose in mind, to help internal kernel callers to provide with u64 block addresses? But for e.g. here, when jbd2 calls for bmap(), it will eventually call for ext4_bmap() -> iomap_bmap(). Now even though this chain is capable of handling u64 addresses, but it's the iomap_bmap() which doesn't provide this. So the question simply is why? Sorry about my query here, it is to only to understand more about this and to not force in anyway to change iomap_bmap() to just get this working. > > Basically, jbd2 needs fixing to be able to map blocks that are at > higher offsets than bmap can actually report. > >> FYI - I do see that bmap() is also used by below APIs/subsystem. >> Not sure if any of subsystems mentioned below may still fail later >> if the underlying FS moved to iomap_bmap() interface or for >> any existing callers of iomap_bmap() :- >> >> 1. mm/page-io.c (generic_swapfile_activate() func) > > Filesystems using iomap infrastructure should be providing > aops->swap_activate() to map swapfile extents via > iomap_swapfile_activate() (e.g. see xfs_iomap_swapfile_activate()), > not using generic_swapfile_activate(). > >> 2. fs/cachefiles/rdwr.c > > Known problem, work being done to stop using bmap() here > >> 3. fs/ecryptfs/mmap.c > > Just a wrapper to pass ->bmap calls through to the lower layer. > >> 4. fs/jbd2/journal.c > > Broken on filesystems where the journal file might be placed beyond > a 32 bit block number, iomap_bmap() just makes that obvious. Needs > fixing. > > You also missed f2fs copy-n-waste using it for internal swapfile > mapping: > > /* Copied from generic_swapfile_activate() to check any holes */ > > That needs fixing, too. > > And you missed the MD bitmap code uses bmap() to map it's bitmap > storage file, which means that is broken is the bitmap file is on a > filesystem/block device > 16TB, too... > >> But the changes done in ext4 to move to iomap_bmap() interface >> resulted in this issue since jbd2 tries to find the block mapping >> of on disk journal inode of ext4 and on a larger filesystem >> this may fail given the design of iomap_bmap() to not >> return addr if > INT_MAX. >> >> So as I see it there are 3 options from here. Wanted to put this >> on mailing list for discussion. >> >> 1. Make changes in iomap_bmap() to return the block address mapping. >> But I still would like to mention that iomap designers may not agree >> with this here Since the direction in general is to get rid of bmap() >> interface anyways. > > Nope. bmap() is broken. Get rid of it. > >> 2. Revert the patch series of "bmap & fiemap to move to iomap interface" >> (why fiemap too? - since if we decide to revert bmap anyways, >> then we better fix the performance numbers report too coming from >> fiemap. Also due to 3rd option below since if iomap_bmap() is >> not changed, then we better keep both of this interface as is until >> we get the solution like 3 below.) > > The use of bmap was broken prior to this conversion - shooting > the messenger doesn't fix the problem. Get rid of bmap(). > >> 3. To move to a new internal API like fiemap. But we need to change >> fiemap in a way that it should also be allowed to used by internal >> kernel APIs. Since as of now fiemap_extent struct is assumed to be >> a user pointer. > > Fiemap cannot be used this way. It's a diagnostic interface that Sure. > provides no guarantee of coherency or atomicity, so you can't use it > in this way in userspace or the kernel. hmm. Yes, I guess even with FIEMAP_FLAG_SYNC what it mostly could provide is to make sure the dirty data goes and sit on disk. But I guess it won't provide guarantee that in case if the data is journalled then it is moved to it's right location on disk before the results are returned to user. So I understood the coherency part. But why do you say atomicity? How does bmap() interface provides atomicity() ? > > IMO, the correct thing to do is for the caller to supply jbd with a > block mapping callback. i.e. jbd2_journal_init_inode() gets called > from both ext4 and ocfs2 with a callback that does the block mapping > for that specific filesystem. Indeed, jbd2 will need to cache that > callback, because it needs to call it to map journal blocks when > committing transactions.... Sure thanks for this. Somehow I feel that the callback will be a similar API to what a_ops->bmap() does today. So from jbd2 perspective will the new block mapping callback needs same level of coherency and atomicity guarantee? Thanks -ritesh
On Tue, Apr 21, 2020 at 03:08:50PM +1000, Dave Chinner wrote: > > FYI - I do see that bmap() is also used by below APIs/subsystem. > > Not sure if any of subsystems mentioned below may still fail later > > if the underlying FS moved to iomap_bmap() interface or for > > any existing callers of iomap_bmap() :- > > > > 1. mm/page-io.c (generic_swapfile_activate() func) > > Filesystems using iomap infrastructure should be providing > aops->swap_activate() to map swapfile extents via > iomap_swapfile_activate() (e.g. see xfs_iomap_swapfile_activate()), > not using generic_swapfile_activate(). And we also need to eventually phase generic_swapfile_activate out, maybe by having a version with a get_blocks callback for the non-iomap case. > > 4. fs/jbd2/journal.c > > Broken on filesystems where the journal file might be placed beyond > a 32 bit block number, iomap_bmap() just makes that obvious. Needs > fixing. I think this wants to use iomap, as that would solve all the problems. > And you missed the MD bitmap code uses bmap() to map it's bitmap > storage file, which means that is broken is the bitmap file is on a > filesystem/block device > 16TB, too... This probably needs to use the in-kernel direct I/O interface, just as it is planned for cachefiles.
hey at all, me use it at two servers, run without on 5.6 not at 5.7_rcx over 16 TB vol. --- btrfs runs on same device -- so it only can be in ext4 subsys
On Tue 21-04-20 01:04:05, Christoph Hellwig wrote: > On Tue, Apr 21, 2020 at 03:08:50PM +1000, Dave Chinner wrote: > > > 4. fs/jbd2/journal.c > > > > Broken on filesystems where the journal file might be placed beyond > > a 32 bit block number, iomap_bmap() just makes that obvious. Needs > > fixing. > > I think this wants to use iomap, as that would solve all the problems. Well, there are two problems with this - firstly, ocfs2 is also using jbd2 and it knows nothing about iomap. So that would have to be implemented. Secondly, you have to somehow pass iomap ops to jbd2 so it all boils down to passing some callback to jbd2 during journal init to map blocks anyway as Dave said. And then it is upto filesystem to do the mapping - usually directly using its internal block mapping function - so no need for iomap AFAICT. Honza
On Tue, Apr 21, 2020 at 06:29:10PM +0200, Jan Kara wrote: > Well, there are two problems with this - firstly, ocfs2 is also using jbd2 > and it knows nothing about iomap. So that would have to be implemented. > Secondly, you have to somehow pass iomap ops to jbd2 so it all boils down > to passing some callback to jbd2 during journal init to map blocks anyway > as Dave said. And then it is upto filesystem to do the mapping - usually > directly using its internal block mapping function - so no need for iomap > AFAICT. You'll need to describe the mapping some how. So why not reuse an existing mechanism instead of creating a new ad-hoc one?
all about this why it is running on kernel 5.6 and before
On Tue, Apr 21, 2020 at 09:45:54AM -0700, Christoph Hellwig wrote: > On Tue, Apr 21, 2020 at 06:29:10PM +0200, Jan Kara wrote: > > Well, there are two problems with this - firstly, ocfs2 is also using jbd2 > > and it knows nothing about iomap. So that would have to be implemented. > > Secondly, you have to somehow pass iomap ops to jbd2 so it all boils down > > to passing some callback to jbd2 during journal init to map blocks anyway > > as Dave said. And then it is upto filesystem to do the mapping - usually > > directly using its internal block mapping function - so no need for iomap > > AFAICT. > > You'll need to describe the mapping some how. So why not reuse an > existing mechanism instead of creating a new ad-hoc one? Well, we could argue that bmap() is an "existing mechanism" --- again, bmap() returns a u64, so it's perfectly fine. It's FIBMAP which is "fundamentally broken", not bmap(). If the goal is to eventually eliminate bmap() and aops->bmap(), sure, then we should force march all file systems to use iomap_bmap(), including ocfs2. Otherwise, if the goal alert users of FIBMAP when it's returning an corrutped block number, why not move the check if the block is larger than INT_MAX to ioctl_fibmap() in fs/ioctl.c, instead of in iomap_bmap()? If we can't fix this, I'm beginning to think that switching to iomap for fiemap and bmap is actually a lose for ext4. It's causing performance regressions, and now we see it's causing functionality regressions. Sure, it's saving a bit of code size, but is it really worth it to use iomap for fiemap/bmap? - Ted