Bug 213627 - Fail to read block descriptors data of ext4 filesystem
Summary: Fail to read block descriptors data of ext4 filesystem
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-30 18:06 UTC by Nipuna
Modified: 2021-07-01 00:24 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.3.x-5.4.x
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Nipuna 2021-06-30 18:06:52 UTC
Our product takes backup of filesystems (ext2/3/4,xfs,btrfs) after taking the snapshot of the volume. We have our own drivers for taking snapshot.

After taking snapshot, we calculate block group count and group descriptors blocks in Block group0 (group zero). From group descriptors, we read block bitmap and inode bitmaps.

All this was working well till 5.0.x kernels  and from 5.3.x kernels, block bitmap and inode bitmap values are getting garbage. It doesnot happen all the time.Everytime after reboot, it works fine.

Our driver simply bypasses read/write calls to system block driver, not sure why data is corrupted.

Can you please help me what have been changed between 5.0.x and 5.3.x kernels regarding ext,block driver.

We are not seeing this issue for xfs and btrfs filesystem.

We are suspecting something might have changed in ext2/3/4 and block driver in 5.3.x kernels.
Comment 1 Nipuna 2021-06-30 18:07:40 UTC
Please let me know if more details are required.
Comment 2 Theodore Tso 2021-07-01 00:24:04 UTC
I'm guessing that it's your snapshot driver which is buggy.   Certainly, if you take a snapshot using LVM, things work fine.  e.g.:

# mke2fs -t ext4 /dev/cwcc-wg/scratch
# mount -t ext4 /dev/cwcc-wg/scratch /mnt
# cp -r /etc /mnt
# lvcreate --snapshot -n snap -L 5G cwcc-wg/scratch
# e2fsck -fn /dev/cwcc-wg/snap

You can see everything that has changed via a command such as "git log v5.0..v5.3 block fs/ext4".    In terms of what might be a relevant change, without understanding how your snapshot driver works, your guess is probably going to be better than mine --- since you have access to your snapshot driver and know how it works.

When you say that your driver "bypasses read/write calls to system block driver", I'm not 100% sure how it works, but at a guess, some things I'd look at are: (a) ext4 uses the buffer cache to read/write metadata blocks.   Maybe your driver isn't properly intercepting buffer cache reads/writes?    (b) Ext4 at mount time reads the superblock via the buffer cache with the block size set to 1k; and then after it determines the block size of the file system (say, 4k), it switches the block size of the buffer cache to the block size of the file system.    Ext[234] has been doing this for decades, but depending on how your snapshot driver is working, perhaps there is some change in the how the buffer cache works which is confusing your driver.

Sorry I can't help more.

Note You need to log in before you can comment on or make changes to this bug.