Bug 211971 - Incorrect fix by e2fsck for blocks_count corruption
Summary: Incorrect fix by e2fsck for blocks_count corruption
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-26 21:39 UTC by tmahmud
Modified: 2021-03-03 17:18 UTC (History)
0 users

See Also:
Kernel Version: Linux 5.4.0-65-generic
Tree: Mainline
Regression: No


Attachments
log files from mke2fs, dumpe2fs and e2fsck (26.15 KB, application/gzip)
2021-02-26 21:39 UTC, tmahmud
Details
The before and after image after using debugfs (35.46 KB, application/octet-stream)
2021-03-03 17:18 UTC, tmahmud
Details

Description tmahmud 2021-02-26 21:39:42 UTC
Created attachment 295497 [details]
log files from mke2fs, dumpe2fs and e2fsck

For an ext4 file system image with only one superblock, if the blocks_count field in superblock is corrupted, e2fsck fixed it incorrectly. In the fixed image, the corrupted blocks_count is unchanged and other fields (e.g., free blocks count) are changed accordingly.
This issue also occurs in images with multiple superblocks too. For example, For an ext4 image with primary and backup superblock (backup superblocks are not located in default locations, e.g., it is located on 513rd block), if the blocks_count field in superblock is corrupted, e2fsck fixed it incorrectly. In the fixed image, the corrupted blocks_count is unchanged and other fields (e.g., free blocks count) are changed accordingly.

e2fsprogs_version_used: e2fsprogs 1.45.6 (20-Mar-2020) 
The commands that I ran to recreate the scenario are:
For image with only one superblock:

dd if=/dev/zero bs=1024 count=8193 of=/home/hdd/image
mke2fs -b 1024 image 8193
debugfs -w image
debugfs:  ssv blocks_count 4000
debugfs:  q
e2fsck -yf image
e2fsck -yf image

# e2fsck fixes the blocks_count corruption in correctly
# In the clean image the blocks_count was 8193, in the fixed image the blocks_count is 4000
#The second run of e2fsck is consistent with the first run, it doesn't fix anything, but blocks_count is still 4000
# Expected that e2fsck would fix the blocks count corruption instead of changing other fields (e.g.,free blocks_count)

For image with multiple superblocks:
dd if=/dev/zero bs=1024 count=8193 of=/home/hdd/image1
mke2fs -b 1024 -g 512 image1 8193
debugfs -w image1
debugfs:  ssv blocks_count 4000
debugfs:  q
e2fsck -yf image1
e2fsck -yf image1  

# e2fsck fixes the blocks_count corruption in correctly
# In the clean image the blocks_count was 8193, in the fixed image the blocks_count is 4000
# The second run of e2fsck is consistent with the first run, it doesn't fix anything, but blocks_count is still 4000
#There were 16 block groups in the clean image, but there are only 7 block groups in the fixed image
# Expected that e2fsck would fix the blocks count corruption instead of changing other fields (e.g.,free blocks_count) and removing the block groups.  

I attached the images and also the logs from mke2fs, dumpe2fs and e2fsck.
Comment 1 Amy 2021-02-27 00:58:36 UTC
Can you replicate this on modern 5.4 from kernel.org? -generic kernels
are from Canonical and are sometimes broken compared to upstream. If
you can't replicate this on mainline, you'll need to contact
Canonical. We can't do anything if the problem only persists on
distribution kernels.

On Fri, Feb 26, 2021 at 1:41 PM <bugzilla-daemon@bugzilla.kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=211971
>
>             Bug ID: 211971
>            Summary: Incorrect fix by e2fsck for blocks_count corruption
>            Product: File System
>            Version: 2.5
>     Kernel Version: Linux 5.4.0-65-generic
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: ext4
>           Assignee: fs_ext4@kernel-bugs.osdl.org
>           Reporter: tmahmud@iastate.edu
>         Regression: No
>
> Created attachment 295497 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=295497&action=edit
> log files from mke2fs, dumpe2fs and e2fsck
>
> For an ext4 file system image with only one superblock, if the blocks_count
> field in superblock is corrupted, e2fsck fixed it incorrectly. In the fixed
> image, the corrupted blocks_count is unchanged and other fields (e.g., free
> blocks count) are changed accordingly.
> This issue also occurs in images with multiple superblocks too. For example,
> For an ext4 image with primary and backup superblock (backup superblocks are
> not located in default locations, e.g., it is located on 513rd block), if the
> blocks_count field in superblock is corrupted, e2fsck fixed it incorrectly.
> In
> the fixed image, the corrupted blocks_count is unchanged and other fields
> (e.g., free blocks count) are changed accordingly.
>
> e2fsprogs_version_used: e2fsprogs 1.45.6 (20-Mar-2020)
> The commands that I ran to recreate the scenario are:
> For image with only one superblock:
>
> dd if=/dev/zero bs=1024 count=8193 of=/home/hdd/image
> mke2fs -b 1024 image 8193
> debugfs -w image
> debugfs:  ssv blocks_count 4000
> debugfs:  q
> e2fsck -yf image
> e2fsck -yf image
>
> # e2fsck fixes the blocks_count corruption in correctly
> # In the clean image the blocks_count was 8193, in the fixed image the
> blocks_count is 4000
> #The second run of e2fsck is consistent with the first run, it doesn't fix
> anything, but blocks_count is still 4000
> # Expected that e2fsck would fix the blocks count corruption instead of
> changing other fields (e.g.,free blocks_count)
>
> For image with multiple superblocks:
> dd if=/dev/zero bs=1024 count=8193 of=/home/hdd/image1
> mke2fs -b 1024 -g 512 image1 8193
> debugfs -w image1
> debugfs:  ssv blocks_count 4000
> debugfs:  q
> e2fsck -yf image1
> e2fsck -yf image1
>
> # e2fsck fixes the blocks_count corruption in correctly
> # In the clean image the blocks_count was 8193, in the fixed image the
> blocks_count is 4000
> # The second run of e2fsck is consistent with the first run, it doesn't fix
> anything, but blocks_count is still 4000
> #There were 16 block groups in the clean image, but there are only 7 block
> groups in the fixed image
> # Expected that e2fsck would fix the blocks count corruption instead of
> changing other fields (e.g.,free blocks_count) and removing the block groups.
>
> I attached the images and also the logs from mke2fs, dumpe2fs and e2fsck.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
Comment 2 Theodore Tso 2021-02-27 01:29:49 UTC
On Fri, Feb 26, 2021 at 04:58:23PM -0800, Amy Parker wrote:
> Can you replicate this on modern 5.4 from kernel.org? -generic kernels
> are from Canonical and are sometimes broken compared to upstream. If
> you can't replicate this on mainline, you'll need to contact
> Canonical. We can't do anything if the problem only persists on
> distribution kernels.

This has nothing to do with the kernel.  What the user is complaining
about is that e2fsck trusts the blocks count field in the superblock
as to be a source of truth.  If that field is artificially changed to
be a smaller value, e2fsck will assume the file system size indicated
by that changed size.

That's an intentional design choice of e2fsck.  Given that with modern
ext4 file systems, we have metadata checksums, if the superblock has
been accidentally corrupted, the checksum will fail, and then e2fsck
will try using the backup superblock instead.

For older file systems that don't have metadata checksums enabled, we
could check to see if certain "fundamental constants" in the primary
superblock is different from the secondary superblock, but...

> > debugfs -w image
> > debugfs:  ssv blocks_count 4000
> > debugfs:  q

This will update the blocks_count in the primary and all secondary
backups.  So that's not going to really help the user.  Effectively,
the complaint is "I pointed the gun at my foot, and pulled the
triggered, and now my foot hurts!"

> > # Expected that e2fsck would fix the blocks count corruption instead of
> > changing other fields (e.g.,free blocks_count)

The problem is that e2fsck can't really determine that the blocks
count field has been corrupted.  We could warn the user if the
blocks_count is smaller than the reported size of the device,
but.... that's actually something that can happen in real life, and
it's not necessarily a file system "corruption", but rather an
intentional choice by the system administrator.  If we were to give a
warning, or worse, assume that blocks count should be adjusted to be
the size of the deivce, we'd be getting complaints from users who
deliberately chose to set the file system size to be something smaller
than the block device.

So this is a case of e2fsck is working as intended.

Cheers,

					- Ted
Comment 3 tmahmud 2021-03-03 17:14:27 UTC
Hello Ted,

Thank you very much for the detailed clarification! It mostly makes sense to me. But I still have two questions regarding the debugfs/e2fsck behavior.


(1)
> > > debugfs -w image
> > > debugfs:  ssv blocks_count 4000
> > > debugfs:  q
> 
> This will update the blocks_count in the primary and all secondary
> backups.  

This is different from what I observed. In my experiment, “debugfs: ssv blocks_count 4000” only updated the blocks_count (and the checksum) in the primary superblock. All secondary backups were not updated (neither the blocks_count nor the checksum). Does this imply that there is a potential bug in debugfs (because it didn’t update all backups as you suggested)?  I’m attaching two images before and after “debugfs: ssv blocks_count 4000” for reference (“image1_before”, “image1_after”). I have verified backups are not updated by dumping the backup superblocks information with dumpe2fs.


(2)
> The problem is that e2fsck can't really determine that the blocks
> count field has been corrupted.  

In my experiment, I observed that e2fsck was able to fix the debugfs-modified primary superblock using secondary superblocks when the secondary superblocks are located in default locations (ex. 8193rd block). However, in an image where secondary superblocks are not in their default locations (ex:513rd block), I found that e2fsck cannot fix the primary superblock using secondary superblocks. So e2fsck’s behavior is inconsistent depending on the location of the secondary superblocks. Could you please comment on this?
Comment 4 tmahmud 2021-03-03 17:18:12 UTC
Created attachment 295609 [details]
The before and after image after using debugfs

Note You need to log in before you can comment on or make changes to this bug.