Bug 218626

Summary: fstest ext4/014 fails when using filesystem quotas
Product: File System Reporter: Luis Henriques (luis.henriques)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: NEW ---    
Severity: normal    
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: full test log
Fix lost+found directory

Description Luis Henriques 2024-03-22 11:54:19 UTC
Created attachment 306024 [details]
full test log

I've started looking into a failure in fstest ext4/014.  It's very easy to reproduce, it simply has to be executed with quotas (without enabling quotas the test passes).

Here's the fstests options I'm using:

export MOUNT_OPTIONS="-o quota"
export MKFS_OPTIONS="-O quota"

And here's the output (the last line is where the test fails):

QA output created by 014
+ create scratch fs
+ mount fs image
+ make some files
+ check fs
+ corrupt image
+ mount image
+ modify files
broken: 1
+ repair fs
+ mount image (2)
+ chattr -R -i
+ modify files (2)
broken: 0
+ check fs (2)
fsck should not fail

Basically, the test corrupts the first block of the root directory by doing the following

- create and mount the filesystem
- create a few files
- unmount filesystem
- debugfs -w -R "zap -f / 0" <dev>
- mount, and attempt to modify some of the files created before, and unmount

After this last unmount, e2fsck attempts to fix the filesystem, including the quota info.  It exits with exit code '1' ('File system errors corrected'), which is what the test expects.

However, after yet another filesystem mount/unmount cycle, another e2fsck run will still complain about quota info being inconsistent.

The test will pass if another e2fsck is run immediately after the first one, which seems to indicate there's some fix missing in the first pass, but I couldn't figure out what (the code isn't particularly easy to grasp...).

On the other end, I'd also expect that the kernel would notice that something was wrong when the filesystem is mounted after e2fsck is run, but dmesg doesn't show anything.

For reference, I'm attaching the full test log that includes the output of e2fsck.
Comment 1 Luis Henriques 2024-03-26 15:38:09 UTC
Created attachment 306044 [details]
Fix lost+found directory

OK, I've a patch that seems to fix the issue.

When mke2fs creates the 'lost+found' directory, it ensures that there are a few empty blocks in it.  However, this test (ext4/014) corrupts the filesystem and this directory needs to be created again.

The e2fsck program, however, when recreating the lost+found directory, isn't making sure that there are these empty blocks.  These extra blocks will be then taken into account in the quota calculation, as they were in the initial (non-corrupted) filesystem.  The patch I'm attaching basically copies the loop that adds the blocks to the empty directory.  There are a few I haven't yet understood -- for example, the '16*1024' magic number is still a mistery.

Any thoughts?