Since 5.5 I started getting persistent hits at the check added in 109ba779d6cca (ext4: check for directory entries too close to block end). It is 100% reproducible when running docker containers on overlayfs2. Here's an example log entry:
kernel: EXT4-fs error (device dm-0): ext4_search_dir:1395: inode #28320400: block 113246792: comm dockerd: bad entry in directory: directory entry too close to block end - offset=0, inode=28320403, rec_len=32, name_len=8, size=4096
dockerd: time="2020-04-08T11:03:35.148433258-07:00" level=error msg="Error removing mounted layer c520f6ce1d0b493e51aa9cdaea2240c6f65f104c3da8fb9767999dc526086f85: unlinkat /var/lib/docker/overlay2/01c0c02ee4841227fefe595eeef8912fee32bc2b63a2264cb513f924e6366950/diff/tmp/apt-key-gpghome.TauCtRwzyD: directory not empty"
To clarify, this error happened elsewhere as well, so this doesn't seem to be overlayfs2-specific.
At first I thought that my filesystem was borked somehow, so I went so far as to reformat the partition, but that didn't help.
Could you run dumpe2fs -h on the file system and attach it to the bug?
dumpe2fs 1.45.6 (20-Mar-2020)
Filesystem volume name: root
Last mounted on: /
Filesystem UUID: 1ca09d01-202a-4a0c-a150-8d078c57d751
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg inline_data sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 31031296
Block count: 124111616
Reserved block count: 6205580
Free blocks: 37127036
Free inodes: 26630302
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 1024
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Tue Apr 7 23:31:35 2020
Last mount time: Thu Jul 9 09:47:02 2020
Last write time: Thu Jul 9 11:30:13 2020
Mount count: 6
Maximum mount count: 38
Last checked: Fri May 22 14:10:16 2020
Check interval: 15552000 (6 months)
Next check after: Wed Nov 18 13:10:16 2020
Lifetime writes: 2756 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
First orphan inode: 7122511
Default directory hash: half_md4
Directory Hash Seed: 7fd796d0-3b05-456b-8550-2734924aa361
Journal backup: inode blocks
FS Error count: 2
First error time: Thu Jul 9 11:30:13 2020
First error function: ext4_search_dir
First error line #: 1399
First error inode #: 28320400
First error block #: 113246792
Last error time: Thu Jul 9 11:30:13 2020
Last error function: ext4_search_dir
Last error line #: 1399
Last error inode #: 28328032
Last error block #: 113247269
Checksum type: crc32c
Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3
Journal size: 1024M
Journal length: 262144
Journal sequence: 0x0052bb7e
Journal start: 66275
Journal checksum type: crc32c
Journal checksum: 0x17d02966
I can still trigger this pretty reliably with docker on overlayfs. Anything I can do to help narrow this down?
Can you give a reliable repro that will work everywhere?
Also, can you try reformatting the file system without inline_data and see if the problem goes away? Inline_data is not something which I consider as mature as other ext4 features.
FWIW I've checked fs/ext4/inline.c and the way it calls ext4_search_dir() which ends up calling ext4_check_dir_entry() indeed looks broken. I'll have a look into fixing that.
Created attachment 290705 [details]
[PATCH] ext4: Fix checking of entry validity
This patch fixes the failures for me. I've submitted it to Ted for inclusion.
The patch fixes the bug for me as well. Thanks Jan!
Thanks for info. Ted has picked up the patch so I'm closing the bug.