Bug 215879 - EXT4-fs error - __ext4_find_entry:1612: inode #2: comm systemd: reading directory lblock 0
Summary: EXT4-fs error - __ext4_find_entry:1612: inode #2: comm systemd: reading direc...
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-24 15:20 UTC by sander44
Modified: 2023-02-19 23:16 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.18.0-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
config kernel (263.72 KB, text/plain)
2022-04-24 15:23 UTC, sander44
Details
photo bug (1.24 MB, image/jpeg)
2022-04-24 15:31 UTC, sander44
Details

Description sander44 2022-04-24 15:20:35 UTC
Hi Kernel Team,

I notice this issue:

EXT4-fs error - __ext4_find_entry:1612: inode #2: comm systemd: reading directory lblock 0


I think this error is sporadic.


Drives:    Local Storage: total: 953.87 GiB used: 45.04 GiB (4.7%) 
           ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKNU010TZ size: 953.87 GiB temp: 52.9 C 
Partition: ID-1: / size: 250.25 GiB used: 40.4 GiB (16.1%) fs: ext4 dev: /dev/nvme0n1p2 
           ID-2: /boot/efi size: 252 MiB used: 274 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 
           ID-3: /home size: 678.39 GiB used: 4.65 GiB (0.7%) fs: ext4 dev: /dev/nvme0n1p4 
Swap:      ID-1: swap-1 type: partition size: 8 GiB used: 0 KiB (0.0%) dev: /dev/nvme0n1p3 

OS: Debian 11/MXLinux KDE
Kernel: 5.18.0-rc3 vanilla

cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.18.0-1-generic root=UUID=166304ea-bc80-458b-99a1-8a39a4e71a09 ro quiet splash clocksource=hpet init=/lib/systemd/systemd

dpkg -l | grep -E "systemd|preload"
ii  libpam-systemd:amd64                          1:247.3-6mx21                          amd64        system and service manager - PAM module
ii  libsystemd0:amd64                             1:247.3-6mx21                          amd64        systemd utility library
ii  preload                                       0.6.4-5+b1                             amd64        adaptive readahead daemon
ii  systemd                                       1:247.3-6mx21                          amd64        system and service manager
ii  systemd-shim                                  10-5                                   amd64        shim for systemd

lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166a
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166b
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166c
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166d
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166e
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 166f
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1670
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1671
01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)
02:00.0 Network controller: MEDIATEK Corp. Device 7961
03:00.0 Non-Volatile memory controller: Intel Corporation Device f1aa (rev 03)
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c4)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
04:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor (rev 01)
04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller

CPU Model name:                      AMD Ryzen 9 5900HS with Radeon Graphics
Comment 1 sander44 2022-04-24 15:23:39 UTC
Created attachment 300795 [details]
config kernel
Comment 2 sander44 2022-04-24 15:31:58 UTC
Created attachment 300796 [details]
photo bug
Comment 3 Theodore Tso 2022-04-25 04:12:03 UTC
We should really improve the error message; what this indicates is that while trying to read from a directory, ext4 received an I/O error from the storage device:

		wait_on_buffer(bh);
		if (!buffer_uptodate(bh)) {
			EXT4_ERROR_INODE_ERR(dir, EIO,
					     "reading directory lblock %lu",
					     (unsigned long) block);
			brelse(bh);
			ret = ERR_PTR(-EIO);
			goto cleanup_and_exit;
		}

I'm not sure why systemd is trying to read so many directories, and why you aren't seeing any I/O error messages logged on the console.  At a guess, the block device has completely failed, and there were messages about the unfolding disaster that has since scrolled off your screen.

Bottom line, this is very likely an hardware problem of some sort.
Comment 4 Vladimir G. Ivanovic 2022-05-26 19:18:29 UTC
I am getting the same error as sander44 (the OP) reported except the name of the program generating the error is different.

EXT4-fs error (device sdb1): __ext4_find_entry:1612: inode #2: comm vorta: reading directory lblock 0

The behavior of the Samsung 1TB SDXC card is predictable. It works for minutes/hour, then throws the above error. This particular SD card is used for borg backup which occur every 15 minutes (and also every 3 hours).

My (temporary) workaround is to unmount the SD card, run e2fsck, then re-mount the card.

I'm suspicious of this being exclusively a hardware error because it is always the same error at the same spot. Might it be an unexpected call or unexpected function arguments? (I have no knowledge of ext4 internals, so it's likely I'm talking nonsense.)

I'm happy to provide more detail or even roll my own kernel to provide debugging symbols. (I'm using Arch 5.15.43-1-lts, but the error is independent of what kernel I'm using.)

— Vladimir
Comment 5 sander44 2023-02-19 14:30:09 UTC
Hi,

I notice today this issue with 6.1.12 kernel version.

On Ubuntu Team, i view this:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1993205

But with my search i view workaround: 
https://unix.stackexchange.com/questions/375450/random-ssd-turn-off-ext4-find-entry-reading-directory-lblock0
https://askubuntu.com/questions/905710/ext4-fs-error-after-ubuntu-17-04-upgrade

Will try with this: nvme_core.default_ps_max_latency_us=200 for testing this issue and to see if it reproduces.
Comment 6 Theodore Tso 2023-02-19 23:16:14 UTC
Note that what you found in your stack exchange search was from five years ago, and described a workaround in a Linux kernel versiojn 4.10.   In addition to manually disabling APST (a quirk for a very specific Samsung SSD which has since been added to newer kernels), other suggestions in the stack exchange or linked web pages included " removing SDD, blowing air into M.2 connector and reinserting it back" and "switching off the 'UEFI Secure Boot' setting in the BIOS"

All of which is to say that the symptom is caused by an I/O error, and there are many potential causes for an I/O error --- everything from missing quirks (to work around broken firmware / hardware design) to bad connections to misconfigured BIOS settings to just plain broken hardware.

This is why blindly web searching based on symptoms can often lead to misleading results; an abdominal pain could mean anything from indigestion, to a pulled muscle, to an infected appendix, to a heart attack.  It's also why I am not fond of people finding bug reports on the web and assuming that anything that has the same symptom must have the same root cause.....

Note You need to log in before you can comment on or make changes to this bug.