Bug 40822 - EXT4-fs error (device dm-4): ext4_lookup:1044: inode #3308277: comm rm: deleted inode referenced: 3058008
Summary: EXT4-fs error (device dm-4): ext4_lookup:1044: inode #3308277: comm rm: delet...
Status: RESOLVED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-10 08:57 UTC by Sebastien Koechlin
Modified: 2014-06-05 14:59 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.7
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Result of the dumpe2fs on FS before fsck (2.46 KB, text/plain)
2011-08-10 08:57 UTC, Sebastien Koechlin
Details

Description Sebastien Koechlin 2011-08-10 08:57:49 UTC
Created attachment 68272 [details]
Result of the dumpe2fs on FS before fsck

Raid1+LVM, using a 250GB ext4 filesystem; 70% data full but 100% inode full.

I was running a "rm -rf " on ~ 6000 directories referencing nearly 1 000 000 inodes (with many many hard links, it's for backup).

The "rm" was running when I got:

[654463.455734] EXT4-fs warning (device dm-4): empty_dir:1917: bad directory (dir #2630064) - no `.' or `..'
[657796.710284] EXT4-fs error (device dm-4): ext4_lookup:1044: inode #3308277: comm rm: deleted inode referenced: 3058008
[657796.710500] Aborting journal on device dm-4-8.
[657796.725559] EXT4-fs (dm-4): Remounting filesystem read-only

No stack trace (btw I had one two hours before the first EXT4-fs warning related to RX buffer, see #13561)

First line may be related to http://lwn.net/Articles/452117/ ??

dumpe2fs report FS as "clean with errors"

fsck without "-f" run in a few seconds just to recover journal:

# fsck.ext4 /dev/vg_main/lv_lagerbe_ss
e2fsck 1.41.12 (17-May-2010)
/dev/vg_main/lv_lagerbe_ss: recovering journal
/dev/vg_main/lv_lagerbe_ss: clean, 4148253/5013504 files, 43804713/67108864 blocks

fsck -f found an error:

# fsck.ext4 -f /dev/vg_main/lv_lagerbe_ss
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'settings.sol' in /2011-08-08/home/bob/.macromedia/Flash_Player/macromedia.com/support/flashplayer/sys/#a.example.com (3308277) has deleted/unused inode 3058008.  Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vg_main/lv_lagerbe_ss: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vg_main/lv_lagerbe_ss: 4148253/5013504 files (0.5% non-contiguous), 43804713/67108864 blocks
barberine:/space#

I did a LVM snapshot before fsck, so I can extract more informations for few days if asked.
Comment 1 Sebastien Koechlin 2013-01-20 20:32:01 UTC
I have again the exact same problem using kernel 3.7.1
Comment 2 Theodore Tso 2013-01-21 02:01:26 UTC
The file system corruption reported by e2fsck is consistent with the complaint from the kernel.  Specifically, there is an entry in the directory which is pointing at a deleted inode.

What's not clear is whether this file system corruption is caused by a kernel bug, or a hardware error (bit flip by the hard drive or the memory), or something else.

What did you mean by this?

    No stack trace (btw I had one two hours before the first EXT4-fs warning
related to RX buffer, see #13561)

What do you mean by see "#13561"?
Comment 3 Catalin(ux) M. BOIE 2013-09-06 07:35:59 UTC
I am getting the same error.

I was running a Fedora 19 guest inside a Fedora 19 host.
At some moment, I had 0 free space on the partition were qcow2 guest image was.
qemu paused the machine I did press resume several times.
I made space and some time after I got this error.
Comment 4 Sebastien Koechlin 2014-03-25 10:11:23 UTC
Sorry, I did not see that I had an answer on bugzilla. I will monitor it from now.

Here is the situation:

- The problem came back every few month during this time. I run many fsck, it usually correct things, but maybe not this particular deleted inode problem (fsck bug?).

- I updated on 2014-02-10 to kernel 3.13.2 and e2fsck.static to 1.42.9-3 (e2fsck-static_1.42.9-2_i386.deb), and the problem happened today.

- About hardware error: this server is running continuously since this initial bug report without problem (no other filesystem corrupted, no segfaults, no kernel ooops...)

- About Host: Debian Linux 32bits, kernel is 64bits, RAID1+LVM2, many-many hardlinks (an inode has 29542 links), filesystem resized (to grow it) may times. It's always between 85 and 95% full.

I have in dmesg:
[3654666.866636] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #300243: comm rsync: deleted inode referenced: 2748375
[3654666.911302] Aborting journal on device dm-4-8.
[3654666.970296] EXT4-fs (dm-4): Remounting filesystem read-only
[3654667.104392] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #300243: comm rsync: deleted inode referenced: 2748375

Previous was using kernel 3.12.6:
[ 3226.048137] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: errors=remount-ro,user_xattr
[ 3526.870020] EXT4-fs (dm-4): error count: 21
[ 3526.870108] EXT4-fs (dm-4): initial error at 1372893876: ext4_lookup:1428: inode 915940
[ 3526.870271] EXT4-fs (dm-4): last error at 1372930167: ext4_put_super:762: inode 1676891

e2fsck just recover the journal and with -f does not find anything wrong.
Comment 5 Sebastien Koechlin 2014-04-11 09:08:51 UTC
It just happened again this night with same vanilia 3.13.2

[5119710.472436] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #2152907: comm rsync: deleted inode referenced: 2707130
[5119710.522811] Aborting journal on device dm-4-8.
[5119710.574721] EXT4-fs (dm-4): Remounting filesystem read-only
[5119710.574919] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #539523: comm rsync: deleted inode referenced: 2707130
[5119710.575622] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #2368102: comm rsync: deleted inode referenced: 2707130
[5119710.576139] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #2466174: comm rsync: deleted inode referenced: 2707130
[5119710.576593] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #2563212: comm rsync: deleted inode referenced: 2707130
[5119710.577058] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #2608359: comm rsync: deleted inode referenced: 2707130
[5119710.577537] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #2459036: comm rsync: deleted inode referenced: 2707130
[5119710.577986] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #507856: comm rsync: deleted inode referenced: 2707130
[5119710.578412] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #461587: comm rsync: deleted inode referenced: 2707130
[5119710.578859] EXT4-fs error (device dm-4): ext4_lookup:1437: inode #506061: comm rsync: deleted inode referenced: 2707130


umount, dume2fs says Filesystem state: clean with errors
I had to run fsck (1.42.9) twice:

barberine:/# e2fsck.static -f /dev/vg_main/lv_lagerbe
e2fsck 1.42.9 (28-Dec-2013)
/dev/vg_main/lv_lagerbe: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (8520386, counted=6873111).
Fix<y>? yes
Free inodes count wrong (1313708, counted=800557).
Fix<y>? yes

/dev/vg_main/lv_lagerbe: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vg_main/lv_lagerbe: 4212947/5013504 files (0.4% non-contiguous), 60235753/67108864 blocks
barberine:/# e2fsck.static -f /dev/vg_main/lv_lagerbe
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Invalid inode number for '.' in directory inode 2022770.
Fix<y>? yes
Pass 3: Checking directory connectivity
'..' in /2012-02-01/home/xxxx/.gconf/desktop/gnome/peripherals/keyboard/host-ramoth/0 (2022770) is /2012-02-01/home/xxxx/.gconf/desktop/gnome/peripherals/keyboard/host-ramothold (2022765), should be /2012-02-01/home/xxxx/.gconf/desktop/gnome/peripherals/keyboard/host-ramoth (2022764).
Fix<y>? yes
Pass 4: Checking reference counts
Inode 2022764 ref count is 4, should be 3.  Fix<y>? yes
Inode 2022765 ref count is 2, should be 3.  Fix<y>? yes
Inode 2238203 ref count is 26, should be 25.  Fix<y>? yes
Inode 4195764 ref count is 23, should be 24.  Fix<y>? yes
Pass 5: Checking group summary information

/dev/vg_main/lv_lagerbe: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vg_main/lv_lagerbe: 4212947/5013504 files (0.4% non-contiguous), 60235753/67108864 blocks
barberine:/space/gbu# e2fsck.static -f /dev/vg_main/lv_lagerbe
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vg_main/lv_lagerbe: 4212947/5013504 files (0.4% non-contiguous), 60235753/67108864 blocks
Comment 6 Theodore Tso 2014-04-11 12:33:49 UTC
The fact that the first fsck had just summary accounting errors in pass 1, and then a subsequent, back-to-back fsck showed errors in pass2, screams either (a) some kind of hardware error, (b) the two mirrored copies of the RAID1 are out of sync, or (c) someone or something else was modifying the filesytem at the same time (i.e., the file system was mounted, and for some reason e2fsck's safety checks didn't trigger, or it some other fsck or userspace program was running at the same time, etc.)

I'm going to guess that (b) is the most likely, since this is a local filesystem (i.e., you're not using fibre channel, or a remote block device, or a dual-hosted SCSI arrangement or anything else exotic).   Could you try to force a resync of the RAID mirrors, and then run another e2fsck -f check?
Comment 7 Sebastien Koechlin 2014-06-05 14:59:35 UTC
It's a Debian system, it's performing an array-check every month (by echoing check into relevant /sys/block/$array/md/sync_action) and I have never seen any error in dmesg.

But I found a strange drive, its SMART attributes were not updated for more than 6 month (Power_On_Hours in particular).

I changed this drive and I can not reproduce any error. You've probably guessed right; so I close this bug.

Thanks a lot.

Note You need to log in before you can comment on or make changes to this bug.