I use a 24TB SW-RAID6 with 10 3TB HDDs. This array contains a dmcrypt container, which contains a ext4 FS [1]. One of the disks had errors and got kicked out of the array. Before I was able to replace it, the ext4 FS began to throw errors like these: EXT4-fs error (device dm-1): ext4_find_dest_de:1802: inode #3833864: block 61343924: comm nfsd: bad entry in directory: rec_len % 4 != 0 - offset=1000, inode=2620025549, rec_len=30675, name_len=223, size=4096 EXT4-fs error (device dm-1): ext4_lookup:1577: inode #172824586: comm tvh:tasklet: iget: bad extra_isize 13022 (inode size 256) EXT4-fs error (device dm-1): htree_dirblock_to_tree:1010: inode #7372807: block 117967811: comm tar: bad entry in directory: rec_len % 4 != 0 - offset=104440, inode=1855122647, rec_len=12017, name_len=209, size=4096 I then used e2fsck to check the FS for errors, but it only created dozens of the following output lines: German original: "Block %$b von Inode %$i steht in Konflikt mit kritischen Metadaten, Blockprüfungen werden übersprungen." Translated: "Inode %$i block %$b conflicts with critical metadata, skipping block checks." It also filled up my RAM, so I added ~100G of swap space. After scanning for a few days, the e2fsck process died. Afterwards I was able to replace the disk, re-scan the FS with e2fsck and after 2 hours, the ext4 FS was clean again. To understand the problem, I set one of the disks faulty, and suddenly the ext4 errors occurred again. I added a spare disk, resynced the array, but e2fsck is unable to fix the errors, and only throws "Inode %$i block %$b conflicts with critical metadata, skipping block checks." while using more and more RAM. Used software versions: Kernel 4.19.52-gentoo e2fsprogs-1.44.5 mdadm-4.1 Please let me know, if I should provide additional information. Kind regards, Yann ########################################## additional information #e2fsck -c /dev/mapper/share e2fsck 1.44.5 (15-Dec-2018) badblocks: ungültige letzter Block - 5860269055 /dev/mapper/share: Updating bad block inode. Durchgang 1: Inodes, Blöcke und Größen werden geprüft Block %$b von Inode %$i steht in Konflikt mit kritischen Metadaten, Blockprüfungen werden übersprungen. [...] Block %$b von Inode %$i steht in Konflikt mit kritischen Metadaten, Blockprüfungen werden übersprungen. Signal (6) SIGABRT si_code=SI_TKILL e2fsck(+0x33469)[0x55ec4942a469] /lib64/libc.so.6(+0x39770)[0x7f562449f770] /lib64/libc.so.6(gsignal+0x10b)[0x7f562449f6ab] /lib64/libc.so.6(abort+0x123)[0x7f5624488539] /lib64/libext2fs.so.2(+0x18af5)[0x7f5624ac3af5] e2fsck(+0x18c30)[0x55ec4940fc30] /lib64/libext2fs.so.2(+0x1a1ec)[0x7f5624ac51ec] /lib64/libext2fs.so.2(+0x1a589)[0x7f5624ac5589] /lib64/libext2fs.so.2(+0x1b04b)[0x7f5624ac604b] e2fsck(+0x1978c)[0x55ec4941078c] e2fsck(+0x1b0f6)[0x55ec494120f6] e2fsck(+0x1b1dc)[0x55ec494121dc] /lib64/libext2fs.so.2(ext2fs_get_next_inode_full+0x8e)[0x7f5624adc53e] e2fsck(e2fsck_pass1+0xa12)[0x55ec49412d22] e2fsck(e2fsck_run+0x6a)[0x55ec4940b54a] e2fsck(main+0xefd)[0x55ec4940703d] /lib64/libc.so.6(__libc_start_main+0xeb)[0x7f5624489e6b] e2fsck(_start+0x2a)[0x55ec494092ea] --------------------------------------------------------------------- #tune2fs -l /dev/mapper/share tune2fs 1.44.5 (15-Dec-2018) Filesystem volume name: <none> Last mounted on: /home/share Filesystem UUID: c5f0559d-e3bd-473f-abc0-7c42b3115897 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: ext_attr dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 366268416 Block count: 5860269056 Reserved block count: 0 Free blocks: 755383351 Free inodes: 363816793 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 2048 Inode blocks per group: 128 RAID stride: 128 RAID stripe width: 1024 Flex block group size: 16 Filesystem created: Sat Mar 17 14:36:16 2018 Last mount time: Fri Jun 21 05:25:34 2019 Last write time: Fri Jun 21 05:30:27 2019 Mount count: 3 Maximum mount count: -1 Last checked: Thu Jun 20 08:55:17 2019 Check interval: 0 (<none>) Lifetime writes: 139 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Default directory hash: half_md4 Directory Hash Seed: 4c37872e-3207-4ff4-8939-a428feaeb49f Journal backup: inode blocks FS Error count: 20776 First error time: Thu Jun 20 14:18:47 2019 First error function: ext4_lookup First error line #: 1577 First error inode #: 172824586 First error block #: 0 Last error time: Fri Jun 21 05:53:24 2019 Last error function: ext4_lookup Last error line #: 1577 Last error inode #: 172824586 Last error block #: 0 --------------------------------------------------------------------- #cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md2 : active raid6 sde[0] sdc[9] sdb[8] sdj[7] sdi[6] sda[11] sdd[10] sdh[3] sdg[2] sdf[1] 23441080320 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU] bitmap: 0/22 pages [0KB], 65536KB chunk md1 : active raid1 sdk2[0] sdl2[2] 52396032 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdk1[0] sdl1[2] 511680 blocks super 1.2 [2/2] [UU] unused devices: <none>
This seems like a RAID problem and not an ext4 problem. The RAID array shouldn't be returning random garbage if one of the drives is unavailable. Maybe it is not doing data parity verification on reads, so that it is blindly returning bad blocks from the failed drive rather than reconstructing valid data from parity if the drive does not fail completely?
Did you resync the disks *before* you ran e2fsck? Or only afterwards?
Andreas & Ted, thank you for your replies. (In reply to Andreas Dilger from comment #1) > This seems like a RAID problem and not an ext4 problem. The RAID array > shouldn't be returning random garbage if one of the drives is unavailable. > Maybe it is not doing data parity verification on reads, so that it is > blindly returning bad blocks from the failed drive rather than > reconstructing valid data from parity if the drive does not fail completely? How can I check that? At least running "checkarray" did not find anything new or helpful. (In reply to Theodore Tso from comment #2) > Did you resync the disks *before* you ran e2fsck? Or only afterwards? 1. my RAID6 got degraded and ext4 errors showed up 2. I ran e2fsck, it consumed all memory and showed only "Inode %$i block %$b conflicts with critical metadata, skipping block checks." 3. I replaced the faulty disk and resynced the RAID6 4. e2fsck was able to clean the filesystem 5. I simulated a drive fault (so my RAID6 had n+1 working disks left) 6. the ext4 FS got corrupted again 7. although the RAID is clean again, e2fsck is not able to clean the FS (like in step 2)
That sounds *very* clearly as a RAID bug. If RAID6 is returning garbage to the file system in degraded mode, there is nothing the file system can do. What worries me is if the RAID6 system was returning garbage when *reading* who knows how it was trashing the file system image when the ext4 kernel code was *writing* to it? In any case, there's very little we as ext4 developers can do here to help, except give you some advice for how to recover your file system. What I'd suggest that you do is to use the debugfs tool to sanity check the inode. If the inode number reported by e2fsck was 123456, you can look at it by using the debugfs command: "stat <123456>". If the timestamps, user id and group id numbers, etc, look insane, you can speed up the recovery time by using the command "clri <123456>", which zeros out the inode.
Thank you for your support, Ted. About one week before the RAID got degraded, I created a full file-based backup, so at least I can expect minimal data loss - I just would like to save the time it would take to copy ~22TB back :-) Before filing a possible bug in the correct section for RAID, I'd like to comprehend the steps you described. In fact e2fsck does not report inode numbers, only variables ("%$i", and so does it for blocks, "%$b"). Using the the total inode count number in debugfs leads to an error: share ~ # tune2fs -l /dev/mapper/share | grep "Inode count" Inode count: 366268416 share ~ # debugfs /dev/mapper/share debugfs 1.44.5 (15-Dec-2018) debugfs: stat 366268416 366268416: File not found by ext2_lookup Or did I get you wrong?
That's because the German translation is busted. I've complained to the German maintainer of the e2fsprogs messages file at the Translation Project, but it hasn't been fixed yet. See [1] for more details. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892173#10 It *should* have actually printed the inode number, and it does if you just use the built-in English text. In general, I *really* wish people would disable the use of the translation when reporting bugs, because very often the use of the translations add noise that make life harder for developers, and in this case, it was losing information for you, the system administrator. See the debugfs man page for more details, but to specify inode numbers to debugfs, you need to surround the inode number with angle brackets. Here are some examples: % debugfs /tmp/test.img debugfs 1.45.2 (27-May-2019) debugfs: ls a 12 (12) . 2 (12) .. 13 (988) motd debugfs: stat /a/motd Inode: 13 Type: regular Mode: 0644 Flags: 0x80000 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 286 File ACL: 0 Links: 1 Blockcount: 2 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 atime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 mtime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 Inode checksum: 0x0000857d EXTENTS: (0):20 debugfs: stat <13> Inode: 13 Type: regular Mode: 0644 Flags: 0x80000 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 286 File ACL: 0 Links: 1 Blockcount: 2 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 atime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 mtime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 Inode checksum: 0x0000857d EXTENTS: (0):20 debugfs: ncheck 13 Inode Pathname 13 /a/motd debugfs: stat a Inode: 12 Type: directory Mode: 0755 Flags: 0x80000 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 1024 File ACL: 0 Links: 2 Blockcount: 2 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d0d416b -- Fri Jun 21 16:43:23 2019 atime: 0x5d0d416b -- Fri Jun 21 16:43:23 2019 mtime: 0x5d0d416b -- Fri Jun 21 16:43:23 2019 Inode checksum: 0x000042d4 EXTENTS: (0):18 debugfs: block_dump 18 0000 0c00 0000 0c00 0102 2e00 0000 0200 0000 ................ 0020 0c00 0202 2e2e 0000 0d00 0000 dc03 0401 ................ 0040 6d6f 7464 0000 0000 0000 0000 0000 0000 motd............ 0060 0000 0000 0000 0000 0000 0000 0000 0000 ................ * 1760 0000 0000 0000 0000 0c00 00de be4e 16e9 .............N.. debugfs:
I changed the locale to en_GB and got now the actual 5 or 6 inodes, which caused the extremely slow procedure of e2fsck. I zeroed them with debugfs and afterwards, e2fsck was able to fix many, many errors. lost+found contains 934G of data (53190 entries). During the next days, I will try to examine them. While copying files out of my backup, I re-tested a failure of a disk an removed it from the array. After a while I re-added it and now the array is rsyncing. I will watch the logs for possible FS errors - for now, all seems clean. Thanks a lot for your support!
It seems, as if the data corruption only affects the directories and files, that the system wrote to while the ext4 FS was not clean. A quick rsync between my backup fileserver copied nothing of the static data back to the productive system - so I assume, that these files should be okay (but of course, I will test that at least randomly). To sum it up: the data loss affects my backup files locally on this system and my tv recordings - 934G of total data size are just about right for that.