Bug 203943
Summary: | ext4 corruption after RAID6 degraded; e2fsck skips block checks and fails | ||
---|---|---|---|
Product: | File System | Reporter: | Yann Ormanns (yann) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | NEW --- | ||
Severity: | normal | CC: | adilger.kernelbugzilla, tytso |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.19.52-gentoo | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Yann Ormanns
2019-06-21 06:51:27 UTC
This seems like a RAID problem and not an ext4 problem. The RAID array shouldn't be returning random garbage if one of the drives is unavailable. Maybe it is not doing data parity verification on reads, so that it is blindly returning bad blocks from the failed drive rather than reconstructing valid data from parity if the drive does not fail completely? Did you resync the disks *before* you ran e2fsck? Or only afterwards? Andreas & Ted, thank you for your replies. (In reply to Andreas Dilger from comment #1) > This seems like a RAID problem and not an ext4 problem. The RAID array > shouldn't be returning random garbage if one of the drives is unavailable. > Maybe it is not doing data parity verification on reads, so that it is > blindly returning bad blocks from the failed drive rather than > reconstructing valid data from parity if the drive does not fail completely? How can I check that? At least running "checkarray" did not find anything new or helpful. (In reply to Theodore Tso from comment #2) > Did you resync the disks *before* you ran e2fsck? Or only afterwards? 1. my RAID6 got degraded and ext4 errors showed up 2. I ran e2fsck, it consumed all memory and showed only "Inode %$i block %$b conflicts with critical metadata, skipping block checks." 3. I replaced the faulty disk and resynced the RAID6 4. e2fsck was able to clean the filesystem 5. I simulated a drive fault (so my RAID6 had n+1 working disks left) 6. the ext4 FS got corrupted again 7. although the RAID is clean again, e2fsck is not able to clean the FS (like in step 2) That sounds *very* clearly as a RAID bug. If RAID6 is returning garbage to the file system in degraded mode, there is nothing the file system can do. What worries me is if the RAID6 system was returning garbage when *reading* who knows how it was trashing the file system image when the ext4 kernel code was *writing* to it? In any case, there's very little we as ext4 developers can do here to help, except give you some advice for how to recover your file system. What I'd suggest that you do is to use the debugfs tool to sanity check the inode. If the inode number reported by e2fsck was 123456, you can look at it by using the debugfs command: "stat <123456>". If the timestamps, user id and group id numbers, etc, look insane, you can speed up the recovery time by using the command "clri <123456>", which zeros out the inode. Thank you for your support, Ted. About one week before the RAID got degraded, I created a full file-based backup, so at least I can expect minimal data loss - I just would like to save the time it would take to copy ~22TB back :-) Before filing a possible bug in the correct section for RAID, I'd like to comprehend the steps you described. In fact e2fsck does not report inode numbers, only variables ("%$i", and so does it for blocks, "%$b"). Using the the total inode count number in debugfs leads to an error: share ~ # tune2fs -l /dev/mapper/share | grep "Inode count" Inode count: 366268416 share ~ # debugfs /dev/mapper/share debugfs 1.44.5 (15-Dec-2018) debugfs: stat 366268416 366268416: File not found by ext2_lookup Or did I get you wrong? That's because the German translation is busted. I've complained to the German maintainer of the e2fsprogs messages file at the Translation Project, but it hasn't been fixed yet. See [1] for more details. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892173#10 It *should* have actually printed the inode number, and it does if you just use the built-in English text. In general, I *really* wish people would disable the use of the translation when reporting bugs, because very often the use of the translations add noise that make life harder for developers, and in this case, it was losing information for you, the system administrator. See the debugfs man page for more details, but to specify inode numbers to debugfs, you need to surround the inode number with angle brackets. Here are some examples: % debugfs /tmp/test.img debugfs 1.45.2 (27-May-2019) debugfs: ls a 12 (12) . 2 (12) .. 13 (988) motd debugfs: stat /a/motd Inode: 13 Type: regular Mode: 0644 Flags: 0x80000 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 286 File ACL: 0 Links: 1 Blockcount: 2 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 atime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 mtime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 Inode checksum: 0x0000857d EXTENTS: (0):20 debugfs: stat <13> Inode: 13 Type: regular Mode: 0644 Flags: 0x80000 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 286 File ACL: 0 Links: 1 Blockcount: 2 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 atime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 mtime: 0x5d0d416e -- Fri Jun 21 16:43:26 2019 Inode checksum: 0x0000857d EXTENTS: (0):20 debugfs: ncheck 13 Inode Pathname 13 /a/motd debugfs: stat a Inode: 12 Type: directory Mode: 0755 Flags: 0x80000 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 1024 File ACL: 0 Links: 2 Blockcount: 2 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d0d416b -- Fri Jun 21 16:43:23 2019 atime: 0x5d0d416b -- Fri Jun 21 16:43:23 2019 mtime: 0x5d0d416b -- Fri Jun 21 16:43:23 2019 Inode checksum: 0x000042d4 EXTENTS: (0):18 debugfs: block_dump 18 0000 0c00 0000 0c00 0102 2e00 0000 0200 0000 ................ 0020 0c00 0202 2e2e 0000 0d00 0000 dc03 0401 ................ 0040 6d6f 7464 0000 0000 0000 0000 0000 0000 motd............ 0060 0000 0000 0000 0000 0000 0000 0000 0000 ................ * 1760 0000 0000 0000 0000 0c00 00de be4e 16e9 .............N.. debugfs: I changed the locale to en_GB and got now the actual 5 or 6 inodes, which caused the extremely slow procedure of e2fsck. I zeroed them with debugfs and afterwards, e2fsck was able to fix many, many errors. lost+found contains 934G of data (53190 entries). During the next days, I will try to examine them. While copying files out of my backup, I re-tested a failure of a disk an removed it from the array. After a while I re-added it and now the array is rsyncing. I will watch the logs for possible FS errors - for now, all seems clean. Thanks a lot for your support! It seems, as if the data corruption only affects the directories and files, that the system wrote to while the ext4 FS was not clean. A quick rsync between my backup fileserver copied nothing of the static data back to the productive system - so I assume, that these files should be okay (but of course, I will test that at least randomly). To sum it up: the data loss affects my backup files locally on this system and my tv recordings - 934G of total data size are just about right for that. |