Bug 32182
Summary: | EXT4-fs error: bad header/extent | ||
---|---|---|---|
Product: | File System | Reporter: | Ramses VdP (ramses) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED OBSOLETE | ||
Severity: | high | CC: | adam.surak, alan, hany+kernel.org, hdfssk, ihhuang, sandeen, tytso |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.38.1 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Ramses VdP
2011-03-29 16:47:44 UTC
Reply-To: xiaoqiangnk@gmail.com Could the bug be reproduced? If so, how to? On Wed, Mar 30, 2011 at 12:47 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=32182 > > Summary: EXT4-fs error: bad header/extent > Product: File System > Version: 2.5 > Kernel Version: 2.6.38.1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: ext4 > AssignedTo: fs_ext4@kernel-bugs.osdl.org > ReportedBy: ramses.rommel@gmail.com > Regression: No > > > I started noticing ext4-fs errors in my dmesg output, they look like this: > > EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760218: comm > dropbox: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), > depth > 0(0) > EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760208: comm > dropbox: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), > depth > 0(0) > EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760218: comm > thunderbird-bin: bad header/extent: invalid magic - magic 0, entries 0, max > 0(0), depth 0(0) > EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760208: comm > thunderbird-bin: bad header/extent: invalid magic - magic 0, entries 0, max > 0(0), depth 0(0) > > I fsck'ed the partition which showed nothing but the regular output but said > the file system was modified, after a reboot the same errors returned though. > I'm using e2fsprogs 1.41.14 and linux 2.6.38.1 and after a downgrade to linux > 2.6.37.5 the errors kept reappearing (I'll test 2.6.38.2 soon but I didn't > find > commit messages that seemed relevant). > > I ran debugfs on one of the inodes: > > % debugfs -R 'stat <3760208>' /dev/disk/by-label/home > debugfs 1.41.14 (22-Dec-2010) > Inode: 3760208 Type: regular Mode: 0644 Flags: 0x80000 > Generation: 231468018 Version: 0x00000001 > User: 1000 Group: 100 Size: 42 > File ACL: 0 Directory ACL: 0 > Links: 1 Blockcount: 0 > Fragment: Address: 0 Number: 0 Size: 0 > ctime: 0x4b2a09ff -- Thu Dec 17 11:37:51 2009 > atime: 0x4d909367 -- Mon Mar 28 15:55:51 2011 > mtime: 0x490976f8 -- Thu Oct 30 09:57:28 2008 > EXTENTS: > > I'm not sure what extra info to give you, but I'd be glad to provide stuff > you > need to reproduce/fix this. > > -- > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are watching the assignee of the bug. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > I don't have to do anything to reproduce it, the errors show up in dmesg while starting up programs mostly. I also don't think there was a specific trigger that caused it, just normal fs usage. I guess there is either some corruption in my file system that fsck does not fix or that the kernel is reporting errors while it shouldn't. I can provide you with more debug info, but I'm not that familiar with the tools to do so, if you could explain what data you need to understand where the bug sits, I'd be happy to provide it to you. HTH, Ramses If most recent fsck doesn't find anything, but runtime finds errors, perhaps you can provide a bzip2'd e2image -r of the fileystem in question? If filenames are sensitive/private, there is an option to scramble them. That way we can see what it's finding, why it's complaining, and why fsck doesn't seem to care. It's interesting that the "bad" data is all 0s though. What kind of storage is this? This is about my /home partition, stored on a conventional hard drive (a seagate momentus). I'll add the requested image. The image was too big to attach, you can find it at http://db.tt/uXGjQPT Still present in 2.6.38.3, does the image I uploaded reveal anything useful? I can provide more info of needed. I hit this problem with 2.6.35.13 after a power outage. The affected inode was empty & had no extents - these commands just returned nothing (was using "debugfs 1.41.14 (22-Dec-2010)"): # debugfs -R 'cat <122877>' /dev/mapper/Fedora-FedoraSystem debugfs 1.41.14 (22-Dec-2010) # debugfs -R 'dump_extents <122877>' /dev/mapper/Fedora-FedoraSystem debugfs 1.41.14 (22-Dec-2010) The file using this inode wouldn't respond to shell commands: # find -inum 122877 /usr/libexec/gdm-simple-slave # stat gdm-simple-slave stat: cannot stat `gdm-simple-slave': Input/output error # rm gdm-simple-slave rm: cannot remove `gdm-simple-slave': Input/output error Multiple fsck.ext4 runs didn't fix things. I eventually made the offending file go away - I've no idea if this procedure is safe, or a good idea! - with these commands. # debugfs -R 'testi <122877>' /dev/mapper/Fedora-FedoraSystem debugfs 1.41.14 (22-Dec-2010) Inode 122877 is marked in use # debugfs -w -R 'kill_file <122877>' /dev/mapper/Fedora-FedoraSystem debugfs 1.41.14 (22-Dec-2010) # debugfs -R 'testi <122877>' /dev/mapper/Fedora-FedoraSystem debugfs 1.41.14 (22-Dec-2010) Inode 122877 is not in use # ls /usr/libexec/gdm-simple-slave ls: cannot access /usr/libexec/gdm-simple-slave: Input/output error # debugfs -w -R 'rm /usr/libexec/gdm-simple-slave' /dev/mapper/Fedora-FedoraSystem debugfs 1.41.14 (22-Dec-2010) # ls /usr/libexec/gdm-simple-slave ls: cannot access /usr/libexec/gdm-simple-slave: No such file or directory • ls gdm* ls: cannot access gdm*: No such file or directory Hi all, is there any further discussion about the issue? I met the same trouble with 2.6.35.13. Thanks. Ramses, sorry, I somehow missed your attachment, I'll see what I can see. I'm seeing similar errors during boot: [ 91.652208] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) [ 91.940546] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) [ 91.954354] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) [ 93.449371] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) Some additional information: # uname -r 2.6.35.14-95.fc14.x86_64 If this is still seen on modern kernels please re-open/update thanks Hello, I'm still seeing this: $ dmesg | grep "EXT4-fs error" [ 70.498238] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) [ 70.659036] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) [ 70.744199] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) [ 72.746596] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) $ uname -r 3.4.7-1.fc16.x86_64 Same PC and filesystem as last time. I wonder whether copying to new filesystem would help because since last time I updated Xorg several times, replacing the binary (ussual Fedora updates). The error messages listed above just means that file system has been corrupted. Have you tried rebooting and letting e2fsck repair the file system? If you can reliably cause the file system to get corrupted in this way, please open a new bug report with the reproduction instructions. But the mere fact that you see these messages doesn't mean that the kernel is buggy.... Reboot + fsck was done quite a lot of times since it started occuring. Some bugs do get found and fixed but the message never goes away. I have no further info to open new bug. Any suggestions as to what I might try to either fix the inode or compile some more information so as to file bug report? Thanks in advance. Could you please open a new bug report with the e2fsck logs, and dumpe2fs output, and lots of detail about your hardware setup, and whether you are using LVM, software raid, etc. We need to rule out all sorts of things, including hardware bugs, problems with partition tables, etc. Since no one else is reporting this I have to rule out hardware bugs (disk bugs, memory bugs, etc.) as well as stupid sysadmin configuration problems. (This isn't to impunge your skills, but a large number of bug reports I get end up not kernel bugs at all.) We experience this problem and the FS remounts to read-only. BuildServer is our write heavy application, the server has RAID0 using MD. We have seen this problem on servers with Ubuntu kernel (3.13.x) and MD raid (Samsung SSDs) and also on servers with OVH custom built kernel (3.10) and LSI HW raid (Intel S3500 SSDs) - although we see the issue on the servers with MD raid more often. Syslog ====== Mar 5 13:30:01 c5-use-1 CRON[12504]: (root) CMD (/usr/bin/run-chef-client.sh) Mar 5 13:30:35 c5-use-1 kernel: [10890197.121882] EXT4-fs error (device md3): ext4_ext_remove_space:2935: inode #26869814: comm BuildServer: pblk 107512509 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0) Mar 5 13:30:35 c5-use-1 kernel: [10890197.205494] Aborting journal on device md3-8. Mar 5 13:30:35 c5-use-1 kernel: [10890197.232881] EXT4-fs (md3): Remounting filesystem read-only Mar 5 13:30:35 c5-use-1 kernel: [10890197.244669] EXT4-fs error (device md3): ext4_journal_check_start:56: Detected aborted journal Mar 5 13:30:35 c5-use-1 kernel: [10890197.244673] EXT4-fs (md3): Remounting filesystem read-only Mar 5 13:30:35 c5-use-1 kernel: [10890197.342604] EXT4-fs error (device md3) in ext4_ext_remove_space:3006: IO failure Mar 5 13:30:35 c5-use-1 kernel: [10890197.398828] EXT4-fs error (device md3) in ext4_ext_truncate:4544: IO failure Mar 5 13:30:35 c5-use-1 kernel: [10890197.457592] EXT4-fs error (device md3) in ext4_reserve_inode_write:4874: Journal has aborted Mar 5 13:30:35 c5-use-1 kernel: [10890197.521148] EXT4-fs error (device md3) in ext4_truncate:3797: Journal has aborted Mar 5 13:30:35 c5-use-1 kernel: [10890197.588798] EXT4-fs error (device md3) in ext4_reserve_inode_write:4874: Journal has aborted Mar 5 13:30:35 c5-use-1 kernel: [10890197.659161] EXT4-fs error (device md3) in ext4_orphan_del:2685: Journal has aborted Mar 5 13:30:35 c5-use-1 kernel: [10890197.732682] EXT4-fs error (device md3) in ext4_reserve_inode_write:4874: Journal has aborted Mar 5 13:31:01 c5-use-1 CRON[13879]: (root) CMD ( /usr/bin/netstat.sh > /tmp/netstat.json) Debugfs ======= debugfs -R 'stat <26869814>' /dev/md3 debugfs 1.42.9 (4-Feb-2014) Inode: 26869814 Type: regular Mode: 0644 Flags: 0x80000 Generation: 2509781484 Version: 0x00000000:00000001 User: 1001 Group: 1001 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 816256 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x54f85a7b:0fc8be80 -- Thu Mar 5 13:30:35 2015 atime: 0x54ee0aac:127f86a8 -- Wed Feb 25 17:47:24 2015 mtime: 0x54ee0ab5:693a9f7c -- Wed Feb 25 17:47:33 2015 crtime: 0x54ee0aac:127f86a8 -- Wed Feb 25 17:47:24 2015 dtime: 0x0194088b -- Tue Nov 3 11:12:11 1970 Size of extra inode fields: 28 EXTENTS: (ETB0):107512509 uname -a ======== Linux c5-use-1 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux mount ===== /dev/md3 on /home type ext4 (rw,noatime,nodiratime,errors=remount-ro) I got also SMART output, dumpe2fs, lspci, mdadm, dpkg, dmesg and es2fsck outputs but the files are quite extensive so I've put them to http://www.surak.eu/fs-ro.tar.gz Let me know if anything more is needed about the config of the server. |