Bug 32182

Summary: EXT4-fs error: bad header/extent
Product: File System Reporter: Ramses VdP (ramses)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED OBSOLETE    
Severity: high CC: adam.surak, alan, hany+kernel.org, hdfssk, ihhuang, sandeen, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38.1 Subsystem:
Regression: No Bisected commit-id:

Description Ramses VdP 2011-03-29 16:47:44 UTC
I started noticing ext4-fs errors in my dmesg output, they look like this:

EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760218: comm dropbox: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760208: comm dropbox: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760218: comm thunderbird-bin: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760208: comm thunderbird-bin: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)

I fsck'ed the partition which showed nothing but the regular output but said the file system was modified, after a reboot the same errors returned though. I'm using e2fsprogs 1.41.14 and linux 2.6.38.1 and after a downgrade to linux 2.6.37.5 the errors kept reappearing (I'll test 2.6.38.2 soon but I didn't find commit messages that seemed relevant).

I ran debugfs on one of the inodes:

% debugfs -R 'stat <3760208>' /dev/disk/by-label/home
debugfs 1.41.14 (22-Dec-2010)
Inode: 3760208   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 231468018    Version: 0x00000001
User:  1000   Group:   100   Size: 42
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x4b2a09ff -- Thu Dec 17 11:37:51 2009
atime: 0x4d909367 -- Mon Mar 28 15:55:51 2011
mtime: 0x490976f8 -- Thu Oct 30 09:57:28 2008
EXTENTS:

I'm not sure what extra info to give you, but I'd be glad to provide stuff you need to reproduce/fix this.
Comment 1 Anonymous Emailer 2011-04-01 14:07:56 UTC
Reply-To: xiaoqiangnk@gmail.com

Could the bug be reproduced?
If so,  how to?

On Wed, Mar 30, 2011 at 12:47 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=32182
>
>           Summary: EXT4-fs error: bad header/extent
>           Product: File System
>           Version: 2.5
>    Kernel Version: 2.6.38.1
>          Platform: All
>        OS/Version: Linux
>              Tree: Mainline
>            Status: NEW
>          Severity: high
>          Priority: P1
>         Component: ext4
>        AssignedTo: fs_ext4@kernel-bugs.osdl.org
>        ReportedBy: ramses.rommel@gmail.com
>        Regression: No
>
>
> I started noticing ext4-fs errors in my dmesg output, they look like this:
>
> EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760218: comm
> dropbox: bad header/extent: invalid magic - magic 0, entries 0, max 0(0),
> depth
> 0(0)
> EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760208: comm
> dropbox: bad header/extent: invalid magic - magic 0, entries 0, max 0(0),
> depth
> 0(0)
> EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760218: comm
> thunderbird-bin: bad header/extent: invalid magic - magic 0, entries 0, max
> 0(0), depth 0(0)
> EXT4-fs error (device sda2): ext4_ext_check_inode:428: inode #3760208: comm
> thunderbird-bin: bad header/extent: invalid magic - magic 0, entries 0, max
> 0(0), depth 0(0)
>
> I fsck'ed the partition which showed nothing but the regular output but said
> the file system was modified, after a reboot the same errors returned though.
> I'm using e2fsprogs 1.41.14 and linux 2.6.38.1 and after a downgrade to linux
> 2.6.37.5 the errors kept reappearing (I'll test 2.6.38.2 soon but I didn't
> find
> commit messages that seemed relevant).
>
> I ran debugfs on one of the inodes:
>
> % debugfs -R 'stat <3760208>' /dev/disk/by-label/home
> debugfs 1.41.14 (22-Dec-2010)
> Inode: 3760208   Type: regular    Mode:  0644   Flags: 0x80000
> Generation: 231468018    Version: 0x00000001
> User:  1000   Group:   100   Size: 42
> File ACL: 0    Directory ACL: 0
> Links: 1   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x4b2a09ff -- Thu Dec 17 11:37:51 2009
> atime: 0x4d909367 -- Mon Mar 28 15:55:51 2011
> mtime: 0x490976f8 -- Thu Oct 30 09:57:28 2008
> EXTENTS:
>
> I'm not sure what extra info to give you, but I'd be glad to provide stuff
> you
> need to reproduce/fix this.
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Comment 2 Ramses VdP 2011-04-01 15:27:22 UTC
I don't have to do anything to reproduce it, the errors show up in dmesg while starting up programs mostly. I also don't think there was a specific trigger that caused it, just normal fs usage.
I guess there is either some corruption in my file system that fsck does not fix or that the kernel is reporting errors while it shouldn't.
I can provide you with more debug info, but I'm not that familiar with the tools to do so, if you could explain what data you need to understand where the bug sits, I'd be happy to provide it to you.

HTH,
Ramses
Comment 3 Eric Sandeen 2011-04-01 15:29:37 UTC
If most recent fsck doesn't find anything, but runtime finds errors, perhaps you can provide a bzip2'd e2image -r of the fileystem in question?  If filenames are sensitive/private, there is an option to scramble them.

That way we can see what it's finding, why it's complaining, and why fsck doesn't seem to care.

It's interesting that the "bad" data is all 0s though.  What kind of storage is this?
Comment 4 Ramses VdP 2011-04-01 22:01:55 UTC
This is about my /home partition, stored on a conventional hard drive (a seagate momentus). I'll add the requested image.
Comment 5 Ramses VdP 2011-04-01 23:21:22 UTC
The image was too big to attach, you can find it at http://db.tt/uXGjQPT
Comment 6 Ramses VdP 2011-04-22 10:59:26 UTC
Still present in 2.6.38.3, does the image I uploaded reveal anything useful? I can provide more info of needed.
Comment 7 Honore Doktorr 2011-06-06 19:59:05 UTC
I hit this problem with 2.6.35.13 after a power outage. The affected inode was empty & had no extents - these commands just returned nothing (was using "debugfs 1.41.14 (22-Dec-2010)"):

# debugfs -R 'cat <122877>' /dev/mapper/Fedora-FedoraSystem
debugfs 1.41.14 (22-Dec-2010)
# debugfs -R 'dump_extents <122877>' /dev/mapper/Fedora-FedoraSystem
debugfs 1.41.14 (22-Dec-2010)

The file using this inode wouldn't respond to shell commands:

# find -inum 122877
/usr/libexec/gdm-simple-slave
# stat gdm-simple-slave
stat: cannot stat `gdm-simple-slave': Input/output error
# rm gdm-simple-slave 
rm: cannot remove `gdm-simple-slave': Input/output error

Multiple fsck.ext4 runs didn't fix things. I eventually made the offending file go away - I've no idea if this procedure is safe, or a good idea! - with these commands.

# debugfs -R 'testi <122877>' /dev/mapper/Fedora-FedoraSystem debugfs 1.41.14 (22-Dec-2010)
Inode 122877 is marked in use
# debugfs -w -R 'kill_file <122877>' /dev/mapper/Fedora-FedoraSystem
debugfs 1.41.14 (22-Dec-2010)

# debugfs -R 'testi <122877>' /dev/mapper/Fedora-FedoraSystem 
debugfs 1.41.14 (22-Dec-2010)
Inode 122877 is not in use
# ls /usr/libexec/gdm-simple-slave
ls: cannot access /usr/libexec/gdm-simple-slave: Input/output error
# debugfs -w -R 'rm /usr/libexec/gdm-simple-slave' /dev/mapper/Fedora-FedoraSystem 
debugfs 1.41.14 (22-Dec-2010)

# ls /usr/libexec/gdm-simple-slave
ls: cannot access /usr/libexec/gdm-simple-slave: No such file or directory
• ls gdm*
ls: cannot access gdm*: No such file or directory
Comment 8 Randall 2011-07-07 05:15:11 UTC
Hi all, is there any further discussion about the issue?
I met the same trouble with 2.6.35.13.

Thanks.
Comment 9 Eric Sandeen 2011-07-07 14:45:15 UTC
Ramses, sorry, I somehow missed your attachment, I'll see what I can see.
Comment 10 Peter Hanecak 2011-08-24 21:50:19 UTC
I'm seeing similar errors during boot:

[   91.652208] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[   91.940546] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[   91.954354] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[   93.449371] EXT4-fs error (device dm-0): ext4_ext_check_inode: inode #1839235: (comm Xorg) bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)

Some additional information:

# uname -r
2.6.35.14-95.fc14.x86_64
Comment 11 Alan 2012-08-20 15:23:05 UTC
If this is still seen on modern kernels please re-open/update thanks
Comment 12 Peter Hanecak 2012-08-20 20:06:46 UTC
Hello,

I'm still seeing this:

$ dmesg | grep "EXT4-fs error"
[   70.498238] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[   70.659036] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[   70.744199] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
[   72.746596] EXT4-fs error (device dm-0): ext4_ext_check_inode:412: inode #1839235: comm Xorg: bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)

$ uname -r
3.4.7-1.fc16.x86_64


Same PC and filesystem as last time. I wonder whether copying to new filesystem would help because since last time I updated Xorg several times, replacing the binary (ussual Fedora updates).
Comment 13 Theodore Tso 2012-08-21 00:44:56 UTC
The error messages listed above just means that file system has been corrupted.

Have you tried rebooting and letting e2fsck repair the file system?   If you can reliably cause the file system to get corrupted in this way, please open a new bug report with the reproduction instructions.

But the mere fact that you see these messages doesn't mean that the kernel is buggy....
Comment 14 Peter Hanecak 2012-08-21 20:33:30 UTC
Reboot + fsck was done quite a lot of times since it started occuring. Some bugs do get found and fixed but the message never goes away. I have no further info to open new bug.

Any suggestions as to what I might try to either fix the inode or compile some more information so as to file bug report? Thanks in advance.
Comment 15 Theodore Tso 2012-08-21 21:34:33 UTC
Could you please open a new bug report with the e2fsck logs, and dumpe2fs output, and lots of detail about your hardware setup, and whether you are using LVM, software raid, etc.    We need to rule out all sorts of things, including hardware bugs, problems with partition tables, etc.   Since no one else is reporting this I have to rule out hardware bugs (disk bugs, memory bugs, etc.) as well as stupid sysadmin configuration problems.   (This isn't to impunge your skills, but a large number of bug reports I get end up not kernel bugs at all.)
Comment 16 Adam Surak 2015-03-05 14:32:44 UTC
We experience this problem and the FS remounts to read-only. BuildServer is our write heavy application, the server has RAID0 using MD. We have seen this problem on servers with Ubuntu kernel (3.13.x) and MD raid (Samsung SSDs) and also on servers with OVH custom built kernel (3.10) and LSI HW raid (Intel S3500 SSDs) - although we see the issue on the servers with MD raid more often.

Syslog
======
Mar  5 13:30:01 c5-use-1 CRON[12504]: (root) CMD (/usr/bin/run-chef-client.sh)
Mar  5 13:30:35 c5-use-1 kernel: [10890197.121882] EXT4-fs error (device md3): ext4_ext_remove_space:2935: inode #26869814: comm BuildServer: pblk 107512509 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
Mar  5 13:30:35 c5-use-1 kernel: [10890197.205494] Aborting journal on device md3-8.
Mar  5 13:30:35 c5-use-1 kernel: [10890197.232881] EXT4-fs (md3): Remounting filesystem read-only
Mar  5 13:30:35 c5-use-1 kernel: [10890197.244669] EXT4-fs error (device md3): ext4_journal_check_start:56: Detected aborted journal
Mar  5 13:30:35 c5-use-1 kernel: [10890197.244673] EXT4-fs (md3): Remounting filesystem read-only
Mar  5 13:30:35 c5-use-1 kernel: [10890197.342604] EXT4-fs error (device md3) in ext4_ext_remove_space:3006: IO failure
Mar  5 13:30:35 c5-use-1 kernel: [10890197.398828] EXT4-fs error (device md3) in ext4_ext_truncate:4544: IO failure
Mar  5 13:30:35 c5-use-1 kernel: [10890197.457592] EXT4-fs error (device md3) in ext4_reserve_inode_write:4874: Journal has aborted
Mar  5 13:30:35 c5-use-1 kernel: [10890197.521148] EXT4-fs error (device md3) in ext4_truncate:3797: Journal has aborted
Mar  5 13:30:35 c5-use-1 kernel: [10890197.588798] EXT4-fs error (device md3) in ext4_reserve_inode_write:4874: Journal has aborted
Mar  5 13:30:35 c5-use-1 kernel: [10890197.659161] EXT4-fs error (device md3) in ext4_orphan_del:2685: Journal has aborted
Mar  5 13:30:35 c5-use-1 kernel: [10890197.732682] EXT4-fs error (device md3) in ext4_reserve_inode_write:4874: Journal has aborted
Mar  5 13:31:01 c5-use-1 CRON[13879]: (root) CMD ( /usr/bin/netstat.sh > /tmp/netstat.json)

Debugfs
=======
debugfs -R 'stat <26869814>' /dev/md3
debugfs 1.42.9 (4-Feb-2014)
Inode: 26869814   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 2509781484    Version: 0x00000000:00000001
User:  1001   Group:  1001   Size: 0
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 816256
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x54f85a7b:0fc8be80 -- Thu Mar  5 13:30:35 2015
 atime: 0x54ee0aac:127f86a8 -- Wed Feb 25 17:47:24 2015
 mtime: 0x54ee0ab5:693a9f7c -- Wed Feb 25 17:47:33 2015
crtime: 0x54ee0aac:127f86a8 -- Wed Feb 25 17:47:24 2015
dtime: 0x0194088b -- Tue Nov  3 11:12:11 1970
Size of extra inode fields: 28
EXTENTS:
(ETB0):107512509

uname -a
========
Linux c5-use-1 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

mount
=====
/dev/md3 on /home type ext4 (rw,noatime,nodiratime,errors=remount-ro)

I got also SMART output, dumpe2fs, lspci, mdadm, dpkg, dmesg and es2fsck outputs but the files are quite extensive so I've put them to http://www.surak.eu/fs-ro.tar.gz

Let me know if anything more is needed about the config of the server.