Bug 13201

Summary: kernel BUG at fs/ext4/extents.c:2737
Product: File System Reporter: Franco Broi (franco)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: CLOSED OBSOLETE    
Severity: high CC: alan, grahame, matt, sandeen, tytso, vaurora
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: messages file.

Description Franco Broi 2009-04-28 09:26:21 UTC
I got this while testing ext4 on an external RAID system. The system has 4 identical RAID systems each with a single 13TB filesystem. Only one of the 4 failed the test which was to simply write 8GB files until the disk fills up.

The complete messages file is attached.

Apr 25 01:59:38 echo19 kernel: EXT4-fs error (device dm-2): __ext4_get_inode_loc: unable to read inode block - inode=761860, block=3612686232
Apr 25 01:59:38 echo19 kernel: EXT4-fs error (device dm-2) in ext4_reserve_inode_write: IO failure
Apr 25 01:59:38 echo19 kernel: mpage_da_map_blocks block allocation failed for inode 761860 at logical offset 699276 with max blocks 1024 with error -5
Apr 25 01:59:38 echo19 kernel: This should not happen.!! Data will be lost
Apr 25 01:59:38 echo19 kernel: ------------[ cut here ]------------
Apr 25 01:59:38 echo19 kernel: kernel BUG at fs/ext4/extents.c:2737!

The filesystem was totally inaccessible so I reset the system. On reboot, the filesystem couldn't be mounted - bad superblock.

I ran fsck a few times before I could remount the filesystem, all the directories were lost but the test files were intact in lost+found.

I can't see any errors anywhere that might indicate that this is a hardware problem but these are brand new systems using SAS host connections which we haven't used before.

I've remade the broken filesystem and restarted the test on this and 11 other identical filesystems, I'll let you know if the problem reoccurs.
Comment 1 Franco Broi 2009-04-28 09:36:55 UTC
Created attachment 21153 [details]
messages file.

messages file for previous post.
Comment 2 Franco Broi 2009-04-29 01:06:09 UTC
Of the 12 tests, 2 produced errors.

EXT4-fs error (device dm-5): ext4_mb_generate_buddy: EXT4-fs: group 9: 32768 blocks in bitmap, 1023 in gd

The filesystem seems OK, I can ls the test files.

EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 0: 32768 blocks in bitmap, 970 in gd
EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 0: 32768 blocks in bitmap, 32766 in gd
EXT4-fs error (device dm-3): ext4_init_block_bitmap: Checksum bad for group 1
EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 1: 0 blocks in bitmap, 1023 in gd
EXT4-fs error (device dm-3): ext4_dx_find_entry: bad entry in directory #15: directory entry across blocks - offset=28672, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): ext4_add_entry: bad entry in directory #15: directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2: directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2: directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2: directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2: directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2: directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0

Although df looks ok
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vgdata--143-data143
                     13456415384 13232157624 224257760  99% /data143
# ls /data143

Produces no output.

At this point I will need to switch back to ext3 so that I can get this disk into production but I do have a small window to run some more tests if anyone has any ideas.
Comment 3 Eric Sandeen 2009-04-29 03:15:31 UTC
Franco, sorry we haven't gotten back with suggestions on this.  It looks like you have hit a couple different end results.  We've had a few reports of corruption on larger filesystems which makes us wonder if there might be a problem above 8T somewhere... 

The current upstream git tree (or the 2.6.30-rc3-git5 prepatch) has more extent validity checking in it; if you do have the time for another test, running on that codebase may yield more info, depending on where the problem lies.

-Eric
Comment 4 Franco Broi 2009-04-30 03:00:48 UTC
I ran a test overnight using 2.6.30-rc3-git5 and it didn't fail. Not sure if this is a good or bad thing?

I've deleted the files and started the test again.

By the way, deleting files with ext4 is lightening fast, it only takes about 5 minutes to delete 13TB! Again, not sure if this is a good or bad thing, it doesn't give you much time to hit ctrl_c...
Comment 5 Franco Broi 2009-04-30 09:39:21 UTC
I've now got filesystem corruption with 2.6.30-rc3-git5, looks pretty much the same as before.

Apr 30 17:30:56 echo20 kernel: EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 0: 32768 blocks in bitmap, 23495 in gd
Apr 30 17:30:56 echo20 kernel: EXT4-fs error (device dm-3): ext4_mb_mark_diskspace_used: Allocating block 1024 in system zone of 0 group

When I do an ls in the test directory I get lots of Input/output errors

EXT4-fs error (device dm-3): ext4_lookup: deleted inode referenced: 127
EXT4-fs error (device dm-3): ext4_lookup: deleted inode referenced: 358
EXT4-fs error (device dm-3): ext4_lookup: deleted inode referenced: 196

Anything you want me to try?
Comment 6 Theodore Tso 2009-05-19 18:04:05 UTC
Could you try replicating this problem in 2.6.30-rc6?   We fixed a race condition in i_cached_extents could have very well caused your problem.   I'm hoping it will close this a few other mystery bug reports we've had over the past couple of months.  (The bug is an old one, but we had struggled with a reliable reproduction case.)
Comment 7 Franco Broi 2009-05-23 10:10:50 UTC
I wont be able to recreate the original test conditions but I'll run a test with a single large filesystem within a couple of weeks.
Comment 8 Franco Broi 2009-06-05 00:51:52 UTC
I haven't been able to recreate the problem using 2.6.30-rc8 but the test conditions aren't identical to before. Would it make a difference that only a single filesystem is being written to and not 4 simultaneously as in the original tests?
Comment 9 Theodore Tso 2009-06-08 16:49:28 UTC
If this is the same problem as the one which we fixed with identical symptoms, what matters is multiple processes/threads writing to the same file at the same time.  People using NFS or SAMBA on a backup server seemed to be a most common scenarios for triggering this (admittedly very hard to reproduce) bug.   We finally got lucky in that someone had a setup which allows for reliable reproduction of the bug, so we could finally sink our teeth into it.

So if what you saw was the same as the bug we fixed in 2.6.30-rc6, no it shouldn't make a difference.   If it is a completely different bug, then of course all bets are off.  In general though whether you are writing to one filesystem or 4 filesystems shouldn't make a difference, except in that it might change the timing necessary to hit a race condition (and in the case of the bug that we found and fixed, it was highly timing dependent; in fact, even after we found the problem, we weren't able to come up with a reliable reproduction case, even though the problem was obvious on paper and the one user who could reliably reproduce reported it went away once the patch was applied; IIRC, Eric finally put in a delay into the code to widen the race window to the point where he could replicate it.)
Comment 10 Franco Broi 2009-06-09 05:10:31 UTC
(In reply to comment #9)
> If this is the same problem as the one which we fixed with identical
> symptoms,
> what matters is multiple processes/threads writing to the same file at the
> same
> time.

Then it doesn't sound like it's the same bug. My tests are very simple, they just keep writing 8GB files until the disk fills up, there is no concurrent access to files or even the filesystem, and the machines are completely standalone.
Comment 11 Valerie Aurora 2009-08-26 18:11:07 UTC
Given that the bug appears to be fixed, and we can't reproduce the original conditions or get more data on this bug, it seems like we should close this bug.