Bug 61631 - kernel BUG at fs/ext4/super.c:818 umounting md raid6 volume
Summary: kernel BUG at fs/ext4/super.c:818 umounting md raid6 volume
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-18 22:05 UTC by Jim Faulkner
Modified: 2013-10-20 22:32 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.11.1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
3.11.1 .config (23.54 KB, application/x-gzip)
2013-09-18 22:06 UTC, Jim Faulkner
Details
dmesg including BUG message (213.91 KB, text/plain)
2013-09-18 22:07 UTC, Jim Faulkner
Details
dumpe2fs output before the umount and bug_on (1.93 KB, text/plain)
2013-10-12 00:17 UTC, Jim Faulkner
Details
dumpe2fs output after the umount and bug_on (1.91 KB, text/plain)
2013-10-12 00:17 UTC, Jim Faulkner
Details
dmesg output after umount and bug_on (253.21 KB, text/plain)
2013-10-12 00:19 UTC, Jim Faulkner
Details
debugfs -R 'stat <20035337>' /dev/md1 > debugfs.1 after umount/bug_on (582 bytes, text/plain)
2013-10-12 00:21 UTC, Jim Faulkner
Details

Description Jim Faulkner 2013-09-18 22:05:53 UTC
Under recent kernels, including 3.11.1, I've been seeing kernel BUGs when umounting a large md raid6 volume.  umount itself gives me a segfault.  I will attach my config.gz and dmesg, which includes the BUG message.
Comment 1 Jim Faulkner 2013-09-18 22:06:32 UTC
Created attachment 108831 [details]
3.11.1 .config
Comment 2 Jim Faulkner 2013-09-18 22:07:22 UTC
Created attachment 108841 [details]
dmesg including BUG message
Comment 3 Jim Faulkner 2013-09-18 22:10:39 UTC
I've done a full fsck -f -v -y on this filesystem using e2fsck 1.42.7 (21-Jan-2013).  umounts still result in a BUG.
Comment 4 Jim Faulkner 2013-09-18 22:27:02 UTC
FYI, this bug only appears after some period of usage.  After a reboot, I can umount without error.  After a reboot and an rsync of the gentoo portage tree to the filesystem, I can still umount without error.  It seems the filesystem only fails to unmount after the filesystem has been in use for a while.  However, it is quite consistent in failing to umount after several days of usage.

Some more details on this filesystem:

thud ~ # df -h  | grep md1
/dev/md1        7.2T  6.7T  488G  94% /nfs
thud ~ # df -i | grep md1
/dev/md1       484597760 41868439 442729321    9% /nfs
thud ~ # tune2fs -l /dev/md1
tune2fs 1.42.7 (21-Jan-2013)
Filesystem volume name:   <none>
Last mounted on:          /nfs
Filesystem UUID:          a9f907fb-4d04-4a28-8164-a2ac4d5a26f4
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Remount read-only
Filesystem OS type:       Linux
Inode count:              484597760
Block count:              1938377664
Reserved block count:     0
Free blocks:              127846725
Free inodes:              442729321
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      561
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Sat Oct  9 08:53:46 2010
Last mount time:          Wed Sep 18 18:25:38 2013
Last write time:          Wed Sep 18 18:25:38 2013
Mount count:              4
Maximum mount count:      23
Last checked:             Wed Sep 11 04:36:58 2013
Check interval:           15552000 (6 months)
Next check after:         Mon Mar 10 04:36:58 2014
Lifetime writes:          18 TB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      1584912d-08c0-46a6-a652-60800c55c4df
Journal backup:           inode blocks
thud ~ #
Comment 5 Theodore Tso 2013-09-19 19:20:38 UTC
On Wed, Sep 18, 2013 at 10:27:02PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> FYI, this bug only appears after some period of usage.  After a reboot, I can
> umount without error.  After a reboot and an rsync of the gentoo portage tree
> to the filesystem, I can still umount without error.  It seems the filesystem
> only fails to unmount after the filesystem has been in use for a while. 
> However, it is quite consistent in failing to umount after several days of
> usage.

Hmm, can you say a bit more about what sort of files you store on the
file system and how the file system gets used?  What looks like is
going on is that there is a whole series of inodes that have been left
stalled on the orpaned inode list.  By the time we reach that point in
the unmount, the in-memory orphan list should have been cleared.

So here are a couple of things that would be really useful to try.

First of all, if you could try to reproduce the crash, and then before
you do the umount, run "dumpe2fs -h /dev/md1 > ~/dumpe2fs.md1.save;
sync".  Then if the system crashes with the same BUG_ON, send us the
dumpe2fs.md1.save, along with the console output.

The thing which I am trying to determine is whether the on-disk
orphaned inode list is set at the time of the umount.  If it is set,
it would be interesting if you could run sync, wait for things to
settle, check to see if dumpe2fs shows that the orphaned list is
empty, and then see if you can trigger the crash.

The other thing that would be useful is to grab one of the inodes
listed in the console, i.e.:

[642473.269223]   inode md1:20289506 at ffff88000499aed0: mode 100644, nlink 0, next 20367069

... and then run the command: "debugfs -R 'stat <20289506>' /dev/md1"

What I am interested in is the inode's atime/ctime/dtime.  It would be
interesting to see if the file was deleted right before the umount was
attempted.

Thanks for the bug report!  Hopefully we'll be able to figure out why
you are seeing this.

Cheers,

					- Ted
Comment 6 Jan Kara 2013-09-20 22:41:39 UTC
Since the mountpoint is named /nfs, I suppose you are exporting the filesystem via NFS, right? There has been a bug in NFS server code leading to exactly the problems you are describing. Commit bf7bd3e98be5c74813bee6ad496139fb0a011b3b should fix the issue (in 3.12-rc1).
Comment 7 Jim Faulkner 2013-09-21 02:08:04 UTC
(In reply to Theodore Tso from comment #5)

> Hmm, can you say a bit more about what sort of files you store on the
> file system and how the file system gets used?  What looks like is
> going on is that there is a whole series of inodes that have been left
> stalled on the orpaned inode list.  By the time we reach that point in
> the unmount, the in-memory orphan list should have been cleared.

This filesystem holds a wide variety of files.  A fair amount (and the majority of the disk usage), is large mp3 (~5 mb) and video (500 mb to 4 gb) files.  However, most I/O happens on small files.  It hosts the gentoo portage tree which I update regularly, as well as regular rdiff-backups of other servers on my network.  Both of these are involve a lot of reading and modifying of lots of small files.

(In reply to Jan Kara from comment #6)
> Since the mountpoint is named /nfs, I suppose you are exporting the
> filesystem via NFS, right? There has been a bug in NFS server code leading
> to exactly the problems you are describing. Commit
> bf7bd3e98be5c74813bee6ad496139fb0a011b3b should fix the issue (in 3.12-rc1).

Yes, I serve large video and mp3 files over NFS.  But in addition, the gentoo portage tree on this filesystem is NFS exported to all gentoo hosts on my network.  Any gentoo portage operations (emerge --metadata as well as updates) results in lots of NFS I/O on lots of small files.
Comment 8 Jim Faulkner 2013-09-21 02:11:27 UTC
I suspect commit bf7bd3e98be5c74813bee6ad496139fb0a011b3b will indeed fix this issue since it sounds exactly like my issue.

However, before applying any patch, I'll do the debug tests that Ted suggests so we can be sure.  I need some uptime to be sure I can reproduce this bug, so I'll report back a few days from now.
Comment 9 Jim Faulkner 2013-10-12 00:14:36 UTC
I ran dumpe2fs -h followed by sync a few times, but was not able to trigger the bug, nor did the dumpe2fs output differ between syncs.

umount did trigger the bug, however, and the dumpe2fs output was different after the umount.

After triggering the bug, I was able to capture the debugfs output of one of the listed inodes.
Comment 10 Jim Faulkner 2013-10-12 00:17:12 UTC
Created attachment 110761 [details]
dumpe2fs output before the umount and bug_on
Comment 11 Jim Faulkner 2013-10-12 00:17:48 UTC
Created attachment 110771 [details]
dumpe2fs output after the umount and bug_on
Comment 12 Jim Faulkner 2013-10-12 00:19:48 UTC
Created attachment 110781 [details]
dmesg output after umount and bug_on
Comment 13 Jim Faulkner 2013-10-12 00:21:25 UTC
Created attachment 110791 [details]
debugfs -R 'stat <20035337>' /dev/md1 > debugfs.1 after umount/bug_on

Please let me know if there's anymore information I can provide.
Comment 14 Jan Kara 2013-10-15 08:04:15 UTC
Umm, no, it really looks like a ton of inodes remains in the orphan list and I really things this is the NFS bug. So just try a kernel with that fix (bf7bd3e98be5c74813bee6ad496139fb0a011b3b) applied. Thanks!
Comment 15 Jim Faulkner 2013-10-20 02:10:45 UTC
Looks like bf7bd3e98be5c74813bee6ad496139fb0a011b3b indeed fixed it.  I ran 3.11.5 (which I believe includes that patch) for a few days, doing normal activities such as nfs i/o and rdiff-backups.  I was able to umount without problems.  Thanks for fixing this!
Comment 16 Jan Kara 2013-10-20 22:32:31 UTC
Thanks for confirmation.

Note You need to log in before you can comment on or make changes to this bug.