Bug 10802 - BUG at fs/hfs/bnode.c:416 with corrupted image
Summary: BUG at fs/hfs/bnode.c:416 with corrupted image
Status: CLOSED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: HFS/HFSPLUS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Christoph Hellwig
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-27 01:21 UTC by Eric Sesterhenn
Modified: 2012-05-21 15:36 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.27-rc6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Corrupted Image (256.33 KB, application/x-bzip)
2008-05-27 01:22 UTC, Eric Sesterhenn
Details

Description Eric Sesterhenn 2008-05-27 01:21:23 UTC
Latest working kernel version: - 
Earliest failing kernel version:2.6.26-rc2 (didnt test older versions)
Distribution: Ubuntu
Hardware Environment: Pentium III
Problem Description:

running fsfuzzer to produce a corrupted image and running some
tests on it produces the oops, fsfuzzer only mounts the image
and runs the following commands:

        echo "+++ Checking dir..."
        ls /media/test >/dev/null 2>&1
        ls -Z /media/test >/dev/null 2>&1
        echo "+++ Making files..."
        touch /media/test/file >/dev/null 2>&1
        ln -s /media/test/file /media/test/fileb >/dev/null 2>&1
        mkdir /media/test/dir1 >/dev/null 2>&1
        echo "+++ Checking stat..."
        stat /media/test/file >/dev/null 2>&1
        stat /media/test/fileb >/dev/null 2>&1
        stat /media/test/dir1 >/dev/null 2>&1
        echo "+++ Writing to files..."
        echo "test" > /media/test/file
        cat /media/test/file > /dev/null 2>&1
        chcon -u user_u /media/test/file 2>/dev/null
        chown nobody,nobody COPYING 2>/dev/null
        chmod 0600 COPYING 2>/dev/null
        echo "+++ Reading from files..."
        cat /media/test/* > /dev/null 2>&1
        echo "+++ device files..."
        rm /media/test/null > /dev/null 2>&1
        mknod /media/test/null c 1 3 > /dev/null 2>&1
        echo "+++ Writing to dirs..."
        cat /media/test/file > /media/test/dir1 2>/dev/null
        cp /media/test/file /media/test/dir1 >/dev/null 2>&1

after this one the oops appears, the cp segfaults

[  172.512183] ------------[ cut here ]------------
[  172.512357] kernel BUG at fs/hfs/bnode.c:416!
[  172.512375] invalid opcode: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[  172.512375] Modules linked in:
[  172.512375] 
[  172.512375] Pid: 4243, comm: cp Not tainted (2.6.26-rc4 #44)
[  172.512375] EIP: 0060:[<c0252c1f>] EFLAGS: 00010286 CPU: 0
[  172.512375] EIP is at hfs_bnode_create+0x13f/0x150
[  172.512375] EAX: cafb0000 EBX: 00000000 ECX: 00000001 EDX: 00000000
[  172.512375] ESI: cae54a1c EDI: cacc25a0 EBP: cafb0cec ESP: cafb0cc8
[  172.512375]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  172.512375] Process cp (pid: 4243, ti=cafb0000 task=caf7eeb0 task.ti=cafb0000)
[  172.512375] Stack: cafb0cd4 c06c85c7 00000001 cae549b0 c025229d cae54a1c 00000000 00000001 
[  172.512375]        cacc27a0 cafb0d34 c0254128 cafb0d10 cae549b0 00000246 cacc2750 00000000 
[  172.512375]        00000100 c8b0a000 000003f2 cafb0d4a cafb0d24 c011b552 00000002 00f80d3c 
[  172.512375] Call Trace:
[  172.512375]  [<c06c85c7>] ? _spin_unlock+0x27/0x50
[  172.512375]  [<c025229d>] ? hfs_bnode_put+0x7d/0x90
[  172.512375]  [<c0254128>] ? hfs_bmap_alloc+0x328/0x350
[  172.512375]  [<c011b552>] ? kmap+0x42/0x70
[  172.512375]  [<c0253060>] ? hfs_bnode_split+0x20/0x360
[  172.512375]  [<c0252411>] ? hfs_bnode_read+0x41/0x50
[  172.512375]  [<c0253446>] ? hfs_brec_insert+0xa6/0x320
[  172.512375]  [<c025484b>] ? hfs_cat_create+0x10b/0x2d0
[  172.512375]  [<c02554ac>] ? hfs_create+0x3c/0x80
[  172.512375]  [<c01883e4>] ? vfs_create+0xa4/0x100
[  172.512375]  [<c018b542>] ? do_filp_open+0x672/0x770
[  172.512375]  [<c06c85c7>] ? _spin_unlock+0x27/0x50
[  172.512375]  [<c017e489>] ? do_sys_open+0x49/0xe0
[  172.512375]  [<c017e589>] ? sys_open+0x29/0x40
[  172.512375]  [<c0103d7d>] ? sysenter_past_esp+0x6a/0xb1
[  172.512375]  =======================
[  172.512375] Code: 18 5b 5e 5f 5d c3 89 d8 bb fb ff ff ff e8 1a f6 ff ff 89 d8 83 c4 18 5b 5e 5f 5d c3 83 c4 18 bb f4 ff ff ff 89 d8 5b 5e 5f 5d c3 <0f> 0b eb fe 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 83 
[  172.512375] EIP: [<c0252c1f>] hfs_bnode_create+0x13f/0x150 SS:ESP 0068:cafb0cc8
[  172.638459] ---[ end trace 7ddc9efd931d077f ]---
Comment 1 Eric Sesterhenn 2008-05-27 01:22:00 UTC
Created attachment 16289 [details]
Corrupted Image
Comment 2 Alan 2008-09-23 07:12:17 UTC
Verified
Comment 3 Luciano Chavez 2009-05-20 17:57:55 UTC
Hello,

I am wondering if there has been any additional findings with this bug. A test team reported basically the same problem to my group internally while they were testing an enterprise linux release originally based on a 2.6.27 kernel. I haven't had a chance to investigate this to deeply yet but this is what I have so far.

I have done a couple of debug patches trying to find out why it was having problems encountering a duplicate bnode and still have to refine the debug output to isolate that. 

One part of the debug patch replaced the BUG_ON() with a test and an error that is returned by hfs_bnode_create() when it encounters the error condition. I mainly did this because whenever we hit the BUG_ON which kills the calling process, we seem to leave a btree lock held which was initially acquired by the call to hfs_find_init() in hfs_cat_create(). This in turn causes subsequent accesses to hang and also prevents unmounting the fs.

--- linux-2.6.27.19-5/fs/hfs/bnode.c.orig	2009-05-08 10:42:30.000000000 -0500
+++ linux-2.6.27.19-5/fs/hfs/bnode.c	2009-05-08 10:56:01.000000000 -0500
@@ -413,7 +413,10 @@ struct hfs_bnode *hfs_bnode_create(struc
 	spin_lock(&tree->hash_lock);
 	node = hfs_bnode_findhash(tree, num);
 	spin_unlock(&tree->hash_lock);
-	BUG_ON(node);
+	if (unlikely(node)) {
+		printk(KERN_ERR "hfs: bnode %d already exists in B*Tree!\n", num);
+		return ERR_PTR(-EEXIST);
+	}
 	node = __hfs_bnode_create(tree, num);
 	if (!node)
 		return ERR_PTR(-ENOMEM);

Replacing the BUG_ON is likely not the fix but it does avoid hangs for subsequent accesses and also allows the fs to be unmounted. This is because the error propagates back down to hfs_cat_create() allowing it to invoke hfs_find_exit() on the error path which releases the tree lock.

My understanding of the fsfuzzer tests are not necessarily to completely bulletproof the fs but at a minimum allow it to more gracefully handle unexpected problems with metadata to avoid security exploits and crashes. I am not sure that this one is that bad unless you consider the tree lock being held by hitting this condition as a denial of service exploit.

I would appreciate any help in finding the cause and a solution for the reported problem. I'll try to do some debugging on my own as well but I am not all too familiar yet with the HFS data structures.
Comment 4 Vlad Codrea 2010-10-21 04:39:54 UTC
CC'ed Christoph Hellwig since he took over maintainership.
Comment 5 Christoph Hellwig 2010-10-21 04:58:46 UTC
I'm only looking at hfsplus for now.

Note You need to log in before you can comment on or make changes to this bug.