Bug 9718
Summary: | ext3 oops on unmount of corrupted image | ||
---|---|---|---|
Product: | File System | Reporter: | Eric Sesterhenn (snakebyte) |
Component: | ext3 | Assignee: | Andrew Morton (akpm) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | sandeen |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24-rc7-gcdf71a10 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | Image before and after testing |
Description
Eric Sesterhenn
2008-01-09 07:34:26 UTC
Eric, if you'd like to send me the image, I'll try to find time to take a look. Thanks, -Eric Created attachment 14380 [details]
Image before and after testing
the .img is the image after the run of run_test, the .img.bck is the one which is not modified yet. I only added the following lines to run_test, with snakebyte being a local user since it once happened to me that one of both tests went wild and deleted the entire file instead of only the directory
mkdir /media/test/stress
chown snakebyte:snakebyte /media/test/stress
echo "+++ fsstress..."
sudo -u snakebyte fsstress -c -d /media/test/stress -l 20
echo "+++ fsx..."
sudo -u snakebyte fsx -N 100 /media/test/stress/test
Eric, which version of fsfuzzer are you using; did you add the fsstress/fsx tests at the end? looks like maybe we missed an ERR_PTR in there somewhere, dereferencing fffffffa... do you know where __lock_acquire+0x1d3/0x1050 lands in your build, or can you attach some disassembly here? Thanks, -Eric Hi, i am basically using fsfuzzer-0.6-lmh with some modifications. To reduce the testcase, in the run_test file, the perform_operations() function only contains the following now: perform_operations() { # perform basic operations mkdir /media/test/stress chown snakebyte:snakebyte /media/test/stress echo "+++ fsx..." sudo -u snakebyte fsx -N 100 /media/test/stress/test } I changed the fuzzer to mount the ext3 file system with -o debug, which gave some additional output; [ 408.387190] EXT3-fs error (device loop0): ext3_new_block: Allocating block in system zone - blocks from 11, length 1 [ 408.388489] EXT3-fs error (device loop0): ext3_new_block: Allocating block in system zone - blocks from 15, length 1 [ 408.404410] EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in system zones - Block = 15, count = 1 [ 408.406595] EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in system zones - Block = 11, count = 1 [ 408.502057] Aborting journal on device loop0. [ 408.504328] journal commit I/O error [ 408.506609] BUG: unable to handle kernel paging request at virtual address fffffffa [ 408.506682] printing eip: c013ece3 *pde = 0087f067 *pte = 00000000 [ 408.506832] Oops: 0000 [#1] PREEMPT [ 408.506947] Modules linked in: [ 408.506986] [ 408.507010] Pid: 3920, comm: umount Not tainted (2.6.24-rc7-gcdf71a10 #29) [ 408.507092] EIP: 0060:[<c013ece3>] EFLAGS: 00010046 CPU: 0 [ 408.507140] EIP is at __lock_acquire+0x1d3/0x1050 [ 408.507164] EAX: 00000002 EBX: 00000246 ECX: fffffff6 EDX: 00000002 [ 408.507188] ESI: 00000000 EDI: 00000000 EBP: cf32cda4 ESP: cf32cd3c [ 408.507270] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 408.507295] Process umount (pid: 3920, ti=cf32c000 task=c23a4000 task.ti=cf32c000) [ 408.507318] Stack: cf785000 c11ef0a0 cf96af00 cf785050 cf96af00 cf785000 cf32cd80 c016cebf [ 408.507674] 00000002 00000000 fffffff6 c23a4000 c23a4000 cf32cd88 c013e453 00000000 [ 408.507970] c23a4000 c016eba5 00000287 cf32cda8 c013e5f3 cf785000 00000046 00000246 journal_force_commit+0xd/0x30 [ 408.509085] [<c01fab75>] ext3_force_commit+0x25/0x30 [ 408.509139] [<c01f4bfc>] ext3_write_inode+0x2c/0x40 [ 408.509241] [<c018e056>] __writeback_single_inode+0x266/0x330 [ 408.509291] [<c018e538>] sync_sb_inodes+0x1c8/0x290 [ 408.509392] [<c018e67a>] sync_inodes_sb+0x7a/0x90 [ 408.509435] [<c0173791>] __fsync_super+0x61/0x70 [ 408.509550] [<c01737ab>] fsync_super+0xb/0x20 [ 408.509593] [<c01739c2>] generic_shutdown_super+0x22/0xd0 [ 408.509695] [<c0173a7f>] kill_block_super+0xf/0x20 [ 408.509737] [<c0173b37>] deactivate_super+0x47/0x60 [ 408.509838] [<c0187994>] mntput_no_expire+0x44/0x60 [ 408.509893] [<c0178cc5>] path_release_on_umount+0x15/0x20 [ 408.510003] [<c0187f81>] sys_umount+0x41/0x210 [ 408.510046] [<c0188167>] sys_oldumount+0x17/0x20 [ 408.510148] [<c0104272>] sysenter_past_esp+0x5f/0xa5 [ 408.510191] ======================= [ 408.510212] Code: c3 37 75 c0 e8 6f 09 fe ff c7 04 24 74 5a 75 c0 e8 c3 8d 74 26 00 8b 4d c0 <8b> 71 04 85 f6 0f 84 78 fe ff ff ff 86 c0 00 00 00 89 f0 e8 05 [ 408.512372] EIP: [<c013ece3>] __lock_acquire+0x1d3/0x1050 SS:ESP 0068:cf32cd3c [ 408.512495] ---[ end trace d536df78a3986ddd ]--- The oops seems to happen in kernel/lockdep.c at: static int __lock_acquire(...) ... if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) { debug_locks_off(); printk("BUG: MAX_LOCKDEP_SUBCLASSES too low!\n"); printk("turning off the locking correctness validator.\n"); return 0; } if (!subclass) class = lock->class_cache; <--- ### here ### /* * Not cached yet or subclass? */ Great, thanks for the extra info. I submitted a patch containing the following to linux-ext4 mailing list yesterday... it's in -mm already. The above oops is caused by __lock_acquire being called even though handle was set with ERR_PTR... in the oops above, err=-EROFS. diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c index 08ff6c7..038ed74 100644 --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -288,10 +288,12 @@ handle_t *journal_start(journal_t *journal, int nblocks) jbd_free_handle(handle); current->journal_info = NULL; handle = ERR_PTR(err); + goto out; } lock_acquire(&handle->h_lockdep_map, 0, 0, 0, 2, _THIS_IP_); +out: return handle; } Jonas, thanks. I saw that patch but failed to put two and two together :) thanks, fixes the bug for me too |