Latest working kernel version: - Earliest failing kernel version: 2.6.24-rc7-gcdf71a10 Distribution: Gentoo Hardware Environment: Pentium III (Coppermine) Software Environment: Linux snakebyte-desktop 2.6.24-rc7-gcdf71a10 #29 PREEMPT Wed Jan 9 16:03:37 CET 2008 i686 GNU/Linux Gnu C 4.1.3 Gnu make 3.81 binutils 2.18 util-linux 2.13 mount 2.13 module-init-tools 3.3-pre2 e2fsprogs 1.40.2 reiserfsprogs 3.6.19 reiser4progs 1.0.6 xfsprogs 2.9.0 pcmciautils 014 PPP 2.4.4 Linux C Library 2.6.1 Dynamic linker (ldd) 2.6.1 Procps 3.2.7 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.97 udev 113 Problem Description: I added fsx-linux from ltp and fsstress to the fsfuzzer tests, now during unmount i sometimes get the following oops: [ 4719.741202] BUG: unable to handle kernel paging request at virtual address fffffffa [ 4719.741261] printing eip: c013ece3 *pde = 0087f067 *pte = 00000000 [ 4719.741409] Oops: 0000 [#1] PREEMPT [ 4719.741524] Modules linked in: [ 4719.741563] [ 4719.741586] Pid: 12258, comm: umount Not tainted (2.6.24-rc7-g3ce54450 #28) [ 4719.741669] EIP: 0060:[<c013ece3>] EFLAGS: 00010046 CPU: 0 [ 4719.741718] EIP is at __lock_acquire+0x1d3/0x1050 [ 4719.741741] EAX: 00000002 EBX: 00000246 ECX: fffffff6 EDX: 00000002 [ 4719.741765] ESI: 00000000 EDI: 00000000 EBP: cca78da4 ESP: cca78d3c [ 4719.741848] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 4719.741872] Process umount (pid: 12258, ti=cca78000 task=cf208000 task.ti=cca78000) [ 4719.741896] Stack: cf43e0f0 c11e87c0 cf96bf00 cf43e000 cf96bf00 cf43e0f0 cca78d80 c016ce3f [ 4719.742248] 00000002 00000000 fffffff6 cf208000 cf208000 cca78d88 c013e453 00000000 [ 4719.742542] cf208000 c016eb25 00000287 cca78da8 c013e5f3 cf43e0f0 00000046 00000246 journal_force_commit+0xd/0x30 [ 4719.743648] [<c01faaf5>] ext3_force_commit+0x25/0x30 [ 4719.743704] [<c01f4b7c>] ext3_write_inode+0x2c/0x40 [ 4719.743805] [<c018dfd6>] __writeback_single_inode+0x266/0x330 [ 4719.743854] [<c018e4b8>] sync_sb_inodes+0x1c8/0x290 [ 4719.743955] [<c018e5fa>] sync_inodes_sb+0x7a/0x90 [ 4719.743997] [<c0173711>] __fsync_super+0x61/0x70 [ 4719.744113] [<c017372b>] fsync_super+0xb/0x20 [ 4719.744157] [<c0173942>] generic_shutdown_super+0x22/0xd0 [ 4719.744258] [<c01739ff>] kill_block_super+0xf/0x20 [ 4719.744300] [<c0173ab7>] deactivate_super+0x47/0x60 [ 4719.744401] [<c0187914>] mntput_no_expire+0x44/0x60 [ 4719.744457] [<c0178c45>] path_release_on_umount+0x15/0x20 [ 4719.744565] [<c0187f01>] sys_umount+0x41/0x210 [ 4719.744609] [<c01880e7>] sys_oldumount+0x17/0x20 [ 4719.744710] [<c0104272>] sysenter_past_esp+0x5f/0xa5 [ 4719.744754] ======================= [ 4719.744775] Code: d3 37 75 c0 e8 6f 09 fe ff c7 04 24 7c 5a 75 c0 e8 c3 8d 74 26 00 8b 4d c0 <8b> 71 04 85 f6 0f 84 78 fe ff ff ff 86 c0 00 00 00 89 f0 e8 05 [ 4719.746918] EIP: [<c013ece3>] __lock_acquire+0x1d3/0x1050 SS:ESP 0068:cca78d3c [ 4719.747041] ---[ end trace a415aae57fb6632a ]--- So far i was unable to reproduce this only with fsx or fstress on the corrupted filesystem, havent seen this on ext2 so far, will try with ext4
Eric, if you'd like to send me the image, I'll try to find time to take a look. Thanks, -Eric
Created attachment 14380 [details] Image before and after testing the .img is the image after the run of run_test, the .img.bck is the one which is not modified yet. I only added the following lines to run_test, with snakebyte being a local user since it once happened to me that one of both tests went wild and deleted the entire file instead of only the directory mkdir /media/test/stress chown snakebyte:snakebyte /media/test/stress echo "+++ fsstress..." sudo -u snakebyte fsstress -c -d /media/test/stress -l 20 echo "+++ fsx..." sudo -u snakebyte fsx -N 100 /media/test/stress/test
Eric, which version of fsfuzzer are you using; did you add the fsstress/fsx tests at the end? looks like maybe we missed an ERR_PTR in there somewhere, dereferencing fffffffa... do you know where __lock_acquire+0x1d3/0x1050 lands in your build, or can you attach some disassembly here? Thanks, -Eric
Hi, i am basically using fsfuzzer-0.6-lmh with some modifications. To reduce the testcase, in the run_test file, the perform_operations() function only contains the following now: perform_operations() { # perform basic operations mkdir /media/test/stress chown snakebyte:snakebyte /media/test/stress echo "+++ fsx..." sudo -u snakebyte fsx -N 100 /media/test/stress/test } I changed the fuzzer to mount the ext3 file system with -o debug, which gave some additional output; [ 408.387190] EXT3-fs error (device loop0): ext3_new_block: Allocating block in system zone - blocks from 11, length 1 [ 408.388489] EXT3-fs error (device loop0): ext3_new_block: Allocating block in system zone - blocks from 15, length 1 [ 408.404410] EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in system zones - Block = 15, count = 1 [ 408.406595] EXT3-fs error (device loop0): ext3_free_blocks: Freeing blocks in system zones - Block = 11, count = 1 [ 408.502057] Aborting journal on device loop0. [ 408.504328] journal commit I/O error [ 408.506609] BUG: unable to handle kernel paging request at virtual address fffffffa [ 408.506682] printing eip: c013ece3 *pde = 0087f067 *pte = 00000000 [ 408.506832] Oops: 0000 [#1] PREEMPT [ 408.506947] Modules linked in: [ 408.506986] [ 408.507010] Pid: 3920, comm: umount Not tainted (2.6.24-rc7-gcdf71a10 #29) [ 408.507092] EIP: 0060:[<c013ece3>] EFLAGS: 00010046 CPU: 0 [ 408.507140] EIP is at __lock_acquire+0x1d3/0x1050 [ 408.507164] EAX: 00000002 EBX: 00000246 ECX: fffffff6 EDX: 00000002 [ 408.507188] ESI: 00000000 EDI: 00000000 EBP: cf32cda4 ESP: cf32cd3c [ 408.507270] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 408.507295] Process umount (pid: 3920, ti=cf32c000 task=c23a4000 task.ti=cf32c000) [ 408.507318] Stack: cf785000 c11ef0a0 cf96af00 cf785050 cf96af00 cf785000 cf32cd80 c016cebf [ 408.507674] 00000002 00000000 fffffff6 c23a4000 c23a4000 cf32cd88 c013e453 00000000 [ 408.507970] c23a4000 c016eba5 00000287 cf32cda8 c013e5f3 cf785000 00000046 00000246 journal_force_commit+0xd/0x30 [ 408.509085] [<c01fab75>] ext3_force_commit+0x25/0x30 [ 408.509139] [<c01f4bfc>] ext3_write_inode+0x2c/0x40 [ 408.509241] [<c018e056>] __writeback_single_inode+0x266/0x330 [ 408.509291] [<c018e538>] sync_sb_inodes+0x1c8/0x290 [ 408.509392] [<c018e67a>] sync_inodes_sb+0x7a/0x90 [ 408.509435] [<c0173791>] __fsync_super+0x61/0x70 [ 408.509550] [<c01737ab>] fsync_super+0xb/0x20 [ 408.509593] [<c01739c2>] generic_shutdown_super+0x22/0xd0 [ 408.509695] [<c0173a7f>] kill_block_super+0xf/0x20 [ 408.509737] [<c0173b37>] deactivate_super+0x47/0x60 [ 408.509838] [<c0187994>] mntput_no_expire+0x44/0x60 [ 408.509893] [<c0178cc5>] path_release_on_umount+0x15/0x20 [ 408.510003] [<c0187f81>] sys_umount+0x41/0x210 [ 408.510046] [<c0188167>] sys_oldumount+0x17/0x20 [ 408.510148] [<c0104272>] sysenter_past_esp+0x5f/0xa5 [ 408.510191] ======================= [ 408.510212] Code: c3 37 75 c0 e8 6f 09 fe ff c7 04 24 74 5a 75 c0 e8 c3 8d 74 26 00 8b 4d c0 <8b> 71 04 85 f6 0f 84 78 fe ff ff ff 86 c0 00 00 00 89 f0 e8 05 [ 408.512372] EIP: [<c013ece3>] __lock_acquire+0x1d3/0x1050 SS:ESP 0068:cf32cd3c [ 408.512495] ---[ end trace d536df78a3986ddd ]--- The oops seems to happen in kernel/lockdep.c at: static int __lock_acquire(...) ... if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) { debug_locks_off(); printk("BUG: MAX_LOCKDEP_SUBCLASSES too low!\n"); printk("turning off the locking correctness validator.\n"); return 0; } if (!subclass) class = lock->class_cache; <--- ### here ### /* * Not cached yet or subclass? */
Great, thanks for the extra info.
I submitted a patch containing the following to linux-ext4 mailing list yesterday... it's in -mm already. The above oops is caused by __lock_acquire being called even though handle was set with ERR_PTR... in the oops above, err=-EROFS. diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c index 08ff6c7..038ed74 100644 --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -288,10 +288,12 @@ handle_t *journal_start(journal_t *journal, int nblocks) jbd_free_handle(handle); current->journal_info = NULL; handle = ERR_PTR(err); + goto out; } lock_acquire(&handle->h_lockdep_map, 0, 0, 0, 2, _THIS_IP_); +out: return handle; }
Jonas, thanks. I saw that patch but failed to put two and two together :)
thanks, fixes the bug for me too