Distribution: Hardware Environment: Software Environment: Problem Description: Steps to reproduce: Following is the lkml post from Jesse for this oops on 2.6.12-rc4. Later on the same was tried on 2.6.12-rc4-mm2 with kdump enabled Under heavy filesystem load (I have a stress test program that creates many directories and files and constantly creates, reads, and deletes them) , I get the following oops reliably under both 2.6.11.8 and 2.6.12-rc4. I was also constantly creating/reading/deleting files over NFS (server was 2.6.11.8) at the same time. 2.4.30 runs fine for long periods of time with the same tests. 2.6.12-rc4 and 2.6.11.8 usually fail within an hour. Running a Dell PE2850 (dual 2.8 xeon), Debian woody, ext3, SMP, HT. After the oops, I see kjournald using 99.9% CPU. May 9 15:10:31 markov kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 May 9 15:10:31 markov kernel: printing eip: May 9 15:10:31 markov kernel: c01c019a May 9 15:10:31 markov kernel: *pde = 00000000 May 9 15:10:31 markov kernel: Oops: 0002 [#1] May 9 15:10:31 markov kernel: PREEMPT SMP May 9 15:10:31 markov kernel: Modules linked in: e1000 May 9 15:10:31 markov kernel: CPU: 3 May 9 15:10:31 markov kernel: EIP: 0060:[log_do_checkpoint+142/472] Not tainted VLI May 9 15:10:31 markov kernel: EFLAGS: 00010202 (2.6.12-rc4) May 9 15:10:31 markov kernel: EIP is at log_do_checkpoint+0x8e/0x1d8 May 9 15:10:31 markov kernel: eax: 00000000 ebx: e90605f8 ecx: c2305080 edx: d23ce000 May 9 15:10:31 markov kernel: esi: 00000000 edi: ee3eade4 ebp: d23ce000 esp: d23cfbe8 May 9 15:10:31 markov kernel: ds: 007b es: 007b ss: 0068 May 9 15:10:31 markov kernel: Process pdflush (pid: 1656, threadinfo=d23ce000 task=df323a60) May 9 15:10:31 markov kernel: Stack: c2174814 c2174800 00000fed c2174800 c21748e4 0003af8f e3f106fc f77ce780 May 9 15:10:31 markov kernel: 00000000 00000004 e1d0983c e1d095cc e1d092fc e191459c e19140bc e164038c May 9 15:10:31 markov kernel: e1640ddc e191489c e1640b6c e164068c e126a92c e126a44c e126a1ac e0e9368c May 9 15:10:31 markov kernel: Call Trace: May 9 15:10:31 markov kernel: [__log_wait_for_space+124/164] __log_wait_for_space+0x7c/0xa4 May 9 15:10:31 markov kernel: [start_this_handle+732/812] start_this_handle+0x2dc/0x32c May 9 15:10:31 markov kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 May 9 15:10:31 markov kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 May 9 15:10:31 markov kernel: [journal_start+139/184] journal_start +0x8b/0xb8 May 9 15:10:32 markov kernel: [gcc2_compiled.+44/76] ext3_journal_start_sb+0x2c/0x4c May 9 15:10:32 markov kernel: [ext3_ordered_writepage+106/320] ext3_ordered_writepage+0x6a/0x140 May 9 15:10:32 markov kernel: [mpage_writepages+437/800] mpage_writepages+0x1b5/0x320 May 9 15:10:32 markov kernel: [ext3_ordered_writepage+0/320] ext3_ordered_writepage+0x0/0x140 May 9 15:10:32 markov kernel: [do_writepages+53/60] do_writepages +0x35/0x3c May 9 15:10:32 markov kernel: [__sync_single_inode+101/460] __sync_single_inode+0x65/0x1cc May 9 15:10:32 markov kernel: [__writeback_single_inode+279/292] __writeback_single_inode+0x117/0x124 May 9 15:10:32 markov kernel: [iput+67/108] iput+0x43/0x6c May 9 15:10:32 markov kernel: [sync_sb_inodes+505/712] sync_sb_inodes +0x1f9/0x2c8 May 9 15:10:32 markov kernel: [writeback_inodes+127/216] writeback_inodes+0x7f/0xd8 May 9 15:10:32 markov kernel: [background_writeout+71/156] background_writeout+0x47/0x9c May 9 15:10:32 markov kernel: [gcc2_compiled.+267/424] __pdflush +0x10b/0x1a8 May 9 15:10:32 markov kernel: [pdflush+0/40] pdflush+0x0/0x28 May 9 15:10:32 markov kernel: [pdflush+30/40] pdflush+0x1e/0x28 May 9 15:10:32 markov kernel: [background_writeout+0/156] background_writeout+0x0/0x9c May 9 15:10:32 markov kernel: [kthread+125/172] kthread+0x7d/0xac May 9 15:10:32 markov kernel: [kthread+0/172] kthread+0x0/0xac May 9 15:10:32 markov kernel: [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc May 9 15:10:32 markov kernel: Code: 24 24 00 00 00 00 8b 40 40 89 44 24 1c 8b 50 04 89 54 24 14 8b 58 28 89 df 8b 43 30 89 44 24 18 90 89 fb 8b 7b 2c 8b 33 ff 45 14 <f0> 0f ba 2e 13 19 c0 85 c0 74 30 ff 4d 14 8b 45 08 a8 08 74 05 May 9 15:10:32 markov kernel: <6>note: pdflush[1656] exited with preempt_count 2 Kernel config is attached. Thanks, Jesse
From log_buf Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: = 00000000 Oops: 0002 [#1] PREEMPT SMP Modules linked in: w83781d i2c_sensor i2c_i801 i2c_core e1000 e100 mii CPU: 1 EIP: 0060:[<c01e9451>] Not tainted VLI EFLAGS: 00210213 (2.6.12-rc4-mm2) EIP is at log_do_checkpoint+0xa1/0x230 eax: 00000002 ebx: ea2ee30c ecx: 00000001 edx: e5eb2000 esi: 00000000 edi: d45cd854 ebp: e5eb3dcc esp: e5eb3c90 ds: 007b es: 007b ss: 0068 Process DiskFSTest (pid: 28251, threadinfo=e5eb2000 task=f3a1ba70) Stack: f61054e4 ea2ee888 e5eb3cc0 e5eb3cb8 e5eb3cbc e5eb2000 000c41e3 decb9c64 f7d59500 f61054e4 00000000 00000006 d893ea50 d706c3c0 d893e540 d893eed0 e2added0 e2add9c0 f56bba80 f56bb570 e0108630 f56bb420 f56bbdb0 e0108ea0 Call Trace: [<c01040cf>] show_stack+0x7f/0xa0 [<c0104283>] show_registers+0x163/0x1d0 [<c01044b1>] die+0x101/0x190 [<c0117ec5>] do_page_fault+0x335/0x673 [<c0103cef>] error_code+0x4f/0x54 [<c01e8ebf>] __log_wait_for_space+0x9f/0xc0 [<c01e412f>] start_this_handle+0xff/0x360 [<c01e4488>] journal_start+0xa8/0xd0 [<c01da187>] ext3_create+0x37/0xf0 [c0173d43>] vfs_create+0x83/0xf0 [<c01745c0>] open_namei+0x600/0x660 [<c0162c0c>] filp_open+0x3c/0x60 [<c016309c>] sys_open+0x4c/0xf0 [<c01031d9>] syscall_call+0x7/0xb Code: 04 89 85 dc fe ff ff 8b 5a 28 8b 43 30 89 df 89 85 e0 fe ff ff 90 8d 74 26 00 89 fb b8 01 00 00 00 8b 7f 2c 8b 33 e8 7f 21 f3 ff <f0> 0f ba 2e 13 19 c0 85 c0 0f 85 43 01 00 00 89 5c 24 04 8b 55 <0>Fatal exception: panic in 5 seconds scheduling while atomic: DiskFSTest/0x00000002/28251 [<c010410e>] dump_stack+0x1e/0x30 [<c03d96f4>] schedule+0xd14/0xd20 [<c03da197>] schedule_timeout+0x57/0xb0 [<c012ae90>] msleep+0x40/0x50 [<c0104504>] die+0x154/0x190 [<c0117ec5>] do_page_fault+0x335/0x673 [<c0103cef>] error_code+0x4f/0x54 [<c01e8ebf>] __log_wait_for_space+0x9f/0xc0 [<c01e412f>] start_this_handle+0xff/0x360 [<c01e4488>] journal_start+0xa8/0xd0 [<c01da187>] ext3_create+0x37/0xf0 [<c0173d43>] vfs_create+0x83/0xf0 [<c01745c0>] open_namei+0x600/0x660 [<c0162c0c>] filp_open+0x3c/0x60 [<c016309c>] sys_open+0x4c/0xf0 [<c01031d9>] syscall_call+0x7/0xb Kernel panic - not syncing: Fatal exception " [maneesh@bigbang giles]$ gdb ./vmlinux-2.6.12-rc4-mm2 crashdump-2.6.12-rc4-mm2 GNU gdb Red Hat Linux (6.1post-1.20040607.41rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". #0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 595 processor.h: No such file or directory. in processor.h (gdb) info thread 4 process 0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 3 process 0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 2 process 28251 crash_get_current_regs (regs=0xe5eb3ac8) at arch/i386/kernel/crash.c:99 * 1 process 1052 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 (gdb) bt #0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 #1 0xc01e7355 in journal_commit_transaction (journal=0xf6105400) at fs/jbd/commit.c:302 #2 0xc01eaa14 in kjournald (arg=0xf6105400) at fs/jbd/journal.c:164 #3 0xc0101265 in kernel_thread_helper () at arch/i386/kernel/process.c:302 (gdb) thread 2 [Switching to thread 2 (process 28251)]#0 crash_get_current_regs ( regs=0xe5eb3ac8) at arch/i386/kernel/crash.c:99 99 arch/i386/kernel/crash.c: No such file or directory. in arch/i386/kernel/crash.c (gdb) bt #0 crash_get_current_regs (regs=0xe5eb3ac8) at arch/i386/kernel/crash.c:99 #1 0xc011681a in crash_save_self () at arch/i386/kernel/crash.c:107 #2 0xc014147e in crash_kexec () at kernel/kexec.c:1032 #3 0xc012033b in panic (fmt=0x1 <Address 0x1 out of bounds>) at kernel/panic.c:86 #4 0xc0104510 in die (str=0xc03efda1 "page fault", regs=0xffffffff, err=2) at arch/i386/kernel/traps.c:391 #5 0xc0117ec5 in do_page_fault (regs=0xe5eb3c5c, error_code=2) at arch/i386/mm/fault.c:472 #6 0xc0103cef in error_code () at atomic.h:175 #7 0xea2ee30c in ?? () #8 0x00000001 in ?? () #9 0xe5eb2000 in ?? () #10 0x00000000 in ?? () (gdb) thread 3 [Switching to thread 4 (process 0)]#0 0xc03da9f5 in _spin_lock ( lock=0xf61054e4) at processor.h:595 595 processor.h: No such file or directory. in processor.h (gdb) bt #0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 #1 0xc01e7355 in journal_commit_transaction (journal=0xf6105400) at fs/jbd/commit.c:302 #2 0xc01eaa14 in kjournald (arg=0xf6105400) at fs/jbd/journal.c:164 #3 0xc0101265 in kernel_thread_helper () at arch/i386/kernel/process.c:302 (gdb) thread 4 [Switching to thread 4 (process 0)]#0 0xc03da9f5 in _spin_lock ( lock=0xf61054e4) at processor.h:595 595 in processor.h (gdb) bt #0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 #1 0xc01e7355 in journal_commit_transaction (journal=0xf6105400) at fs/jbd/commit.c:302 #2 0xc01eaa14 in kjournald (arg=0xf6105400) at fs/jbd/journal.c:164 #3 0xc0101265 in kernel_thread_helper () at arch/i386/kernel/process.c:302 (gdb) info thread * 4 process 0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 * 3 process 0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 2 process 28251 crash_get_current_regs (regs=0xe5eb3ac8) at arch/i386/kernel/crash.c:99 1 process 1052 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 (gdb) thread 1 [Switching to thread 1 (process 1052)]#0 0xc03da9f5 in _spin_lock ( lock=0xf61054e4) at processor.h:595 595 in processor.h (gdb) bt #0 0xc03da9f5 in _spin_lock (lock=0xf61054e4) at processor.h:595 #1 0xc01e7355 in journal_commit_transaction (journal=0xf6105400) at fs/jbd/commit.c:302 #2 0xc01eaa14 in kjournald (arg=0xf6105400) at fs/jbd/journal.c:164 #3 0xc0101265 in kernel_thread_helper () at arch/i386/kernel/process.c:302 (gdb) print *(journal_t *) 0xf6105400 $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xf6204c90, j_superblock = 0xf37b5000, j_format_version = 2, j_state_lock = {slock = 1, break_lock = 0}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, sleepers = 0, wait = {lock = {slock = 1, break_lock = 0}, task_list = { next = 0xf6105430, prev = 0xf6105430}}}, j_running_transaction = 0x0, j_committing_transaction = 0xf7d59900, j_checkpoint_transactions = 0xf52bb100, j_wait_transaction_locked = {lock = { slock = 1, break_lock = 0}, task_list = {next = 0xf610544c, prev = 0xf610544c}}, j_wait_logspace = {lock = {slock = 1, break_lock = 0}, task_list = {next = 0xf610545c, prev = 0xf610545c}}, j_wait_done_commit = {lock = {slock = 1, break_lock = 0}, task_list = { next = 0xf610546c, prev = 0xf610546c}}, j_wait_checkpoint = {lock = { slock = 1, break_lock = 0}, task_list = {next = 0xf610547c, prev = 0xf610547c}}, j_wait_commit = {lock = {slock = 1, break_lock = 0}, task_list = {next = 0xf610548c, prev = 0xf610548c}}, j_wait_updates = {lock = {slock = 1, break_lock = 0}, task_list = { next = 0xf610549c, prev = 0xf610549c}}, j_checkpoint_sem = {count = { counter = -1}, sleepers = 1, wait = {lock = {slock = 1, break_lock = 0}, task_list = {next = 0xe92fdbb0, prev = 0xf18f3c58}}}, j_head = 6324, j_tail = 7209, j_free = 885, j_first = 1, j_last = 8192, j_dev = 0xf7d33100, j_blocksize = 4096, j_blk_offset = 0, j_fs_dev = 0xf7d33100, j_maxlen = 8192, j_list_lock = {slock = 0, break_lock = 1}, j_inode = 0xf523d158, j_tail_sequence = 803299, j_transaction_sequence = 803305, j_commit_sequence = 803303, j_commit_request = 803304, j_uuid = '\0' <repeats 15 times>, j_task = 0xf7e69030, j_max_transaction_buffers = 2048, j_commit_interval = 5000, j_commit_timer = 0xf6197fb8, j_revoke_lock = { slock = 1, break_lock = 0}, j_revoke = 0xc5aac4dc, j_revoke_table = { 0xc5aac4dc, 0xc5aac4d0}, j_wbuf = 0xc5aad000, j_wbufsize = 512, j_private = 0xf6105c00}
Hmm, I must say I don't quite understand the crash dump because the Oops is in log_do_checkpoint() while the dump from the crash dump shows kjournald is stuck on spin_lock(j_list_lock) in journal_commit_transaction(). It would be useful to find out where is log_do_checkpoint+0xa1/0x230 (e.g. line number). I have also attached patch (untested but trivial - famous last words ;) that can help it this case. Would it be possible to test it on the crashing machine?
Created attachment 5103 [details] Move journal_remove_journal_head into lock_bh_state. This patch could close some races.
I requested Jesse Giles <JGilles@starkinvestments.com> to test the patch. Meanwhile here is some more analysis form the the dump It looks like thread 2 (cpu 1) has crashed.. where as other cpus are spinning for some spinlock (gdb) disassemble log_do_checkpoint Dump of assembler code for function log_do_checkpoint: 0xc01e93b0 <log_do_checkpoint+0>: push %ebp 0xc01e93b1 <log_do_checkpoint+1>: mov %esp,%ebp 0xc01e93b3 <log_do_checkpoint+3>: push %edi 0xc01e93b4 <log_do_checkpoint+4>: push %esi 0xc01e93b5 <log_do_checkpoint+5>: xor %esi,%esi 0xc01e93b7 <log_do_checkpoint+7>: push %ebx 0xc01e93b8 <log_do_checkpoint+8>: sub $0x130,%esp 0xc01e93be <log_do_checkpoint+14>: mov 0x8(%ebp),%eax 0xc01e93c1 <log_do_checkpoint+17>: mov %esi,0xfffffeec(%ebp) 0xc01e93c7 <log_do_checkpoint+23>: mov %eax,(%esp) 0xc01e93ca <log_do_checkpoint+26>: call 0xc01e95e0 <cleanup_journal_tail> 0xc01e93cf <log_do_checkpoint+31>: test %eax,%eax 0xc01e93d1 <log_do_checkpoint+33>: mov %eax,%edx 0xc01e93d3 <log_do_checkpoint+35>: jle 0xc01e94f6 <log_do_checkpoint+326> 0xc01e93d9 <log_do_checkpoint+41>: mov 0x8(%ebp),%edx 0xc01e93dc <log_do_checkpoint+44>: add $0xe4,%edx 0xc01e93e2 <log_do_checkpoint+50>: mov %edx,%eax 0xc01e93e4 <log_do_checkpoint+52>: mov %edx,0xfffffee8(%ebp) 0xc01e93ea <log_do_checkpoint+58>: call 0xc03da970 <_spin_lock> 0xc01e93ef <log_do_checkpoint+63>: mov 0x8(%ebp),%edx 0xc01e93f2 <log_do_checkpoint+66>: mov 0x40(%edx),%eax 0xc01e93f5 <log_do_checkpoint+69>: test %eax,%eax 0xc01e93f7 <log_do_checkpoint+71>: je 0xc01e94d9 <log_do_checkpoint+297> 0xc01e93fd <log_do_checkpoint+77>: mov $0xffffe000,%edx 0xc01e9402 <log_do_checkpoint+82>: and %esp,%edx 0xc01e9404 <log_do_checkpoint+84>: mov %edx,0xfffffed8(%ebp) 0xc01e940a <log_do_checkpoint+90>: lea 0x0(%esi),%esi 0xc01e9410 <log_do_checkpoint+96>: mov %eax,0xfffffee4(%ebp) 0xc01e9416 <log_do_checkpoint+102>: xor %ebx,%ebx 0xc01e9418 <log_do_checkpoint+104>: mov 0xfffffee4(%ebp),%edx 0xc01e941e <log_do_checkpoint+110>: mov %ebx,0xfffffef0(%ebp) 0xc01e9424 <log_do_checkpoint+116>: mov 0x4(%eax),%eax 0xc01e9427 <log_do_checkpoint+119>: mov %eax,0xfffffedc(%ebp) 0xc01e942d <log_do_checkpoint+125>: mov 0x28(%edx),%ebx 0xc01e9430 <log_do_checkpoint+128>: mov 0x30(%ebx),%eax 0xc01e9433 <log_do_checkpoint+131>: mov %ebx,%edi 0xc01e9435 <log_do_checkpoint+133>: mov %eax,0xfffffee0(%ebp) 0xc01e943b <log_do_checkpoint+139>: nop 0xc01e943c <log_do_checkpoint+140>: lea 0x0(%esi),%esi 0xc01e9440 <log_do_checkpoint+144>: mov %edi,%ebx 0xc01e9442 <log_do_checkpoint+146>: mov $0x1,%eax 0xc01e9447 <log_do_checkpoint+151>: mov 0x2c(%edi),%edi 0xc01e944a <log_do_checkpoint+154>: mov (%ebx),%esi ---Type <return> to continue, or q <return> to quit--- 0xc01e944c <log_do_checkpoint+156>: call 0xc011b5d0 <add_preempt_count> 0xc01e9451 <log_do_checkpoint+161>: lock btsl $0x13,(%esi) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Crash here as esi=0 0xc01e9456 <log_do_checkpoint+166>: sbb %eax,%eax 0xc01e9458 <log_do_checkpoint+168>: test %eax,%eax 0xc01e945a <log_do_checkpoint+170>: jne 0xc01e95a3 <log_do_checkpoint+499> 0xc01e9460 <log_do_checkpoint+176>: mov %ebx,0x4(%esp) 0xc01e9464 <log_do_checkpoint+180>: mov 0x8(%ebp),%edx 0xc01e9467 <log_do_checkpoint+183>: lea 0xfffffef0(%ebp),%eax From the disassembly it looks like we are in jbd_trylock_bh_state(), in log_do_checkpoint(), and bh is NULL next_jh = jh->b_cpnext; bh = jh2bh(jh); if (!jbd_trylock_bh_state(bh)) { jbd_sync_bh(journal, bh); spin_lock(&journal->j_list_lock); journal_head is in ebx, which has b_bh = 0, (gdb) print *(struct journal_head *) 0xea2ee30c $1 = {b_bh = 0x0, b_jcount = 0, b_jlist = 0, b_modified = 0, b_frozen_data = 0x0, b_committed_data = 0x0, b_transaction = 0x0, b_next_transaction = 0x0, b_tnext = 0xea2ee888, b_tprev = 0xea1f4c64, b_cp_transaction = 0x0, b_cpnext = 0xd45cd854, b_cpprev = 0xdecb9c64} transaction = journal->j_checkpoint_transactions = 0xf52bb100 (gdb) print *(transaction_t *) 0xf52bb100 $2 = {t_journal = 0xf6105400, t_tid = 803301, t_state = T_FINISHED, t_log_start = 235, t_nr_buffers = 0, t_reserved_list = 0x0, t_locked_list = 0x0, t_buffers = 0x0, t_sync_datalist = 0x0, t_forget = 0x0, t_checkpoint_list = 0xd94cbc30, t_iobuf_list = 0x0, t_shadow_list = 0x0, t_log_list = 0x0, t_handle_lock = {slock = 1, break_lock = 0}, t_updates = 0, t_outstanding_credits = 0, t_cpnext = 0xf7d59580, t_cpprev = 0xf7d59980, t_expires = 247686436, t_handle_count = 65650} If needed, we can loop through t_checkpoint_list, but looks like it is quite huge to paste here..
Great, thanks for the analysis. It confirms that freed jh somehow got into the checkpoint list. Hopefuly the attached patch helps. If not we have to look again...
Hi Jan, Jesse tried the patch but it still oopsed at the same location. The new oops is as below. We have got one more kdump due to this new oops. Any suggestions to look at some specific things related to journelling or ext3? ================================ Jun 2 11:37:42 salk kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Jun 2 11:37:42 salk kernel: printing eip: Jun 2 11:37:42 salk kernel: c01e9451 Jun 2 11:37:42 salk kernel: *pde = 00000000 Jun 2 11:37:42 salk kernel: Oops: 0002 [#1] Jun 2 11:37:42 salk kernel: PREEMPT SMP Jun 2 11:37:42 salk kernel: Modules linked in: w83781d i2c_sensor i2c_i801 i2c_core e1000 e100 mii Jun 2 11:37:42 salk kernel: CPU: 1 Jun 2 11:37:42 salk kernel: EIP: 0060:[log_do_checkpoint+161/560] Not tainted VLI Jun 2 11:37:42 salk kernel: EFLAGS: 00210213 (2.6.12-rc4-mm2-patched) Jun 2 11:37:42 salk kernel: EIP is at log_do_checkpoint+0xa1/0x230 Jun 2 11:37:42 salk kernel: eax: 00000002 ebx: e531f680 ecx: 00000001 edx: d5664000 Jun 2 11:37:42 salk kernel: esi: 00000000 edi: e983cd34 ebp: d5665dcc esp: d5665c90 Jun 2 11:37:42 salk kernel: ds: 007b es: 007b ss: 0068 Jun 2 11:37:42 salk kernel: Process DiskFSTest (pid: 2576, threadinfo=d5664000 task=f7c65550) Jun 2 11:37:42 salk kernel: Stack: f741cae4 f73e05e4 d5665cc0 d5665cb8 d5665cbc d5664000 000c485b ebd54958 Jun 2 11:37:42 salk kernel: f74b9880 f741cae4 00000000 00000006 e70c99f0 e70c9330 e70c9cc0 e70c9c00 Jun 2 11:37:42 salk kernel: e70c9960 e70c9c30 e70c9c60 e70c9c90 e70c9cf0 ecf0ce40 e70c9d20 e70c9d80 Jun 2 11:37:42 salk kernel: Call Trace: Jun 2 11:37:42 salk kernel: [show_stack+127/160] show_stack+0x7f/0xa0 Jun 2 11:37:42 salk kernel: [show_registers+355/464] show_registers+0x163/0x1d0 Jun 2 11:37:42 salk kernel: [die+257/400] die+0x101/0x190 Jun 2 11:37:42 salk kernel: [do_page_fault+821/1651] do_page_fault+0x335/0x673 Jun 2 11:37:42 salk kernel: [error_code+79/84] error_code+0x4f/0x54 Jun 2 11:37:42 salk kernel: [__log_wait_for_space+159/192] __log_wait_for_space+0x9f/0xc0 Jun 2 11:37:42 salk kernel: [start_this_handle+255/864] start_this_handle+0xff/0x360 Jun 2 11:37:42 salk kernel: [journal_start+168/208] journal_start+0xa8/0xd0 Jun 2 11:37:42 salk kernel: [ext3_create+55/240] ext3_create+0x37/0xf0 Jun 2 11:37:42 salk kernel: [vfs_create+131/240] vfs_create+0x83/0xf0 Jun 2 11:37:42 salk kernel: [open_namei+1536/1632] open_namei+0x600/0x660 Jun 2 11:37:42 salk kernel: [filp_open+60/96] filp_open+0x3c/0x60 Jun 2 11:37:42 salk kernel: [sys_open+76/240] sys_open+0x4c/0xf0 Jun 2 11:37:42 salk kernel: [syscall_call+7/11] syscall_call+0x7/0xb Jun 2 11:37:42 salk kernel: Code: 04 89 85 dc fe ff ff 8b 5a 28 8b 43 30 89 df 89 85 e0 fe ff ff 90 8d 74 26 00 89 fb b8 01 00 00 00 8b 7f 2c 8b 33 e8 7f 21 f3 ff <f0> 0f ba 2e 13 19 c0 85 c0 0f 85 43 01 00 00 89 5c 24 04 8b 55 Jun 2 11:37:42 salk kernel: <0>Fatal exception: panic in 5 seconds Jun 2 11:37:42 salk kernel: scheduling while atomic: DiskFSTest/0x00000002/2576 Jun 2 11:37:42 salk kernel: [dump_stack+30/48] dump_stack+0x1e/0x30 Jun 2 11:37:42 salk kernel: [schedule+3348/3360] schedule+0xd14/0xd20 Jun 2 11:37:42 salk kernel: [schedule_timeout+87/176] schedule_timeout+0x57/0xb0 Jun 2 11:37:42 salk kernel: [msleep+64/80] msleep+0x40/0x50 Jun 2 11:37:42 salk kernel: [die+340/400] die+0x154/0x190 Jun 2 11:37:42 salk kernel: [do_page_fault+821/1651] do_page_fault+0x335/0x673 Jun 2 11:37:42 salk kernel: [error_code+79/84] error_code+0x4f/0x54 Jun 2 11:37:42 salk kernel: [__log_wait_for_space+159/192] __log_wait_for_space+0x9f/0xc0 Jun 2 11:37:42 salk kernel: [start_this_handle+255/864] start_this_handle+0xff/0x360 Jun 2 11:37:42 salk kernel: [journal_start+168/208] journal_start+0xa8/0xd0 Jun 2 11:37:42 salk kernel: [ext3_create+55/240] ext3_create+0x37/0xf0 Jun 2 11:37:42 salk kernel: [vfs_create+131/240] vfs_create+0x83/0xf0 Jun 2 11:37:42 salk kernel: [open_namei+1536/1632] open_namei+0x600/0x660 Jun 2 11:37:42 salk kernel: [filp_open+60/96] filp_open+0x3c/0x60 Jun 2 11:37:42 salk kernel: [sys_open+76/240] sys_open+0x4c/0xf0 Jun 2 11:37:42 salk kernel: [syscall_call+7/11] syscall_call+0x7/0xb ================================
Created attachment 5117 [details] config file for patched kernel
Created attachment 5118 [details] hardware information about the test setup
Linus merged a couple of patches today which should fix this. ext3: fix list scanning in __cleanup_transaction ext3: fix log_do_checkpoint() assertion failure
If you would like to check the kdump yourself, the 06/02 kdump is provided by Jesse at this location alongwith the kernels used people.msoe.edu/~gillesj
Thanks Andrew, for pointing this out. So, these two patches should be tried in-addition to the patch mentioned in comment #3 or the older patch has to be backed out?
please test Linus's current tree.
Thanks for testing with my patch. Please test also with the current Linus's kernel as Andrew wrote. I don't belive much that the bug is fixed (as the bugs that got fixed manifested themselves differently) but maybe something mystical is happening there and the fixes will help ;). If the bug is still there - can you maybe send me the programs triggering it and I'll try to find some SMP machine around and try to reproduce it myself?
Got the following response from Jesse: From jgilles@starkinvestments.com Tue Jun 7 09:17:19 2005 Subject: Re: Fw: Re: [Ext2-devel] Re: oops in log_do_checkpoint with 2.6.11.8 and 2.6.12-rc4 From: Jesse Gilles <jgilles@starkinvestments.com> To: maneesh@in.ibm.com Date: Mon, 06 Jun 2005 17:16:57 -0500 X-Mailer: Evolution 2.2.1.1 Maneesh, I tested both 2.6.12-rc5-mm2 and 2.6.12-rc5-git8 as they both contain the ext3 patches Andrew referred to. 2.6.12-rc5-mm2 crashed and I was able to get a kdump again--it is posted at the usual place. Surprisingly, I was not able to reproduce the problem with 2.6.12-rc5-git8. Thanks, Jesse On Fri, 2005-06-03 at 15:22 +0530, Maneesh Soni wrote: > Hi Jesse, > > There has been more activies in this bug. Could you please try the two > patches (I think you need to download it from Linus's git tree) as mentioned > in the bugzilla. > > http://bugme.osdl.org/show_bug.cgi?id=4683 > > Thanks > Maneesh > >
Jan, Looks like 2.6.12-rc5-git8 fixes the bug, so should we consider this as closed or you would like to get the 2.6.12-rc5-mm2 kdump analised?
I let the tests on 2.6.12-rc5-git8 keep running after I mailed Maneesh yesterday, and it ended up crashing overnight. Previous tests have caused crashes usually within 30 min to an hour and after running tests on rc5-git8 all day yesterday, I thought it was fixed. Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c01df5be *pde = 00000000 Oops: 0002 [#1] PREEMPT SMP Modules linked in: w83781d i2c_sensor i2c_i801 i2c_core e1000 e100 mii CPU: 1 EIP: 0060:[<c01df5be>] Not tainted VLI EFLAGS: 00210202 (2.6.12-rc5-git8) EIP is at log_do_checkpoint+0x8e/0x220 eax: d704c000 ebx: f5dfcbdc ecx: 00000000 edx: d704c000 esi: 00000000 edi: e25089a0 ebp: e21075c4 esp: d704dcb0 ds: 007b es: 007b ss: 0068 Process DiskFSTest (pid: 31741, threadinfo=d704c000 task=d914c020) Stack: f7f1d2e4 d55585f8 d704dcdc d704dcd4 d704dcd8 d704c000 000c5e5b f76bd080 f7f1d2e4 00000000 0000001b cb60659c e060e8cc d72b60bc c7aced1c d4b2274c d7cf7e3c d75d59bc c4ab5b6c d2a22b9c c3ac8b9c e29261dc e4e0611c d34494dc Call Trace: [<c01df06f>] __log_wait_for_space+0x9f/0xc0 [<c01da362>] start_this_handle+0x132/0x3f0 [<c0130c50>] autoremove_wake_function+0x0/0x60 [<c01ceab5>] ext3_find_entry+0x235/0x440 [<c0130c50>] autoremove_wake_function+0x0/0x60 [<c01da71d>] journal_start+0xad/0xe0 [<c01d0437>] ext3_create+0x37/0xf0 [<c016ba77>] vfs_create+0x87/0xd0 [<c016c2c1>] open_namei+0x5f1/0x650 [<c014ab78>] free_pgtables+0x78/0x90 [<c015b08e>] filp_open+0x3e/0x70 [<c015b3c2>] get_unused_fd+0xa2/0xd0 [<c015b4e9>] sys_open+0x49/0x90 [<c0102f45>] syscall_call+0x7/0xb Code: 1c 31 f6 8b 54 24 1c 89 74 24 28 8b 40 04 89 44 24 18 8b 5a 28 8b 6b 30 89 df 8d 76 00 8b 44 24 14 89 fb 8b 33 8b 7f 2c ff 40 14 <f0> 0f ba 2e 13 19 c0 85 c0 0f 85 46 01 00 0 0 89 5c 24 04 8d 44 <6>note: DiskFSTest[31741] exited with preempt_count 2 scheduling while atomic: DiskFSTest/0x10000002/31741 [<c03aa9d4>] schedule+0xcd4/0xce0 [<c0152105>] page_remove_rmap+0x35/0x50 [<c014b11e>] zap_pte_range+0x11e/0x280 [<c0153cc1>] free_pages_and_swap_cache+0x71/0xa0 [<c03ab127>] cond_resched+0x27/0x40 [<c014b4cb>] unmap_vmas+0x18b/0x280 [<c01501ff>] exit_mmap+0x9f/0x190 [<c01191b8>] mmput+0x38/0xa0 [<c011e02f>] do_exit+0xbf/0x3a0 [<c0104228>] die+0x188/0x190 [<c01137f3>] do_page_fault+0x2e3/0x5b9 [<c0124fa6>] run_timer_softirq+0x126/0x1c0 [<c0120996>] __do_softirq+0xd6/0xf0 [<c0120ad5>] irq_exit+0x45/0x50 [<c0103990>] apic_timer_interrupt+0x1c/0x24 [<c0113510>] do_page_fault+0x0/0x5b9 [<c0103a37>] error_code+0x4f/0x54 [<c01df5be>] log_do_checkpoint+0x8e/0x220 [<c01df06f>] __log_wait_for_space+0x9f/0xc0 [<c01da362>] start_this_handle+0x132/0x3f0 [<c0130c50>] autoremove_wake_function+0x0/0x60 [<c01ceab5>] ext3_find_entry+0x235/0x440 [<c0130c50>] autoremove_wake_function+0x0/0x60 [<c01da71d>] journal_start+0xad/0xe0 [<c01d0437>] ext3_create+0x37/0xf0 [<c016ba77>] vfs_create+0x87/0xd0 [<c016c2c1>] open_namei+0x5f1/0x650 [<c014ab78>] free_pgtables+0x78/0x90 [<c015b08e>] filp_open+0x3e/0x70 [<c015b3c2>] get_unused_fd+0xa2/0xd0 [<c015b4e9>] sys_open+0x49/0x90 [<c0102f45>] syscall_call+0x7/0xb
I guess, we are seeing same problems with 2.6.12-rc5-mm2 and 2.6.12-rc5-git8. As we have a crashdump from 2.6.12-rc5-mm2, I looked at it. The analysis is quite similar as we have with earlier dumps in comment #1 and comment #4. Please look at the below attached log. This time also it is crashing at the same place, that is in log_do_checkpoint() ===>jbd_trylock_bh_state() because of journal head with NULL buffer_head pointer.
Created attachment 5137 [details] Analysis for 0606 kdump from 2.6.12-rc5-mm2 kernel
OK, thanks for the analysis. I'm going to look into new ways how this can happen. I will attach to this bug a debugging patch that should help in tracing the problem - it will show the name of the caller that caused that jh->j_bh is NULL. Can you please try the kernel with this patch applied and send me an output when the crash happens? Thanks.
Created attachment 5143 [details] Patch for debugging journal lists operations
I ran the tests with the debugging patch and I will attach the output from the crash. I used 2.6.12-rc5 with the patch.
Created attachment 5145 [details] crash output with debugging patch
Thanks for the debug output. I don't know what I expected but this output was certainly unexpected ;). It was so strange that I started suspecting the locking code itself and though I don't understand it too much I think I've found the problem. In cond_resched_lock() we don't return 1 everytime we drop the lock. As JBD is actually the only place using this behaviour there's no wonder we did not spot it earlier... Anyway try the attached patch if it fixes the problem.
Created attachment 5157 [details] Make rescheduling function return 1 whenever it drops the lock
I ran the tests for quite a while and this patch seems to have fixed things. Thanks!
Perfect. Finally we've solved it. Thanks for testing.
Closing the bug. Thanks everybody.