Bug 4963

Summary: Encounter this Kernel oops while running file system race around test suite
Product: File System Reporter: Sharyathi (sharyathi)
Component: JFSAssignee: Dave Kleikamp (shaggy)
Status: REJECTED DUPLICATE    
Severity: normal CC: bnpoorni, maneesh, raosanth, sglass
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-rc3-mm2 Subsystem:
Regression: --- Bisected commit-id:

Description Sharyathi 2005-07-29 06:31:10 UTC
Distribution:
SLES 9 SP1
------------------------
Hardware Environment:
SMP, 2 way Xeon TM 2.8GHz, 1.5G RAM
Network Interface (Tigon3)
Disk I/O
SCSI storage controller: IBM server RAID
------------------------
Software Environment:
1.kernel  linux-2.6.13-rc3-mm2
-------------------------
Problem Description:
  While Running File system Race test on file systems ext2,ext3,reiserfs,js and
xfs encountered this oops message

PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file:
Modules linked in:
CPU:    1
EIP:    0060:[<c024f6f5>]    Not tainted VLI
EFLAGS: 00010296   (2.6.13-rc3-mm2-I)
EIP is at txLock+0x3b5/0x530
eax: d244d088   ebx: 00000000   ecx: d244d02c   edx: d483742c
esi: 000002d8   edi: f88ed600   ebp: ca40da20   esp: ca40d9b4
ds: 007b   es: 007b   ss: 0068
Process dir_create.sh (pid: 26440, threadinfo=ca40c000 task=c6926530)
Stack: 00000286 000256b4 000256b8 ca40d983 c049fe41 ca40d9e0 00000000 d244d02c
       00000001 00000000 00000000 ca40da00 c02416d8 00000000 ca40da20 c018c16d
       d244d258 d244d074 00000000 00000000 00000001 d244d02c 00000000 f657a800
Call Trace:
 [<c0103fbb>] show_stack+0xab/0xc0
 [<c0104154>] show_registers+0x164/0x1d0
 [<c0104392>] die+0x122/0x1c0
 [<c04a089c>] do_page_fault+0x3dc/0x61d
 [<c0103c4b>] error_code+0x4f/0x54
 [<c02358b5>] xtInsert+0x365/0x590
 [<c024a27a>] extAlloc+0x26a/0x3f0
 [<c0230b8a>] jfs_get_blocks+0x21a/0x2c0
 [<c0230c66>] jfs_get_block+0x36/0x40
 [<c016a658>] nobh_prepare_write+0xe8/0x420
 [<c0230d48>] jfs_prepare_write+0x28/0x30
 [<c0147119>] generic_file_buffered_write+0x1a9/0x680
 [<c014790a>] __generic_file_aio_write_nolock+0x31a/0x580
 [<c0147cb0>] __generic_file_write_nolock+0x90/0xb0
 [<c0147ec2>] generic_file_write+0x52/0xd0
 [<c0166168>] vfs_write+0xb8/0x190
 [<c016630b>] sys_write+0x4b/0x80
 [<c01030ff>] sysenter_past_esp+0x54/0x75<2>EXT2-fs error (device dm-0):
ext2_readdir: bad page in #1014

Code: 89 72 50 b8 78 eb 51 <2>EXT2-fs error (device dm-0): ext2_readdir: bad
page in #1015844
c0 e8 99 01 25 00 8b 45 b0 8b 15 70 0a 5f c0 83 c0 5c a3 70 0a 5f c0 8b 4d b0 89
50 04 c7 41 5c 6c 0a 54
05 25 00 e9 5a ff ff ff 8b 4d 10 66
Comment 1 Dave Kleikamp 2005-07-29 08:38:27 UTC
Would it be possible to determine where in txLock the trap occurred?  At the
least, could you "make fs/jfs/jfs_txnmgr.s" so that I can compare the failing
code to the assembler listing?  I don't have a xeon system to compile it on myself.
Comment 2 Andrew Morton 2005-07-29 11:21:06 UTC
OK, well there's one known bug in the BIO layer which could cause the dm
layer to return bad data.  That might explain the ext2 warning, although
JFS shouldn't be oopsing.  I'd suggest that you add
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc3/2.6.13-rc3-mm3/broken-out/bio_clone-fix.patch.
 That's already in 2.6.13-rc4.

Comment 3 Sharyathi 2005-07-31 22:29:41 UTC
I had captured the dump and here is the preliminary analysis over the dump.
Let me know If you need more information
Meanwhile I will test on the rc4-mm1 and let you know, if this bug is fixed
-----------------------------------
GNU gdb Red Hat Linux (6.3.0.0-0.31rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".
-----------------------------
#0  0xc01e3655 in journal_unmap_buffer (journal=0xc5f92200, bh=0xdfd95c9c) at
include/asm/bitops.h:143
143             __asm__ __volatile__( LOCK_PREFIX
-----------------------------
(gdb) info thread
  4 process 1122  submit_bh (rw=1, bh=0xe3bc8754) at fs/buffer.c:2752
  3 process 0  0xc01e3655 in journal_unmap_buffer (journal=0xc5f92200,
bh=0xdfd95c9c)
    at include/asm/bitops.h:143
  2 process 0  0xc01e3655 in journal_unmap_buffer (journal=0xc5f92200,
bh=0xdfd95c9c)
    at include/asm/bitops.h:143
* 1 process 2284  0xc01e3655 in journal_unmap_buffer (journal=0xc5f92200,
bh=0xdfd95c9c)
    at include/asm/bitops.h:143
-----------------------------
(gdb) bt
#0  0xc01e3655 in journal_unmap_buffer (journal=0xc5f92200, bh=0xdfd95c9c) at
include/asm/bitops.h:143
#1  0xc01e3909 in journal_invalidatepage (journal=0xc5f92200, page=0xc52385a0,
offset=0)
    at fs/jbd/transaction.c:1906
#2  0xc0155b16 in do_invalidatepage (page=0xc5e98030, offset=3755564188) at
mm/truncate.c:25
#3  0xc0155b60 in truncate_complete_page (mapping=0xc5e98030, page=0xc52385a0)
at mm/truncate.c:52
#4  0xc0155d25 in truncate_inode_pages_range (mapping=0xf5ed7850,
lstart=warning: Unhandled dwarf expresion opcode DW_OP_piece
3755564188, lend=warning: Unhandled dwarf expresion opcode DW_OP_piece
-987527776)
    at mm/truncate.c:162
#5  0xc0155ee7 in truncate_inode_pages (mapping=0xc5e98030, lstart=warning:
Unhandled dwarf expresion opcode DW_OP_piece
3755564188) at mm/truncate.c:219
#6  0xc015b405 in vmtruncate (inode=0xf5ed7780, offset=warning: Unhandled dwarf
expresion opcode DW_OP_piece
444096111) at mm/memory.c:1587
#7  0xc0186da8 in inode_setattr (inode=0xf5ed7780, attr=0xee70cf4c) at fs/attr.c:74
#8  0xc01d5cf9 in ext3_setattr (dentry=0xc5e98030, attr=0xee70cf4c) at
fs/ext3/inode.c:2807
#9  0xc0187038 in notify_change (dentry=0xf6e88414, attr=0xee70cf4c) at
fs/attr.c:160
#10 0xc0168bd3 in do_truncate (dentry=0xf6e88414, length=warning: Unhandled
dwarf expresion opcode DW_OP_piece
3755564188) at fs/open.c:210
#11 0xc016920b in sys_ftruncate64 (fd=3320414256, length=warning: Unhandled
dwarf expresion opcode DW_OP_piece
444096111) at fs/open.c:317
#12 0xc01040bf in sysenter_past_esp () at include/asm/current.h:9
#13 0x00000005 in ?? ()
#14 0x1a785e6f in ?? ()
#15 0x00000000 in ?? ()
-----------------------------
(gdb) list journal_unmap_buffer
1734     *
1735     * We're outside-transaction here.  Either or both of j_running_transaction
1736     * and j_committing_transaction may be NULL.
1737     */
1738    static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
1739    {
1740            transaction_t *transaction;
1741            struct journal_head *jh;
1742            int may_free = 1;
1743            int ret;
-----------------------------
Comment 4 Dave Kleikamp 2005-08-03 11:19:35 UTC
The first oops reported looks a lot like bug #4987.  Both appear to be trapping
on a store of a 16-bit value.  Both show the current instruction without the 66
prefix, making it look like a 32-bit store.  (Does the processor increment the
IP past the prefix?)  I don't why we're seeing an oops.  In both cases the
register being dereferenced has already been dereferenced.  In neither case
should the structure be freed by another thread.  Both bug reports are missing
the beginning part of the oops text.

The second oops is in ext3, and I haven't looked too closely at it.
Comment 5 Sharyathi 2005-08-03 23:18:07 UTC
I am still not able to replicate the bug in 2.6.13-rc4 kernel, will update you
on encountering the problem
Comment 6 Andrew Morton 2005-08-04 16:21:29 UTC
Sooo...  where do we stand on this bug now?
Comment 7 Dave Kleikamp 2005-08-05 12:06:10 UTC
This is still under investigation.  The problem doesn't appear in 2.6.13-rc4,
but a similar oops happens in 2.6.13-rc4-mm1 (bug #4987).
Comment 8 Sonny Rao 2005-08-05 12:21:10 UTC
So, I took a quick look at this, compiled 2.6.13-rc3-mm2 on a P4 Xeon box --
though I don't know exactly what options were in the config so my version could
be slightly different from the version Sharyathi compiled.

I haven't reproduced the problem myself so hopefully this isn't all 
totally bogus analysis  ;-)

It looks like we could be oopsing in this piece of code:

	/* initialize type dependent area for linelock */
	linelock = (struct linelock *) & tlck->lock;
	linelock->next = 0;
	linelock->flag = tlckLINELOCK;
	linelock->maxcnt = TLOCKSHORT;
	linelock->index = 0;

from jfs_txnmgr.c: lines 765-770

Here's output of objdump -ld on jfs_txnmgr.o looking at roughly the area
where the oops occurs:

fs/jfs/jfs_txnmgr.c:747
    1180:       8b 5c 24 10             mov    0x10(%esp),%ebx
    1184:       66 89 73 1a             mov    %si,0x1a(%ebx)
fs/jfs/jfs_txnmgr.c:766
    1188:       8d 55 10                lea    0x10(%ebp),%edx
fs/jfs/jfs_txnmgr.c:767
    118b:       66 c7 45 10 00 00       movw   $0x0,0x10(%ebp)
fs/jfs/jfs_txnmgr.c:768
    1191:       66 c7 42 04 00 20       movw   $0x2000,0x4(%edx)
fs/jfs/jfs_txnmgr.c:769
    1197:       c6 42 02 14             movb   $0x14,0x2(%edx)
fs/jfs/jfs_txnmgr.c:770
    119b:       c6 42 03 00             movb   $0x0,0x3(%edx)


To me it looked like the oopsing instruction was at 0x1191 where we first
dereference %edx.  I'm confused why the instruction before that dereferences
%ebp (instead of %edx) for line 767.  

Basically, this would mean that the linelock pointer is either somehow invalid,
which seems to be unlikely, or possibly incorrectly aligned?  You're playing
some interesting games with the whole overlay thing, maybe a union would be more
appropriate there?

I'll try and reproduce on a my xeon machine...
Comment 9 Dave Kleikamp 2005-08-05 14:45:33 UTC
I think the code looks closer to lines 757-761:

    757                         jfs_ip->atltail = lid;
    758                         TXN_LOCK();
    759                         list_add_tail(&jfs_ip->anon_inode_list,
    760                                       &TxAnchor.anon_list);
    761                         TXN_UNLOCK();

fs/jfs/jfs_txnmgr.c:757
    12eb:       8b 45 a0                mov    0xffffffa0(%ebp),%eax
    12ee:       66 89 43 50             mov    %ax,0x50(%ebx)
fs/jfs/jfs_txnmgr.c:758
    12f2:       b8 1c 00 00 00          mov    $0x1c,%eax
    12f7:       e8 fc ff ff ff          call   12f8 <txLock+0x5f8>
include/linux/list.h:54
    12fc:       c7 43 5c 6c 00 00 00    movl   $0x6c,0x5c(%ebx)
include/linux/list.h:52
    1303:       8b 15 70 00 00 00       mov    0x70,%edx
include/linux/list.h:81
    1309:       89 d8                   mov    %ebx,%eax
    130b:       83 c0 5c                add    $0x5c,%eax
include/linux/list.h:53
    130e:       a3 70 00 00 00          mov    %eax,0x70
include/linux/list.h:56
    1313:       89 02                   mov    %eax,(%edx)
include/linux/list.h:55
    1315:       89 50 04                mov    %edx,0x4(%eax)
fs/jfs/jfs_txnmgr.c:761
    1318:       b8 1c 00 00 00          mov    $0x1c,%eax
    131d:       e8 fc ff ff ff          call   131e <txLock+0x61e>
    1322:       e9 d2 fe ff ff          jmp    11f9 <txLock+0x4f9>

It's hard to tell for sure, since it compiled differently for each of us.  The
actual instruction it trapped on is:
 89 72 50                  mov    %esi,0x50(%edx)
but I believe the real instruction is
 66 89 72 50               mov    %si,0x50(%edx)

This would be consistent with what I observed in bug #4987, although I have no
idea how we would have jumped into the middle of an instruction.
Comment 10 Dave Kleikamp 2005-08-10 08:31:27 UTC
Okay, I completely misunderstood the end of the oops.  I thought the Code dump
started at EIP, but it starts 43 bytes before EIP.  ksymoops figures it out, but
since the oops output was mangled, I didn't catch it.

I finally figured out what was happening and this is the same as bug #4987.  A
patch can be found there.

*** This bug has been marked as a duplicate of 4987 ***