Bug 4987

Summary: Kernel OOPs while running Files System Stress on JFS
Product: File System Reporter: Sharyathi (sharyathi)
Component: JFSAssignee: Dave Kleikamp (shaggy)
Status: CLOSED CODE_FIX    
Severity: high CC: bnpoorni, maneesh, sglass
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-rc4-mm1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Analysis of dump taken through the kdump
Log buff out put from the dump
Fix race in txLock()

Description Sharyathi 2005-08-02 23:54:32 UTC
Distribution:
SLES 9 SP1
--------------------------------
Hardware Environment:
SMP, 2 way Xeon TM 2.8GHz, 1.5G RAM
Network Interface (Tigon3)
Disk I/O
SCSI storage controller: IBM server RAID
--------------------------------
Software Environment:
Linux x235b 2.6.13-rc4-mm1-II #1 PREEMPT Tue Aug 2 08:26:47 PDT 2005 i686 i686
i386 GNU/Linux
--------------------------------
Problem Description:
  Encountered this OOps message while Running FSstress on JFS file system

PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /kernel/crash_notes
Modules linked in:
CPU:    1
EIP:    0060:[<c024fbfc>]    Not tainted VLI
EFLAGS: 00010202   (2.6.13-rc4-mm1-I)
EIP is at txCommit+0x16c/0x300
eax: d0a6ab74   ebx: d0a6ab18   ecx: f3f5142c   edx: d2610108
esi: d0a6ad44   edi: 0004e1f6   ebp: eb57fe18   esp: eb57fda8
ds: 007b   es: 007b   ss: 0068
Process fsstress (pid: 16597, threadinfo=eb57e000 task=eb509030)
Stack: d0a6adfc eb57fdbc c04986c9 eb57fdbc 00000001 00000000 f883ef08 f7e2c180
       00000612 00000612 00000000 f7e2c180 f5e94800 00000001 eb57fe40 000ec454
       00000000 00000000 00000000 eb57fe18 c024ef8b 00000000 00000001 004b119a
Call Trace:
 [<c0103fcb>] show_stack+0xab/0xc0
 [<c0104164>] show_registers+0x164/0x1d0
 [<c01043a2>] die+0x122/0x1c0
 [<c04996dc>] do_page_fault+0x3dc/0x61d
 [<c0103c53>] error_code+0x4f/0x54
 [<c023044b>] jfs_commit_inode+0x10b/0x110
 [<c02304a4>] jfs_write_inode+0x54/0x70
 [<c018be94>] write_inode+0x44/0x50
 [<c018bfbf>] __sync_single_inode+0x11f/0x230
 [<c018c14f>] __writeback_single_inode+0x7f/0x150
 [<c018c408>] generic_sync_sb_inodes+0x1e8/0x300
 [<c018c54d>] sync_sb_inodes+0x2d/0x40
 [<c018c6a7>] sync_inodes_sb+0x77/0x90
 [<c018c7d7>] __sync_inodes+0xd7/0xf0
 [<c018c80e>] sync_inodes+0x1e/0x50
 [<c0167982>] do_sync+0x22/0x80
 [<c01679f2>] sys_sync+0x12/0x20
 [<c01030ff>] sysenter_past_esp+0x54/0x75
Code: 89 41 1a 66 c7 43 50 00 00 b8 78 6b 51 c0 66 c7 43 4e 00 00 e8 c6 8a 24 00
8d 86 30 fe ff ff 8b 9
Comment 1 Sharyathi 2005-08-03 00:00:20 UTC
Created attachment 5487 [details]
Analysis of dump taken through the kdump

I am attaching the preliminary analysis of the dump taken through kdump. Please
let us know if you need more information from the dump
Comment 2 Adrian Bunk 2005-08-03 04:19:30 UTC
*** Bug 4986 has been marked as a duplicate of this bug. ***
Comment 3 Dave Kleikamp 2005-08-03 08:17:49 UTC
It would be helpful to have the top of the oops message.  There is usually
useful information before the line, "PREEMPT SMP ...".

I'm not sure how far I can trust the oops output as to where it trapped.  The
code disassembles to:
00000000 <_EIP>:
Code;  c024fbfc No symbols available   <=====
   0:   89 41 1a                  mov    %eax,0x1a(%ecx)   <=====
Code;  c024fbff No symbols available
   3:   66 c7 43 50 00 00         movw   $0x0,0x50(%ebx)
Code;  c024fc05 No symbols available
   9:   b8 78 6b 51 c0            mov    $0xc0516b78,%eax
Code;  c024fc0a No symbols available
   e:   66 c7 43 4e 00 00         movw   $0x0,0x4e(%ebx)
Code;  c024fc10 No symbols available
  14:   e8 c6 8a 24 00            call   248adf <_EIP+0x248adf>

But, if I compile jfs_txnmgr.c myself, the first instruction should have a '66'
prefix, making it:
   0:   66 89 41 1a               mov    %ax,0x1a(%ecx)   <=====

Does this mean that the processor jumped into the middle of the instruction, or
is the oops processing in the kernel not quite accurate?

It would be helpful to see the full oops message, though.
Comment 4 Sharyathi 2005-08-03 22:31:31 UTC
Dave 
  I hit upon this bug again, But my console is not able to show me any more
information, above PREEMPT SMP, apart from Oops: 0002[#]
Here again I am attaching the complete Oops message. 
------------------------------------
Oops: 0002 [#]
PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /kernel/crash_notes
Modules linked in:
CPU:    1
EIP:    0060:[<c024fbfc>]    Not tainted VLI
EFLAGS: 00010202   (2.6.13-rc4-mm1-I)
EIP is at txCommit+0x16c/0x300
eax: dc862108   ebx: dc8620ac   ecx: f395c4ac   edx: c05e8a6c
esi: dc8622d8   edi: 0000a027   ebp: e5053b70   esp: e5053b00
ds: 007b   es: 007b   ss: 0068
Process dd (pid: 11719, threadinfo=e5052000 task=f54ff530)
Stack: d5238e00 e5053b14 c04986c9 e5053b14 00000001 00000000 f882bfe8 f37fda80
       0000030a 0000030a 00000000 f37fda80 f7e82200 00000001 e5053b98 00241037
       00000000 00000000 00000000 c5ab5ca0 00000000 00000000 00000000 00000000
Call Trace:
 [<c0103fcb>] show_stack+0xab/0xc0
 [<c0104164>] show_registers+0x164/0x1d0
 [<c01043a2>] die+0x122/0x1c0
 [<c04996dc>] do_page_fault+0x3dc/0x61d
 [<c0103c53>] error_code+0x4f/0x54
 [<c023044b>] jfs_commit_inode+0x10b/0x110
 [<c02304a4>] jfs_write_inode+0x54/0x70
 [<c018be94>] write_inode+0x44/0x50
 [<c018bfbf>] __sync_single_inode+0x11f/0x230
 [<c018c14f>] __writeback_single_inode+0x7f/0x150
 [<c018c408>] generic_sync_sb_inodes+0x1e8/0x300
 [<c018c54d>] sync_sb_inodes+0x2d/0x40
 [<c018c625>] writeback_inodes+0xc5/0xd0
 [<c014b738>] balance_dirty_pages+0x48/0x150
 [<c014b8bd>] balance_dirty_pages_ratelimited+0x7d/0xb0
 [<c0147204>] generic_file_buffered_write+0x2b4/0x680
 [<c01478ea>] __generic_file_aio_write_nolock+0x31a/0x580
 [<c0147c90>] __generic_file_write_nolock+0x90/0xb0
 [<c0147ea2>] generic_file_write+0x52/0xd0
 [<c0166258>] vfs_write+0xb8/0x190
 [<c01663fb>] sys_write+0x4b/0x80
 [<c01030ff>] sysenter_past_esp+0x54/0x75
Code: 89 41 1a 66 c7 43 50 00 00 b8 78 6b 51 c0 66 c7 43 4e 00 00 e8 c6 8a 24 00
8d 86 30 fe ff ff 8b 9
 Linux version 2.6.13-rc4-mm1-II (root@x235b) (gcc version 3.3.3 (SuSE Linux))
#1 PREEMPT Tue Aug 2 08:5
BIOS-provided physical RAM map:
## Secondary Kernel booting
------------------------------------
I have memory dump with me let me know if you nees any information from dump.
Comment 5 Maneesh Soni 2005-08-03 22:39:39 UTC
Sharyathi,

If you have the dump you could get the entire log_buf and could probably see the
missing line in oops message. Use "x/100s log_buf" to dump the log_buf from dump
in gdb.
Comment 6 Sharyathi 2005-08-03 23:12:20 UTC
Created attachment 5501 [details]
Log buff out put from the dump

This is the out put of log buffer from the dump I had taken.
Comment 7 Dave Kleikamp 2005-08-05 12:35:23 UTC
I was able to reproduce the problem.

        movzwl  78(%esi), %eax
        cmpw    $0, 26(%edx)
        movw    %ax, 24(%edx)
        jne     .L474
        movzwl  80(%esi), %eax
        movw    %ax, 26(%edx)

The instruction pointer is 1 byte into the above instruction, just past the 66
prefix that makes this a 16-bit store.  The contents of eax are c16fe7f0, which
should not be possible, cine the previous instruction should have zeroed the
high order 16 bits.  The only explanation, is that the processor took a bad
branch or return into the wrong place.

.L474:
        movw    $0, 80(%esi)
        movl    $jfsTxnLock, %eax
        movw    $0, 78(%esi)
        call    _spin_lock
        leal    -472(%ebx), %eax
        movl    -472(%ebx), %edx
Comment 8 Dave Kleikamp 2005-08-09 13:17:07 UTC
I recreated the oops using kgdb, and this time it shows me I'm at line 1246 of
jfs_txnmgr.c:
        list_del_init(&jfs_ip->anon_inode_list);

jfs_ip->anon-inode_list.prev looks like a valid address, but it can't be
accessed, so it looks like an inode was freed without being taken off the
anon_inode_list.  I should be able to figure it out from here.
Comment 9 Dave Kleikamp 2005-08-10 07:47:04 UTC
Created attachment 5583 [details]
Fix race in txLock()
Comment 10 Dave Kleikamp 2005-08-10 07:49:33 UTC
I believe this patch fixes the problem.  It took a while to hit the oops, so I'm
going to keep running for a while to make sure.
Comment 11 Dave Kleikamp 2005-08-10 08:31:33 UTC
*** Bug 4963 has been marked as a duplicate of this bug. ***