Bug 2841 - XFS internal error xfs_da_do_buf + umount oops
Summary: XFS internal error xfs_da_do_buf + umount oops
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: XFS Guru
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-06 04:14 UTC by Jan-Frode Myklebust
Modified: 2007-03-05 20:53 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.8.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Jan-Frode Myklebust 2004-06-06 04:14:50 UTC
Distribution:WhiteBox Enterprise Linux 3.0 (RedHat Enterprise clone)
Hardware Environment: IBM x330, dual Pentium III, 2 GB memory, qlogic qla2300
fibre channel connected disks
Software Environment: NFS-exported filesystems are XFS.
Problem Description:

I just had a crash on my NFS server, reported in
http://bugme.osdl.org/show_bug.cgi?id=2840

After booting the node up again I go lots of XFS internal errors on one of the
filesystems:

0x0: 00 00 00 00 00 00 00 af 07 a8 00 28 00 30 00 20
Filesystem "sdb1": XFS internal error xfs_da_do_buf(2) at line 2273 of
file fs/xfs/xfs_da_btree.c.  Caller 0xc020a867
Call Trace:
 [<c020a3f2>] xfs_da_do_buf+0x5e2/0x9b0
 [<c020a867>] xfs_da_read_buf+0x47/0x60
 [<c020a867>] xfs_da_read_buf+0x47/0x60
 [<c023c0e2>] xfs_trans_read_buf+0x2d2/0x330
 [<c020a867>] xfs_da_read_buf+0x47/0x60
 [<c020e412>] xfs_dir2_block_lookup_int+0x52/0x1a0
 [<c020e412>] xfs_dir2_block_lookup_int+0x52/0x1a0
 [<c0250033>] xfs_initialize_vnode+0x2d3/0x2e0
 [<c0250c29>] vfs_init_vnode+0x39/0x40
 [<c01fc80d>] xfs_bmap_last_offset+0xbd/0x120
 [<c020e330>] xfs_dir2_block_lookup+0x20/0xb0
 [<c020c91b>] xfs_dir2_lookup+0xab/0x110
 [<c0221c37>] xfs_ilock+0x57/0x100
 [<c023d1a8>] xfs_dir_lookup_int+0x38/0x100
 [<c0221c37>] xfs_ilock+0x57/0x100
 [<c02425f6>] xfs_lookup+0x66/0x90
 [<c024ded4>] linvfs_lookup+0x64/0xa0
 [<c015ecf9>] __lookup_hash+0x89/0xb0
 [<c015ed80>] lookup_one_len+0x50/0x60
 [<c01d2bba>] compose_entry_fh+0x5a/0x120
 [<c01d3084>] encode_entry+0x404/0x510
 [<c024bbf6>] linvfs_readdir+0x196/0x240
 [<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50
 [<c0151d6c>] open_private_file+0x1c/0x90
 [<c0162949>] vfs_readdir+0x99/0xb0
 [<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50
 [<c01ca279>] nfsd_readdir+0x79/0xc0
 [<c03c8aa6>] svcauth_unix_accept+0x286/0x2a0
 [<c01cffe0>] nfsd3_proc_readdirplus+0xe0/0x1c0
 [<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50
 [<c01d21b0>] nfs3svc_decode_readdirplusargs+0x0/0x180
 [<c01c493e>] nfsd_dispatch+0xbe/0x19c
 [<c03c785d>] svc_authenticate+0x4d/0x80
 [<c03c4e92>] svc_process+0x4d2/0x5f9
 [<c0119b40>] default_wake_function+0x0/0x10
 [<c01c46f0>] nfsd+0x1b0/0x340
 [<c01c4540>] nfsd+0x0/0x340
 [<c0104b2d>] kernel_thread_helper+0x5/0x18

I then did a 'exportfs -a -u' and tried unmounting this sdb1
filesystem, but then the machine froze, and I got this oops:
                                                                                
Unable to handle kernel paging request at virtual address 65000000
 printing eip:
c013c9a5
*pde = 00000000
Oops: 0002 [#1]
SMP
CPU:    0
EIP:    0060:[<c013c9a5>]    Not tainted
EFLAGS: 00010012   (2.6.6)
EIP is at free_block+0x65/0xf0
eax: eadd60c0   ebx: eab6c000   ecx: eab6cd30   edx: 65000000
esi: f7d4dda0   edi: 00000010   ebp: f7d4ddb8   esp: f26a5e28
ds: 007b   es: 007b   ss: 0068
Process umount (pid: 3699, threadinfo=f26a5000 task=f28128e0)
Stack: f7d4ddc8 0000001b f7d608c4 f7d608c4 00000292 eaba8ca0 f7d5f000 c013cb05
       0000001b f7d608b4 f7d4dda0 f7d608b4 00000292 eaba8ca0 eaba9c40 c013cd79
       eaba8ca0 00000001 00000001 c02456a5 eaba9c40 00000000 01af8f60 c0251006
Call Trace:
 [<c013cb05>] cache_flusharray+0xd5/0xe0
 [<c013cd79>] kmem_cache_free+0x49/0x50
 [<c02456a5>] xfs_finish_reclaim+0xe5/0x110
 [<c0251006>] vn_reclaim+0x56/0x60
 [<c02514e8>] vn_purge+0x138/0x160
 [<c02421d4>] xfs_inactive+0xf4/0x4b0
 [<c013ca11>] free_block+0xd1/0xf0
 [<c0251668>] vn_remove+0x58/0x5a
 [<c025022f>] linvfs_clear_inode+0xf/0x20
 [<c0168e74>] clear_inode+0xb4/0xd0
 [<c0168ecc>] dispose_list+0x3c/0x80
 [<c016905d>] invalidate_inodes+0x8d/0xb0
 [<c0156eca>] generic_shutdown_super+0x8a/0x190
 [<c0157a57>] kill_block_super+0x17/0x40
 [<c0156d3b>] deactivate_super+0x6b/0xa0
 [<c016c0cb>] sys_umount+0x3b/0x90
 [<c014596a>] unmap_vma_list+0x1a/0x30
 [<c016c137>] sys_oldumount+0x17/0x20
 [<c010689f>] syscall_call+0x7/0xb
                                                                                
Code: 89 02 8b 43 0c c7 03 00 01 10 00 31 d2 c7 43 04 00 02 20 00

After getting the node back up again, I ran xfs_repair, but everything
looked OK:
                                                                                
# xfs_repair  /dev/sdb1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
        - traversal finished ...
        - traversing all unattached subtrees ...
        - traversals finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done





Steps to reproduce:
Comment 1 Jan-Frode Myklebust 2004-10-01 06:46:48 UTC
Got a similar error yesterday during the nightly backup (tivoli storage
manager).. Lots of repeated:

0x0: 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31
Filesystem "sdb1": XFS internal error xfs_da_do_buf(2) at line 2273 of file
fs/xfs/xfs_da_btree.c.  Caller 0xc020dd57
 [<c020d963>] xfs_da_do_buf+0x503/0x850
 [<c020dd57>] xfs_da_read_buf+0x47/0x60
 [<c020dd57>] xfs_da_read_buf+0x47/0x60
 [<c0135e39>] find_lock_page+0x29/0xa0
 [<c020dd57>] xfs_da_read_buf+0x47/0x60
 [<c0211512>] xfs_dir2_block_lookup_int+0x52/0x1a0
 [<c0211512>] xfs_dir2_block_lookup_int+0x52/0x1a0
 [<c023d2ef>] xfs_trans_read_buf+0x29f/0x300
 [<c020043d>] xfs_bmap_last_offset+0xbd/0x120
 [<c0211430>] xfs_dir2_block_lookup+0x20/0xb0
 [<c020fb7b>] xfs_dir2_lookup+0xab/0x110
 [<c0251133>] xfs_initialize_vnode+0x2d3/0x2e0
 [<c016d11c>] wake_up_inode+0xc/0x40
 [<c023e288>] xfs_dir_lookup_int+0x38/0x100
 [<c0223e67>] xfs_ilock+0x57/0x100
 [<c0243556>] xfs_lookup+0x66/0x90
 [<c025183b>] linvfs_get_parent+0x5b/0xb0
 [<c0223990>] xfs_iget+0x140/0x180
 [<c016a6db>] d_alloc+0x1b/0x180
 [<c016a93c>] d_alloc_anon+0x4c/0x150
 [<c02518f0>] linvfs_get_dentry+0x60/0x80
 [<c01c7aa3>] find_exported_dentry+0x493/0x660
 [<c036648f>] dev_queue_xmit+0x22f/0x2a0
 [<c0382f50>] ip_finish_output2+0x0/0x186
 [<c0382ff1>] ip_finish_output2+0xa1/0x186
 [<c0382f50>] ip_finish_output2+0x0/0x186
 [<c036f742>] nf_hook_slow+0xb2/0xf0
 [<c0382f50>] ip_finish_output2+0x0/0x186
 [<c0382f20>] dst_output+0x0/0x30
 [<c0380a0c>] ip_finish_output+0x1dc/0x1f0
 [<c0382f50>] ip_finish_output2+0x0/0x186
 [<c0382f20>] dst_output+0x0/0x30
 [<c0382f31>] dst_output+0x11/0x30
 [<c036f742>] nf_hook_slow+0xb2/0xf0
 [<c0382f20>] dst_output+0x0/0x30
 [<c0382a89>] ip_push_pending_frames+0x419/0x470
 [<c0382f20>] dst_output+0x0/0x30
 [<c0118a1d>] find_busiest_group+0xfd/0x350
 [<c01176f0>] recalc_task_prio+0x80/0x160
 [<c011785a>] activate_task+0x8a/0xa0
 [<c01c7fa0>] export_decode_fh+0x50/0x6c
 [<c01ca0d0>] nfsd_acceptable+0x0/0xf0
 [<c01c7f50>] export_decode_fh+0x0/0x6c
 [<c01ca536>] fh_verify+0x376/0x560
 [<c01ca0d0>] nfsd_acceptable+0x0/0xf0
 [<c01d2d93>] nfsd3_proc_getattr+0x73/0xb0
 [<c01d4820>] nfs3svc_decode_fhandle+0x0/0x90
 [<c01c863e>] nfsd_dispatch+0xbe/0x19c
 [<c03cc4d2>] svc_process+0x4d2/0x5f9
 [<c01195d0>] default_wake_function+0x0/0x10
 [<c01c83f0>] nfsd+0x1b0/0x340
 [<c01c8240>] nfsd+0x0/0x340
 [<c0103fdd>] kernel_thread_helper+0x5/0x18


I now upgraded to 2.6.9-rc3 (mostly because of bug #3395). Haven't tried running
xfs_repair.. Should I?

Rerunning the backup now, to see if it will trigger it again.
Comment 2 Adrian Bunk 2006-12-07 07:48:48 UTC
Is this issue still present in kernel 2.6.19?
Comment 3 Adrian Bunk 2007-02-17 11:58:05 UTC
Please reopen this bug if it's still present with kernel 2.6.20.
Comment 4 lazx888 2007-03-05 20:53:44 UTC
I just suffered a major XFS filesystem failure with error messages similar to
this one.

http://oss.sgi.com/bugzilla/show_bug.cgi?id=741

Note You need to log in before you can comment on or make changes to this bug.