Distribution:WhiteBox Enterprise Linux 3.0 (RedHat Enterprise clone) Hardware Environment: IBM x330, dual Pentium III, 2 GB memory, qlogic qla2300 fibre channel connected disks Software Environment: NFS-exported filesystems are XFS. Problem Description: I just had a crash on my NFS server, reported in http://bugme.osdl.org/show_bug.cgi?id=2840 After booting the node up again I go lots of XFS internal errors on one of the filesystems: 0x0: 00 00 00 00 00 00 00 af 07 a8 00 28 00 30 00 20 Filesystem "sdb1": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c. Caller 0xc020a867 Call Trace: [<c020a3f2>] xfs_da_do_buf+0x5e2/0x9b0 [<c020a867>] xfs_da_read_buf+0x47/0x60 [<c020a867>] xfs_da_read_buf+0x47/0x60 [<c023c0e2>] xfs_trans_read_buf+0x2d2/0x330 [<c020a867>] xfs_da_read_buf+0x47/0x60 [<c020e412>] xfs_dir2_block_lookup_int+0x52/0x1a0 [<c020e412>] xfs_dir2_block_lookup_int+0x52/0x1a0 [<c0250033>] xfs_initialize_vnode+0x2d3/0x2e0 [<c0250c29>] vfs_init_vnode+0x39/0x40 [<c01fc80d>] xfs_bmap_last_offset+0xbd/0x120 [<c020e330>] xfs_dir2_block_lookup+0x20/0xb0 [<c020c91b>] xfs_dir2_lookup+0xab/0x110 [<c0221c37>] xfs_ilock+0x57/0x100 [<c023d1a8>] xfs_dir_lookup_int+0x38/0x100 [<c0221c37>] xfs_ilock+0x57/0x100 [<c02425f6>] xfs_lookup+0x66/0x90 [<c024ded4>] linvfs_lookup+0x64/0xa0 [<c015ecf9>] __lookup_hash+0x89/0xb0 [<c015ed80>] lookup_one_len+0x50/0x60 [<c01d2bba>] compose_entry_fh+0x5a/0x120 [<c01d3084>] encode_entry+0x404/0x510 [<c024bbf6>] linvfs_readdir+0x196/0x240 [<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50 [<c0151d6c>] open_private_file+0x1c/0x90 [<c0162949>] vfs_readdir+0x99/0xb0 [<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50 [<c01ca279>] nfsd_readdir+0x79/0xc0 [<c03c8aa6>] svcauth_unix_accept+0x286/0x2a0 [<c01cffe0>] nfsd3_proc_readdirplus+0xe0/0x1c0 [<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50 [<c01d21b0>] nfs3svc_decode_readdirplusargs+0x0/0x180 [<c01c493e>] nfsd_dispatch+0xbe/0x19c [<c03c785d>] svc_authenticate+0x4d/0x80 [<c03c4e92>] svc_process+0x4d2/0x5f9 [<c0119b40>] default_wake_function+0x0/0x10 [<c01c46f0>] nfsd+0x1b0/0x340 [<c01c4540>] nfsd+0x0/0x340 [<c0104b2d>] kernel_thread_helper+0x5/0x18 I then did a 'exportfs -a -u' and tried unmounting this sdb1 filesystem, but then the machine froze, and I got this oops: Unable to handle kernel paging request at virtual address 65000000 printing eip: c013c9a5 *pde = 00000000 Oops: 0002 [#1] SMP CPU: 0 EIP: 0060:[<c013c9a5>] Not tainted EFLAGS: 00010012 (2.6.6) EIP is at free_block+0x65/0xf0 eax: eadd60c0 ebx: eab6c000 ecx: eab6cd30 edx: 65000000 esi: f7d4dda0 edi: 00000010 ebp: f7d4ddb8 esp: f26a5e28 ds: 007b es: 007b ss: 0068 Process umount (pid: 3699, threadinfo=f26a5000 task=f28128e0) Stack: f7d4ddc8 0000001b f7d608c4 f7d608c4 00000292 eaba8ca0 f7d5f000 c013cb05 0000001b f7d608b4 f7d4dda0 f7d608b4 00000292 eaba8ca0 eaba9c40 c013cd79 eaba8ca0 00000001 00000001 c02456a5 eaba9c40 00000000 01af8f60 c0251006 Call Trace: [<c013cb05>] cache_flusharray+0xd5/0xe0 [<c013cd79>] kmem_cache_free+0x49/0x50 [<c02456a5>] xfs_finish_reclaim+0xe5/0x110 [<c0251006>] vn_reclaim+0x56/0x60 [<c02514e8>] vn_purge+0x138/0x160 [<c02421d4>] xfs_inactive+0xf4/0x4b0 [<c013ca11>] free_block+0xd1/0xf0 [<c0251668>] vn_remove+0x58/0x5a [<c025022f>] linvfs_clear_inode+0xf/0x20 [<c0168e74>] clear_inode+0xb4/0xd0 [<c0168ecc>] dispose_list+0x3c/0x80 [<c016905d>] invalidate_inodes+0x8d/0xb0 [<c0156eca>] generic_shutdown_super+0x8a/0x190 [<c0157a57>] kill_block_super+0x17/0x40 [<c0156d3b>] deactivate_super+0x6b/0xa0 [<c016c0cb>] sys_umount+0x3b/0x90 [<c014596a>] unmap_vma_list+0x1a/0x30 [<c016c137>] sys_oldumount+0x17/0x20 [<c010689f>] syscall_call+0x7/0xb Code: 89 02 8b 43 0c c7 03 00 01 10 00 31 d2 c7 43 04 00 02 20 00 After getting the node back up again, I ran xfs_repair, but everything looked OK: # xfs_repair /dev/sdb1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done Steps to reproduce:
Got a similar error yesterday during the nightly backup (tivoli storage manager).. Lots of repeated: 0x0: 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 Filesystem "sdb1": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c. Caller 0xc020dd57 [<c020d963>] xfs_da_do_buf+0x503/0x850 [<c020dd57>] xfs_da_read_buf+0x47/0x60 [<c020dd57>] xfs_da_read_buf+0x47/0x60 [<c0135e39>] find_lock_page+0x29/0xa0 [<c020dd57>] xfs_da_read_buf+0x47/0x60 [<c0211512>] xfs_dir2_block_lookup_int+0x52/0x1a0 [<c0211512>] xfs_dir2_block_lookup_int+0x52/0x1a0 [<c023d2ef>] xfs_trans_read_buf+0x29f/0x300 [<c020043d>] xfs_bmap_last_offset+0xbd/0x120 [<c0211430>] xfs_dir2_block_lookup+0x20/0xb0 [<c020fb7b>] xfs_dir2_lookup+0xab/0x110 [<c0251133>] xfs_initialize_vnode+0x2d3/0x2e0 [<c016d11c>] wake_up_inode+0xc/0x40 [<c023e288>] xfs_dir_lookup_int+0x38/0x100 [<c0223e67>] xfs_ilock+0x57/0x100 [<c0243556>] xfs_lookup+0x66/0x90 [<c025183b>] linvfs_get_parent+0x5b/0xb0 [<c0223990>] xfs_iget+0x140/0x180 [<c016a6db>] d_alloc+0x1b/0x180 [<c016a93c>] d_alloc_anon+0x4c/0x150 [<c02518f0>] linvfs_get_dentry+0x60/0x80 [<c01c7aa3>] find_exported_dentry+0x493/0x660 [<c036648f>] dev_queue_xmit+0x22f/0x2a0 [<c0382f50>] ip_finish_output2+0x0/0x186 [<c0382ff1>] ip_finish_output2+0xa1/0x186 [<c0382f50>] ip_finish_output2+0x0/0x186 [<c036f742>] nf_hook_slow+0xb2/0xf0 [<c0382f50>] ip_finish_output2+0x0/0x186 [<c0382f20>] dst_output+0x0/0x30 [<c0380a0c>] ip_finish_output+0x1dc/0x1f0 [<c0382f50>] ip_finish_output2+0x0/0x186 [<c0382f20>] dst_output+0x0/0x30 [<c0382f31>] dst_output+0x11/0x30 [<c036f742>] nf_hook_slow+0xb2/0xf0 [<c0382f20>] dst_output+0x0/0x30 [<c0382a89>] ip_push_pending_frames+0x419/0x470 [<c0382f20>] dst_output+0x0/0x30 [<c0118a1d>] find_busiest_group+0xfd/0x350 [<c01176f0>] recalc_task_prio+0x80/0x160 [<c011785a>] activate_task+0x8a/0xa0 [<c01c7fa0>] export_decode_fh+0x50/0x6c [<c01ca0d0>] nfsd_acceptable+0x0/0xf0 [<c01c7f50>] export_decode_fh+0x0/0x6c [<c01ca536>] fh_verify+0x376/0x560 [<c01ca0d0>] nfsd_acceptable+0x0/0xf0 [<c01d2d93>] nfsd3_proc_getattr+0x73/0xb0 [<c01d4820>] nfs3svc_decode_fhandle+0x0/0x90 [<c01c863e>] nfsd_dispatch+0xbe/0x19c [<c03cc4d2>] svc_process+0x4d2/0x5f9 [<c01195d0>] default_wake_function+0x0/0x10 [<c01c83f0>] nfsd+0x1b0/0x340 [<c01c8240>] nfsd+0x0/0x340 [<c0103fdd>] kernel_thread_helper+0x5/0x18 I now upgraded to 2.6.9-rc3 (mostly because of bug #3395). Haven't tried running xfs_repair.. Should I? Rerunning the backup now, to see if it will trigger it again.
Is this issue still present in kernel 2.6.19?
Please reopen this bug if it's still present with kernel 2.6.20.
I just suffered a major XFS filesystem failure with error messages similar to this one. http://oss.sgi.com/bugzilla/show_bug.cgi?id=741