Distribution: Debian Sarge Hardware Environment: Whitebox P4 2.4 Xeon server w/ 1G ECC Ram Tyan S2723 Motherboard 3ware 7506 IDE Raid card Software Environment: Plain Kernel 2.6.3 XFS Raid 5 LVM 1.0 Problem Description: A small file which is about 1.5M cannot be removed from /home. ls cannot list the file but find ./ can. Kernel error log generated as follows when every remove operated tried. Filesystem "dm-4": corrupt dinode 214000215, extent total = -1044130556, nblocks = 13962305913605067584. Unmount and run xfs_repair. 0x0: 49 4e 81 b4 01 02 00 01 00 00 03 ec 00 00 03 ec Filesystem "dm-4": XFS internal error xfs_iformat(1) at line 475 of file fs/xfs/xfs_inode.c. Caller 0xc01d4393 Call Trace: [<c01d2e91>] xfs_iformat+0x2fc/0x615 [<c01d4393>] xfs_iread+0x21d/0x266 [<c01d4393>] xfs_iread+0x21d/0x266 [<c01d4393>] xfs_iread+0x21d/0x266 [<c01d1783>] xfs_iget_core+0xf9/0x5a9 [<c01d1d8a>] xfs_iget+0x157/0x189 [<c01ede8a>] xfs_dir_lookup_int+0xb4/0x12b [<c01f37ef>] xfs_lookup+0x50/0x88 [<c01fff01>] linvfs_lookup+0x67/0x9f [<c015e5db>] real_lookup+0xcd/0xf0 [<c015e851>] do_lookup+0x96/0xa1 [<c015ece8>] link_path_walk+0x48c/0x8e0 [<c015f633>] __user_walk+0x49/0x5e [<c015a7bf>] vfs_lstat+0x1c/0x58 [<c015aeba>] sys_lstat64+0x1b/0x39 [<c015268d>] __fput+0xaa/0x101 [<c010a877>] syscall_call+0x7/0xb problem was fixed after umount /home and run xfs_repair I found this bug report on http://oss.sgi.com/bugzilla/show_bug.cgi?id=197 Looks like the same problem. Steps to reproduce:
xfs_repair result: # xfs_repair /dev/mapper/vg0-home Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - agno = 32 - agno = 33 - agno = 34 - agno = 35 - agno = 36 - agno = 37 - agno = 38 - agno = 39 - agno = 40 - agno = 41 - agno = 42 - agno = 43 - agno = 44 - agno = 45 - agno = 46 - agno = 47 - agno = 48 - agno = 49 - agno = 50 - agno = 51 bad non-zero extent size value 3250852672 for non-realtime inode 214000215, resetting to zero bad attr fork offset 19 in inode 214000215, should be 15 cleared inode 214000215 - agno = 52 - agno = 53 - agno = 54 - agno = 55 - agno = 56 - agno = 57 - agno = 58 - agno = 59 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - agno = 32 - agno = 33 - agno = 34 - agno = 35 - agno = 36 - agno = 37 - agno = 38 - agno = 39 - agno = 40 - agno = 41 - agno = 42 - agno = 43 - agno = 44 - agno = 45 - agno = 46 - agno = 47 - agno = 48 - agno = 49 - agno = 50 - agno = 51 entry "15.pdf" in shortform directory 214000198 references free inode 214000215 junking entry "15.pdf" in directory inode 214000198 - agno = 52 - agno = 53 - agno = 54 - agno = 55 - agno = 56 - agno = 57 - agno = 58 - agno = 59 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done
Same disaster happened again today. The following error message appears in kernel log. ========================================================================== Filesystem "lvm(58,5)": corrupt inode 167132010 ((a)extents = 623044872). Unmount and run xfs_repair. 0x0: 49 4e 81 ed 01 02 00 01 00 00 03 ea 00 00 03 ea Filesystem "lvm(58,5)": XFS internal error xfs_iformat_extents(1) at line 678 of file xfs_inode.c. Caller 0xc02004cb d7f6dd20 c0200a31 c034e6c7 00000001 f715d800 c034e613 000002a6 c02004cb c02004cb 03531db0 2522e908 c50af150 00000076 f0988a00 f715d800 8ac4c2b2 c02004cb c50af100 f0988a00 00000000 c50af22c c0200e2d c50af242 f0988a16 Call Trace: [<c0200a31>] [<c02004cb>] [<c02004cb>] [<c02004cb>] [<c0200e2d>] [<c02016b2>] [<c01feb94>] [<c01ff127>] [<c021b69c>] [<c0220d20>] [<c022b7af>] [<c014f1f2>] [<c014f990>] [<c014fcb9>] [<c0150019>] [<c014bf0f>] [<c010927f>] ============================================================================== Unmount the partition and do xfs_repair on it, seems ok by far. xfs_repair log is attactched below. tux:~# xfs_repair -v /dev/vg0/data Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... zero_log: head block 187092 tail block 187092 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 bad non-zero extent size value 2301803360 for non-realtime inode 167132010, resetting to zero bad attr fork offset 118 in inode 167132010, should be 15 cleared inode 167132010 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 entry "recover_1.3b-1_powerpc.deb" at block 0 offset 440 in directory inode 167120659 references free inode 167132010 clearing inode number in entry at offset 440... - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... rebuilding directory inode 167120659 - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done
PS: This time, the xfs error occured on the same system but with 2.4.25 kernel.
Is this problem still present in recent 2.6 kernels?
Looks more like a hardware issue than anything else. Nothing here to suggest an XFS problem anyway - XFS detected corruption ondisk each time and shutdown the filesystem... cheers.