Distribution: Debian Sarge Hardware Environment: - CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz stepping 09 CPU0: Intel P4/Xeon Extended MCE MSRs (12) available Detected 2398.381 MHz processor. - hda: ST340014A, ATA DISK drive (Seagate 40G) Using cfq io scheduler hda: max request size: 1024KiB hda: 78165360 sectors (40020 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100) - SGI XFS with ACLs, security attributes, realtime, no debug enabled SGI XFS Quota Management subsystem - VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci0000:00:11.1 Software Environment: - xfs mount 2.12-7 - xfs_repair 2.6.11-1 - rsync 2.6.2-1 - gcc 4:3.3.4-1 - binutils 2.14.90.0.7-8 - glibc-2.3.2.ds1-13 Problem Description: This system runs long time (2 month) without any problem as Terminal Server. When i needed to update programs on this machine, im install programs, deb packages, in another machine ,this have same hardware and kernel but with 1.5GRam, and, after, i rsync all files (/lib/,/usr/,/var/, etc..), this process comsumpt a 15 min. The problem occured in this process (rsyncing), and the kernel generate, in terminal, a lot of messages like this: ....... [<c01af5b5>] xfs_alloc_read_agf+0xd7/0x1d1 [<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe [<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe [<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe [<c01af7a5>] xfs_alloc_vextent+0xf6/0x37f [<c01ddd5a>] xfs_ialloc_ag_alloc+0x147/0x5d5 [<c020743d>] pagebuf_get+0x159/0x181 [<c01faf58>] xfs_trans_read_buf+0x243/0x312 [<c01df92e>] xfs_ialloc_read_agi+0x7a/0x10d [<c01de59a>] xfs_dialloc+0x125/0x9f2 [<c020715d>] _pagebuf_find+0x53/0x1af [<c01ee6b0>] xlog_grant_log_space+0x113/0x33c [<c01e519f>] xfs_ialloc+0x62/0x437 [<c01fbfe7>] xfs_dir_ialloc+0x82/0x26e [<c01f95c0>] xfs_trans_reserve+0x7d/0x199 [<c0200e6e>] xfs_create+0x279/0x6a0 [<c01abcde>] xfs_acl_vhasacl_default+0x36/0x42 [<c020af73>] linvfs_mknod+0x304/0x399 [<c01cf04f>] xfs_dir2_lookup+0xfb/0xfd [<c020b5bb>] linvfs_setattr+0xfa/0x146 [<c020b475>] linvfs_permission+0x0/0x13 [<c020b484>] linvfs_permission+0xf/0x13 [<c014d98e>] vfs_create+0x8d/0xf2 [<c014df37>] open_namei+0x355/0x3a4 [<c0141b8a>] filp_open+0x2d/0x4e [<c0141f2d>] sys_open+0x4d/0x78 [<c0103b4d>] sysenter_past_esp+0x52/0x71 ......... (the same debug occurs a lot times with same exit) ......... ( at some parts occurs another exit as bellow) [<c01af5b5>] xfs_alloc_read_agf+0xd7/0x1d1 [<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe [<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe [<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe [<c01138ae>] recalc_task_prio+0x8f/0x183 [<c01139fe>] activate_task+0x5c/0x6f [<c01af7a5>] xfs_alloc_vextent+0xf6/0x37f [<c01ddd5a>] xfs_ialloc_ag_alloc+0x147/0x5d5 [<c020743d>] pagebuf_get+0x159/0x181 [<c01faf58>] xfs_trans_read_buf+0x243/0x312 [<c01df92e>] xfs_ialloc_read_agi+0x7a/0x10d [<c01de59a>] xfs_dialloc+0x125/0x9f2 [<c029f29c>] ip_rcv_finish+0x0/0x230 [<c02921cc>] nf_hook_slow+0xbb/0x105 [<c029f29c>] ip_rcv_finish+0x0/0x230 [<c029f085>] ip_rcv+0x39d/0x43c [<c01ee6b0>] xlog_grant_log_space+0x113/0x33c [<c01e519f>] xfs_ialloc+0x62/0x437 [<c01fbfe7>] xfs_dir_ialloc+0x82/0x26e [<c01f95c0>] xfs_trans_reserve+0x7d/0x199 [<c0200e6e>] xfs_create+0x279/0x6a0 [<c01abcde>] xfs_acl_vhasacl_default+0x36/0x42 [<c020af73>] linvfs_mknod+0x304/0x399 [<c012aa70>] file_read_actor+0x0/0xca [<c01cf04f>] xfs_dir2_lookup+0xfb/0xfd [<c020b475>] linvfs_permission+0x0/0x13 [<c020b484>] linvfs_permission+0xf/0x13 [<c014d98e>] vfs_create+0x8d/0xf2 [<c014df37>] open_namei+0x355/0x3a4 [<c0141b8a>] filp_open+0x2d/0x4e [<c0141f2d>] sys_open+0x4d/0x78 [<c0103b4d>] sysenter_past_esp+0x52/0x71 ................ After a reboot, kernel and lilo seem good, but when root filesystem needed to mount this error showed: XFS mounting filesystem hda1 Starting XFS recovery on filesystem: hda1 (dev: hda1) [<c01b2d19>] xfs_alloc_read_agf+0xd7/0x1d1 [<c01b29be>] xfs_alloc_fix_freelist+0x3e7/0x3fe [<c01b29be>] xfs_alloc_fix_freelist+0x3e7/0x3fe [<c01b29be>] xfs_alloc_fix_freelist+0x3e7/0x3fe [<c012d940>] buffered_rmqueue+0xc6/0x151 [<c012dc84>] __alloc_pages+0x2b9/0x2f5 [<c01f1e14>] xlog_grant_log_space+0x113/0x33c [<c01b321b>] xfs_free_extent+0x89/0xd4 [<c0131345>] cache_alloc_refill+0x130/0x1c8 [<c01f6b26>] xlog_recover_process_efi+0x167/0x1b6 [<c01f6bc6>] xlog_recover_process_efis+0x51/0x53 [<c01f7ff0>] xlog_recover_finish+0x1d/0xad [<c01f003d>] xfs_log_mount_finish+0x17/0x18 [<c01f9700>] xfs_mountfs+0x818/0xea4 [<c01f893a>] xfs_xlatesb+0x43/0x1d7 [<c020b968>] xfs_setsize_buftarg+0x33/0x6b [<c020052f>] xfs_mount+0x2ce/0x53d [<c0210f6e>] vfs_mount+0x22/0x2a [<c0210ddc>] linvfs_fill_super+0x7e/0x1c9 [<c021d58f>] snprintf+0x1f/0x27 [<c016cbec>] disk_name+0x5c/0xa5 [<c0147aeb>] get_sb_bdev+0xf9/0x124 [<c0210f42>] linvfs_get_sb+0x1b/0x25 [<c0210d5e>] linvfs_fill_super+0x0/0x1c9 [<c0147ce4>] do_kern_mount+0x7a/0xeb [<c0158693>] do_add_mount+0x68/0x14a [<c0158975>] do_mount+0x14f/0x194 [<c021e0ca>] __copy_from_user_ll+0x54/0x58 [<c021e147>] copy_from_user+0x34/0x61 [<c01587ce>] copy_mount_options+0x59/0xb1 [<c0158ca3>] sys_mount+0x7a/0xb7 [<c03c0c4e>] do_mount_root+0x27/0x98 [<c03c0d08>] mount_block_root+0x49/0xf4 [<c0100399>] init+0x0/0xf3 [<c03c0ed3>] mount_devfs+0x2f/0x33 [<c03c0dfb>] prepare_namespace+0x22/0xcb [<c0100399>] init+0x0/0xf3 [<c0100399>] init+0x0/0xf3 [<c0100399>] init+0x0/0xf3 [<c0100487>] init+0xee/0xf3 [<c0102244>] kernel_thread_helper+0x0/0xb [<c0102249>] kernel_thread_helper+0x5/0xb Ending XFS recovery on filesystem: hda1 (dev: hda1) VFS: Mounted root (xfs filesystem) readonly. Mounted devfs on /dev Freeing unused kernel memory: 160k freed [<c01b2d19>] xfs_alloc_read_agf+0xd7/0x1d1 [<c01b2b45>] xfs_alloc_pagf_init+0x1f/0x3e [<c01b2b45>] xfs_alloc_pagf_init+0x1f/0x3e [<c01b2b45>] xfs_alloc_pagf_init+0x1f/0x3e [<c01e1a76>] xfs_ialloc_ag_select+0x12a/0x28d [<c01e2596>] xfs_dialloc+0x9bd/0x9f2 [<c012a4fe>] find_or_create_page+0x1c/0x9f [<c012a205>] wake_up_page+0xe/0x2e [<c020a704>] _pagebuf_lookup_pages+0x1fe/0x2d9 [<c01c323d>] xfs_bmap_search_extents+0x5c/0x71 [<c020a92b>] _pagebuf_find+0xbd/0x1af [<c01f1e14>] xlog_grant_log_space+0x113/0x33c [<c01e8903>] xfs_ialloc+0x62/0x437 [<c01ff74b>] xfs_dir_ialloc+0x82/0x26e [<c01fcd24>] xfs_trans_reserve+0x7d/0x199 [<c02045d2>] xfs_create+0x279/0x6a0 [<c01af442>] xfs_acl_vhasacl_default+0x36/0x42 [<c020e6d7>] linvfs_mknod+0x304/0x399 [<c01d74b3>] xfs_dir2_leaf_lookup+0x2b/0xbd [<c01d30b0>] xfs_dir2_isleaf+0x20/0x60 [<c01d279d>] xfs_dir2_lookup+0xe5/0xfd [<c0104510>] common_interrupt+0x18/0x20 [<c012b2c6>] filemap_nopage+0x1c8/0x2f4 [<c020ebd9>] linvfs_permission+0x0/0x13 [<c020ebe8>] linvfs_permission+0xf/0x13 [<c014d99a>] vfs_create+0x8d/0xf2 [<c014df43>] open_namei+0x355/0x3a4 [<c0141b22>] filp_open+0x2d/0x4e [<c0141eb5>] sys_open+0x4d/0x78 [<c0103b51>] sysenter_past_esp+0x52/0x71 Well, im boot this machine using a "cdbootable distribution", to restore this situation (this distribution have *2.4.26*). Im try to restore using xfs_repair, but the tools stop in pass 2 and alert to try to "mount and umount to restore log or use -L to zero log" (some think like that) After trying to mount (mount /dev/hda1 /mnt/restore) this _another version_ of kernel panic with this message: ............. SGI XFS with realtime, no debug enabled SGI XFS Quota Management subsystem XFS mounting filesystem ide0(3,1) Starting XFS recovery on filesystem: ide0(3,1) (dev: ide0(3,1)) 0x0: 58 41 47 46 00 00 00 01 00 00 00 0d 00 09 51 23 Filesystem "ide0(3,1)": XFS internal error xfs_alloc_read_agf at line 2201 of file xfs_alloc.c. Caller 0xf8ba94c4 ef01fb98 f8bd3fc8 00000001 00000000 00000000 f8bd40bd f8c10584 00000001 ef753000 f8c1052e 00000899 f8ba94c4 ef753000 f8ba9c4f f8c10584 00000001 ef753000 eef31200 f8c1052e 00000899 f8ba94c4 ef753000 ef0dfc40 ef0dfc40 Call Trace: [<f8bd3fc8>] [<f8bd40bd>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] [<f8ba9c4f>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] [<f8ba94c4>] [<c013630e>] [<c013679a>] [<c0133bbc>] [<f8baa0b2>] [<f8be9b7b>] [<f8be9bf2>] [<f8beafa8>] [<f8be3580>] [<f8becb7e>] [<f8c1b578>] [<f8bdfc2e>] [<f8bf3b46>] [<f8c035ad>] [<f8c0329e>] [<c0142986>] [<c014336c>] [<f8c1be8c>] [<f8c1be8c>] [<c0155ba6>] [<c014355c>] [<f8c1be8c>] [<c0156bd6>] [<c0156e5a>] [<c0156cd4>] [<c015722b>] [<c0108997>] 0x0: 58 41 47 46 00 00 00 01 00 00 00 0d 00 09 51 23 Filesystem "ide0(3,1)": XFS internal error xfs_alloc_read_agf at line 2201 of file xfs_alloc.c. Caller 0xf8ba94c4 ef01fa88 f8bd3fc8 00000001 00000000 00000000 f8bd40bd f8c10584 00000001 ef753000 f8c1052e 00000899 f8ba94c4 ef753000 f8ba9c4f f8c10584 00000001 ef753000 eef31200 f8c1052e 00000899 f8ba94c4 ef753000 ef0df798 ef0df798 Call Trace: [<f8bd3fc8>] [<f8bd40bd>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] [<f8ba9c4f>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] [<f8ba94c4>] [<f8baa0b2>] [<f8bb9aee>] [<f8bdcf07>] [<f8bf726b>] [<f8c1be00>] [<f8c03eda>] [<f8c02bf8>] [<c0153712>] [<c01542c6>] [<f8be9f0d>] [<f8beafc7>] [<f8be3580>] [<f8becb7e>] [<f8c1b578>] [<f8bdfc2e>] [<f8bf3b46>] [<f8c035ad>] [<f8c0329e>] [<c0142986>] [<c014336c>] [<f8c1be8c>] [<f8c1be8c>] [<c0155ba6>] [<c014355c>] [<f8c1be8c>] [<c0156bd6>] [<c0156e5a>] [<c0156cd4>] [<c015722b>] [<c0108997>] xfs_force_shutdown(ide0(3,1),0x8) called from line 4049 of file xfs_bmap.c. Return address = 0xf8c037f1 Filesystem "ide0(3,1)": Corruption of in-memory data detected. Shutting down filesystem: ide0(3,1) Please umount the filesystem, and rectify the problem(s) Ending XFS recovery on filesystem: ide0(3,1) (dev: ide0(3,1)) ................ after im umount , and rmmod xfs module but this message ocurs: ........... kmem_cache_destroy: Can't free all objects eeff4a28 kmem_cache_destroy: Can't free all objects eeff4934 ............ after, again, try to modprobe xfs this ocurs: ............ SGI XFS with realtime, no debug enabled kernel BUG at slab.c:815! invalid operand: 0000 CPU: 0 EIP: 0010:[<c01333db>] Not tainted EFLAGS: 00010246 eax: 00000000 ebx: eeff4eec ecx: eeff4f58 edx: eeff4a94 esi: eeff4a8d edi: f8c14474 ebp: c0352e10 esp: ee55de84 ds: 0018 es: 0018 ss: 0018 Process modprobe (pid: 3052, stackpage=ee55d000) Stack: 00000000 00000000 ef0b2ea4 ffffffea eeff4f0c ee55dea0 00000004 00000064 f8c07219 f8c14467 00000104 00000010 00000000 00000000 00000000 f8bf3420 00000104 f8c14467 00000094 f8c1445a 00000010 f8c14450 00000150 f8c14443 Call Trace: [<f8c07219>] [<f8c14467>] [<f8bf3420>] [<f8c14467>] [<f8c1445a>] [<f8c14450>] [<f8c14443>] [<f8c033f8>] [<c01367e8>] [<c0136809>] [<c011c89d>] [<f8ba5060>] [<c0108997>] Code: 0f 0b 2f 03 a0 46 27 c0 8b 12 81 fa 4c ac 2b c0 75 d3 a1 4c .................. I see this message: "Corruption of in-memory data detected." warned, so i changed machine to test (the another PentiumIV which same configuration), and the same problems ocurs, after running 2 days whith memtest86+ nothing was reported (no errors in memory). This machine do backups too, which a lot of bz2 files, none of them appers corrupted. After try to repair i dumped 128MB off this bugged file system in a image. which dd if=/dev/hda1 of=xfs_bug.img bs=1024k count=100 i dont now if i did right thing, nor if is useful, but the image can be uploaded by me, just sant a email to request this image (64MBytes bzipped). so, to repair i do xfs_repair -L /dev/hda1, this fix the problem. nothing in filesystem show corrupted after repair. (i do rsync again, which -b --backup_dir=/tmp/ to see diferences, and nothing shows wrong) Steps to reproduce: Im very sorry, but i can't reproduce, but after a lot overwrite which rsync the xfs filesystem make a "stable" bug, where i cant mount, or repair whichout zero the log.
Just to knock out a couple things... > after im umount , and rmmod xfs module but this message ocurs: > ........... > kmem_cache_destroy: Can't free all objects eeff4a28 > kmem_cache_destroy: Can't free all objects eeff4934 looks like the forced shutdown path leaks some zone allocations; hard to know for sure which zones from this info. If it happens again, check /proc/slabinfo to see which xfs zones are still there. After that, > after, again, try to modprobe xfs this ocurs: > ............ > SGI XFS with realtime, no debug enabled > kernel BUG at slab.c:815! this is expected, since the zones did not get cleaned up before. A reboot will take care of that problem. The original traces - were those oopses? Looks like you edited them a bit too much. Thanks for trying to save out the first part of the device; we'll need to look and see if by chance that'll be useful. in-memory corruption does not necessarily mean bad memory; it just means that some in-memory variable was not as expected. This could be due to any number of reasons.
Hallo, im sorry about time to response, but i solving anothers linux problem in another city. Well, the problem occurs again, i try to identify what happen but the only thing i got is the lasts messages ... xfs_alloc_read_agf+0xd7/0x1d1 . ... sysenter_past_esp+0x52/0x71 this part repeat "forever" with same sequence and content every time a program try to access filesistem. I forgot, but the kernel version is, now , 2.6.7. the image can be downloaded from http://www.ambientebrasil.com.br/download/xfs_bug.img.bz2 ( *64M* bytes of bzip2 image) I dont know much about kernel (a guru of course), but a think the xfs messages are a small bug in "readfunctions" when trying to read a already damaged filesystem caused by "any bad writefunction". This write mistake is not reported,but only the wrong read access, and the system not realy freeze, only is putted in readonly mode. I think if is a good choice to put debug in xfs in kernel compilation, its help? What i can do to trap mode messages as possible? I can try more time to debug is the "crash" happens again.
>> after, again, try to modprobe xfs this ocurs: >> ............ >> SGI XFS with realtime, no debug enabled >> kernel BUG at slab.c:815! >this is expected, since the zones did not get cleaned up before. A reboot >will take care of that problem. >The original traces - were those oopses? Looks like you edited them a bit >too much. Sorry about few information, but this another "BUG" happens which kernel 2.4.26 ,becouse i used my debian rescue disk, which has this version, and i think this rescue has not kernel debugs set to on.
Is this issue still present in kernel 2.6.19?
Please reopen this bug if it's still present with kernel 2.6.20.