Bug 2936 - XFS internal error xfs_alloc_read_agf
Summary: XFS internal error xfs_alloc_read_agf
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: XFS Guru
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-22 13:59 UTC by Leonardo Marques de Souza
Modified: 2007-02-17 12:00 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.6
Tree: Mainline
Regression: ---


Attachments

Description Leonardo Marques de Souza 2004-06-22 13:59:29 UTC
Distribution: Debian Sarge 
Hardware Environment: 
- CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz stepping 09 
  CPU0: Intel P4/Xeon Extended MCE MSRs (12) available 
  Detected 2398.381 MHz processor. 
- hda: ST340014A, ATA DISK drive (Seagate 40G) 
  Using cfq io scheduler 
  hda: max request size: 1024KiB 
  hda: 78165360 sectors (40020 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100) 
- SGI XFS with ACLs, security attributes, realtime, no debug enabled 
  SGI XFS Quota Management subsystem 
- VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci0000:00:11.1 
 
Software Environment:  
- xfs mount 2.12-7 
- xfs_repair 2.6.11-1 
- rsync 2.6.2-1 
- gcc 4:3.3.4-1 
- binutils 2.14.90.0.7-8 
- glibc-2.3.2.ds1-13 
 
Problem Description: 
 This system runs long time (2 month) without any problem as Terminal Server. 
 When i needed to update programs on this machine, im install programs, deb 
packages, in another machine ,this have same hardware and kernel but with 
1.5GRam, and, after, i rsync all files (/lib/,/usr/,/var/, etc..), this process 
comsumpt a 15 min. 
 The problem occured in this process (rsyncing), and the kernel generate, in 
terminal, a lot of messages like this: 
 
....... 
[<c01af5b5>] xfs_alloc_read_agf+0xd7/0x1d1 
[<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe 
[<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe 
[<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe 
[<c01af7a5>] xfs_alloc_vextent+0xf6/0x37f 
[<c01ddd5a>] xfs_ialloc_ag_alloc+0x147/0x5d5 
[<c020743d>] pagebuf_get+0x159/0x181 
[<c01faf58>] xfs_trans_read_buf+0x243/0x312 
[<c01df92e>] xfs_ialloc_read_agi+0x7a/0x10d 
[<c01de59a>] xfs_dialloc+0x125/0x9f2 
[<c020715d>] _pagebuf_find+0x53/0x1af 
[<c01ee6b0>] xlog_grant_log_space+0x113/0x33c 
[<c01e519f>] xfs_ialloc+0x62/0x437 
[<c01fbfe7>] xfs_dir_ialloc+0x82/0x26e 
[<c01f95c0>] xfs_trans_reserve+0x7d/0x199 
[<c0200e6e>] xfs_create+0x279/0x6a0 
[<c01abcde>] xfs_acl_vhasacl_default+0x36/0x42 
[<c020af73>] linvfs_mknod+0x304/0x399 
[<c01cf04f>] xfs_dir2_lookup+0xfb/0xfd 
[<c020b5bb>] linvfs_setattr+0xfa/0x146 
[<c020b475>] linvfs_permission+0x0/0x13 
[<c020b484>] linvfs_permission+0xf/0x13 
[<c014d98e>] vfs_create+0x8d/0xf2 
[<c014df37>] open_namei+0x355/0x3a4 
[<c0141b8a>] filp_open+0x2d/0x4e 
[<c0141f2d>] sys_open+0x4d/0x78 
[<c0103b4d>] sysenter_past_esp+0x52/0x71 
......... (the same debug occurs a lot times with same exit) 
......... ( at some parts occurs another exit as bellow) 
[<c01af5b5>] xfs_alloc_read_agf+0xd7/0x1d1 
[<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe 
[<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe 
[<c01af1ef>] xfs_alloc_fix_freelist+0x37c/0x3fe 
[<c01138ae>] recalc_task_prio+0x8f/0x183 
[<c01139fe>] activate_task+0x5c/0x6f 
[<c01af7a5>] xfs_alloc_vextent+0xf6/0x37f 
[<c01ddd5a>] xfs_ialloc_ag_alloc+0x147/0x5d5 
[<c020743d>] pagebuf_get+0x159/0x181 
[<c01faf58>] xfs_trans_read_buf+0x243/0x312 
[<c01df92e>] xfs_ialloc_read_agi+0x7a/0x10d 
[<c01de59a>] xfs_dialloc+0x125/0x9f2 
[<c029f29c>] ip_rcv_finish+0x0/0x230 
[<c02921cc>] nf_hook_slow+0xbb/0x105 
[<c029f29c>] ip_rcv_finish+0x0/0x230 
[<c029f085>] ip_rcv+0x39d/0x43c 
[<c01ee6b0>] xlog_grant_log_space+0x113/0x33c 
[<c01e519f>] xfs_ialloc+0x62/0x437 
[<c01fbfe7>] xfs_dir_ialloc+0x82/0x26e 
[<c01f95c0>] xfs_trans_reserve+0x7d/0x199 
[<c0200e6e>] xfs_create+0x279/0x6a0 
[<c01abcde>] xfs_acl_vhasacl_default+0x36/0x42 
[<c020af73>] linvfs_mknod+0x304/0x399 
[<c012aa70>] file_read_actor+0x0/0xca 
[<c01cf04f>] xfs_dir2_lookup+0xfb/0xfd 
[<c020b475>] linvfs_permission+0x0/0x13 
[<c020b484>] linvfs_permission+0xf/0x13 
[<c014d98e>] vfs_create+0x8d/0xf2 
[<c014df37>] open_namei+0x355/0x3a4 
[<c0141b8a>] filp_open+0x2d/0x4e 
[<c0141f2d>] sys_open+0x4d/0x78 
[<c0103b4d>] sysenter_past_esp+0x52/0x71 
................ 
 
 After a reboot, kernel and lilo seem good, but when root filesystem needed to 
mount this error showed: 
XFS mounting filesystem hda1 
Starting XFS recovery on filesystem: hda1 (dev: hda1) 
[<c01b2d19>] xfs_alloc_read_agf+0xd7/0x1d1 
[<c01b29be>] xfs_alloc_fix_freelist+0x3e7/0x3fe 
[<c01b29be>] xfs_alloc_fix_freelist+0x3e7/0x3fe 
[<c01b29be>] xfs_alloc_fix_freelist+0x3e7/0x3fe 
[<c012d940>] buffered_rmqueue+0xc6/0x151 
[<c012dc84>] __alloc_pages+0x2b9/0x2f5 
[<c01f1e14>] xlog_grant_log_space+0x113/0x33c 
[<c01b321b>] xfs_free_extent+0x89/0xd4 
[<c0131345>] cache_alloc_refill+0x130/0x1c8 
[<c01f6b26>] xlog_recover_process_efi+0x167/0x1b6 
[<c01f6bc6>] xlog_recover_process_efis+0x51/0x53 
[<c01f7ff0>] xlog_recover_finish+0x1d/0xad 
[<c01f003d>] xfs_log_mount_finish+0x17/0x18 
[<c01f9700>] xfs_mountfs+0x818/0xea4 
[<c01f893a>] xfs_xlatesb+0x43/0x1d7 
[<c020b968>] xfs_setsize_buftarg+0x33/0x6b 
[<c020052f>] xfs_mount+0x2ce/0x53d 
[<c0210f6e>] vfs_mount+0x22/0x2a 
[<c0210ddc>] linvfs_fill_super+0x7e/0x1c9 
[<c021d58f>] snprintf+0x1f/0x27 
[<c016cbec>] disk_name+0x5c/0xa5 
[<c0147aeb>] get_sb_bdev+0xf9/0x124 
[<c0210f42>] linvfs_get_sb+0x1b/0x25 
[<c0210d5e>] linvfs_fill_super+0x0/0x1c9 
[<c0147ce4>] do_kern_mount+0x7a/0xeb 
[<c0158693>] do_add_mount+0x68/0x14a 
[<c0158975>] do_mount+0x14f/0x194 
[<c021e0ca>] __copy_from_user_ll+0x54/0x58 
[<c021e147>] copy_from_user+0x34/0x61 
[<c01587ce>] copy_mount_options+0x59/0xb1 
[<c0158ca3>] sys_mount+0x7a/0xb7 
[<c03c0c4e>] do_mount_root+0x27/0x98 
[<c03c0d08>] mount_block_root+0x49/0xf4 
[<c0100399>] init+0x0/0xf3 
[<c03c0ed3>] mount_devfs+0x2f/0x33 
[<c03c0dfb>] prepare_namespace+0x22/0xcb 
[<c0100399>] init+0x0/0xf3 
[<c0100399>] init+0x0/0xf3 
[<c0100399>] init+0x0/0xf3 
[<c0100487>] init+0xee/0xf3 
[<c0102244>] kernel_thread_helper+0x0/0xb 
[<c0102249>] kernel_thread_helper+0x5/0xb 
 
Ending XFS recovery on filesystem: hda1 (dev: hda1) 
VFS: Mounted root (xfs filesystem) readonly. 
Mounted devfs on /dev 
Freeing unused kernel memory: 160k freed 
[<c01b2d19>] xfs_alloc_read_agf+0xd7/0x1d1 
[<c01b2b45>] xfs_alloc_pagf_init+0x1f/0x3e 
[<c01b2b45>] xfs_alloc_pagf_init+0x1f/0x3e 
[<c01b2b45>] xfs_alloc_pagf_init+0x1f/0x3e 
[<c01e1a76>] xfs_ialloc_ag_select+0x12a/0x28d 
[<c01e2596>] xfs_dialloc+0x9bd/0x9f2 
[<c012a4fe>] find_or_create_page+0x1c/0x9f 
[<c012a205>] wake_up_page+0xe/0x2e 
[<c020a704>] _pagebuf_lookup_pages+0x1fe/0x2d9 
[<c01c323d>] xfs_bmap_search_extents+0x5c/0x71 
[<c020a92b>] _pagebuf_find+0xbd/0x1af 
[<c01f1e14>] xlog_grant_log_space+0x113/0x33c 
[<c01e8903>] xfs_ialloc+0x62/0x437 
[<c01ff74b>] xfs_dir_ialloc+0x82/0x26e 
[<c01fcd24>] xfs_trans_reserve+0x7d/0x199 
[<c02045d2>] xfs_create+0x279/0x6a0 
[<c01af442>] xfs_acl_vhasacl_default+0x36/0x42 
[<c020e6d7>] linvfs_mknod+0x304/0x399 
[<c01d74b3>] xfs_dir2_leaf_lookup+0x2b/0xbd 
[<c01d30b0>] xfs_dir2_isleaf+0x20/0x60 
[<c01d279d>] xfs_dir2_lookup+0xe5/0xfd 
[<c0104510>] common_interrupt+0x18/0x20 
[<c012b2c6>] filemap_nopage+0x1c8/0x2f4 
[<c020ebd9>] linvfs_permission+0x0/0x13 
[<c020ebe8>] linvfs_permission+0xf/0x13 
[<c014d99a>] vfs_create+0x8d/0xf2 
[<c014df43>] open_namei+0x355/0x3a4 
[<c0141b22>] filp_open+0x2d/0x4e 
[<c0141eb5>] sys_open+0x4d/0x78 
[<c0103b51>] sysenter_past_esp+0x52/0x71 
 
 Well, im boot this machine using a "cdbootable distribution", to restore this 
situation (this distribution have *2.4.26*). 
 
 Im try to restore using xfs_repair, but the tools stop in pass 2 and alert to 
try to "mount and umount to restore log or use -L to zero log" (some think like 
that) 
 After trying to mount (mount /dev/hda1 /mnt/restore) this _another version_ of 
kernel panic with this message: 
............. 
SGI XFS with realtime, no debug enabled 
SGI XFS Quota Management subsystem 
XFS mounting filesystem ide0(3,1) 
Starting XFS recovery on filesystem: ide0(3,1) (dev: ide0(3,1)) 
0x0: 58 41 47 46 00 00 00 01 00 00 00 0d 00 09 51 23 
Filesystem "ide0(3,1)": XFS internal error xfs_alloc_read_agf at line 2201 of 
file xfs_alloc.c.  Caller 0xf8ba94c4 
ef01fb98 f8bd3fc8 00000001 00000000 00000000 f8bd40bd f8c10584 00000001 
       ef753000 f8c1052e 00000899 f8ba94c4 ef753000 f8ba9c4f f8c10584 00000001 
       ef753000 eef31200 f8c1052e 00000899 f8ba94c4 ef753000 ef0dfc40 ef0dfc40 
Call Trace:    [<f8bd3fc8>] [<f8bd40bd>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] 
  [<f8ba9c4f>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] [<f8ba94c4>] [<c013630e>] 
  [<c013679a>] [<c0133bbc>] [<f8baa0b2>] [<f8be9b7b>] [<f8be9bf2>] [<f8beafa8>] 
  [<f8be3580>] [<f8becb7e>] [<f8c1b578>] [<f8bdfc2e>] [<f8bf3b46>] [<f8c035ad>] 
  [<f8c0329e>] [<c0142986>] [<c014336c>] [<f8c1be8c>] [<f8c1be8c>] [<c0155ba6>] 
  [<c014355c>] [<f8c1be8c>] [<c0156bd6>] [<c0156e5a>] [<c0156cd4>] [<c015722b>] 
  [<c0108997>] 
0x0: 58 41 47 46 00 00 00 01 00 00 00 0d 00 09 51 23 
Filesystem "ide0(3,1)": XFS internal error xfs_alloc_read_agf at line 2201 of 
file xfs_alloc.c.  Caller 0xf8ba94c4 
ef01fa88 f8bd3fc8 00000001 00000000 00000000 f8bd40bd f8c10584 00000001 
       ef753000 f8c1052e 00000899 f8ba94c4 ef753000 f8ba9c4f f8c10584 00000001 
       ef753000 eef31200 f8c1052e 00000899 f8ba94c4 ef753000 ef0df798 ef0df798 
Call Trace:    [<f8bd3fc8>] [<f8bd40bd>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] 
  [<f8ba9c4f>] [<f8c10584>] [<f8c1052e>] [<f8ba94c4>] [<f8ba94c4>] [<f8baa0b2>] 
  [<f8bb9aee>] [<f8bdcf07>] [<f8bf726b>] [<f8c1be00>] [<f8c03eda>] [<f8c02bf8>] 
  [<c0153712>] [<c01542c6>] [<f8be9f0d>] [<f8beafc7>] [<f8be3580>] [<f8becb7e>] 
  [<f8c1b578>] [<f8bdfc2e>] [<f8bf3b46>] [<f8c035ad>] [<f8c0329e>] [<c0142986>] 
  [<c014336c>] [<f8c1be8c>] [<f8c1be8c>] [<c0155ba6>] [<c014355c>] [<f8c1be8c>] 
  [<c0156bd6>] [<c0156e5a>] [<c0156cd4>] [<c015722b>] [<c0108997>] 
xfs_force_shutdown(ide0(3,1),0x8) called from line 4049 of file xfs_bmap.c.  
Return address = 0xf8c037f1 
Filesystem "ide0(3,1)": Corruption of in-memory data detected.  Shutting down 
filesystem: ide0(3,1) 
Please umount the filesystem, and rectify the problem(s) 
Ending XFS recovery on filesystem: ide0(3,1) (dev: ide0(3,1)) 
................ 
 
after im umount , and rmmod xfs module but this message ocurs: 
........... 
kmem_cache_destroy: Can't free all objects eeff4a28 
kmem_cache_destroy: Can't free all objects eeff4934 
............ 
after, again, try to modprobe xfs this ocurs: 
............ 
SGI XFS with realtime, no debug enabled 
kernel BUG at slab.c:815! 
invalid operand: 0000 
CPU:    0 
EIP:    0010:[<c01333db>]    Not tainted 
EFLAGS: 00010246 
eax: 00000000   ebx: eeff4eec   ecx: eeff4f58   edx: eeff4a94 
esi: eeff4a8d   edi: f8c14474   ebp: c0352e10   esp: ee55de84 
ds: 0018   es: 0018   ss: 0018 
Process modprobe (pid: 3052, stackpage=ee55d000) 
Stack: 00000000 00000000 ef0b2ea4 ffffffea eeff4f0c ee55dea0 00000004 00000064 
       f8c07219 f8c14467 00000104 00000010 00000000 00000000 00000000 f8bf3420 
       00000104 f8c14467 00000094 f8c1445a 00000010 f8c14450 00000150 f8c14443 
Call Trace:    [<f8c07219>] [<f8c14467>] [<f8bf3420>] [<f8c14467>] [<f8c1445a>] 
  [<f8c14450>] [<f8c14443>] [<f8c033f8>] [<c01367e8>] [<c0136809>] [<c011c89d>] 
  [<f8ba5060>] [<c0108997>] 
 
Code: 0f 0b 2f 03 a0 46 27 c0 8b 12 81 fa 4c ac 2b c0 75 d3 a1 4c 
.................. 
 
I see this message: 
 "Corruption of in-memory data detected." warned, 
so i changed machine to test (the another PentiumIV which same configuration), 
and the same problems ocurs, after running 2 days whith memtest86+ nothing was 
reported (no errors in memory). 
 This machine do backups too, which a lot of bz2 files, none of them appers 
corrupted. 
 
After try to repair i dumped 128MB off this bugged file system in a image. 
which dd if=/dev/hda1 of=xfs_bug.img bs=1024k count=100 
 i dont now if i did right thing, nor if is useful, but the image can be 
uploaded by me, just sant a email to request this image (64MBytes bzipped). 
 
 so, to repair i do xfs_repair -L /dev/hda1, this fix the problem. 
nothing in filesystem show corrupted after repair. 
 (i do rsync again, which -b --backup_dir=/tmp/ to see diferences, and nothing 
shows wrong) 
 
Steps to reproduce: 
 Im very sorry, but i can't reproduce, but after a lot overwrite which rsync 
the xfs filesystem make a "stable" bug, where i cant mount, or repair whichout 
zero the log.
Comment 1 Eric Sandeen 2004-06-23 09:05:03 UTC
Just to knock out a couple things...

> after im umount , and rmmod xfs module but this message ocurs: 
> ........... 
> kmem_cache_destroy: Can't free all objects eeff4a28 
> kmem_cache_destroy: Can't free all objects eeff4934 

looks like the forced shutdown path leaks some zone allocations; hard to know
for sure which zones from this info.  If it happens again, check /proc/slabinfo
to see which xfs zones are still there.

After that,

> after, again, try to modprobe xfs this ocurs: 
> ............ 
> SGI XFS with realtime, no debug enabled 
> kernel BUG at slab.c:815! 

this is expected, since the zones did not get cleaned up before.  A reboot
will take care of that problem.

The original traces - were those oopses?  Looks like you edited them a bit
too much.

Thanks for trying to save out the first part of the device; we'll need to look
and see if by chance that'll be useful.

in-memory corruption does not necessarily mean bad memory; it just means that
some in-memory variable was not as expected.  This could be due to any number
of reasons.
Comment 2 Leonardo Marques de Souza 2004-07-08 15:44:17 UTC
Hallo, 
  im sorry about time to response, but i solving anothers linux problem in 
another city. 
 
 Well, the problem occurs again, i try to identify what happen but the only 
thing i got is the lasts messages 
... xfs_alloc_read_agf+0xd7/0x1d1  
. 
... sysenter_past_esp+0x52/0x71  
this part repeat "forever" with same sequence and content every time a program 
try to access filesistem. 
  I forgot, but the kernel version is, now , 2.6.7. 
 
the image can be downloaded from 
 
http://www.ambientebrasil.com.br/download/xfs_bug.img.bz2 
 
( *64M* bytes of bzip2 image) 
 
 I dont know much about kernel (a guru of course), but a think the xfs messages 
are a small bug in "readfunctions" when trying to read a already damaged 
filesystem caused by "any bad writefunction". This write mistake is not 
reported,but only the wrong read access, and the system not realy freeze, only 
is putted in readonly mode. 
 
 I think if is a good choice to put debug in xfs in kernel compilation, its 
help? What i can do to trap mode messages as possible? I can try more time to 
debug is the "crash"  happens again. 
 
 
 
 
 
 
 
 
Comment 3 Leonardo Marques de Souza 2004-07-09 06:51:32 UTC
>> after, again, try to modprobe xfs this ocurs:  
>> ............  
>> SGI XFS with realtime, no debug enabled  
>> kernel BUG at slab.c:815!  
 
>this is expected, since the zones did not get cleaned up before.  A reboot 
>will take care of that problem. 
 
>The original traces - were those oopses?  Looks like you edited them a bit 
>too much. 
 
Sorry about few information, but this another "BUG" happens which kernel 2.4.26 
,becouse i used my debian rescue disk, which has this version, and i think this 
rescue has not kernel debugs set to on. 
 
Comment 4 Adrian Bunk 2006-12-07 07:48:10 UTC
Is this issue still present in kernel 2.6.19?
Comment 5 Adrian Bunk 2007-02-17 12:00:12 UTC
Please reopen this bug if it's still present with kernel 2.6.20.

Note You need to log in before you can comment on or make changes to this bug.