Latest working kernel version: Unknown Earliest failing kernel version: 2.6.26-rc4-00168-gc3b25b3 Distribution: Debian sid Hardware Environment: Dell Inspiron 6400, CPU Intel Core Duo T2300 @ 1.66Ghz, 1G RAM, HDD 60G 5400 rpm Seagata ST96812as, ICH7 family chipset, 945GM graphics Software Environment: Linux 2.6.26-rc4-00168-gc3b25b3 Distribution: Debian sid(unstable) Kernel built with gcc 4.2 xfsprogs: latest from Debian (2.9.8-1) Filesystem information: 2 partitions: / and /var, both XFS Filesystem was created with lazy-count enabled Problem Description: I have run out of space on /, which is using an XFS filesystem. Now the filesystem is corrupt, and unmountable. I was running the following commands in a gnome terminal: dd if=/dev/zero of=xt bs=100M count=9& dd if=/dev/zero of=yt bs=100M count=9& rm xt; rm yt I have run out of space on / (got a message from gnome that it is more than 99% full). After that point I couldn't run any commands (not even cat and df), it said no such command. I guess the root filesystem got unmounted automatically. I switched to the console, but it was garbled, and switching back to X was impossible too. I have rebooted, and then I got a failure when mounting / filesystem (see below). This is with a 2.6.26-rc4 kernel (and since I cannot boot I can't try building a newer kernel). I also tried booting a 2.6.25-2 distro kernel, and I got a similar error message. I've got Fedora 8 installed on another partition, and I could attempt recovery from there. Before doing that is there any information you would need to analyze this problem? Running 'xfs_repair /dev/sda6' from Fedora shows: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs-repari. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. So should I go ahead and destroy the log, or is there anything inthere that you need to diagnose the problem? XFS: correcting sb_features alignment problem XFS mounting filesystem sda6 Starting XFS recovery on filesystem: sda6 (logdev: internal) 00000000: 58 41 47 46 00 00 00 01 00 00 00 04 00 02 54 29 XAGF..........T Filesystem "sda6": XFS internal error xfs_alloc_read_agf at line 2194 of file /var/local/src/linux-2.6/fs/xfs/xfs_alloc.c. Caller 0xc023b75a Pid: 2273, comm: mount Not tainted 2.6.26-rc4-00168-gc3b25b3 #26 [<c02615c7>] xfs_error_report+0x4e/0x50 [<c023b75a>] ? xfs_alloc_pagf_init+0x1e/0x3b [<c026160c>] xfs_corruption_error+0x43/0x4b [<c023b75a>] ? xfs_alloc_pagf_init+0x1e/0x3b [<c023b63c>] xfs_alloc_read_agf+0xbb/0x1bb [<c023b75a>] ? xfs_alloc_pagf_init+0x1e/0x3b [<c023b75a>] xfs_alloc_pagf_init+0x1e/0x3b [<c027bf88>] xfs_initialize_perag_data+0xce/0x16a [<c027c52a>] xfs_mountfs+0x487/0x69c [<c02b08a2>] ? _atomic_dec_and_lock+0x46/0x64 [<c0288f3c>] ? kmem_zalloc+0xc/0x30 [<c027d144>] ? xfs_mru_cache_create+0xdc/0x107 [<c0283287>] xfs_mount+0x2f9/0x342 [<c029252a>] xfs_fs_fill_super+0xa8/0x1eb [<c0283287>] xfs_mount+0x2f9/0x342 [<c029252a>] xfs_fs_fill_super+0xa9/0x1eb [<c017f7f6>] get_sb_bdev+0xea/0x114 [<c02b139f>] ? idr_pre_get+0x1a/0x44 [<c0291382>] xfs_fs_get_sb+0x21/0x27 [<c0292482>] ? xfs_fs_fill_super+0x0/0x1eb [<c017f465>] vfs_kern_mount+0x59/0x117 [<c017f56d>] do_kern_mount+0x33/0xbd [<c0194446>] do_new_mount+0x59/0x77 [<c0195238>] do_mount+0x1ce/0x1e4 [<c0427bfa>] ? error_code+0x72/0x78 [<c015007b>] ? acct_file_reopen+0x2/0xf8 [<c042b7a4>] ? iret_exc+0x418/0x980 [<c01952ce>] sys_mount+0x80/0xb2 [<c0103c72>] syscall_call+0x7/0xb [<c0420000>] ? detect_ht+0x7e/0x13b mount: Structure needs cleaning Begin: Running /scripts/local-bottom ... Done. Done. Begin: Running /scripts/init-bottom ... Done. mount: No such file or directory mount: No such file or directory Target filesystem doesn't have /sbin/init. No init found. Try passing init= bootarg. BusyBox v1.9.2 (Debian 1:1.9.2-3) built-in shell (ash) Enter 'help' for a list of built-in commands. /bin/sh: can't access tty; job control turned off (initramfs) The error message from 2.6.25-2 distro kernel is similar: Filesystem "sda6": XFS internal error xfs_alloc_read_agf at line 2195 of file fs/xfs/xfs_alloc.c. Caller 0xf8a01801 [<...>] xfs_alloc_read_agf+0x129/0x1a6 [xfs] [<...>] xfs_alloc_pagf_init+0x15/0x31 [xfs] [<...>] xfs_alloc_pagf_init+0x15/0x31 [xfs] [<...>] xfs_alloc_pagf_init+0x15/0x31 [xfs] [<...>] xfs_ialloc_pagi_init+0x2d/0x33 [xfs] [<...>] xfs_initialize_perag_data+0x69/0x140 [xfs] [<...>] xfs_mountfs+0x34a/0x5e3 [xfs] [<...>] kmem_alloc+0x53/0xa8 [xfs] [<...>] default_wake_function+0x0/0x8 ....... Steps to reproduce: [steps I have done when problem happened, since the system doesn't boot I can't try if the same sequence of steps reproduces the problem] Have a / filesystem with XFS, with ~800M free space. Run these commands: dd if=/dev/zero of=xt bs=100M count=9& dd if=/dev/zero of=yt bs=100M count=9& rm xt; rm yt Wait till filesystem is full Root filesystem is unaccesible/unmountable If you need additional information, please ask.
Mount has encountered a corrupt AGF. You will have to destroy the log and run repair so this filesystem can be mounted again. Fortunately the panic occured after the log was replayed so the damage should be minimised. But before you do that could you do the following? # xfs_db /dev/sda6 xfs_db> agf 0 xfs_db> print magicnum = 0x58414746 versionnum = 1 ... xfs_db> agf 1 xfs_db> print magicnum = 0x58414746 versionnum = 1 ... and keep doing that for each AG in the filesystem and post the output. There shouldn't be too many AGs (probably about 4). And if possible run xfs_metadump on this filesystem.
(In reply to comment #1) > Mount has encountered a corrupt AGF. You will have to destroy the log and > run > repair so this filesystem can be mounted again. Fortunately the panic > occured > after the log was replayed so the damage should be minimised. But before you > do > that could you do the following? > I had 15 agf, here is the output: # xfs_db /dev/sda6 xfs_db: cannot init perag data (117) xfs_db> agf 0 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 0 length = 152617 bnoroot = 21179 cntroot = 2977 bnolevel = 2 cntlevel = 2 flfirst = 13 fllast = 18 flcount = 6 freeblks = 1809 longest = 10 btreeblks = 4 xfs_db> agf 1 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 1 length = 152617 bnoroot = 170 cntroot = 648 bnolevel = 1 cntlevel = 1 flfirst = 126 fllast = 1 flcount = 4 freeblks = 1360 longest = 10 btreeblks = 0 xfs_db> agf 2 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 2 length = 152617 bnoroot = 4350 cntroot = 4490 bnolevel = 1 cntlevel = 1 flfirst = 108 fllast = 111 flcount = 4 freeblks = 1410 longest = 10 btreeblks = 0 xfs_db> agf 3 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 3 length = 152617 bnoroot = 915 cntroot = 1677 bnolevel = 1 cntlevel = 1 flfirst = 118 fllast = 121 flcount = 4 freeblks = 1640 longest = 10 btreeblks = 0 xfs_db> agf 4 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 4 length = 152617 bnoroot = 2582 cntroot = 3055 bnolevel = 2 cntlevel = 2 flfirst = 124 fllast = 1 flcount = 6 freeblks = 4294967292 longest = 11 btreeblks = 4 xfs_db> agf 5 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 5 length = 152617 bnoroot = 169 cntroot = 180 bnolevel = 1 cntlevel = 1 flfirst = 85 fllast = 88 flcount = 4 freeblks = 860 longest = 10 btreeblks = 0 xfs_db> agf 6 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 6 length = 152617 bnoroot = 315 cntroot = 1213 bnolevel = 1 cntlevel = 1 flfirst = 60 fllast = 63 flcount = 4 freeblks = 998 longest = 11 btreeblks = 0 xfs_db> agf 7 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 7 length = 152617 bnoroot = 699 cntroot = 753 bnolevel = 1 cntlevel = 1 flfirst = 66 fllast = 69 flcount = 4 freeblks = 849 longest = 10 btreeblks = 0 xfs_db> agf 8 pxfs_db> rint magicnum = 0x58414746 versionnum = 1 seqno = 8 length = 152617 bnoroot = 12543 cntroot = 12545 bnolevel = 2 cntlevel = 2 flfirst = 34 fllast = 39 flcount = 6 freeblks = 1696 longest = 12 btreeblks = 4 xfs_db> agf 9 pxfs_db> rint magicnum = 0x58414746 versionnum = 1 seqno = 9 length = 152617 bnoroot = 18838 cntroot = 14324 bnolevel = 2 cntlevel = 2 flfirst = 124 fllast = 1 flcount = 6 freeblks = 4048 longest = 12 btreeblks = 4 xfs_db> agf 10 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 10 length = 152617 bnoroot = 2102 cntroot = 3239 bnolevel = 1 cntlevel = 1 flfirst = 7 fllast = 10 flcount = 4 freeblks = 1384 longest = 10 btreeblks = 0 xfs_db> agf 11 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 11 length = 152617 bnoroot = 465 cntroot = 1211 bnolevel = 1 cntlevel = 1 flfirst = 112 fllast = 115 flcount = 4 freeblks = 433 longest = 10 btreeblks = 0 xfs_db> agf 12 prxfs_db> int magicnum = 0x58414746 versionnum = 1 seqno = 12 length = 152617 bnoroot = 139 cntroot = 265 bnolevel = 2 cntlevel = 2 flfirst = 55 fllast = 60 flcount = 6 freeblks = 3203 longest = 10 btreeblks = 5 xfs_db> agf 13 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 13 length = 152617 bnoroot = 29595 cntroot = 218 bnolevel = 2 cntlevel = 2 flfirst = 59 fllast = 64 flcount = 6 freeblks = 2018 longest = 10 btreeblks = 4 xfs_db> agf 14 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 14 length = 152617 bnoroot = 2813 cntroot = 4492 bnolevel = 1 cntlevel = 1 flfirst = 89 fllast = 92 flcount = 4 freeblks = 499 longest = 10 btreeblks = 0 xfs_db> agf 15 xfs_db> print magicnum = 0x58414746 versionnum = 1 seqno = 15 length = 152617 bnoroot = 76 cntroot = 145 bnolevel = 2 cntlevel = 2 flfirst = 40 fllast = 45 flcount = 6 freeblks = 2482 longest = 10 btreeblks = 4 xfs_db> agf 16 bad allocation group number 16 Here is the metadump of the filesystem (19M bzip2 compressed): http://www.hotlinkfiles.com/files/1499548_on9su/metadump_1.bz2 I have run xfs_repair -L /dev/sda6, and mounted the filesystem. Here is xfs_info: meta-data=/dev/sda6 isize=256 agcount=16, agsize=152617 blks = sectsz=512 attr=2 data = bsize=4096 blocks=2441872, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=65536 blocks=0, rtextents=0 Is there any other information you'd need?
AGF 4 has a bogus free block count of 4294967292 (should be less than 152617). Thanks for the metadump, I'll pull it down and see what else I can dig out of it.
Closing as obsolete, please re-open if this is incorrect