Bug 10421
Summary: | Filesystem failing without any warning or etc | ||
---|---|---|---|
Product: | File System | Reporter: | Denys Fedoryshchenko (nuclearcat) |
Component: | XFS | Assignee: | XFS Guru (xfs-masters) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | bunk, jeffrey.hundstad |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25-rc8-git1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9832 |
Description
Denys Fedoryshchenko
2008-04-07 23:11:22 UTC
What's the error messages and stack trace prior to the xfs_force_shutdown message? i.e. why did the filesystem shutdown? Were there any I/O errors? If there is no message, please set /proc/sys/fs/xfs/error_level to 11 so the next time it occurs there is a stack trace and error message prior to the shutdown. Also, can you post the output of 'xfs_info <mntpt>' for the three filesystems in question (need to do while they are mounted and not shut down). The output of lspci would also be handy so we know what hardware you are actually using.... Cheers, Dave. I will set error_level and will wait till night. SUPERPROXY ~ # xfs_info /cache1 meta-data=/dev/sdc2 isize=256 agcount=16, agsize=869894 blks = sectsz=512 attr=0 data = bsize=4096 blocks=13918304, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=65536 blocks=0, rtextents=0 SUPERPROXY ~ # xfs_info /cache2 meta-data=/dev/sdd1 isize=256 agcount=16, agsize=1120031 blks = sectsz=512 attr=0 data = bsize=4096 blocks=17920496, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=65536 blocks=0, rtextents=0 SUPERPROXY ~ # xfs_info /cache3 meta-data=/dev/sde1 isize=256 agcount=16, agsize=1120031 blks = sectsz=512 attr=0 data = bsize=4096 blocks=17920496, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=65536 blocks=0, rtextents=0 SUPERPROXY ~ # xfs_info /cache4 meta-data=/dev/sdf1 isize=256 agcount=16, agsize=1120031 blks = sectsz=512 attr=0 data = bsize=4096 blocks=17920496, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=65536 blocks=0, rtextents=0 SUPERPROXY ~ # lspci 00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13) 00:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 00:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13) 00:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) 01:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) 01:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) 01:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03) 02:03.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 02) 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 03:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 04:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13) 04:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 04:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13) 04:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) Again FS crashed, and no new information appeared in log, even with debug set to 11 [305409.607467] xfs_force_shutdown(sdd1,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xf8b79dd2 Thats all what i have. SUPERPROXY /proc/sys/fs/xfs # xfs_repair -v /dev/sdd1 Phase 1 - find and verify superblock... - block cache size set to 376368 entries Phase 2 - using internal log - zero log... zero_log: head block 60243 tail block 60243 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 134763204, moving to lost+found Phase 7 - verify and correct link counts... XFS_REPAIR Summary Tue Apr 8 23:53:00 2008 Phase Start End Duration Phase 1: 04/08 23:52:23 04/08 23:52:24 1 second Phase 2: 04/08 23:52:24 04/08 23:52:31 7 seconds Phase 3: 04/08 23:52:31 04/08 23:52:57 26 seconds Phase 4: 04/08 23:52:57 04/08 23:52:57 Phase 5: 04/08 23:52:57 04/08 23:52:57 Phase 6: 04/08 23:52:57 04/08 23:52:57 Phase 7: 04/08 23:52:57 04/08 23:52:57 Total run time: 34 seconds done SUPERPROXY /proc/sys/fs/xfs # /bin/mount -o noatime,nodiratime,logbufs=8 -L CACHE2 /cache2 SUPERPROXY /cache2/lost+found # ls -la drwxr-xr-x 2 root root 22 Apr 8 23:52 . drwxr-xr-x 4 squid root 35 Apr 6 2007 .. -rw-r----- 1 squid squid 0 Apr 2 23:40 134763204 SUPERPROXY /proc/sys/fs/xfs # grep "" * age_buffer_centisecs:1500 error_level:11 filestream_centisecs:3000 inherit_noatime:1 inherit_nodefrag:1 inherit_nodump:1 inherit_nosymlinks:0 inherit_sync:1 irix_sgid_inherit:0 irix_symlink_mode:0 panic_mask:0 restrict_chown:1 rotorstep:1 stats_clear:0 xfsbufd_centisecs:100 xfssyncd_centisecs:3000 > xfs_force_shutdown(sdd1,0x1) called from line 420 of file
> fs/xfs/xfs_rw.c. Return address = 0xf8b79dd2
(sdd1,0x1) means that the shutdown flags were 0x1:
#define SHUTDOWN_META_IO_ERROR 0x0001 /* write attempt to metadata failed */
There are a total of 5 places this can come from. From:
- xfs_trans_read_buf() we'd see I/O error alerts in the log prior
to the shutdown
- xfs_inactive() we'd see an error saying 'xfs_ifree returned an error'
- xfs_bwrite() won't log an extra error
- xfs_buf_error_relse() won't log an extra error
- xfs_bmap_finish() won't log an extra error
The three that don't log extra errors are a result of I/O errors or
object validation errors. Object validation errors typicaly log lots
more errors and stack dumps, though, so I'm tending towards a new
set of I/O errors coming from below XFS at this point....
What I really need to know is what function is at the return address
0xf8b79dd2. 'ksymoops -A 0xf8b79dd2' or something similar should give us
the symbol....
Also, can you run xfs_check on the broken filesystem before you run
xfs_repair so we can see all the errors on disk as a result of the shutdown.
SUPERPROXY /proc # cat /proc/kallsyms |grep f8b79d f8b79ddd t xfs_write_sync_logforce [xfs] f8b79d37 t xfs_bwrite [xfs] So it looks like xfs_bwrite. I got a little bit more this night. Apr 9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_inotobp: xfs_imap() returned an error 22 on sde1. Returning error. Apr 9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_iunlink_remove: xfs_inotobp() returned an error 22 on sde1. Returning error. Apr 9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_inactive: xfs_ifree() returned an error = 22 on sde1 Apr 9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_force_shutdown(sde1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c. Return address = 0xf8b797d7 Apr 9 03:50:53 194.146.153.28 kernel: [340245.845721] Filesystem "sde1": I/O Error Detected. Shutting down filesystem: sde1 Apr 9 03:50:53 194.146.153.28 kernel: [340245.845721] Please umount the filesystem, and rectify the problem(s) Then when i notice that and unmount: Apr 9 08:54:32 194.146.153.28 kernel: [334261.479051] xfs_force_shutdown(sde1,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xf8b79dd2 I wonder if turning on the debugging (all the #ifdef DEBUG) in xfs_dilocate would be useful to get more info about the apparent xfs_inotobp->xfs_imap->xfs_dilocate failure? (if it's possible for you to do this...) During last 2 days i dont have such crashes anymore on both hosts. One (with only 2 XFS mounts) crashed 2-3 times and then not crashing anymore. Another with 4 mounted XFS filesystems crashed more times, and now not crashing anymore. I will wait more, if it will not appear during 1-2 weeks on last git, i will close bug. From comment #5, that error path indicates an invalid inode number in the unlinked list. i.e. the inode being freed is being inserted into the unlinked list, but the current inode the list points to is non-existent. Given that the problem is not consistent and there is no corruption on disk, this smells of a memory corruption of some kind. I think that turning on all the memory debugging and poisoning would be a good idea. Also if it happens again and xfs_repair puts a disconnected inode into lost+found, can you dump the inode contents for us using: # xfs_db -r -c 'inode <inode #>' -c p <dev> The inode # is the name of the inode in lost+found. Most probably it can't be hardware corruption, i have 6 Sun servers, and only 2 of them who is upgraded to 2.6.25-rcX having this issues. I will wait more if issue will happen again. I just had more dmesg messages while upgrading to RC9, but without debug and didn't had time to run xfs_db. But maybe will help somehow. I did now debug and if will catch issue in time will run also xfs_db. Here dmesg: [ 317.410931] xfs_inotobp: xfs_imap() returned an error 22 on sde1. Returning error. [ 317.410931] xfs_iunlink_remove: xfs_inotobp() returned an error 22 on sde1. Returning error. [ 317.410931] xfs_inactive: xfs_ifree() returned an error = 22 on sde1 [ 317.410931] xfs_force_shutdown(sde1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c. Return address = 0xf8b7985f [ 317.555550] Filesystem "sde1": I/O Error Detected. Shutting down filesystem: sde1 [ 317.555731] Please umount the filesystem, and rectify the problem(s) [ 318.724694] xfs_inotobp: xfs_imap() returned an error 22 on sdd1. Returning error. [ 318.724702] xfs_iunlink_remove: xfs_inotobp() returned an error 22 on sdd1. Returning error. [ 318.724706] xfs_inactive: xfs_ifree() returned an error = 22 on sdd1 [ 318.724712] xfs_force_shutdown(sdd1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c. Return address = 0xf8b7985f [ 318.791361] xfs_difree: xfs_inobt_lookup_le returned() an error 5 on sdd1. Returning error. [ 318.896324] xfs_difree: xfs_inobt_lookup_le returned() an error 5 on sdd1. Returning error. [ 319.183699] xfs_iunlink_remove: xfs_itobp() returned an error 5 on sdd1. Returning error. [ 319.183743] xfs_iunlink_remove: xfs_trans_read_buf() returned an error 5 on sdd1. Returning error. [ 319.228470] Filesystem "sdd1": I/O Error Detected. Shutting down filesystem: sdd1 [ 319.228690] Please umount the filesystem, and rectify the problem(s) [ 1110.111013] xfs_force_shutdown(sde1,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xf8b79e5a [ 1120.966164] xfs_force_shutdown(sdd1,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xf8b79e5a I catch it. Some memory debug enabled (R/O protection and VM debug), others i cannot enable, cause it will cause major slowdown. Here is debug enabled in specified function [18882.132386] Filesystem "sde1": xfs_dilocate: agno (127) >= mp->m_sb.sb_agcount (16) [18882.132561] Filesystem "sde1": xfs_dilocate: agbno (0x1fffff) >= mp->m_sb.sb_agblocks (0x11171f) [18882.132900] Pid: 1987, comm: squid Not tainted 2.6.25-rc9-build-0027 #6 [18882.132924] [<f8b61487>] xfs_dilocate+0x163/0x400 [xfs] [18882.133139] [<f8b673b9>] xfs_imap+0x12c/0x1d9 [xfs] [18882.133230] [<c013b358>] ? mark_held_locks+0x41/0x5c [18882.133501] [<c013b4d3>] ? trace_hardirqs_on+0xe1/0x102 [18882.133541] [<f8b67bf6>] xfs_inotobp+0x31/0x1b1 [xfs] [18882.133611] [<f8b85f0c>] ? xfs_buf_read_flags+0x14/0x78 [xfs] [18882.133728] [<f8b7b8b2>] ? xfs_trans_read_buf+0x25f/0x29d [xfs] [18882.133774] [<f8b68382>] xfs_iunlink_remove+0x272/0x350 [xfs] [18882.133966] [<f8b69831>] xfs_ifree+0x15/0xe5 [xfs] [18882.134016] [<f8b823a4>] xfs_inactive+0x312/0x3cb [xfs] [18882.134068] [<f8b8a48a>] xfs_fs_clear_inode+0x7d/0xbe [xfs] [18882.134109] [<c0170e6d>] clear_inode+0x63/0xb2 [18882.134121] [<c0170f35>] generic_delete_inode+0x79/0xcb [18882.134133] [<c0170f99>] generic_drop_inode+0x12/0x137 [18882.134147] [<c0170661>] iput+0x63/0x66 [18882.134159] [<c016907d>] do_unlinkat+0xb9/0x12b [18882.134192] [<c013ea70>] ? sys_futex+0xb8/0xce [18882.134202] [<c01037fd>] ? sysenter_past_esp+0x9a/0xa5 [18882.134210] [<c013b4d3>] ? trace_hardirqs_on+0xe1/0x102 [18882.134222] [<c01690ff>] sys_unlink+0x10/0x12 [18882.134227] [<c01037c2>] sysenter_past_esp+0x5f/0xa5 [18882.134265] ======================= [18882.134271] xfs_inotobp: xfs_imap() returned an error 22 on sde1. Returning error. [18882.134276] xfs_iunlink_remove: xfs_inotobp() returned an error 22 on sde1. Returning error. [18882.134282] xfs_inactive: xfs_ifree() returned an error = 22 on sde1 [18882.134287] xfs_force_shutdown(sde1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c. Return address = 0xf8b823eb [18882.203955] Filesystem "sde1": I/O Error Detected. Shutting down filesystem: sde1 [18882.204206] Please umount the filesystem, and rectify the problem(s) So, this is an invalid inode number; Dave thinks it's a bad number on the unlinked list, I think? This is where slab poisoning would be good, to catch use-after frees and the like, just in case. If repair turns up another inode, can you do as Dave suggested in comment #8? The agno and agbno values (both derived from the (bad) inode number itself are a bit interesting, 127 and 0x1fffff, which are both all 1's in binary. It's too bad it didn't print the actual original inode nr; I wonder if this could possibly be a negative error code that somehow snuck in as an inode number? No, on repair there is no lost inodes this time, so there is nothing to debug over xfs_db :-( Dave may have better ideas, but I'd be interested in something like this in xfs_dilocate just before the xfs_stack_trace() and EINVAL return: xfs_fs_cmn_err(CE_ALERT, mp, "inode number 0x%llx\n", ino); to see what actual inode number we got, just to see if it offers any clues. sgi guys, do you think there's any chance that http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c319b58b13bb22f9a2478825b06c641c825f51ec is to blame here? Maybe i can just reverse this patch and try without it? but seems for now it is more and more difficult to trigger bug :-( Eric, I don't think that patch has anything to do with it. Denys, it's still worth trying a revert, though I think that adding more printk()s in the error paths to identify the inode and the failure path would be more useful.... Can you reproduce the bug w/o aufs? The issue, that it is happening less and less. Since last report nothing. I cannot disable aufs, it will not handle my load most probably. But i will try. Also i will upgrade today to release. aufs is squid way of handling files (threads, instead linear processing (UFS) file by file). Kernel is not patched, and it is fully vanilla. Never mind on disabling aufs; I thought you were using the kernel module. Thanks, -Eric Here is with inode number [ 342.535990] Filesystem "sdd1": xfs_dilocate: agno (127) >= mp->m_sb.sb_agcount (16) [ 342.535996] Filesystem "sdd1": xfs_dilocate: agbno (0x1fffff) >= mp->m_sb.sb_agblocks (0x11171f) [ 342.536995] Filesystem "sdd1": inode number 0xffffffff [ 342.537178] Pid: 1690, comm: squid Not tainted 2.6.25-build-0028 #16 [ 342.537181] [<f8b0b0d2>] ? xfs_fs_cmn_err+0x1b/0x21 [xfs] [ 342.537223] [<f8b0d4c9>] xfs_dilocate+0x17d/0x415 [xfs] [ 342.537324] [<f8b133f5>] xfs_imap+0x12c/0x1d9 [xfs] [ 342.537364] [<c013a4b4>] ? mark_held_locks+0x41/0x5c [ 342.537384] [<c013a62f>] ? trace_hardirqs_on+0xe1/0x102 [ 342.537425] [<f8b13c32>] xfs_inotobp+0x31/0x1b1 [xfs] [ 342.537459] [<f8b31f48>] ? xfs_buf_read_flags+0x14/0x78 [xfs] [ 342.537505] [<f8b278ee>] ? xfs_trans_read_buf+0x25f/0x29d [xfs] [ 342.537547] [<f8b143be>] xfs_iunlink_remove+0x272/0x350 [xfs] [ 342.537636] [<f8b1586d>] xfs_ifree+0x15/0xe5 [xfs] [ 342.537687] [<f8b2e3e0>] xfs_inactive+0x312/0x3cb [xfs] [ 342.537741] [<f8b364c6>] xfs_fs_clear_inode+0x7d/0xbe [xfs] [ 342.537778] [<c0170000>] clear_inode+0x63/0xb2 [ 342.537788] [<c01700c8>] generic_delete_inode+0x79/0xcb [ 342.537800] [<c017012c>] generic_drop_inode+0x12/0x137 [ 342.537812] [<c016f7f4>] iput+0x63/0x66 [ 342.537821] [<c01681d1>] do_unlinkat+0xb9/0x12b [ 342.537861] [<c013dbcc>] ? sys_futex+0xb8/0xce [ 342.537873] [<c01037f9>] ? sysenter_past_esp+0x9a/0xa5 [ 342.537883] [<c013a62f>] ? trace_hardirqs_on+0xe1/0x102 [ 342.537898] [<c0168253>] sys_unlink+0x10/0x12 [ 342.537904] [<c01037be>] sysenter_past_esp+0x5f/0xa5 [ 342.537957] ======================= [ 342.537994] xfs_inotobp: xfs_imap() returned an error 22 on sdd1. Returning error. [ 342.537998] xfs_iunlink_remove: xfs_inotobp() returned an error 22 on sdd1. Returning error. [ 342.538002] xfs_inactive: xfs_ifree() returned an error = 22 on sdd1 [ 342.538006] xfs_force_shutdown(sdd1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c. Return address = 0xf8b2e427 [ 342.541407] xfs_iunlink_remove: xfs_trans_read_buf() returned an error 5 on sdd1. Returning error. [ 342.868966] Filesystem "sdd1": I/O Error Detected. Shutting down filesystem: sdd1 [ 342.869066] Please umount the filesystem, and rectify the problem(s) last dmesg from 2.6.25 release one more, and i did xfs_db - moving disconnected inodes to lost+found ... disconnected inode 201364285, moving to lost+found Phase 7 - verify and correct link counts... done Proxy-Karam114 ~ # xfs_db -r -c 'inode 201364285' -c p /dev/sdd1 core.magic = 0x494e core.mode = 0100640 core.version = 1 core.format = 2 (extents) core.nlinkv1 = 1 core.uid = 101 core.gid = 101 core.flushiter = 0 core.atime.sec = Tue Apr 22 09:39:11 2008 core.atime.nsec = 497758374 core.mtime.sec = Tue Apr 22 15:18:21 2008 core.mtime.nsec = 843550184 core.ctime.sec = Tue Apr 22 15:19:16 2008 core.ctime.nsec = 945175699 core.size = 0 core.nblocks = 0 core.extsize = 0 core.nextents = 0 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.filestream = 0 core.gen = 151 next_unlinked = null u = (empty) ok, inode number is -1... Did you use %llx to print it, i.e. it really is 32 bits of F's not 64? Hmm are we getting hold of a NULLAGINO somehow... I wonder if it would be crazy to set NULLAGINO to ((xfs_agino_t)-2) to test this theory and see if you got 0xfffffffe I wouldn't do that yet unless Dave chimes in on whether or not I'm crazy :) I haven't really had time to look at code to see if this makes sense. I just copy&paste request from message #15 I cannot set any risky patches on this server, also i have trip to another country soon, so probably i will have to switch to another filesystem, because i cannot handle such surprises abroad. I will try to write script to handle that. It probably is not risky, but I was hoping for a 2nd opinion. I do understand being hesitant to keep changing the code on your server, and I appreciate the debugging you've done so far. It may be enough that with code inspection this bug can still be found. When the filesystems have these problems, do you know if they went through log recovery during mount, prior to the error? If i just unmount FS and mount back it will crash after while with other kind of errors. I checked that. So i have to unmount, mount (so log will be cleared), umount again, xfs_repair, then mount. Then it works fine for 1-2 days. An incore inode saying it's inode number is -1 (NULLFSINO)? That's a new one. That would also explain the: "xfs_dilocate: agno (127) >= mp->m_sb.sb_agcount (16)" error message as well. The reason we are trying to look up the inode is that the inode number matches with the base unlinked inode list number (of NULLAGINO). This acutally means "this unlinked list is empty", so we are right to be shutting down here as there's definitely in memory corruption of some kind. FWIW, this is definitely an in-memory problem as the inode number does not exist on disk at all... What I don't understand yet is why, if the inode number is -1, we didn't abort earlier in xfs_iunlink_remove when we tried to read way beyond the end of device to pull in the AGI buffer.... Hmmmm. I suspect that this is not a full 64 bit word corruption. I think the lower 32bits of the inode number have been overwritten with 0xffffffff. The agno printed in the above error message should have been much larger than 127 (0x7f) if it was a full 64 bit word corruption. IF we look at the agino - 0x1fffff - that's 25 bits of 1s and the agno is 7 bits of 1s that are packed tightly against the agino. Hence we have 32 bits of 1s in a 64bit word. On top of that the corruption had to happen between calculating the agno from ip->i_ino and calculating the agino from ip->i_ino. In between this we read teh AGI buffer, so there's a big window for corruption to occur there (possible disk I/O). I think we are definitely looking at some kind of memory corruption occurring here and we are no closer to isolating the cause at all. It's likely to be a use-after-free or some such issue. Can you please turn on every single memory debugging option (slab poisoning, etc) in the kernel to try to catch whatever is corrupting memory. Ideally, I'd like you to run with KDB or another debugger so you can dump the memory around the inode that has been corrupted so we can see whether or not it's just this 32 bit word that has been corrupted. <sigh> I'm just not with it today. Disregard my last add - I obviously can't tell the difference between xfs_inotobp and xfs_itobp in error messages. Hum. This looks similar. http://pld.pastebin.com/f56c4c24a Denys, how much time do you have before you bail out and can't run any more modified xfs code... Thanks, -Eric Just for reference, repair output for that previous pastebin: http://pld.pastebin.com/f410a4f93 I can run for now, even if i switch to another FS, if there is anything that i can do in 1 crash - i will do it. But if i have to wait few days for answer, i cannot resist waking up each day at 3 of night, and fixing crashed proxy. http://marc.info/?l=linux-kernel&m=120942140024743&w=2 is it related? > http://marc.info/?l=linux-kernel&m=120942140024743&w=2
> is it related?
No.
Marco, yes, it probably is related. Any information you can supply on how you tripped over this would be appreciated... I might be hitting something similar here. It's an odd setup, 2.6.25 xfs backported to an older kernel; I need to try upstream. But if I run this test overnight... #!/bin/bash mkdir /mnt/test/dir1 mkdir /mnt/test/dir2 mkdir /mnt/test/dir3 mkdir /mnt/test/dir4 dd if=/dev/zero of=/mnt/test/dir1/file1 bs=1M count=20480 & dd if=/dev/zero of=/mnt/test/dir2/file2 bs=1M count=20480 & dd if=/dev/zero of=/mnt/test/dir3/file3 bs=1M count=20480 & dd if=/dev/zero of=/mnt/test/dir4/file4 bs=1M count=20480 & sync sync rm -f /mnt/test/dir1/file1 & rm -f /mnt/test/dir2/file2 & rm -f /mnt/test/dir3/file3 & rm -f /mnt/test/dir4/file4 & wait I eventually hit something similar. I changed NULLAGINO and NULLFSINO: -#define NULLFSINO ((xfs_ino_t)-1) -#define NULLAGINO ((xfs_agino_t)-1) +#define NULLFSINO ((xfs_ino_t)-2) +#define NULLAGINO ((xfs_agino_t)-3) in both userspace & kernelspace and when I trip I get: xfs_inotobp: xfs_imap() returned an error 22 on sdb5 for inode 4294967293. Returning error. so that is a NULLAGINO -Eric Further; I actually ran the above test under restricted memory (mem=512m) on a 4cpu machine, in a mkfs; mount; run test; umount loop. I added more printks & a dump stack and ran overnight; the stack dump is probably not that interesting, but: Call Trace: [<ffffffff883f56e9>] :xfs:xfs_iunlink_remove+0x2c7/0x3df [<ffffffff884058aa>] :xfs:xfs_trans_reserve+0xea/0x1cc [<ffffffff883f6421>] :xfs:xfs_ifree+0x18/0xf3 [<ffffffff8840df71>] :xfs:xfs_inactive+0x313/0x3d7 [<ffffffff88416654>] :xfs:xfs_fs_clear_inode+0xa4/0xec [<ffffffff80022c2d>] clear_inode+0xd2/0x123 [<ffffffff8002f340>] generic_delete_inode+0xde/0x143 [<ffffffff8003c2f6>] do_unlinkat+0xd5/0x141 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 I also added a printk in xfs_iunlink_remove() ... xfs_trans_brelse(tp, last_ibp); } next_ino = XFS_AGINO_TO_INO(mp, agno, next_agino); + if (next_ino == 4294967293) { + printk("ip->i_ino %llu agino %u next_ino %llu next_agino %u agno %u bucket_index %d\n", ip->i_ino, agino, next_ino, next_agino, agno, bucket_index); + dump_stack(); + } error = xfs_inotobp(mp, tp, next_ino, &last_dip, &last_ibp, &last_offset); if (error) { and got, in several runs: (remember NULLAGINO is now "-3" in my kernel) ip->i_ino 268640385 agino 204929 next_ino 4294967293 next_agino 4294967293 agno 2 bucket_index 1 ip->i_ino 134217857 agino 129 next_ino 4294967293 next_agino 4294967293 agno 1 bucket_index 1 ip->i_ino 402653313 agino 129 next_ino 4294967293 next_agino 4294967293 agno 3 bucket_index 1 ip->i_ino 134217857 agino 129 next_ino 4294967293 next_agino 4294967293 agno 1 bucket_index 1 ip->i_ino 402653313 agino 129 next_ino 4294967293 next_agino 4294967293 agno 3 bucket_index 1 The inode numbers are correct FWIW: 132 /mnt/test/dir1/file1 268640385 /mnt/test/dir3/file3 134217857 /mnt/test/dir2/file2 402653313 /mnt/test/dir4/file4 I can hit it with CONFIG_XFS_DEBUG too: /redhat/BUILD/xfs-kmod-0.5/xfs/xfs_inode.c, line: 2068 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at .../src/redhat/BUILD/xfs-kmod-0.5/xfs/support/debug.c:81 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:0d.0/0000:03:00.0/irq CPU 1 Modules linked in: xfs(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand dm_multipath video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg tg3 k8_edac edac_mc i2c_nforce2 k8temp hwmon i2c_core shpchp serio_raw pcspkr dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc mptspi mptscsih scsi_transport_spi sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 4540, comm: rm Tainted: G 2.6.18-88.el5 #1 RIP: 0010:[<ffffffff885dd23d>] [<ffffffff885dd23d>] :xfs:assfail+0x1a/0x26 RSP: 0018:ffff8100025ddd08 EFLAGS: 00010286 RAX: 0000000000000095 RBX: 0000000000000001 RCX: ffffffff802ec9a8 RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0 RBP: ffff81000fdc4000 R08: ffffffff802ec9a8 R09: 0000000000000046 R10: ffff8100025dd9a8 R11: 0000000000000080 R12: ffff81001a2afd00 R13: ffff81000fdc4000 R14: ffff81000f2c1840 R15: 0000000000000000 FS: 00002b40c784d260(0000) GS:ffff81000116f7c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003f552c2080 CR3: 000000000249e000 CR4: 00000000000006e0 Process rm (pid: 4540, threadinfo ffff8100025dc000, task ffff81000130d860) Stack: 0000000000000000 ffffffff885b2c2d 0000000000000001 00000001025ddd48 ffff8100025ddd94 ffff81001d8b4400 0000000000000001 000000000000000c 0000000000000000 00000081fffffffe 0008000000010000 0000000000000000 Call Trace: [<ffffffff885b2c2d>] :xfs:xfs_iunlink_remove+0x1f2/0x586 [<ffffffff885ca88a>] :xfs:xfs_trans_log_inode+0x86/0xd4 [<ffffffff885b5510>] :xfs:xfs_ifree+0x12d/0x209 [<ffffffff885d28ed>] :xfs:xfs_inactive+0x3b5/0x47c [<ffffffff885dba39>] :xfs:xfs_fs_clear_inode+0xa4/0x111 [<ffffffff80022c2d>] clear_inode+0xd2/0x123 [<ffffffff8002f340>] generic_delete_inode+0xde/0x143 [<ffffffff8003c2f6>] do_unlinkat+0xd5/0x141 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 59 0b 5f 88 c2 51 00 58 c3 41 55 83 e7 07 41 54 49 RIP [<ffffffff885dd23d>] :xfs:assfail+0x1a/0x26 RSP <ffff8100025ddd08> <0>Kernel panic - not syncing: Fatal exception ok time to go actually read the code ;) does seem to be some sort of a race. Looking at comment #40, I presume this means that we are calling xfs_ifree on the same inode on multiple occassions. So, xfs_ifree() gets called on last reference of an unlinked inode and so it must call xfs_iunlink_remove to take the inode out of the unlinked list. The code checks the head of the list for a match, otherwise it traverses the list (found from the hash bucket in the agi) looking for the inode. And in our cases it doesn't find the inode that it was expecting to be there as it manages to get to the end of the list. Is it a case of us trying to free the inode multiple times and so we've already removed the inode from the list previously? Just trying to get up to speed on what this is about... --Tim Tes, no, sorry the printk's in comment #40 were from separate runs. If you look at the script in comment #39, each inode only gets one unlink call. -Eric :-) Okay, that makes more sense. I imagined that there was only one unlink system call per file but I was wondering if we were getting to ifree multiple times somehow. I didn't realise the multiple run output ... oops. --Tim From doing some bisection, I'm pretty sure this: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a9759f2de38a3443d5107bddde03b4f3f550060e is the culprit. don't know is this is related, but after 20 days uptime linux 2.6.25 has started logging these messages: May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=134, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (292) len=166, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164 May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164 May 14 17:00:24 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164 so my hunch is that this part of the patch: @@ -1191,7 +1164,7 @@ _xfs_buf_ioapply( /* Special code path for reading a sub page size buffer in -- * we populate up the whole page, and hence the other metadata * in the same page. This optimization is only valid when the * filesystem block size is not smaller than the page size. */ if ((bp->b_buffer_length < PAGE_CACHE_SIZE) && - (bp->b_flags & XBF_READ) && locking && + (bp->b_flags & XBF_READ) && (blocksize >= PAGE_CACHE_SIZE)) { bio = bio_alloc(GFP_NOIO, 1); is allowing another metadata read to clobber our agi with on-disk data after it's been updated. Some tracing showed: <7>3438/2 xfs_iunlink ino 268640385 agibp ffff81001c1533c0 <7>3438/2 xfs_iunlink:1995: agi bp ffff81001c1533c0 ffff81000f19b400->agi_unlinked[1] set to agino 204929 <7>3437/2 xfs_iunlink_remove ino 134217857 [0]more> <7>3437/2 xfs_iunlink_remove:2120: agi bp ffff81000e8a9840 ffff81001e1c1400->agi_unlinked[1] set to next_agino 4294967295 <7>3438/0 xfs_iunlink_remove ino 268640385 <4>3438/0 xfs_iunlink_remove:2078: agi bp ffff81001c1533c0 ffff81000f19b400->agi_unlinked[1] == NULLAGINO ... without any code assignments in between, we went from the proper value back to NULLAGINO (disk value) I think this optimization is not honoring the locking for the other xfs_bufs around it, since it used to be page-based but is now per-bp. hch's patch at http://verein.lst.de/~hch/xfs-buffer-locking (modulo a missing brace) lets my testcase pass. Thanks, -Eric While compiling under 2.6.25 got error after applying patch
CC [M] fs/xfs/linux-2.6/xfs_buf.o
fs/xfs/linux-2.6/xfs_buf.c: In function '_xfs_buf_lookup_pages':
fs/xfs/linux-2.6/xfs_buf.c:453: error: invalid storage class for function '_xfs_buf_map_pages'
fs/xfs/linux-2.6/xfs_buf.c:449: warning: ISO C90 forbids mixed declarations and code
fs/xfs/linux-2.6/xfs_buf.c:711: error: invalid storage class for function 'mem_to_page'
fs/xfs/linux-2.6/xfs_buf.c:975: error: invalid storage class for function 'xfs_buf_wait_unpin'
fs/xfs/linux-2.6/xfs_buf.c:1001: error: invalid storage class for function 'xfs_buf_iodone_work'
fs/xfs/linux-2.6/xfs_buf.c:1105: error: invalid storage class for function '_xfs_buf_ioend'
fs/xfs/linux-2.6/xfs_buf.c:1116: error: invalid storage class for function 'xfs_buf_bio_end_io'
fs/xfs/linux-2.6/xfs_buf.c:1152: error: invalid storage class for function '_xfs_buf_ioapply'
fs/xfs/linux-2.6/xfs_buf.c:1384: error: invalid storage class for function 'xfs_alloc_bufhash'
fs/xfs/linux-2.6/xfs_buf.c:1400: error: invalid storage class for function 'xfs_free_bufhash'
fs/xfs/linux-2.6/xfs_buf.c:1414: error: invalid storage class for function 'xfs_register_buftarg'
fs/xfs/linux-2.6/xfs_buf.c:1423: error: invalid storage class for function 'xfs_unregister_buftarg'
fs/xfs/linux-2.6/xfs_buf.c:1456: error: invalid storage class for function 'xfs_setsize_buftarg_flags'
fs/xfs/linux-2.6/xfs_buf.c:1489: error: invalid storage class for function 'xfs_setsize_buftarg_early'
fs/xfs/linux-2.6/xfs_buf.c:1507: error: invalid storage class for function 'xfs_mapping_buftarg'
fs/xfs/linux-2.6/xfs_buf.c:1540: error: invalid storage class for function 'xfs_alloc_delwrite_queue'
fs/xfs/linux-2.6/xfs_buf.c:1590: error: invalid storage class for function 'xfs_buf_delwri_queue'
fs/xfs/linux-2.6/xfs_buf.c:1640: error: invalid storage class for function 'xfs_buf_runall_queues'
fs/xfs/linux-2.6/xfs_buf.c:1648: error: invalid storage class for function 'xfsbufd_wakeup'
fs/xfs/linux-2.6/xfs_buf.c:1671: error: invalid storage class for function 'xfs_buf_delwri_split'
fs/xfs/linux-2.6/xfs_buf.c:1708: error: invalid storage class for function 'xfsbufd'
fs/xfs/linux-2.6/xfs_buf.c:1846: error: expected declaration or statement at end of input
make[2]: *** [fs/xfs/linux-2.6/xfs_buf.o] Error 1
make[1]: *** [fs/xfs] Error 2
make: *** [fs] Error 2
make: *** Waiting for unfinished jobs....
make[2]: *** [net/ipv4/netfilter] Interrupt
make[1]: *** [net/netfilter] Interrupt
make[1]: *** [net/ipv4] Interrupt
make: *** [net] Interrupt
make[3]: *** [drivers/char/ipmi/modules.order] Interrupt
make[2]: *** [drivers/char/ipmi] Interrupt
make[1]: *** [drivers/char] Interrupt
make: *** [drivers] Interrupt
It looks like patch is invalid?
For example
- if (blocksize < PAGE_CACHE_SIZE && !PagePrivate(page)) {
+ if (blocksize >= PAGE_CACHE_SIZE) {
+ if (flags & XBF_READ)
+ bp->b_flags |= _XBF_PAGE_LOCKED;
>>>>>>> Probably here missing brace? or to remove it 3 lines up
+ else if (!PagePrivate(page)) {
if (test_page_region(page, offset, nbytes))
page_count++;
}
Yes, there's a missing brace (as noted in comment #49) :) Try http://www.linux.sgi.com/archives/xfs/2008-05/msg00190.html -Eric 3 days and it is fine still. Yes it fixes the bug. Is it going to be in mainline? Yep.... http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ab455eeaff6893cd06da33843e840d888cdc04a I think you can close this one now. |